Cloudy with a Chance of Malicious URLs
|Wing Fei, Chia|
It has been repeated numerous times that the browser today is far more complex and powerful than the operating system. The Internet is a learning resource, a communication medium and the ultimate entertainment center and the browser is the key to all that. In contrast, the Internet is also addictive and outright dangerous, and through your browser – it is indeed the most preferred way of infecting your computer.
Not only search engine results have been poisoned into luring users to malware, but legitimate high-traffic sites are also compromised either with malicious code or 3rd party banner advertisements directing the user to malware. At the same time, there are sites masquerading as your local bank to phish your personal details, and sites distributing serverside polymorphic malware that the scan engine has difficulty keeping up at times. Protecting the user from each and every single malicious URL out there is undeniably challenging and almost impossible it may seem. Whether or not blocking the malicious URL is the most effective solution is debatable, but it certainly has gotten everyone's attention these days including testers and reviewers.
As URLs can be seen as complex and dynamically changing over a short period time, the benefit of real-time blocking of malicious URLs is universal, regardless of the device and the operating system used. Besides, the in-the-cloud safety rating lookups done for URLs are really small in size. Comparing that to an average size of virus definition database update, it is equivalent to thousands of lookups and protects users instantaneously when the threat is identified.
In this presentation, we will look into how effective would it be blocking the malicious URLs and domains whenever possible over malicious files, and the challenges of doing so including the arrival of shortened URLs, and internationalized domain names. How can we proactively crawl the entire Internet in search of these malicious URLs when crawling is expensive to do, require lots of hardware, bandwidth intensive and knowing where on the Internet should be concentrating on for our users? Do we rely on each other, shifting the crawling responsibility to our users without compromising their privacy?