While the 'bad boys' on the hit TV show COPS and bad bots may only have a few things in common, one main difference stands between the two.
On COPS, you know exactly who the criminals are; they look, walk, talk and act like criminals. But bad bots? They are sneakier than the criminals you see on COPS. These bad bots often look like the good bots you want on your site. It can become incredibly confusing.
As a result, most website owners will do one of two things when it comes to bots: either not enough, or too much. Some website owners won’t do anything about bots, ultimately granting the bad bots unfettered access right alongside the good bots.
On the flipside, after reading horror stories about what malicious bots can do to a website, other owners react by putting every possible precaution in place; they block both good and bad bots, and negatively impact things like search engine rankings. Neither of these solutions are actually a solution to the problem at hand, so where does this leave us?
You’re excused if you’ve been thinking all bots are bad bots, the kind of malicious creepy-crawlies you don’t want on your website. It’s true; bots don’t get a ton of good press, although a lot of them should. Many of the bots that roam your website come courtesy of search engines. Googlebots are some of the most sought after bots around because they do what you would assume – crawl your website in order to index content and add it to Google search results.
If you were a hacker designing a bot to deploy on websites for malicious purposes and you wanted the bot to roam everywhere without raising red flags, how do begin to disguise your bot? You create an imposter. For every 24 visits a website gets from Googlebots, one of those Googlebots will be a fake.
In an analysis of over 50 million fake Googlebot sessions on websites, it was found that 34.3% of all fake Googlebots were explicitly malicious. Around a quarter (24%) were used to launch DDoS attacks while just over 5% were used to scrape content, typically resulting in content being reused on other websites and potentially leading to SEO damage.
In order for your website to be able to distinguish good, legitimate bots from malicious ones, it has to be able to look past what the bot appears to be in order to see the details under the surface. What is the bot’s classification or type? What is its user-agent string? What is its IP? Does that IP match the source it’s claiming to come from? Is the bot coming from a region of the world known for malicious bot activity?
Some commonly used methods include geographic IP filtering, or configuring the robots.txt file to allow the bots you want, and shut out the rest you don’t. While this method is not perfect, the bad bots won’t bother fetching the robots.txt file and will
Since those methods aren’t foolproof, what method is? What you (or your security professional) needs is to enforce security policies on your website that shut out bots that are known to be malicious, whether it’s through originating IPs, geographic location, known suspicious user-agent strings, or another factor.
How do you know a bot is malicious? As in so many other avenues of life, having lots of information is key. The more the internet security learns and, more importantly, shares about bad bots, the safer everyone’s website will be. A directory called BotoPedia compiles information about good and bad bots, and it is intended to provide a much clearer understanding of the bots you see crawling your website.
Now, how well does it work when the criminals on COPS hide out in garbage cans to protect themselves? It simply does not work well in their favor. Bad bots will still try to disguise themselves just as the criminals held their posts hiding out in garbage cans, but with the infosec community coming together, these bad bots will soon be trash as well.