I spend probably far too much time trawling my logs looking for odd behavior. Now I don't get much traffic here, but I'm a Yorkshireman so I'm stingy with my bandwidth, so I'll happily add search engines to my robots.txt or block them at the firewall if they get too greedy and trawl the site too often or don't obey robots.txt

So when I noticed a Russian engine (a big one) was being a little agressive and spoofing valid browsers I dropped them at layer three.

A couple of hours later I notice something odd in the logs - - [25/Aug/2010:15:44:08 +0100] "GET / HTTP/1.0" 200 11190 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" - - [25/Aug/2010:15:44:08 +0100] "GET /system/themes/charcoal/style.css HTTP/1.0" 200 11755 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" - - [25/Aug/2010:15:44:09 +0100] "GET /system/themes/charcoal/ie.css HTTP/1.0" 200 1639 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" - - [25/Aug/2010:15:44:09 +0100] "GET /scripts/jquery.js HTTP/1.0" 200 56044 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" - - [25/Aug/2010:15:44:10 +0100] "GET /system/themes/charcoal/scripts/jquery.pngfix.js HTTP/1.0" 200 4290 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" - - [25/Aug/2010:15:44:11 +0100] "GET /system/themes/charcoal/scripts/fixpngs.js HTTP/1.0" 200 549 "" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" - - [25/Aug/2010:15:44:11 +0100] "GET /system/themes/charcoal/images/pwrd_habari.png HTTP/1.0" 200 3777 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

So hours after I block a search engine that's been hitting me every day for months I get my first hit ever that's got it as the referrer? Ok, this _could_ be coincidence.

Time to don the deerstalker!

I noticed that all the IPs are different, but it's certainly the same page load (I don't get anywhere near enough hits for this to be several visitors at the same time), so this is being tor'd somehow. A couple of whois queries and I find all of the IPs belong to one ISP.

The above section of the log is all there was, no more. So there was no requests to anything referenced in the css files. So nothing was actually rendering the page, which means it probably wasn't actually IE6 requesting the page.

What was the alleged search term that meant the visitor *cough*BOT*cough* ended up here? %e0 etc is too high a hex value to be HTML encoded ASCII, but it could be Unicode and given the source I went to a Unicode Cyrillic text translator, which came back with some Cyrillic text unsurprisingly, which I then plugged into Google translate. The search term that led to the visit? "Arbitration manager"

My conclusion from digging around? detects that their bot trawling isn't having any luck anymore and want's to see if this page is still active, so fires up a connection to a localised tor (maybe all the exit nodes sit on their address space) and grabs the root index.html with another bot. I don't know why. They've not requested anything since, and it probably won't do them any good. Due to their odd behavior there's a few more rules in the firewall now ;)

