Sogou Search Engine Spider Smashing Websites

Was keeping an eye on our CPU usage on a newly provisioned VPS on which a part of AusGamers was recently transferred to and noticed a big, unusual spike in CPU usage:

Correlating this with another graph indicated it was something hitting our news or forum pages pretty hard, so I nabbed the Apache logs and quickly determined what it was – the “Sogou web spider”, hitting our front page twice a second, over and over again:

199.119.201.102 – – [13/Sep/2011:10:52:16 +1000] “GET / HTTP/1.0” 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+h
ttp://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – – [13/Sep/2011:10:52:16 +1000] “GET / HTTP/1.0” 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – – [13/Sep/2011:10:52:17 +1000] “GET / HTTP/1.0” 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – – [13/Sep/2011:10:52:17 +1000] “GET / HTTP/1.0” 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – – [13/Sep/2011:10:52:17 +1000] “GET / HTTP/1.0” 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”
199.119.201.102 – – [13/Sep/2011:10:52:18 +1000] “GET / HTTP/1.0” 301 233 “http://www.ausgamers.com” “Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)”

… and so on, for a total of 18,763 requests Eventually it moved on to our different pages, but I stopped counting.

The URL in our logs directs you to a Chinese language FAQ, which when run through the awesome translate feature in Chrome directs you to a form for which you can submit a complaint about “crawling too fast”. I did that (in English) and will be fascinated to see if I get a response.

In the meantime, we just blocked the IP address.

One thought on “Sogou Search Engine Spider Smashing Websites”

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.