View Full Version : Yahoo excessive spiders
ErosOlmi
02-04-2008, 22:44
During the last month (or even more) *.thinBasic.com web sites has see a lot of traffic increase.
Checking web access stats I've found, just for the forum web site, a bandwidth usage of more than 3 GBytes JUST FOR YAHOO SPIDERS >:( >:( >:(
See attached image that refers to this site spiders stats just for March 2008. Slurp is yahoo spider.
I've sent a mail to yahoo customer support asking to please stop crowling our site in that way.
I've got a reply they will check my request and reply within 24h. Hope so ???
In the meantime I've seen many posts on interned bothering about excessive yahoo spiders.
If you visit this forum during the day or night you will see on the left side box "OnLine" the number of yahoo spiders active on the site. Sometimes I can see more than 100 different yahoo IP all active at the same time.
I'm quite tired about that. We have a limited bandwidth with our provider and I would not like to use it in this way.
Just to let you know
Ciao
Eros
PS: I've already applied some changes in Robots.txt file but spiders (nowadays) are just ignoring it
http://community.thinbasic.com/robots.txt
Michael Clease
02-04-2008, 22:57
Is there any chance that whoever hosts your site can do something about this spam traffic?
ErosOlmi
02-04-2008, 23:00
I suppose they are suffering about this problem like many others on the Net.
I think Yahoo is just reducing the gap between them and Google. But this cannot be done using the resouces of others.
In any case I will send my provider (http://hosting.aruba.it/?lang=EN) a mail.
Thanks
Eros
Michael Clease
02-04-2008, 23:06
I dont really know anything about these sorts of problems but is .htaccess any good.
I found this http://webartistdesigns.com/block_bots_and_spiders.html
ErosOlmi
02-04-2008, 23:11
.htaccess is for Apache web server.
thinBasic is on a dedicated physical Win2003 server running IIS6
Of course I can easily block IPs from Yahoo with Win2003 IP filtering but I would not like to do it as option number 1 because:
as you can see, I have to block almost an entire range of more than ... let say a lot of IPs (http://ws.arin.net/whois/?queryinput=74.6.17.51)
I would limit to have search engine to find thinBasic.
Maybe I will leave this for option number ... 5 ;)
Michael Clease
02-04-2008, 23:26
thanks for the information sorry I dont have anymore ideas.
I am getting a new manager in 2 weeks who is from an IT background so I will be picking his brain about Server 2003 and its configuring, I do stumble around it at the moment adding user account and email address / groups.
ErosOlmi
02-04-2008, 23:38
oh, thanks Abraxas but do not warry. I know how to to it the hard way ;)
I deal with those matters on a daily basis for my real life job :D
I just would like to see credible a company like yahoo do things the right way and not act like a bad boy.
Ciao
Eros
That is incredible Eros, good luck in getting it resolved, wow 3gb, Yikes Yahoo!
ErosOlmi
03-04-2008, 02:25
That's what I call support. I've got a nice reply from Yahoo :o
--- My request ---_________________________________________________
Subject: Reporting a Problem
Additional Information: Hi there. I manage a little web site at
www.thinbasic.com and community.thinbasic.com
Problem is that your spiders are visiting my site so much frequently
that a lot of my limited bandwidth is consumed by your spiders. Since
about two weeks I can see even more that 1 hundred spiders present
almost at the same time scanning all posts in our forum.
I have a limited band with my provider and my users have to support some
delays when using our sites.
I can understand your needs but please understand mine.
Hope someone will read this post and reply to me soon.
Thanks
Eros
While Viewing Form Name: http://help.yahoo.com/l/us/yahoo/search/search_support.html
---Yahoo reply ---_________________________________________________
Hello Eros,
Thank you for writing to Yahoo! Search.
We apologize for the excess traffic to your site. In a continuing effort
to improve our index freshness and comprehensiveness we are bringing
online additional crawler machines to prepare them for production.
To limit the amount of traffic your site receives, you can place a
crawl-delay in your robots.txt, specifying the number of seconds each
crawler will wait between each request. Please check out the following
pages for detailed information about this:
http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html
http://www.ysearchblog.com/archives/000078.html
Again, my apologies for the inconvenience and we appreciate your
patience as we're improving Yahoo! Search.
Thank you again for contacting Yahoo! Search.
Regards,
Yasmen
Yahoo! Search Customer Care
For assistance with all Yahoo! services, please visit: http://help.yahoo.com/
_________________________________________________
I will check what to do.
Ciao
Eros
Michael Hartlef
03-04-2008, 05:57
Thanks for the heads up. This really can get ysomeone in trouble if they have a very low amount of traffic for free.