Is Google out of control? |
Post Reply | Page 12> |
Author | ||||
sean
Moderator Group Sponsor Member Joined: 20 July 2005 Location: North Idaho Status: Offline Points: 7388 |
Post Options
Thanks(0)
Posted: 13 Sep. 2011 at 4:34pm |
|||
Why is a "search robot" attempting to write replies to threads?
At any time of day, Google has 20-30, sometimes more, "search robots" perusing the forum. Not that I object to bots, they do serve a purpose, but IMO Google is way out of order here. Quite often, Google bots outnumber real people! Perhaps this might be related to the "search time-outs" we've been experiencing? Google is just eating up too much bandwidth. Even if not, Google has NO business trying to "write" replies! Russ, is there any way to limit the number of simultaneous connections Google (or any bot) can make? There are more "writes" than this photo shows, it's only what I could see on-screen w/out scrolling: |
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
I've noticed the same thing. My best guess is that Google's web crawlers are simply following every link on every page, so they are also hitting the reply link in threads and the result is what we see.
|
||||
sean
Moderator Group Sponsor Member Joined: 20 July 2005 Location: North Idaho Status: Offline Points: 7388 |
Post Options
Thanks(0)
|
|||
None of the other bots do it (Bing, Yahoo, Baidu, plus a few others). Google certainly has the ability to know what NOT to follow.
The other bots don't abuse the privilege either; have never seen more than 2 simultaneous connections from any of them. If there's a way to limit Google, it should be so. Sean
|
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
I agree that Google seems to be crawling more actively than the others. When I have a moment or you do, we can troll around the Web Wiz site and forum to find out what options are out there, if any.
|
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
Well, one solution is to begin banning batches of Google robot IP addresses from the site, but I think a better solution is suggested by this snippet I found in response to a question about how to limit Googles impact on bandwidth"
". . . . I do not believe the reader’s intention was to exclude all of his Web pages from Google search engine results pages (SERPs). He just wants Google not to request pages from his server so often. "Google actually has a Web page with this information and an email address. This is a direct quote from Google’s Webmaster FAQs page: " 'Please send an email to googlebot@google.com with the name of your site and a detailed description of the problem. Please also include a portion of the weblog that shows Google accesses, so we can track down the problem more quickly on our end.' " |
||||
russnj
Admin Webmaster Joined: 20 July 2005 Location: W. Windsor, NJ Status: Offline Points: 3943 |
Post Options
Thanks(0)
|
|||
I really don't think its a problem, google has been doing it all along just with this version the active users are separated in categories.
And as far as bandwidth goes we are not even at 40% of our monthly quotas. And I am not aware of anyway to limit the connections of a single bot, but there might be a way. One last thing are you still getting timeouts since the update? |
||||
43 MB, 48 CJ2A, 50 CJ3A, 55 M38A1, 56 CJ5, 79 M151A2, M100 ,65 M416 |
||||
kilroy
Member Joined: 14 Dec. 2009 Location: Iowa Status: Offline Points: 2096 |
Post Options
Thanks(0)
|
|||
Yes.
|
||||
"You know, I'm too old a bunny to get very excited about all this."
General H.E. von Salmuth Commander German Fourth Army France: 5 June 1944 1947 Willys CJ2A 1947 Bantam T3-C Trailer |
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
Overall, it is much better since the upgrade, but there are still some issues.
|
||||
russnj
Admin Webmaster Joined: 20 July 2005 Location: W. Windsor, NJ Status: Offline Points: 3943 |
Post Options
Thanks(0)
|
|||
looks like using the advanced search and selecting show results as topics causes a time out almost every time for me. Is that the case for the rest of you?
|
||||
43 MB, 48 CJ2A, 50 CJ3A, 55 M38A1, 56 CJ5, 79 M151A2, M100 ,65 M416 |
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
I think that I almost always choose the results as posts rather than topics. In that case, I find that that the likelihood of the search timing out is strongly correlated with the sparseness of the results. In other words, if the search must go through the entire database to find the results, then the timeout is most likely to occur. If you pick an absolutely common term such as "Willys", it will find 100 results very quickly and not time out. I doubt that you ever get a timeout if you are not using the Advanced Search because, in that case, you are search only within a single forum.
I'm pretty sure that, with the current size of the forum (number of posts) that any search that requires going through most or all of the database will time out with the default setting of 60 seconds. If there is not an option to completely index the text of every post, I think that the only solution is to increase the length of the timeout period. I doubt that choosing "topics" vs. "posts" as the output will have any impact because the search will require the same amount of time.
|
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
At the moment, I'm having a hard time making it time out even when I pick terms that don't exist in the posts!
Edited by samcj2a - 15 Sep. 2011 at 10:07pm |
||||
sean
Moderator Group Sponsor Member Joined: 20 July 2005 Location: North Idaho Status: Offline Points: 7388 |
Post Options
Thanks(0)
|
|||
I still get timeouts on 3-word "phrase" searches, but overall, it's working fairly well again.
|
||||
sean
Moderator Group Sponsor Member Joined: 20 July 2005 Location: North Idaho Status: Offline Points: 7388 |
Post Options
Thanks(0)
|
|||
Dragging this back kicking and screaming. Earlier I had said:
.. but the tables have turned now. "Baidu" and "Bing" now comprise 90-95% of the bots. "Active user" stats from earlier this morning:
Nearly twice as many bots as real people! Of the 40 bots, split was about 70%/30% Baidu/Bing And this:
5 different Baidu bots on the same topic? plus others "Baidu" is a Chinese language search engine (aka the "Chinese" google), and is getting lots of bad press from other sites, even ignoring "robots.txt" rules.
Russ, this is not about monthly data quotas, but impacts server responsiveness. The more of these bots there are slamming the site, the slower the server is to "serve up" pages to real users. What happened w/Google? Were you able to limit the connections? Can the same be done w/Baidu & Bing? Sean |
||||
russnj
Admin Webmaster Joined: 20 July 2005 Location: W. Windsor, NJ Status: Offline Points: 3943 |
Post Options
Thanks(0)
|
|||
I can see blocking the Chinese spider, since we have no members from china that I am aware of, besides spammers.
But really overall we are not even close to putting a strain on the servers with the amount of users that come to the forum. As you can see the most in the last there months was 95 users on at the same time, this software/server setup is designed to handle users in the hundreds. I can block the crawlers but I would not expect to see any difference in speed of the forum, it has more to do with who else is on the server and how efficiently the host manages the spikes in traffic that come along. |
||||
43 MB, 48 CJ2A, 50 CJ3A, 55 M38A1, 56 CJ5, 79 M151A2, M100 ,65 M416 |
||||
sean
Moderator Group Sponsor Member Joined: 20 July 2005 Location: North Idaho Status: Offline Points: 7388 |
Post Options
Thanks(0)
|
|||
Russ:
I don't know how many servers webwiz has, but they certainly host many other forums besides us. If the bots are hitting those as hard as they're hitting us, it all adds up. I have no way to gauge the impact these bots cause. But every connection requires CPU time, the more connections, the slower the response, to greater or lesser degree. In the past, "page generated" times were typically under .100 seconds. They are now typically over .300 seconds, sometimes more. I won't belabor the point. Just my personal pet peeve with these obnoxious search engines. Sean
|
||||
sean
Moderator Group Sponsor Member Joined: 20 July 2005 Location: North Idaho Status: Offline Points: 7388 |
Post Options
Thanks(0)
|
|||
BAIDU is gone! Hurrah!
|
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
I agree that it seemed like it was overdoing its welcome and it did not seem that much good for mankind was being done by having a Chinese language search engine index the forum exhaustively.
|
||||
samcj2a
Member Sponsor Member x 5 Joined: 21 Oct. 2006 Location: Arlington, VA Status: Offline Points: 8549 |
Post Options
Thanks(0)
|
|||
I just tried several searches and think things are much improved except that it would be nice if Russ would lift the 100 result limit again. The searches where there are many positive results are returned in just a few seconds. Those with several words, not all of which are satisfied, are returned in fairly short order too. The longest search that I did was where there were three words, but at least one was obscure but present with the others at least a few times. Even that one did not time out. I have to attribute it to the absence of Baidu, although we'll have to see what happens over time.
|
||||
Post Reply | Page 12> |
Tweet |
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |