-


Nine By Blue founder Vanessa Fox is author of the critically acclaimed Marketing in the Age of Google: Your Online Strategy IS Your Business, which has been called the “search marketing bible” by technology leaders. Be sure to pick up your copy today.
More Information » News Archives
Categories
About Nine By Blue
Successful search strategies involve your entire organization and that's no easy feat. To help, we’ve developed a methodology, training, and a custom reporting suite to measure your progress and surface important issues.
We don't offer search engine optimization. We help you understand your audiences (who search) and solve their problems.
Latest from Twitter
Facebook
December 2, 2008
Historically, Google Blog Search has indexed primarily via RSS feeds, which meant that for blogs that published partial feeds, Google Blog Search only indexed that partial portion. Any links or text in the rest of the post weren’t available through Blog Search. (The full posts were indexed in web search, of course.)
The people behind the West Seattle Blog pointed out a change to this on Twitter.
According to Jeremy Hylton of the Google Blog Search team, they now index the full content of the page. This means that not only do they index the full post even if the blog publishes a partial feed, but it means that they index the non-post parts of the pages as well. This is mostly an improvement, of course, but it’s causing some problems, particularly for people who have alerts set or do searches for references to themselves, their sites, or their brands when any of these are linked to in blogrolls.
The result is that anytime a blog publishes a new post, Google Blogsearch picks up the new page, including the sidebar details. So you may get an alert that there’s a new blog post about you, but when you go check it out, you find the post doesn’t even mention you!
Jeremy says:
We do expect to fix the problem you’re seeing. We’ll use the full page content, but exclude the content that isn’t really part of the post. I’m not sure if we’ll be able to make the change before the end of the year, but we are working on it and are pretty confident that it can be solved.
They’ll post once it’s been fixed.
A funny aside:when searching to see where else this was being talked about, I came across this juxtaposition of results:


Yea, noticed this behavior about a month ago. Good you have confirmation. Posted my details over at http://www.seroundtable.com/archives/018624.html
I’ve experienced the problem that you’ve described (with sidebar content causing issues, etc). In the upgrade to Reputation Monitor that we’re launching soon, we’ve added a lot of behind-the-scenes processing to judge the relevance of a particular page to you/your search term – this should remove these false-positives that Google now returns.
And yes – interesting search results; Patrick does reap the benefits of being crawled about every two minutes.
I for one prefer full feeds – it’s better for the users, but clearly Google is not respecting the choices of the publishers.
But that’s just par for the course for Google (Google Books, Youtube . . . etc)
quadszilla, if there’s content that you don’t want to be indexed, robots.txt is great for that.
The problem now is Blog Search indexes some site’s content better than Web search does but Google doesn’t cache it. So you find some juicy (but deleted) blog post in Blog Search, can’t access any cache for it, search for it on the Web, and still can’t find the cache because Web search never indexed it.
Why use Blog Search to index more thoroughly than Web search does if it doesn’t cache?
I’m seeing the opposite problem too, where the Web index for some sites is better than the Blog index for them is – Google just seems to be at loose ends with how best to organize information – and since that is what part of their business is about (most of it is about advertising, but still) you’d think they’d want to get their acts together on this.
Caveat: I’m speaking purely from a semantic viewpoint on this – technically I’m against caching or even web crawling by any search engine unless the owner of the content opts in – but that’s another post that I’m not replying to at the moment.
Pingback: Google Still Working On Making Blog Search More Relevant