Historically, Google Blog Search has indexed primarily via RSS feeds, which meant that for blogs that published partial feeds, Google Blog Search only indexed that partial portion. Any links or text in the rest of the post weren’t available through Blog Search. (The full posts were indexed in web search, of course.)
The people behind the West Seattle Blog pointed out a change to this on Twitter.
According to Jeremy Hylton of the Google Blog Search team, they now index the full content of the page. This means that not only do they index the full post even if the blog publishes a partial feed, but it means that they index the non-post parts of the pages as well. This is mostly an improvement, of course, but it’s causing some problems, particularly for people who have alerts set or do searches for references to themselves, their sites, or their brands when any of these are linked to in blogrolls.
The result is that anytime a blog publishes a new post, Google Blogsearch picks up the new page, including the sidebar details. So you may get an alert that there’s a new blog post about you, but when you go check it out, you find the post doesn’t even mention you!
Jeremy says:
We do expect to fix the problem you’re seeing. We’ll use the full page content, but exclude the content that isn’t really part of the post. I’m not sure if we’ll be able to make the change before the end of the year, but we are working on it and are pretty confident that it can be solved.
They’ll post once it’s been fixed.
A funny aside:when searching to see where else this was being talked about, I came across this juxtaposition of results:
Tags: RSS




December 3rd, 2008 at 4:20 am
Yea, noticed this behavior about a month ago. Good you have confirmation. Posted my details over at http://www.seroundtable.com/archives/018624.html