<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Increasing Search Indexing Coverage With an XML Sitemap</title>
	<atom:link href="http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/</link>
	<description>by Vanessa Fox</description>
	<lastBuildDate>Wed, 14 Oct 2009 20:07:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: cesar</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2652</link>
		<dc:creator>cesar</dc:creator>
		<pubDate>Wed, 19 Nov 2008 06:26:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2652</guid>
		<description>test comment</description>
		<content:encoded><![CDATA[<p>test comment</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard McLaughin</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2635</link>
		<dc:creator>Richard McLaughin</dc:creator>
		<pubDate>Mon, 20 Oct 2008 16:48:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2635</guid>
		<description>(whine) I still find a lot of pages that are in my xml file that Google has yet to find.

Great post.</description>
		<content:encoded><![CDATA[<p>(whine) I still find a lot of pages that are in my xml file that Google has yet to find.</p>
<p>Great post.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Guardian</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2650</link>
		<dc:creator>Guardian</dc:creator>
		<pubDate>Sat, 18 Oct 2008 06:44:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2650</guid>
		<description>Hey Vanessa,

          A master stroke with an informative as well as resourceful post. I am almost newbie to search, but I have a great power to locate the right sources in search industry and you are among them. I ahve been fllowing your rss feeds as well as your blog.You would be happy to know that I have gained much knowledge about search just by following your blog as well as Danny&#039;s Daggle.</description>
		<content:encoded><![CDATA[<p>Hey Vanessa,</p>
<p>          A master stroke with an informative as well as resourceful post. I am almost newbie to search, but I have a great power to locate the right sources in search industry and you are among them. I ahve been fllowing your rss feeds as well as your blog.You would be happy to know that I have gained much knowledge about search just by following your blog as well as Danny&#8217;s Daggle.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Atwood</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2649</link>
		<dc:creator>Jeff Atwood</dc:creator>
		<pubDate>Fri, 17 Oct 2008 07:57:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2649</guid>
		<description>Hi Vanessa,

I entered a comment reply to this blog entry but it hasn&#039;t been posted yet? Did I mess up, or was it eaten by spam filters somehow?

At any rate, I just wanted to thank you for the great and informative blog entry.

Jeff</description>
		<content:encoded><![CDATA[<p>Hi Vanessa,</p>
<p>I entered a comment reply to this blog entry but it hasn&#8217;t been posted yet? Did I mess up, or was it eaten by spam filters somehow?</p>
<p>At any rate, I just wanted to thank you for the great and informative blog entry.</p>
<p>Jeff</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Atwood</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2648</link>
		<dc:creator>Jeff Atwood</dc:creator>
		<pubDate>Wed, 15 Oct 2008 07:33:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2648</guid>
		<description>Hi Vanessa,

Great article! And thanks for Google webmaster tools &amp; sitemap.org, they&#039;re both fantastic resources.

One clarification, however:

&quot;the problem likely isn’t with the dynamic nature of the URLs themselves. The issue is probably that the internal linking structure doesn’t provide links to every single page. Since Googlebot crawls the web by following links, it wouldn’t know about the unlinked URLs&quot;

I don&#039;t think this is true; every single question in the system can be reached through a direct hyperlink1 *IF* you follow the pagination links, as I mentioned in my article:

http://stackoverflow.com/questions
http://stackoverflow.com/questions?page=2
http://stackoverflow.com/questions?page=3
..
http://stackoverflow.com/questions?page=931

The problem, from our perspective, is that Googlebot simply wasn&#039;t doing that.

Blogs are a simpler case because all the archive pages are generally in the form:

http://myblog/archives/2008-06
http://myblog/archives/2008-07

etcetera. This led us to believe, based on observed behavior, that Googlebot couldn&#039;t follow our pagination links.

However, in retrospect sitemap.xml is probably a more *efficient* way for Googlebot (and any other search engines) to discover URLs to each question in Stack Overflow. No page loads are incurred on the server, no extra parsing of meaningless (to searchbots) markup, and so forth.</description>
		<content:encoded><![CDATA[<p>Hi Vanessa,</p>
<p>Great article! And thanks for Google webmaster tools &amp; sitemap.org, they&#8217;re both fantastic resources.</p>
<p>One clarification, however:</p>
<p>&#8220;the problem likely isn’t with the dynamic nature of the URLs themselves. The issue is probably that the internal linking structure doesn’t provide links to every single page. Since Googlebot crawls the web by following links, it wouldn’t know about the unlinked URLs&#8221;</p>
<p>I don&#8217;t think this is true; every single question in the system can be reached through a direct hyperlink1 *IF* you follow the pagination links, as I mentioned in my article:</p>
<p><a href="http://stackoverflow.com/questions" rel="nofollow" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions?referer=');">http://stackoverflow.com/questions</a><br />
<a href="http://stackoverflow.com/questions?page=2" rel="nofollow" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions?page=2&amp;referer=');">http://stackoverflow.com/questions?page=2</a><br />
<a href="http://stackoverflow.com/questions?page=3" rel="nofollow" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions?page=3&amp;referer=');">http://stackoverflow.com/questions?page=3</a><br />
..<br />
<a href="http://stackoverflow.com/questions?page=931" rel="nofollow" onclick="pageTracker._trackPageview('/outgoing/stackoverflow.com/questions?page=931&amp;referer=');">http://stackoverflow.com/questions?page=931</a></p>
<p>The problem, from our perspective, is that Googlebot simply wasn&#8217;t doing that.</p>
<p>Blogs are a simpler case because all the archive pages are generally in the form:</p>
<p><a href="http://myblog/archives/2008-06" rel="nofollow" onclick="pageTracker._trackPageview('/outgoing/myblog/archives/2008-06?referer=');">http://myblog/archives/2008-06</a><br />
<a href="http://myblog/archives/2008-07" rel="nofollow" onclick="pageTracker._trackPageview('/outgoing/myblog/archives/2008-07?referer=');">http://myblog/archives/2008-07</a></p>
<p>etcetera. This led us to believe, based on observed behavior, that Googlebot couldn&#8217;t follow our pagination links.</p>
<p>However, in retrospect sitemap.xml is probably a more *efficient* way for Googlebot (and any other search engines) to discover URLs to each question in Stack Overflow. No page loads are incurred on the server, no extra parsing of meaningless (to searchbots) markup, and so forth.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brent D. Payne</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2645</link>
		<dc:creator>Brent D. Payne</dc:creator>
		<pubDate>Wed, 15 Oct 2008 04:42:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2645</guid>
		<description>Nice post Vanessa. Interesting how people interact a lot for a while and then not so much for a long while. Hopefully this is coming up on a time where we&#039;ll start doing more interacting naturally again. ;-)

P.S. I&#039;m on season 4 of Buffy. I&#039;m catching up to get the sub-culture of search--that you created.</description>
		<content:encoded><![CDATA[<p>Nice post Vanessa. Interesting how people interact a lot for a while and then not so much for a long while. Hopefully this is coming up on a time where we&#8217;ll start doing more interacting naturally again. <img src='http://www.ninebyblue.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>P.S. I&#8217;m on season 4 of Buffy. I&#8217;m catching up to get the sub-culture of search&#8211;that you created.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vanessa</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2643</link>
		<dc:creator>Vanessa</dc:creator>
		<pubDate>Tue, 14 Oct 2008 07:31:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2643</guid>
		<description>There are several ways people approach what to put in Sitemaps:

-Put the important pages in the Sitemap. This method is a good one is if it&#039;s problematic to put all pages in the Sitemap. The point of the Sitemap is to let search engine know more about your site, particularly about the pages of your site, and this approach tells the search engines about the pages you care about most. That should give search engines a signal that all other things being equal, you&#039;re telling them that these pages are the ones you care about. (Of course, all signals normally aren&#039;t equal, so instead this will be one signal balanced among many, but the same idea holds.) So, that&#039;s a solid approach.

-Put the non-indexed pages in the Sitemap. The idea behind this method is that search engines already know about the rest of your site, so you&#039;re just making sure they know about these as well. This may seem the opposite of the first approach. After all, if from the first approach search engines should get a signal that the pages in the Sitemap are most important, then wouldn&#039;t the search engines use that same signal for this set of URLs? When really they might be the least important (hence the non-indexing). It may seem that way, but actually that&#039;s not the case. Since search engines use the Sitemap as one of many signals, what you&#039;re really saying with URLs in a Sitemap, is hey, search engine! pay attention to these pages! It generally won&#039;t cause the search engine to then forsake all other signals that caused indexing of the other pages. It will just focus some extra attention on these. A Sitemap comes into play the most in the crawling process. So, if some pages aren&#039;t indexed, it makes sense to make sure the search engines know about them so they can crawl them.

-Put a comprehensive list of URLs in the Sitemap. This is my preferred approach when it&#039;s technically practical. Why not tell search engines what the definitive list of pages on your site is? Why limit it to really important ones? One benefit to this is that there&#039;s at least one place other than crawling that Sitemaps can be helpful, and that&#039;s canonicalization. If a search engine has detected that several URLs display the same page, the version of the URL that&#039;s in the Sitemap is a signal as to which is the canonical version.

In reality, any of these approaches are good ones. Sitemaps enable the site owner to have a voice in the long list of signals that search engines use to crawl and index pages. Since they&#039;re a signal and not a directive, they don&#039;t correlate to just one option. The signal tells the search engines that you care about their crawlers taking a look at these pages, and many times, they then do.

I imagine that each search engine uses the Sitemap signals slightly differently, since after all, each search engine has different crawling and indexing algorithms. However, I do think  that it would be useful for the search engines to come together and let us know how exactly they use them and how they differ in using them. In particular, it would be very helpful if, as part of sitemaps.org, they got together and made sure they weren&#039;t using Sitemaps for opposing purposes. You don&#039;t want to have a shared standard that is used so differently that if a site owner compiles a Sitemap in a particular way, it helps with one search engine and hurts with another.

When I worked on the sitemaps.org collaboration, it was all about figuring out what the standard should be and coming together to support it. Now that all the major engines do, I think the next step is sorting out more details about how they&#039;re used (particularly since the search engines should now have lots of data about how they can best be used) and give site owners best practices.</description>
		<content:encoded><![CDATA[<p>There are several ways people approach what to put in Sitemaps:</p>
<p>-Put the important pages in the Sitemap. This method is a good one is if it&#8217;s problematic to put all pages in the Sitemap. The point of the Sitemap is to let search engine know more about your site, particularly about the pages of your site, and this approach tells the search engines about the pages you care about most. That should give search engines a signal that all other things being equal, you&#8217;re telling them that these pages are the ones you care about. (Of course, all signals normally aren&#8217;t equal, so instead this will be one signal balanced among many, but the same idea holds.) So, that&#8217;s a solid approach.</p>
<p>-Put the non-indexed pages in the Sitemap. The idea behind this method is that search engines already know about the rest of your site, so you&#8217;re just making sure they know about these as well. This may seem the opposite of the first approach. After all, if from the first approach search engines should get a signal that the pages in the Sitemap are most important, then wouldn&#8217;t the search engines use that same signal for this set of URLs? When really they might be the least important (hence the non-indexing). It may seem that way, but actually that&#8217;s not the case. Since search engines use the Sitemap as one of many signals, what you&#8217;re really saying with URLs in a Sitemap, is hey, search engine! pay attention to these pages! It generally won&#8217;t cause the search engine to then forsake all other signals that caused indexing of the other pages. It will just focus some extra attention on these. A Sitemap comes into play the most in the crawling process. So, if some pages aren&#8217;t indexed, it makes sense to make sure the search engines know about them so they can crawl them.</p>
<p>-Put a comprehensive list of URLs in the Sitemap. This is my preferred approach when it&#8217;s technically practical. Why not tell search engines what the definitive list of pages on your site is? Why limit it to really important ones? One benefit to this is that there&#8217;s at least one place other than crawling that Sitemaps can be helpful, and that&#8217;s canonicalization. If a search engine has detected that several URLs display the same page, the version of the URL that&#8217;s in the Sitemap is a signal as to which is the canonical version.</p>
<p>In reality, any of these approaches are good ones. Sitemaps enable the site owner to have a voice in the long list of signals that search engines use to crawl and index pages. Since they&#8217;re a signal and not a directive, they don&#8217;t correlate to just one option. The signal tells the search engines that you care about their crawlers taking a look at these pages, and many times, they then do.</p>
<p>I imagine that each search engine uses the Sitemap signals slightly differently, since after all, each search engine has different crawling and indexing algorithms. However, I do think  that it would be useful for the search engines to come together and let us know how exactly they use them and how they differ in using them. In particular, it would be very helpful if, as part of sitemaps.org, they got together and made sure they weren&#8217;t using Sitemaps for opposing purposes. You don&#8217;t want to have a shared standard that is used so differently that if a site owner compiles a Sitemap in a particular way, it helps with one search engine and hurts with another.</p>
<p>When I worked on the sitemaps.org collaboration, it was all about figuring out what the standard should be and coming together to support it. Now that all the major engines do, I think the next step is sorting out more details about how they&#8217;re used (particularly since the search engines should now have lots of data about how they can best be used) and give site owners best practices.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Amit Agarwal</title>
		<link>http://www.ninebyblue.com/blog/increasing-search-indexing-coverage-with-an-xml-sitemap/comment-page-1/#comment-2642</link>
		<dc:creator>Amit Agarwal</dc:creator>
		<pubDate>Tue, 14 Oct 2008 06:13:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.ninebyblue.com/?p=260#comment-2642</guid>
		<description>Hi Vanessa - Thanks for this informative article - the frequency field of sitemaps has always been very confusing but glad you covered it.

I was reading a recent post on SEOMoz that quoted a discussion from SMX East. It says &quot;Put really important pages in your sitemap, rather than every page on your site. &quot;

Would love to hear your opinion on this.</description>
		<content:encoded><![CDATA[<p>Hi Vanessa &#8211; Thanks for this informative article &#8211; the frequency field of sitemaps has always been very confusing but glad you covered it.</p>
<p>I was reading a recent post on SEOMoz that quoted a discussion from SMX East. It says &#8220;Put really important pages in your sitemap, rather than every page on your site. &#8221;</p>
<p>Would love to hear your opinion on this.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
