<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nine By Blue &#187; SEO</title>
	<atom:link href="http://www.ninebyblue.com/category/blog/seo-blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ninebyblue.com</link>
	<description>by Vanessa Fox</description>
	<lastBuildDate>Mon, 30 Aug 2010 01:23:33 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Bad SEO Advice</title>
		<link>http://www.ninebyblue.com/blog/bad-seo-advice/</link>
		<comments>http://www.ninebyblue.com/blog/bad-seo-advice/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 01:22:03 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1364</guid>
		<description><![CDATA[I come across bad SEO advice all the time. Much of it may seem obvious to those of us who have been involved in search for any length of time, but for people who haven&#8217;t, it can be difficult to know what&#8217;s concrete advice, what&#8217;s speculation, and what&#8217;s just plain terrible. For that matter, it [...]]]></description>
			<content:encoded><![CDATA[<p>I come across bad SEO advice all the time. Much of it may seem obvious to those of us who have been involved in search for any length of time, but for people who haven&#8217;t, it can be difficult to know what&#8217;s concrete advice, what&#8217;s speculation, and what&#8217;s just plain terrible. For that matter, it can be difficult for those outside of SEO to know what&#8217;s smart and what&#8217;s considered search engine manipulation.</p>
<p>I was in a meeting a few days ago and someone asked if it was true that for SEO purposes, a page should have as few outbound links as possible. I said outbound links were fine, great even! And then talked a bit about how it&#8217;s a bad idea to build pages for nuances in the search engine algorithms anyway, as hundreds of signals exist and they&#8217;re changing all the time. Oh, he said. We&#8217;ve been talking about implementing the <a href="http://searchengineland.com/canonical-tag-16537" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/canonical-tag-16537?referer=');">canonical tag</a>. We probably shouldn&#8217;t do that then. And I realized, how would a developer know that the canonical tag is awesome and the meta keywords tag isn&#8217;t? That you shouldn&#8217;t worry about keyword density but you should put important keywords in your title tag?</p>
<p>Recently, someone sent me an &#8220;SEO optimization report&#8221; for their site that came from automated software that guaranteed top ten rankings in 90 days. Some of the advice was good (use unique title tags), some was harmless (improve your Flesch readability ease score), and some was just crazy talk. Below is a bit of the crazy.</p>
<p><strong>&#8220;You should increase your keyword density. You can do this by removing some text.&#8221;</strong></p>
<p>This whole notion of keyword density has been around forever, but here&#8217;s what it really boils down to. How is your potential audience looking for this content? Put those words in your title tag, H1, and somewhere on the page. And use those words as anchor text in internal links to that page. If other sites link to the page using that anchor text, even better! It&#8217;s bad enough when people try to get the &#8220;right&#8221; keyword density by nonsensically repeating the same words over and over on a page, but removing other text? That&#8217;s just sad.</p>
<p><strong>&#8220;Keywords in the HTML comment tags help a good ranking in Google.&#8221;</strong></p>
<p>Um. Not really.</p>
<p><strong>&#8220;Some search engines penalize sites if the terms from the meta keywords tag don&#8217;t appear in the body of the page.&#8221;</strong></p>
<p>Well, first, search engines (in particular, Google) ignore the meta keywords tag. And also, this statement isn&#8217;t true.</p>
<p><strong>&#8220;Your page includes the meta Google-Site-Verification tag twice. Search engines could regard it as a spamming  attempt and might decide not to index your web site.&#8221;</strong></p>
<p>Wow. I assume this is simply a case of automation going awry and whoever wrote this software doesn&#8217;t actually think that having two verified Google Webmaster Tools accounts will cause Google to remove the site from the index. But even so, having duplicate meta tags of any kind doesn&#8217;t cause Google or Bing to flag the site for spam. I mentioned this was all about the crazy, right?</p>
<p><strong>&#8220;Some search engines don&#8217;t accept submissions with capitalized letters in titles or meta tags.&#8221;</strong></p>
<p>Maybe someone more familiar with old school directories can weigh in on where this comes from. But recommending that your title tags not contain capital letters? This may be automated software, but someone manually wrote that message.</p>
<p><strong>&#8220;Some search engines rank sites lower that are hosted at free hosting providers.&#8221;</strong></p>
<p><a href="http://www.mattcutts.com/blog/myth-busting-virtual-hosts-vs-dedicated-ip-addresses/" onclick="pageTracker._trackPageview('/outgoing/www.mattcutts.com/blog/myth-busting-virtual-hosts-vs-dedicated-ip-addresses/?referer=');">No</a>.</p>
<p>PS &#8211; Creative use of bold won&#8217;t actually help. And question marks in URLs are just fine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/bad-seo-advice/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>URL Referrer Tracking</title>
		<link>http://www.ninebyblue.com/blog/url-referrer-tracking/</link>
		<comments>http://www.ninebyblue.com/blog/url-referrer-tracking/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 03:18:28 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1355</guid>
		<description><![CDATA[Note: This post was originally posted on Jane and Robot in November 2008 (by Nathan Buggia) and is being temporarily stored here.

There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this [...]]]></description>
			<content:encoded><![CDATA[<p><strong><em>Note: </em></strong><em>This post was originally posted on Jane and Robot in November 2008 (by <a href="http://nathanbuggia.com" onclick="pageTracker._trackPageview('/outgoing/nathanbuggia.com?referer=');">Nathan Buggia</a>) and is being temporarily stored here.</em></p>
<div>
<p>There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this way can result in significant issues with search engines. In particular, it can cause duplicate content issues (since the search engine bot finds multiple valid URLs that point to the same page) and ranking issues (since all the links to the page aren&#8217;t to the same URL).</p>
<p>Let&#8217;s say that Jane and Robot uploaded two different online training seminars to YouTube as part of a viral marketing effort to drive more traffic to our site. To gauge our return on investment from each of these seminars, we&#8217;ve added a tracking parameter to the link within each YouTube description that a customer can click on to learn more, here are the two URLS: http://janeandrobot.com/?from=promo-seminar-1 and http://janeandrobot.com/?from=promo-seminar-2. Each would bring the customer to our home page (the same page served by http://janeandrobot.com) and we would track the conversions based on the from parameter in the URL.</p>
<p>While this solution may seem to work well initially, it can result in low quality tracking data and impact our search acquisition. Here&#8217;s a summary of the major problems:</p>
<ol>
<li>Duplicate content - search engines sometimes have difficulty determining if two URLs contain the exact same page (see <a href="/post/canonical-url-canonicalization-domain.aspx">canonicalization</a> for more information). In this case, we&#8217;re creating this problem because we&#8217;ve created multiple URLs for the same page. Search engines are likely to find all three URLs for the home page and store/ rank them as separate content within their index. This could cause the search engine robots to crawl the page three times instead of just once (which may not be a big deal if we are only tracking two promotions, but could become a big problem if we used similar tracking parameters for many other campaigns and URLs). Not only are the robots using more bandwidth than is necessary, but since they don&#8217;t crawl a site infinitely, they could spend all the allotted time crawling duplicate pages and never get to some of the good unique pages on the site.</li>
<li>Ranking - search engines use the number of quality links pointing to a URL as a major signal in determining the authority and usefulness of that content. Because we now have three different URLs pointing to the same page, people have three choices when linking to it. The result is a lower rank for all of the variations of the URL. Search engines generally filter out duplicates, so for instance, if the original (canonical) home page has 100 incoming links and each URL with a tracking parameter has 25 links, then search engines might filter out the two URLs with fewer links and show only the canonical URL, ranking it at position eight for a particular query based on those 100 incoming links. If all incoming links were to the same URL, then search engines would count 150 links to the home page and might rank it at position three for that same query. Another danger is that if one of the YouTube promo videos becomes exceptionally popular, its promo URL might gain more links than the original home page URL. Using this same example, if one of the promo URLs gained 200 links, search engines might choose to display it in the search results over the original home page. This could cause a confusing experience for potential customers who are looking for your home page (http://janeandrobot.com/?from=promo-seminar-1 doesn&#8217;t look like a home page and searchers might be less likely to click on it, thinking it&#8217;s not the page they&#8217;re looking for). It&#8217;s also not ideal from a branding perspective.</li>
<li>Reporting quality - as social networking sites become more popular, we become more of a sharing culture online. Many people use bookmarks, and online bookmarking sites such as Delicious, email, and other sharing sites such as Facebook, Twitter, and FriendFeed to save and share URLs. They&#8217;ll click on on a URL, and if they like it, copy and paste it from the browser&#8217;s address bar. If the link they&#8217;re saving/sharing happens to be one of our promotional links, then they have preserved this link for all time, and everyone who clicks through the link will look identical to someone coming through the promo. This skews the reporting numbers of who went to the site after viewing the video &#8212; which was why we set up the tracking parameters in the first place!</li>
</ol>
<h2>Implementation Options</h2>
<p>Unfortunately there is no perfect solution for this scenario, and what works best for you depends on your infrastructure and situation. Here we&#8217;ve listed several common solutions that you can choose from to improve your own implementation. We generally recommend the first solution (Redirects), but there are pros and cons to each option that you should review carefully before making your decision.</p>
<h3>Redirects (and Cookies)</h3>
<p>The first option strives to solve the problem by trapping all of the promotional requests, recording the tracking information, then removing the tracking parameter from the URL. This can be time consuming to implement, but it is the best all-round scenario to address the three major issues listed above.</p>
<p>If you wanted to get fancy, and track a user&#8217;s entire session based on your referral parameter, then you can use this method as well and simply set a cookie on the client machine at the same time you trap the request. This is recommended to understand the value of traffic from different sources. In either case, here are the steps you&#8217;ll need to undertake:</p>
<p>1. Trap the incoming request - find where you web site application&#8217;s logic processes the HTTP request for your page. Trap each request at that point and check if it has a tracking parameter. If it does, record this in your internal referral tracking system. You can record this either in your server logs, or in a custom referral tracking database you maintain on your own.</p>
<ul>
<li>If you also would like to track the entire user&#8217;s session, then you should also use this opportunity to set a cookie on the client.</li>
</ul>
<p>2. Implement the redirect - next step is to implement a 301 redirect from the current URL to the same page without the tracking parameter (or the canonical URL). Don&#8217;t for get to use the cache-control attribute in the HTTP header to ensure that all the requests come to your server and don&#8217;t get handled automatically in some network-based cache. Here&#8217;s what a sample redirect header might look like:</p>
<div>
<pre>301 Moved Permanently</pre>
<pre>Cache-Control: max-age=0</pre>
</div>
<p>Note that ASP.Net and IIS both use 302 redirects by default, so you many need to manually create the 301 response code.</p>
<p>The way this works is that when a search engine encounters a promotional URL (http://janeandrobot.com/?from=promo-seminar-1) it issues an HTTP GET request to the URL. The HTTP response tells the search engine that this page has been permanently moved (301 Redirect) and provides the new address (the same as the old address but without the tracking parameter). The search engine then discards the first URL (with the tracking code) and only stores the second URL (without the tracking code). And everything is right in the world.</p>
<p>This implementation is one of the best options, but it does have some limitations:</p>
<ul>
<li>One downside of this method is that it requires you to manage your own referral tracking system. Because it traps the referral parameters and removes them from the URL before the page actually loads, 3rd party referral tracking applications like Google Analytics, Omniture, WebTrends or Microsoft adCenter Analytics will not be able to track these referrals.</li>
</ul>
<h3>Canonical URL &lt;Link /&gt; Tag</h3>
<p>Possibly the simplest option to solve this issue is to take advantage of a new standard recently adopted by Google, Yahoo and Microsoft Live Search. Their solution to this problem is to use a new attribute of the &lt;link /&gt; tag to explicitly tell them what the canonical URL for the page is. Assuming the &lt;link /&gt; tag has been created correctly, the search engines will treat this like the a 301 redirect to the canonical URL.</p>
<p>Here&#8217;s an example of using this tag:</p>
<div>
<pre>&lt;html&gt;</pre>
<pre>   &lt;head&gt;</pre>
<pre>      &lt;link rel="canonical" href="http://janeandrobot.com" /&gt;</pre>
<pre>   &lt;/head&gt;</pre>
<pre>&lt;/html&gt;</pre>
</div>
<p>Here&#8217;s a few notes about implementing this tag:</p>
<ul>
<li>Search engines view this as a hint, not a command. Implementing this tag isn&#8217;t a guarantee, although Google said they will try their best to make it work. The reason they can&#8217;t give any guarantees is because they may detect that you are implementing it incorrectly, or it is being used for some type of spammy scenario.</li>
<li>Relative or absolute URL are supported within the href attribute. However, I recommend that you use absolute URLs whenever possible. This helps the search engines further normalize the URLs because they see what protocol (http or https) you use, and whether or not you are prefixing your domain with &#8220;www.&#8221;.</li>
<li>Sub-domains are supported, separate domains are not. With this tag you can specify a separate a different sub-domain, for example within this URL (http://janeandrobot.com?from=promo-seminar-2) you could specify this canonical URL (http://videos.janeandrobot.com). However, the &lt;link /&gt; tag would not be valid if you specific a completely different domain like this http://janeandrobot-videos.com.</li>
<li>Common Pitfalls&#8230; You&#8217;ll want to ensure that you don&#8217;t do anything silly like (i) create an infinite loop with two canonical tags pointing to each other (ii) have the canonical tag point to a page that returns a 404 status code. You should also make sure that your canonical URL is generally a short and simple URL.</li>
</ul>
<p>While this implementation seems a little too good to be true, there are a few potential downsides. The first is that if you implement it incorrectly, the search engines will simply ignore it, and that could be complicated to debug. The other issue is that it fixes issues #1 (duplicate content) and #2 (ranking) but does nothing to fix the 3rd issue of reporting. Still, given all of that I would likely implement this option first and do the others when I had some spare dev cycles.</p>
<h3>URL Fragment</h3>
<p>A simple and elegant option is to simply place the tracking parameter behind a hash mark in the URL, creating a URL fragment. Traditionally, these are used to denote links within a page, and are ignored completely by search engines. In fact, they simply truncate the URL fragment from the URL.</p>
<p>Old URL</p>
<ul>
<li>http://janeandrobot.com/?from=promo-seminar-1</li>
<li>http://janeandrobot.com/?from=promo-seminar-2</li>
</ul>
<p>New URL with URL Fragment</p>
<ul>
<li>http://janeandrobot.com/#from=promo-seminar-1</li>
<li>http://janeandrobot.com/#from=promo-seminar-1</li>
</ul>
<p>By default Google Analytics will ignore the fragment as well, however there is a simple work around that was provided to us by <a href="http://www.kaushik.net/avinash/" onclick="pageTracker._trackPageview('/outgoing/www.kaushik.net/avinash/?referer=');">Avinash Kaushik</a>, Google&#8217;s web metrics evangelist. Using the following JavaScript:</p>
<div>
<pre>var pageTracker = _gat._getTracker("UA-12345-1");</pre>
<pre>// Solution for domain level only</pre>
<pre>pageTracker._trackPageview(document.location.pathname + "/" + document.location.hash);</pre>
<pre>// If you have a path included in the URL as well</pre>
<pre>pageTracker._trackPageview(document.location.pathname + document.location.search +</pre>
<pre>                           "/" + document.location.hash);</pre>
</div>
<p>You can create a few additional variations of this if you also have additional queries in the URL you would like to track. Check with your web analytics provider to find out if you need to customize your implementation to account for using URL fragments for tracking.</p>
<p>Does this sound too simple and easy to be true? There are a couple downsides to this approach:</p>
<ul>
<li>This option fixes issues 1 (duplicate content) &amp; 2 (ranking) listed above, but it will not address the 3rd issue of reporting. You could still encounter some reporting issues using this method if people are bookmarking or emailing around the URL.</li>
<li>Typically you&#8217;ll have to write some custom code to parse the URL fragment. Since it&#8217;s a non-standard implementation, standard methods may not support this.</li>
</ul>
<h3>Robots Exclusion Protocol</h3>
<p>Another relatively simple solution is to use robots.txt to ensure that search engines are not indexing URLs that contain tracking parameters. This method enables you to ensure that the original (canonical) version of the URL is always the one indexed and avoids the duplicate content issues involving indexing and bandwidth.</p>
<p>Assuming that all of our tracking parameters will follow a similar pattern to this:</p>
<p>http://janeandrobot.com/?from=&lt;PromoID&gt;</p>
<p>we can easily create a pattern that will match for this. Below is a robots.txt file that implements the pattern:</p>
<div>
<pre># Sample Robots.txt file, single query parameter</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /?from=</pre>
</div>
<p>The first line means that this rule should apply to all search engines (or robots crawling your site), and the second line tells them that they can&#8217;t index any URLs that start with &#8216;janeandrobot.com/?from=&#8217; and some type of promotional code of any length. See complete information on using the <a href="/post/Managing-Robots-Access-To-Your-Website.aspx">Robots Exclusion Protocol</a>. Use this pattern if you will have multiple query parameters:</p>
<div>
<pre># Sample Robots.txt file, multiple query parameters</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*from=</pre>
</div>
<p>Once you&#8217;ve implemented the pattern appropriate for your site, you can easily check to see if it is working correctly by using the <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35237" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35237&amp;referer=');">Google Webmaster Tools robots.txt analysis tool</a>. It enables you to test specific URLs against a test robots.txt file. Note that although this tool tests GoogleBot specifically, all the major search engines <a href="http://searchengineland.com/yahoo-google-microsoft-clarify-robotstxt-support-14125.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/yahoo-google-microsoft-clarify-robotstxt-support-14125.php?referer=');">support the same pattern matching rules</a>. In <a href="https://www.google.com/webmasters/tools" onclick="pageTracker._trackPageview('/outgoing/www.google.com/webmasters/tools?referer=');">Google Webmaster Tools</a>:</p>
<ol>
<li>Add the site, then click Tools &gt; Analyze robots.txt. (Unlike most features in Google Webmaster Tools, you don&#8217;t need to verify ownership of the site to use the robots.txt analysis tool). The tool displays the current robots.txt file.</li>
<li>Modify this file with the Disallow line for the tracking parameter. (If the site doesn&#8217;t yet have a robots.txt file, you&#8217;ll need to copy in both the User-agent and Disallow lines.)</li>
<li>In the Test URLs box, add a couple of the URLs you want to block. Also add a few URLs you do want indexed (such as the original version of the URL that you&#8217;re adding tracking parameters to).</li>
<li>Click Check. The tool displays how Googlebot would interpret the robots.txt file and if each URL you are testing would be blocked or allowed.</li>
</ol>
<p>At this point you may be thinking, wow, I can do all this and not have to write any new code? Unfortunately, there are even more downsides to this approach than the others:</p>
<ul>
<li>This option will fix issue 1 (duplicate content), but not issues 2 (ranking) and 3 (reporting). This can be a good interim solution while you&#8217;re implementing the more complete redirects solution, but it often isn&#8217;t useful enough on its own.</li>
<li>Likely this will take a little bit of extra testing to ensure you get the patterns correct in your robots.txt file and don&#8217;t inadvertently block content you want indexed.</li>
</ul>
<h3>Yahoo Site Explorer</h3>
<p>Yahoo provides an online tool designed to solve this scenario. However, the solution only helps with Yahoo search traffic. To use the Yahoo fix, simply go to <a href="http://siteexplorer.search.yahoo.com" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com?referer=');">http://siteexplorer.search.yahoo.com</a> and create an account for your web site in the Yahoo Site Explorer tool. Once you&#8217;ve verified ownership of your web site, you can use their <a href="http://www.ysearchblog.com/archives/000479.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000479.html?referer=');">Dynamic URL Rewriting</a> tool to indicate which parameters in your URLs Yahoo should ignore.</p>
<p><a href="http://siteexplorer.search.yahoo.com" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com?referer=');"><img title="yahoo-url-rewriting-tool" src="http://janeandrobot.com/wp-content/uploads/2008/11/yahoo2.png" alt="Yahoo URL Rewriting Tool" /></a></p>
<p>Simply specify the name of the parameter you use for referral tracking (in our example it is &#8216;from&#8217;), and set the action &#8216;Remove from URLs&#8217;. Yahoo will then remove that parameter from all of your URLs while processing them and give you a handy little report about how many URLs where impacted.</p>
<p>Again, this is another solution that seems too easy to be true, but again, there are some significant limitations with this approach:</p>
<ul>
<li>At the end of the day this is still a Yahoo-only solution. With approximately 20% market share, it is likely this will not meet all of your needs. However, if you do get some percentage of your traffic from Yahoo, there is no harm in doing this in the short term while you implement another method in the longer term.</li>
<li>The other problem with this solution is that it doesn&#8217;t solve issue #3 (reporting), so you are still susceptible to reporting errors due to folks bookmarking and emailing your URLs with tracking codes.</li>
</ul>
<h2>Common Pitfalls</h2>
<h3>Cloaking &amp; Conditional Redirects</h3>
<p>Some web sites and SEO consultants attempt to solve this by a technique called cloaking or conditional redirects. Essentially what these methods do is check if the HTTP GET request is coming from a search engine and then show them something different than normal users see. This something different could be a simple 301 redirect back to the page without the tracking parameter similar to our first solution above. The difference is that our solution implemented this redirect for all requesters, and cloaking/ conditional redirects implement it only for search engines.</p>
<p>The big problem with this implementation method is that cloaking and conditional redirects are explicitly prohibited in the webmaster guidelines for <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=66355" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=66355&amp;referer=');">Google</a>, <a href="http://help.yahoo.com/l/us/yahoo/search/basics/basics-18.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/basics/basics-18.html?referer=');">Yahoo</a> and <a href="http://help.live.com/help.aspx?mkt=en-us&amp;project=wl_webmasters" onclick="pageTracker._trackPageview('/outgoing/help.live.com/help.aspx?mkt=en-us_amp_project=wl_webmasters&amp;referer=');">Live Search</a>.  If you use this method, you risk your pages being penalized or banned by the search engines. The primary reason they prohibit this behaviors is because they want to know exactly what content they are presenting searchers using their service. When a web site shows something different to a search engine robot than to a general user, a search engine can never be sure what the user will see when they go to the web site. So, even if you&#8217;re thinking of implementing cloaking for what seems to be a <a href="http://www.ninebyblue.com/blog/whats-really-black-hat-anyway/">valid, and not deceptive, reason</a>, it&#8217;s still a technique search engines strongly discourage.</p>
<p>This leads to the second major problem with this implementation method &#8211; it adds significant complication and can be difficult to monitor whether or not it&#8217;s working &#8211; e.g. you have to test it pretending to be each of the 3 search engines robots. When things go wrong, it is likely that you&#8217;re not going to see it right away, and by the time you do, your search engine traffic may already be impacted. Check out this example when Nike ran into an<a href="http://www.vabeachkevin.com/nikecom-pay-attention-googlebot-cloaking-broken/" onclick="pageTracker._trackPageview('/outgoing/www.vabeachkevin.com/nikecom-pay-attention-googlebot-cloaking-broken/?referer=');">issue with cloaking</a>.</p>
<h3>Crazy Tracking Codes</h3>
<p>Many studies on the web that show <a href="http://www.marketingsherpa.com/article.php?ident=30181" onclick="pageTracker._trackPageview('/outgoing/www.marketingsherpa.com/article.php?ident=30181&amp;referer=');">customers prefer short, understandable URLs</a> over long complicated ones, and are more likely to click on them in the search results. In addition, users prefer descriptive keywords in URLs. Therefore, it might be worth your time to spend a few extra minutes thinking about the tracking codes you use to see if you can make them friendlier.</p>
<p>Good examples</p>
<ul>
<li>?from=promo</li>
<li>?from=developer-video</li>
<li>?partner=a768sdf129</li>
</ul>
<p>Bad examples</p>
<ul>
<li>?i=A768SDF129,re23ADFA,style-23423,date-2008-02-01&amp;page=2</li>
<li>?IAmSpyingOnYou=a768sdf129&amp;YouAreASucker=re23adfd</li>
</ul>
<h2>Testing Your Implementation</h2>
<p>So you&#8217;ve implemented your new favorite method, it compiles on your dev box, and now it&#8217;s time to roll it into production, right? Maybe not! The initial goal of referrer URL-based tracking was to understand where your traffic was coming from so you can use that information to optimize your business. To ensure the data your collecting is actually useful, we highly recommend that you do some testing to ensure that all the common scenarios are working the way you expect, and you know where the holes are in your measurement capabilities. As with all metrics on the web, there will be holes in your data so you need to know what they are and account for them.</p>
<p>The first step in testing the implementation is to try it with a test parameter, walking the full scenario through start to finish.</p>
<ol>
<li>Create several phoney promotional links that reflect the actual types of links you expect. This could be on your home page, product pages or with many additional query parameters that you might encounter.</li>
<li>Place these fake promotional links in a location that won&#8217;t confuse your customers but are likely to get indexed by search engines. Using a social networking site or a blog might serve this well.</li>
<li>Click through those links as a customer and verify that you get to the correct page with a good user experience. Be sure to take these into account as well:
<ul>
<li>Redirects operating properly (if you&#8217;re using them) - use the <a href="https://addons.mozilla.org/en-US/firefox/addon/3829" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/3829?referer=');">Live HTTP Headers</a> tool in FireFox to ensure the application is providing the correct headers (301 redirect and caching).</li>
<li>Major browsers all work- if you&#8217;re using cookies, you should test all the major browsers to ensure that they support cookies and that your scenario works the way you might expect. Don&#8217;t forget to try common mobile browsers if your customers access your site this way.</li>
</ul>
</li>
<li>Check out the search engine experience to ensure that you&#8217;re not running into the duplicate content or ranking issues.
<ul>
<li>Major Engines submit URL - if you place the test URLs in the right social network or place on your blog, they should get indexed within a week or so. If they don&#8217;t you can also try the &#8220;submit a URL&#8221; from <a href="http://www.google.com/addurl/" onclick="pageTracker._trackPageview('/outgoing/www.google.com/addurl/?referer=');">Google</a>,<a href="http://siteexplorer.search.yahoo.com/submit" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/submit?referer=');">Yahoo</a> and <a href="http://search.msn.com.sg/docs/submit.aspx" onclick="pageTracker._trackPageview('/outgoing/search.msn.com.sg/docs/submit.aspx?referer=');">Microsoft</a>, though they are not guaranteed to work. Essentially you want to make sure the search engines have had the opportunity to see these URLs.</li>
<li>Use &#8217;site:&#8217; command to ensure tracking URLs are not indexed - here&#8217;s an example query in <a href="/admin/Pages/site:janeandrobot.com%20inurl:from">Google</a>, <a href="http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fjaneandrobot.com&amp;fr=sfp" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/search?p=http_3A_2F_2Fjaneandrobot.com_amp_fr=sfp&amp;referer=');">Yahoo</a>, and <a href="http://search.live.com/results.aspx?q=site%3Ajaneandrobot.com&amp;first=1&amp;FORM=PERE" onclick="pageTracker._trackPageview('/outgoing/search.live.com/results.aspx?q=site_3Ajaneandrobot.com_amp_first=1_amp_FORM=PERE&amp;referer=');">Microsoft</a> showing that our Jane and Robot example promotional URLs are not indexed.</li>
</ul>
</li>
<li>Take a look at your metrics and ensure the numbers you&#8217;re recording correlate to the testing you are doing. Some additional things to consider:
<ul>
<li><span style="text-decoration: underline;">Internal referrals </span>- you might also want to add some logic to your application to filter out (or exclude) all referrals from the development team and your own employees. This is often done by checking requests against a list of known employee or company IP addresses and scrubbing those from your tracking data.</li>
<li><span style="text-decoration: underline;">Caching Issues </span>- you might also want to try out several scenarios with multiple subsequent requests. You&#8217;ll want to ensure that every request is going to your server and not getting cached somewhere along the way.</li>
</ul>
</li>
</ol>
<h2>Related Resources</h2>
<ul>
<li>Related Internet Standards:
<ul>
<li><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9" onclick="pageTracker._trackPageview('/outgoing/www.w3.org/Protocols/rfc2616/rfc2616-sec14.html_sec14.9?referer=');">W3C Standard For Cache-Control Header</a></li>
<li><a href="http://www.apps.ietf.org/rfc/rfc2396.html" onclick="pageTracker._trackPageview('/outgoing/www.apps.ietf.org/rfc/rfc2396.html?referer=');">URI Specification</a> (how URLs work)</li>
</ul>
</li>
<li>Tools Used in Article:
<ul>
<li><a href="http://google.com/webmasters/tools" onclick="pageTracker._trackPageview('/outgoing/google.com/webmasters/tools?referer=');">Google Webmaster Tools</a> (Robots.txt Tester)</li>
<li><a href="http://siteexplorer.search.yahoo.com/" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/?referer=');">Yahoo Site Explorer</a> (Dynamic URL Rewriting)</li>
<li><a href="https://addons.mozilla.org/en-US/firefox/addon/3829" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/3829?referer=');">Live HTTP Headers</a> (View HTTP Headers)</li>
<li>Suggest URL Tool - <a href="http://www.google.com/addurl/" onclick="pageTracker._trackPageview('/outgoing/www.google.com/addurl/?referer=');">Google</a>, <a href="http://siteexplorer.search.yahoo.com/submit" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/submit?referer=');">Yahoo</a>, <a href="http://search.msn.com.sg/docs/submit.aspx" onclick="pageTracker._trackPageview('/outgoing/search.msn.com.sg/docs/submit.aspx?referer=');">Microsoft</a></li>
</ul>
</li>
<li>Canonical Link Tag Standard
<ul>
<li><a href="http://searchengineland.com/canonical-tag-16537" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/canonical-tag-16537?referer=');">Search Engine Land Article</a> (Best Practices)</li>
<li><a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html?referer=');">Google Announcement Blog Post</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=139394" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=139394&amp;referer=');">Google Help Documentation</a></li>
<li><a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/" onclick="pageTracker._trackPageview('/outgoing/ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/?referer=');">Yahoo Announcement Blog Post</a></li>
<li><a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx?referer=');">Live Search Announcement Blog Post</a></li>
</ul>
</li>
<li>Related articles
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=66359" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=66359&amp;referer=');">Duplicate Content &#8211; Google Technical Support</a></li>
<li><a href="http://blogs.omniture.com/2008/10/01/campaign-tracking-inside-omniture-sitecatalyst/" onclick="pageTracker._trackPageview('/outgoing/blogs.omniture.com/2008/10/01/campaign-tracking-inside-omniture-sitecatalyst/?referer=');">URL Tracking in Omniture&#8217;s SiteCatalyst</a></li>
<li><a href="http://www.google.com/support/googleanalytics/bin/answer.py?answer=55515" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/googleanalytics/bin/answer.py?answer=55515&amp;referer=');">Goal Tracking in Google Analytics</a></li>
</ul>
</li>
<li><a href="http://www.kaushik.net/avinash" onclick="pageTracker._trackPageview('/outgoing/www.kaushik.net/avinash?referer=');">Occam’s Razor by Avinash Kaushik</a></li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/url-referrer-tracking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Managing Robot&#8217;s Access To Your Website</title>
		<link>http://www.ninebyblue.com/blog/managing-robots-access-to-your-website-2/</link>
		<comments>http://www.ninebyblue.com/blog/managing-robots-access-to-your-website-2/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 03:06:10 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1347</guid>
		<description><![CDATA[Note: This post was originally posted on Jane and Robot in June 2008 and is being temporarily stored here.

Controlling what content is blocked from being found in search engines is crucial for many websites. Fortunately, the major search engines and other well-behaved robots observe the Robots Exclusion Protocol (REP), which has evolved organically since the early [...]]]></description>
			<content:encoded><![CDATA[<p><strong><em>Note: </em></strong><em>This post was originally posted on Jane and Robot in June 2008 and is being temporarily stored here.</em></p>
<div>
<p>Controlling what content is blocked from being found in search engines is crucial for many websites. Fortunately, the major search engines and other well-behaved robots observe the <a href="http://www.robotstxt.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org/?referer=');">Robots Exclusion Protocol</a> (REP), which has evolved organically since the early 1990&#8217;s to provide a set of controls over what parts of a web site search engines robots can crawl and index.</p>
<p>Article Sections:</p>
<ul>
<li><a href="#Capabilities_of_the_REP">Capabilities of REP</a></li>
<li><a href="#Deciding_what_should_be_Public_vs._Private">Deciding What Should be Public vs. Private</a></li>
<li><a href="#Implementing_the_REP">Implementing the REP</a>
<ul>
<li><a href="#Site_Level_Implementation_(Robots.txt)">Site Level</a></li>
<li><a href="#Page_Level_Implementation_(META_Tags)">Page Level (Meta Tags)</a></li>
<li><a href="#HTTP_Header_Implementation_(X-ROBOTS-Tag)">Page Level (HTTP Header)</a></li>
<li><a href="#Content_Level_Implementation">Content Level</a></li>
</ul>
</li>
<li><a href="#Common_implementation_mistakes">Common Mistakes</a></li>
<li><a href="#Testing_your_implementation_">Testing Your Implementation</a></li>
<li><a href="#removal">Removing Content From Search Engine Indices</a></li>
<li><a href="#Additional_Resources:_">Additional Resources</a></li>
</ul>
<h2><a name="Capabilities_of_the_REP"></a>Capabilities of the REP</h2>
<p>The Robots Exclusion Protocol provides controls that can be applied at the site level (robots.txt), at the page level (META tag, or X-Robots-Tag), or at the HTML element level to control both the crawl of your site and the way it&#8217;s listed in the search engine results pages (SERPs). Below is a table listing the common scenarios, directives, and which search engines support them.</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top">Use Case</td>
<td valign="top">Robots.txt</td>
<td valign="top">META/ X-Robots-Tag</td>
<td valign="top">Other</td>
<td valign="top">Supported By</td>
</tr>
<tr>
<td valign="top">Allow access to your content</td>
<td valign="top">Allow</td>
<td valign="top">FOLLOW<br />
INDEX</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow access to your content</td>
<td valign="top">Disallow</td>
<td valign="top">NOINDEX<br />
NOFOLLOW</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35303" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=35303&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow access to index images on the page</td>
<td valign="top"></td>
<td valign="top">NOIMAGEINDEX</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=79892" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=79892&amp;referer=');">Google</a></td>
</tr>
<tr>
<td valign="top">Disallow the display of a cached version of your content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOARCHIVE</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35306=" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35306=&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/deletion/basics-10.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/deletion/basics-10.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow the creation of a description for this content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOSNIPPET</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35304" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35304&amp;referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000587.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000587.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow the translation of your content into other languages</td>
<td valign="top"></td>
<td valign="top">NOTRANSLATE</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/help/faq_translation.html#donttrans" onclick="pageTracker._trackPageview('/outgoing/www.google.com/help/faq_translation.html_donttrans?referer=');">Google</a></td>
</tr>
<tr>
<td valign="top">Do not follow or give weight to links within this content</td>
<td valign="top"></td>
<td valign="top">NOFOLLOW</td>
<td valign="top">a href attribute:<br />
rel=NOFOLLOW</td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=96569" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=96569&amp;referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000069.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000069.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/livesearch/archive/2005/01/18/nofollow_tags.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2005/01/18/nofollow_tags.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Do not use the <a href="http://www.dmoz.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.dmoz.org/?referer=');">Open Directory Project</a> (ODP) to create descriptions for your content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOODP</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35264" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35264&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/indexing/indexing-11.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/indexing/indexing-11.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Do not use the Yahoo Directory to create descriptions for your content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOYDIR</td>
<td valign="top"></td>
<td valign="top"><a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Yahoo</a></td>
</tr>
<tr>
<td valign="top">Do not index this specific element within an HTML page</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top">class=robots-nocontent</td>
<td valign="top"><a href="http://www.ysearchblog.com/archives/000444.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000444.html?referer=');">Yahoo</a></td>
</tr>
<tr>
<td valign="top">Stop indexing this content after a specific date</td>
<td valign="top"></td>
<td valign="top">UNAVAILABLE_AFTER</td>
<td valign="top"></td>
<td valign="top"><a href="http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html" onclick="pageTracker._trackPageview('/outgoing/googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html?referer=');">Google</a></td>
</tr>
<tr>
<td valign="top">Disallow the creation of enhanced captions</td>
<td valign="top"></td>
<td valign="top">NOPREVIEW</td>
<td valign="top"></td>
<td valign="top"><a href="http://bing.com/community" onclick="pageTracker._trackPageview('/outgoing/bing.com/community?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Specify a sitemap file or a sitemap index file</td>
<td valign="top">Sitemap</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=64748" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=64748&amp;referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000437.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000437.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/livesearch/archive/2007/04/11/discovering-sitemaps.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2007/04/11/discovering-sitemaps.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Specify how frequently a crawler may access your website</td>
<td valign="top">Crawl-Delay</td>
<td valign="top"></td>
<td valign="top"><a href="http://google.com/webmaster" onclick="pageTracker._trackPageview('/outgoing/google.com/webmaster?referer=');">Google WMT</a></td>
<td valign="top"><a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/04/18/ramping-up-msnbot.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/04/18/ramping-up-msnbot.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Authenticate the identity of the crawler</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top">Reverse DNS Lookup</td>
<td valign="top"><a href="http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html?referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000460.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000460.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Request removal of your content from the engine&#8217;s index</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top"><a href="http://google.com/webmaster" onclick="pageTracker._trackPageview('/outgoing/google.com/webmaster?referer=');">Google WMT</a><br />
<a href="http://siteexplorer.search.yahoo.com" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com?referer=');">Yahoo SE</a><br />
<a href="http://webmaster.live.com/" onclick="pageTracker._trackPageview('/outgoing/webmaster.live.com/?referer=');">Microsoft WMT</a></td>
<td valign="top"><a href="http://googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html?referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/?referer=');">Yahoo</a><br />
Microsoft</td>
</tr>
</tbody>
</table>
<h2><a name="Deciding_what_should_be_Public_vs._Private"></a>Deciding What Should be Public vs. Private</h2>
<p>One of the first steps in managing the robots is knowing what type of content should be public vs. private. Start with the assumption that by default, everything is public, then explicitly identify the items that are private.</p>
<p>If you want search engines to access all the content on your site, you don&#8217;t need a robots.txt file at all. When a search engine tries to access the robots.txt file on your site and the server can&#8217;t return one (ideally by returning a 404 HTTP status code), the search engine treats this the same as a robots.txt file that allows access to everything.</p>
<p>Every website and every business has a different set of needs, so there&#8217;s no blanket rule for what to make private, but some common elements may apply.</p>
<ul>
<li><strong>Private data -</strong> You may have content on your site that you don&#8217;t want to be searchable in search engines. For instance, you may have private user information (such as addresses) that you don&#8217;t want surfaced. For this type of content, you may want to use a more secure approach that keeps all visitors from the pages (such as password protection). However, some types of content are fine for visitor access, but not search engine access. For instance, you may run a discussion forum that is open for public viewing, but you may not want individual posts to appear in search results for forum member names.</li>
<li><a name="noncontent"></a><strong>Non-content content </strong>- Some content, like <a href="/post/Effectively-Using-Images.aspx#noncontent">images used for navigation</a>, provides little value to searchers. It&#8217;s not harmful to include these items in search engine indices, but since search engines allocate limited bandwidth to crawl each site and limited space to store content from each site, it may make sense to block these items to help direct the bots to the content on your site that you do want indexed.</li>
<li><strong>Printer-friendly pages -</strong> if you have specific pages (URLs) that are formatted for printing you may want to block them out to avoid duplicate content issues. The drawback to allowing the printer-friendly page to be indexed is that it could potentially be listed in the search results instead of the default version of the page, which wouldn&#8217;t provide an ideal user experience for a visitor coming to the site through search.</li>
<li><strong>Affiliate links and advertising -</strong> If you include advertising on your site, you can keep search engine robots from following the links by redirecting them to a blocked page, then on to the destination page. (There are other methods for implementing advertising-based links as well.)</li>
<li><strong>Landing pages -</strong> Your site may include multiple variations of entry pages used for advertising purposes. For instance, you may run AdWords campaigns that link to a particular version of a page based on the ad, or you may print different URLs for different print ad campaigns (either for tracking purposes or to provide a custom experience related to the ad). Since these pages are meant to be an extension of the ad, and are generally near duplicates of the default version of the page, you may want to block these landing pages from being indexed.</li>
<li><strong>Experimental pages -</strong> As you try new ideas on your site (for instance, using A/B testing), you likely want to block all but the original page from being indexed during the experiment.</li>
</ul>
<h2><a name="Implementing_the_REP"></a>Implementing the REP</h2>
<p>REP is flexible and can be implemented a number of ways. This flexibility lets you easily specify some policies for your entire site (or subdomain) and then enhance them more granularly at the page or link level as needed.</p>
<h3><a name="Site_Level_Implementation_(Robots.txt)"></a>Site Level Implementation (Robots.txt)</h3>
<p>Site wide directives are stored in a robots.txt file, which must be located in the root directory of each domain or sub-domain (e.g. <a href="/robots.txt">http://janeandrobot.com/robots.txt</a>.) Note that robots.txt files only apply to the hostname where they are placed, and do not apply to subdomains. So a robots.txt file located on <a href="http://microsoft.com/robots.txt" onclick="pageTracker._trackPageview('/outgoing/microsoft.com/robots.txt?referer=');">http://microsoft.com/robots.txt</a> will not apply to the MSDN subdomain <a href="http://msdn.microsoft.com/" onclick="pageTracker._trackPageview('/outgoing/msdn.microsoft.com/?referer=');">http://msdn.microsoft.com</a>. However, the robots.txt file does apply to all subfolders and pages within the specified hostname.</p>
<p>A robots.txt file is a UTF-8 encoded file that contains entries that consist of a user-agent line (that tells the search engine robot if the entry is directed at it) and one or more directives that specify content that the search engine robot is blocked from crawling or indexing. A simple robots.txt file is shown below.</p>
<div>
<pre>User-agent: *</pre>
<pre>Disallow: /private</pre>
</div>
<p><code>user-agent:</code> &#8211; Specifies which robots the entry applies to.</p>
<ul>
<li>Set this to <code>*</code> to specify that this entry applies to all search engine robots.</li>
<li>Set this to a specific robot name to provide instructions for just that robot. You can find a complete list of robot names at <a href="http://www.robotstxt.org" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org?referer=');">robotstxt.org</a>.</li>
<li>If you direct an entry at a particular robot, then it obeys that entry instead of any entries defined for <code>user-agent: * </code>(rather than in addition to those entries).</li>
</ul>
<p>The major search engines have multiple robots that crawl the web for different types of content (such as images or mobile). They generally begin all robots with the same name so that if you block the major robot, all robots for that search engine are blocked as well. However, if you want to block only the more specific robot, you can block it directly and still allow web crawl access.</p>
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Google</a> &#8211; The primary search engine robot is Googlebot.</li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html?referer=');">Yahoo!</a> &#8211; The primary search engine robot is Slurp.</li>
<li><a href="http://www.bing.com/community/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx" onclick="pageTracker._trackPageview('/outgoing/www.bing.com/community/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx?referer=');">Bing</a> &#8211; The primary search engine robots is Bingbot. (The previous name for this bot was MSNbot, and Microsoft Bing continues to obey directives aimed at that bot as well.)</li>
</ul>
<p><code>Disallow: </code>- Specifies what content is blocked</p>
<ul>
<li>Must begin with a slash (<code>/</code>).</li>
<li>Blocks access to any URLs that begin with the characters after the <code>/</code>. For instance, <code>Disallow: /images</code> blocks access to <code>/images/</code>, <code>/images/image1.jpg</code>, and <code>/images10</code>.</li>
</ul>
<p>You can specify other rules for search engine robots in addition to the standard instructions that block access to content as noted in <a href="#other">other robot instructions</a>.</p>
<p>Some things to note about robots.txt implementation:</p>
<ul>
<li>The major search engines support pattern matching using the asterisk character (*) for wildcard match and the dollar sign ($) for end of sequence matching as described below in <a href="#patterns">using pattern matching</a>.</li>
<li>The robots.txt file is case sensitive, so <code>Disallow: /images </code>would block <code>http://www.example.com/images</code> but not <code>http://www.example.com/Images</code>.</li>
<li>If conflicts exist in the file, the robot obeys the longest (and therefore generally more specific) line.</li>
</ul>
<h4>Basic Samples</h4>
<p>Block all robots - Useful when your site is in pre-launch development and isn&#8217;t ready for search traffic.</p>
<div>
<pre># This keeps out all well-behaved robots.</pre>
<pre># Disallow: * is not valid.</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /</pre>
</div>
<p>Keep out all bots by default - Blocks all pages except those specified. Not recommended as is difficult to maintain and diagnose.</p>
<div>
<pre># Stay out unless otherwise stated</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /</pre>
<pre>Allow: /Public/</pre>
<pre>Allow: /articles/</pre>
<pre>Allow: /images/</pre>
</div>
<p>Block specific content - The most common usage of robots.txt.</p>
<div>
<pre># Block access to the images folder</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /images/</pre>
</div>
<p><a name="allow"></a>Allow specific content - Block a folder, but allow access to selected pages in that folder.</p>
<div>
<pre># Block everything in the images folder</pre>
<pre># Except allow images/image1.jpg</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /images/</pre>
<pre>Allow: /images/image1.jpg</pre>
</div>
<p><a href="/admin/Pages/patterns"></a>Allow specific robot - Block a class of robots (for instance, Googlebot), but allow a specific bot in that class (for instance, Googlebot-Mobile).</p>
<div>
<pre># Block Googlebot access</pre>
<pre># Allow Googlebot-Mobile access</pre>
<pre>User-agent: Googlebot</pre>
<pre>Disallow: /</pre>
<pre>User-agent: Googlebot-Mobile</pre>
<pre>Allow: /</pre>
</div>
<h4>Pattern Matching Examples</h4>
<p>The major engines support two types of pattern matching.</p>
<ul>
<li>* matches any sequence of characters</li>
<li>$ matches the end of  URL.</li>
</ul>
<p>Block access to URLs that contain a set of characters - Use the asterisk (*) to specify a wildcard.</p>
<div>
<pre># Block access to all URLs that include an ampersand</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*&amp;</pre>
</div>
<p>This directive would block search engines from crawling <code>http://www.example.com/page1.asp?id=5&amp;sessionid=xyz</code>.</p>
<p>Block access to URLs that end with a set of characters - Use the dollar sign ($) to specify end of line.</p>
<div>
<div>
<pre># Block access to all URLs that end in .cgi</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*.cgi$</pre>
</div>
<p>This directive would block search engines from crawling <code>http://www.example.com/script1.cgi</code> but not from crawling <code>http://www.example.com/script1.cgi?value=1</code>.</p>
<p>Selectively allow access to a URL that matches a blocked pattern - Use the <code>Allow</code> directive in conjunction with pattern matching for more complex implementations.</p>
<div>
<pre># Block access to URLs that contain ?</pre>
<pre># Allow access to URLs that end in ?</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*?</pre>
<pre>Allow: /*?$</pre>
</div>
<p>That directive blocks all URLs that contain <code>?</code> except those that end in <code>?</code>. In this example, the default version of the page will be indexable:</p>
<ul>
<li><code>http://www.example.com/productlisting.aspx?</code></li>
</ul>
<p>Variations of the page will be blocked:</p>
<ul>
<li><code>http://www.example.com/productlisting.aspx?nav=price</code></li>
<li><code>http://www.example.com/productlisting.aspx?sort=alpha</code></li>
</ul>
<h4><a name="other"></a>Other robot instructions</h4>
</div>
<p>Specify a Sitemap or Sitemap index file - If you&#8217;d like to provide search engines with a comprehensive list of your best URLs, you can provide one or more <a href="http://sitemaps.org" target="_blank" onclick="pageTracker._trackPageview('/outgoing/sitemaps.org?referer=');">Sitemap</a> autodiscovery directives. Note, user-agent does not apply to this directive so you cannot use this to specify a Sitemap to some but not all search engines.</p>
<div>
<pre># Please take my sitemap and index everything!</pre>
<pre>Sitemap: http://janeandrobot.com/sitemap.axd</pre>
</div>
<p>Reduce the crawling load - This only works with Microsoft and Yahoo. For Google you&#8217;ll need to specify a slower crawling speed through their <a href="http://google.com/webmaster" target="_blank" onclick="pageTracker._trackPageview('/outgoing/google.com/webmaster?referer=');">Webmaster Tools</a>. Be careful when implementing this because if you slow down the crawl too much, robots won&#8217;t be able to get to all of your site and you may lose pages from the index.</p>
<div>
<pre># Bingbot, please wait 5 seconds in between visits</pre>
<pre>User-agent: bingbot</pre>
<pre>Crawl-delay: 5</pre>
<pre># Yahoo's Slurp, please wait 12 seconds in between visits</pre>
<pre>User-agent: slurp</pre>
<pre>Crawl-delay: 12</pre>
</div>
<h3><a name="Page_Level_Implementation_(META_Tags)"></a>Page Level Implementation (META Tags)</h3>
<p>The REP page-level directives allow you to refine the site wide policies on a page-by-page basis</p>
<p>Placing a meta tag on the page - Place the meta tag in the head tag. Each directive should be comma delimited inside the tag. E.g. &lt;meta name=&#8221;ROBOTS&#8221; content=&#8221;Directive1, Directive 2&gt;.</p>
<div>
<pre>&lt;html&gt;</pre>
<pre>&lt;head&gt;</pre>
<pre>&lt;title&gt;Your title here&lt;/title&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOINDEX"&gt;</pre>
<pre>&lt;/head&gt;</pre>
<pre>&lt;body&gt;Your page here&lt;/body&gt;</pre>
<pre>&lt;/html&gt;</pre>
</div>
<p>Targeting a specific search engine - Within the meta tag you can specify which search engine you would like to target, or you can target them all.</p>
<div>
<pre>&lt;!-- Applies to All Robots --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- ONLY GoogleBot --&gt;</pre>
<pre>&lt;meta name="Googlebot" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- ONLY Slurp (Yahoo) --&gt;</pre>
<pre>&lt;meta name="Slurp" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- ONLY BingBot (Microsoft) --&gt;</pre>
<pre>&lt;meta name="BingBot" content="NOINDEX"&gt;</pre>
</div>
<p>Control how your listings - there are a set of options you can use to determine how your site will show up on the SERP. You can exert some control over how the description is created, and remove the &#8220;Cached page&#8221; link.</p>
<p><img title="example-serp" src="http://janeandrobot.com/wp-content/uploads/2008/06/example-serp.gif" alt="example-serp" /></p>
<div>
<pre>&lt;!-- Do not show a description for this page --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOSNIPPET"&gt;</pre>
<pre>&lt;!-- Do not use http://dmoz.org to create a description --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOODP"&gt;</pre>
<pre>&lt;!-- Do not present a cached version of the document in a search result --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOARCHIVE"&gt;</pre>
</div>
<p>Using other directives - Other meta robots directives are shown below.</p>
<div>
<pre>&lt;!-- Do not trust links on this page, could be user generated content (UCG) --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOFOLLOW"&gt;</pre>
<pre>&lt;!-- Do not index this page --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- Do not index any images on this page (will still index the if they are linked</pre>
<pre>     elsewhere) Better to use Robots.txt if you really want them safe.</pre>
<pre>     This is a Google Only tag. --&gt;</pre>
<pre>&lt;meta name="GOOGLEBOT" content="NOIMAGEINDEX"&gt;</pre>
<pre>&lt;!-- Do not translate this page into other languages--&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOTRANSLATE"&gt;</pre>
<pre>&lt;!-- NOT RECOMMENDED, there really isn't much point in using these --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="FOLLOW"&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="UNAVAILABLE_AFTER"&gt;</pre>
</div>
<h3><a name="HTTP_Header_Implementation_(X-ROBOTS-Tag)"></a>HTTP Header Implementation (X-ROBOTS-Tag)</h3>
<p>Allows developers to specify page-level REP directives for non text/html content types like PDF, DOC, PPT, or dynamically generated images.</p>
<p>Using the X-Robots-Tag - to use the X-Robots-Tag, simply add it to your header as shown below. To specify multiple directives you can either comma delimit them, or add them as separate header items.</p>
<div>
<pre>HTTP/1.x 200 OK</pre>
<pre>Cache-Control: private</pre>
<pre>Content-Length: 2199552</pre>
<pre>Content-Type: application/octet-stream</pre>
<pre>Server: Microsoft-IIS/7.0</pre>
<pre>content-disposition: inline; filename=01 - The truth about SEO.ppt</pre>
<pre>X-Robots-Tag: noindex, nosnippet</pre>
<pre>X-Powered-By: ASP.NET</pre>
<pre>Date: Sun, 01 Jun 2008 19:25:47 GMT</pre>
</div>
<p>The X-Robots-Tag directive supports most of the same directives as the meta tag. The only limitation with this method over the meta tag implementation is that there is no way to target a specific robot &#8211; though that probably isn&#8217;t a big deal for most use cases.</p>
<ul>
<li>X-Robots-Tag: noindex</li>
<li>X-Robots-Tag: nosnippet</li>
<li>X-Robots-Tag: notranslate</li>
<li>X-Robots-Tag: noarchive</li>
<li>X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT</li>
</ul>
<h3><a name="Content_Level_Implementation"></a>Content Level Implementation</h3>
<p>You can further refine your site level and page level directives within several content tags.</p>
<p>Each anchor tag (link) can be modified to tell search engines that you do not trust where this URL is pointing to. This is typically used for links within user generated content (UCG) like wikis, blog comments, reviews and other community sites.</p>
<div>
<pre>&lt;a href="#" rel="NOFOLLOW"&gt;My Hyperlink&lt;/a&gt;</pre>
</div>
<p>Also, in Yahoo Search you can specify which &lt;div&gt; elements on a page you would not like indexed using the <code>class=robots-nocontent</code> attribute. However, we don&#8217;t highly recommend using this tag because it is not supported in any other engine, making it not super-useful.</p>
<div>
<pre>&lt;div&gt;</pre>
<pre>No content for you! (or at least Yahoo!)</pre>
<pre>&lt;/div&gt;</pre>
</div>
<h2><a name="Common_implementation_mistakes"></a>Common Mistakes</h2>
<p>While implementing the REP is generally straight-forward, there are a few common mistakes.</p>
<ul>
<li>GoogleBot follows the most specific directive, ignoring all others. In the robots.txt file, if you specify a section for all user-agents (<code>user-agent: *</code>) and also declare a section for Googlebot (<code>user-agent: Googlebot</code>), Google will disregard all sections in the robots.txt file except the Googlebot section. This could potentially leave you exposing much more content to Google that you might have thought.</li>
</ul>
<div>
<pre># This keeps out all well-behaved robots</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /</pre>
<pre># This looks like it is giving Google access to only this directory, but since it is a</pre>
<pre># GoogleBot specific section, Google will disregard the previous section</pre>
<pre># and access the whole site.</pre>
<pre>User-agent: Googlebot</pre>
<pre>Allow: /Content_For_Google/</pre>
</div>
<ul>
<li><strong>NOFOLLOW will most likely not prevent indexing -</strong> if you use <code>NOFOLLOW</code> at either the page or the link level, it is still possible for the links from the page to be indexed because the search engine may have found a reference to them from another source. Another note, using <code>rel="NOFOLLOW"</code> within your anchor text is still perceived as a recommendation by the search engines, not a command.To ensure that content is not indexed, either use the <code>Disallow</code> directive at the site level, or use <code>NOINDEX</code> at the page level.</li>
<li><strong>Directives that are not recommended -</strong> the directives in the REP are all about exceptions, by default the robots assume they can crawl your whole site. Therefore, you do not need to explicitly use the <code>FOLLOW</code> and<code>INDEX</code> directives as they will not be taken into account by the search engines. It sounds silly but I&#8217;ve seen a few sites that have implemented these on every page and every link.Another directive that is not recommended is the <code>NOCACHE</code> directive. This was created by Microsoft, and is synonymous with <code>NOARCHIVE</code>. While they will most likely always continue to support the directive, it is better to use <code>NOARCHIVE</code> so it will work on all the search engines.</li>
<li><strong>Be cognizant of case -</strong> when referencing files and URLs in the robots.txt file, use a defensive approace to URL case, as the major engines do not handle it the same way. (e.g. /Files does not always equal /files).</li>
</ul>
<h2><a name="Testing_your_implementation_"></a>Testing Your Implementation</h2>
<p>As you&#8217;re implementing your REP design, you should test it both before you deploy it and after. The easiest way to test this is to use the robots validator in Google&#8217;s Webmaster Tools. This tool is a good sanity check to ensure you&#8217;re not blocking URLs you want indexed, however advanced developers (or paranoid ones with critical business requirements) will want to definitively know what the robots are doing, not simply rely on what the robots say they are doing. These folks will want to look at tools as well look at their server logs to see what&#8217;s being crawled definitively.</p>
<p>In addition to using validation tools, reporting tools from the search engines on what they couldn&#8217;t acces, and looking at logs data to see what the search engine robots are crawling, you should check the search engine results to see if any pages you are intending to block are being indexed. If they are, use the methods described in this section to ensure you are blocking them correctly and <a href="#removal">use the search engine tools to request that the pages be removed</a>.</p>
<p><a name="partial"></a><strong>When Blocked Content Appears to be Indexed - </strong>If search engines are blocked from crawling pages, they may still index the URL if the robot finds a link to that URL on a page that isn&#8217;t blocked. The listing may display the URL only, such as shown below.</p>
<p><img title="urlonly" src="http://janeandrobot.com/wp-content/uploads/2008/06/urlonly.gif" alt="urlonly" /></p>
<p>Or, it may include a title and in some instances, a description. This makes it appear as though the search engine robot is disregarding the directive that blocks access to the page, but the search engine is in fact obeying the directive not to crawl the page and is using anchor text from the link to that page and descriptive details from either the page that contains the link or a source such as the <a href="http://www.dmoz.org" onclick="pageTracker._trackPageview('/outgoing/www.dmoz.org?referer=');">Open Directory Project</a>.</p>
<p>For more details, see:</p>
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35667" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35667&amp;referer=');">Google: partially indexed page</a></li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-01.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-01.html?referer=');">Yahoo!: thin documents</a></li>
</ul>
<h3><a name="The_Easy_Way_"></a>The Easy Way</h3>
<p><strong>Search Engine Tools For Validation -</strong> Both Google and Microsoft provide some tools as part of their Webmaster Centers to help you verify if you&#8217;ve configured your REP the way you expect. Let&#8217;s start with Google&#8217;s tools:</p>
<p>The first thing you should check are the list of URLs that Google has seen from your website and not indexed due to the REP. Note you can also download the list and filter, sort, and have-your-way-with-it in Excel.</p>
<p><img title="webmaster-robotstxt-blocked1" src="http://janeandrobot.com/wp-content/uploads/2008/06/webmaster-robotstxt-blocked1.gif" alt="webmaster-robotstxt-blocked1" /></p>
<p>The next step is to use their interactive robots.txt tool to analyze your rules and test specific URLs for blockage. When you pull up the tool they already should have it pre-populated with the robots.txt file they have on file from the last time they crawled. You can input a list of URLs you&#8217;d like to check below, select the user-agent you&#8217;d like to check against and the tool will tell you if they are blocked or not. You can also use the tool to test changes to your robots.txt file to see how Google would interpret things.</p>
<p><img title="google-analyze-robotstxt" src="http://janeandrobot.com/wp-content/uploads/2008/06/google-analyze-robotstxt.jpg" alt="google-analyze-robotstxt" /></p>
<p>Microsoft has list of URLs blocked by robots.txt that Bingbot has tried to crawl as well.</p>
<h3><a name="The_Hard_Way_(More_Accurate)"></a>The Hard Way</h3>
<p><strong>More Accurate Views of Robot Access Through Your Logs -</strong> If you have a specific business need to ensure that the robots are following your rules, (or you&#8217;re just paranoid) then you should not simply rely on the tools they provide to test compliance. You&#8217;re going to need to go straight to the horse&#8217;s mouth and analyze your web server logs to see exactly what they are doing. There is no one easy tool for doing this, you&#8217;ll likely have to use an existing tool like one of these (<a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07" onclick="pageTracker._trackPageview('/outgoing/www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&amp;referer=');">Microsoft HTTP Log Parser</a>) or write your own. It isn&#8217;t difficult, it will simply take some time to implement. A useful reference for this is a list of all the robot <a href="http://www.robotstxt.org/db.html" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org/db.html?referer=');">user agents</a>, and more complete list of bots from <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Google</a>, and <a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Microsoft</a>.</p>
<p><a name="verify"></a><strong>Verifying Robot Identity -</strong> Another thing you&#8217;ll likely want to consider in this endeavor is to validate that the robots are who they actually say they are. Google, Yahoo and Microsoft all support <a href="http://en.wikipedia.org/wiki/Reverse_DNS_lookup" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Reverse_DNS_lookup?referer=');">Reverse DNS authentication</a>of their robots. The process is pretty simple and described here by <a href="http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html?referer=');">Google</a>, <a href="http://www.ysearchblog.com/archives/000460.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000460.html?referer=');">Yahoo </a>and <a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Microsoft</a>, essentially you simply find out what range their robot&#8217;s DNS is hosted in, and use that in your tool. This way, if the address changes (which it will), you don&#8217;t need to update your code.</p>
<p>Should you find any issues, where one of the robots are not minding the REP, or are misbehaving in some other way, you can always communicate directly with each engine through one of their forums:</p>
<ul>
<li><a href="http://groups.google.com/group/Google_Webmaster_Help-Indexing/topics" onclick="pageTracker._trackPageview('/outgoing/groups.google.com/group/Google_Webmaster_Help-Indexing/topics?referer=');">Google Crawling, Indexing and Ranking Forum</a></li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/search_support.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/search_support.html?referer=');">Yahoo Crawler Feedback Form</a></li>
<li><a href="http://forums.microsoft.com/webmaster/ShowForum.aspx?ForumID=1984&amp;SiteID=79" onclick="pageTracker._trackPageview('/outgoing/forums.microsoft.com/webmaster/ShowForum.aspx?ForumID=1984_amp_SiteID=79&amp;referer=');">Microsoft Crawler Error and Feedback Forum</a></li>
</ul>
<h2><a name="removal"></a>Removing Content From Search Engine Indices</h2>
<p>If you find that you haven&#8217;t implemented the techniques described here correctly and private content from your site is indexed, each of the major search engines has methods available for requesting that it be removed. For more information, see:</p>
<ul>
<li><a href="http://googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html?referer=');">Google: Requesting removal of content from our index</a></li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/?referer=');">Yahoo!: Deleting URLs</a></li>
<li><a href="https://support.live.com/eform.aspx?productKey=wlsearch&amp;page=wlsupport_home_options_form_byemail&amp;ct=eformts" onclick="pageTracker._trackPageview('/outgoing/support.live.com/eform.aspx?productKey=wlsearch_amp_page=wlsupport_home_options_form_byemail_amp_ct=eformts&amp;referer=');">Live Search: Requesting content removal</a></li>
</ul>
<h2><a name="Additional_Resources:_"></a>Additional Resources:</h2>
<ul>
<li>Google
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40362" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40362&amp;referer=');">How to create a robots.txt file</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Descriptions of each user-agent that Google uses</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40367" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40367&amp;referer=');">How to use pattern matching</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40368" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40368&amp;referer=');">How often we recrawl your robots.txt file</a></li>
<li><a href="http://googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html?referer=');">All about Googlebot</a></li>
</ul>
</li>
<li>Yahoo!
<ul>
<li><a href="http://www.ysearchblog.com/archives/000372.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000372.html?referer=');">Wild card support</a></li>
<li><a href="http://www.ysearchblog.com/archives/000508.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000508.html?referer=');">X-Robots tag directive support</a></li>
</ul>
</li>
<li>Microsoft Bing
<ul>
<li><a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Search robots in disguise</a></li>
</ul>
</li>
<li>Other resources
<ul>
<li><a href="http://searchengineland.com/070305-204850.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/070305-204850.php?referer=');">Search Engine Land: Meta Robots Tag 101</a></li>
<li><a href="http://searchengineland.com/080603-121100.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/080603-121100.php?referer=');">Search Engine Land: Yahoo!, Microsoft, Google Clarify Robots.txt Support</a></li>
<li><a href="http://searchengineland.com/070417-213813.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/070417-213813.php?referer=');">Search Engine Land: URL Removal Options</a></li>
<li><a href="http://www.robotstxt.org/" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org/?referer=');">robotstxt.org</a></li>
<li><a href="http://en.wikipedia.org/wiki/Robots.txt" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Robots.txt?referer=');">Wikipedia: Robots Exclusion Standard</a></li>
</ul>
</li>
</ul>
<h3>Revision History</h3>
<ul>
<li>02/12/2009 &#8211; Google, Yahoo and Microsoft make a joint announcement of the rel=&#8217;Canonical&#8217; tag to make it easier for publishers to specify the canonical URLs.</li>
<li>06/04/2009 &#8211; Added NOPREVIEW tag announced this week by Microsoft. Used to disable the &#8216;hover preview&#8217; feature on their SERP.</li>
<li>08/19/10 &#8211; Changed Live Search to Bing through and MSNbot to Bingbot. Also removed references to Bing Webmaster Center robots.txt validator, <a href="http://searchengineland.com/all-new-microsoft-bing-webmaster-tools-46827" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/all-new-microsoft-bing-webmaster-tools-46827?referer=');">as it no longer exists</a>.</li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/managing-robots-access-to-your-website-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google AdWords Keyword Tool: Lessons on How Not to Move a URL</title>
		<link>http://www.ninebyblue.com/blog/how-not-to-move-a-url-lessons-from-the-google-adwords-keywords-tool-revamp/</link>
		<comments>http://www.ninebyblue.com/blog/how-not-to-move-a-url-lessons-from-the-google-adwords-keywords-tool-revamp/#comments</comments>
		<pubDate>Fri, 14 May 2010 17:10:23 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1220</guid>
		<description><![CDATA[I love the Google AdWords Keyword Tool.  I use it all the time and I demo it a lot when speaking at events. But I have no idea what the URL is. When I want to use it, I just do whatever else does whenever they want to go to any site on the web. [...]]]></description>
			<content:encoded><![CDATA[<p>I love the Google AdWords Keyword Tool.  I use it all the time and I demo it a lot when speaking at events. But I have no idea what the URL is. When I want to use it, I just do whatever else does whenever they want to go to any site on the web. I do a Google search for it. Specifically, I type [google adwords keyword tool] into Google and click on the first result.</p>
<p>Until today.</p>
<p>Now when I do that search, the tool is nowhere to be found.</p>
<p><a title="Google AdWords Keyword Tool by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4606373293/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4606373293/?referer=');"><img src="http://farm4.static.flickr.com/3531/4606373293_012e367dc1.jpg" alt="Google AdWords Keyword Tool" width="485" height="500" /></a></p>
<p>Instead, I find the Google AdWords landing page, a bunch of articles on other sites about the keyword tool, and the UK version of the tool.  Huh. What&#8217;s up with that? (I don&#8217;t even see a paid search ad for it.)</p>
<p>Google recently completely revamped the tool and as part of that, changed the URL. The old URL was https://adwords.google.com/select/KeywordTool. Now, when you (or Googlebot, if you are not you, but instead, a search engine webcrawler looking to update your index) access that URL, you get a 302 redirect to https://adwords.google.com/o/Targeting/Explorer?__u=5701132992&amp;__c=8003242272&amp;stylePrefOverride=2#search.none!ideaType=KEYWORD&amp;requestType=IDEAS. And a captcha.</p>
<p>When I look for that URL in Google, I see it&#8217;s not indexed, although at least I see a paid search ad for the tool.</p>
<p><a title="Google's Keyword Tool by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4606373371/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4606373371/?referer=');"><img src="http://farm2.static.flickr.com/1111/4606373371_2326a0cb7f.jpg" alt="Google's Keyword Tool" width="500" height="204" /></a></p>
<p>Of course, that URL doesn&#8217;t show up in part because it&#8217;s not a permanent URL. If I try to access inurl:https://adwords.google.com/o/Targeting/Explorer, I also get the keyword tool, via a set of redirect that leads to this URL: https://adwords.google.com/o/Targeting/Explorer?&amp;stylePrefOverride=2&amp;__u=5701132992&amp;__c=8003242272#search.none!ideaType=KEYWORD&amp;requestType=IDEAS.</p>
<p>Which is actually the same URL with the parameters in a different order. Although things don&#8217;t look much better in the Google index when I search for any URL that begins with https://adwords.google.com/o/Targeting/Explorer.</p>
<p><a title="Google Keyword Tool URL by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4606373537/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4606373537/?referer=');"><img src="http://farm5.static.flickr.com/4008/4606373537_9fb0cb363b.jpg" alt="Google Keyword Tool URL" width="500" height="306" /></a></p>
<p>Let&#8217;s look a little more closely at those redirects. The source code of https://adwords.google.com/o/Targeting/Explorer looks like this:</p>
<pre id="line1">&lt;script type="text/javascript" language="javascript"&gt;
var jsRedirect = true;
var url = "/select/Login?aw3=true&amp;dst=%2Fo%2FTargeting%2FExplorer&amp;frag=search.none%21ideaType%3DKEYWORD%26requestType%3DIDEAS";
window.location.assign(url);
&lt;/script&gt; &lt;/body&gt; &lt;/html&gt;</pre>
<p>Which then leads to this (HTTP response edited to show the vital components for this example):</p>
<pre>https://adwords.google.com/select/Login?aw3=true&amp;dst=%2Fo%2FTargeting%2FExplorer&amp;frag=search.none%21ideaType%3DKEYWORD%26requestType%3DIDEAS</pre>
<pre>HTTP/1.1 302 Moved Temporarily</pre>
<pre>----------------------------------------------------------</pre>
<pre>https://adwords.google.com/um/StartNewLogin?aw3=true&amp;dst=%2Fo%2FTargeting%2FExplorer&amp;frag=search.none%21ideaType%3DKEYWORD%26requestType%3DIDEAS</pre>
<pre>HTTP/1.1 302 Moved Temporarily</pre>
<pre>----------------------------------------------------------</pre>
<pre>https://www.google.com/accounts/ServiceLogin?service=adwords&amp;hl=en_US&amp;ltmpl=regionale&amp;passive=true&amp;ifr=false&amp;alwf=true&amp;continue=https%3A%2F%2Fadwords.google.com%2Fum%2Fgaiaauth%3Fapt%3DNone%26ugl%3Dtrue</pre>
<pre>HTTP/1.1 302 Moved Temporarily</pre>
<pre>----------------------------------------------------------</pre>
<pre>https://adwords.google.com/um/gaiaauth?apt=None&amp;ugl=true&amp;pli=1&amp;auth=DQAAALUAAAAMl8-Jos-ywfDoe9g9erkG4klYT-fzYde8k9MEQMmOkonqCalB_LbFISNUgDOMGnoAZkaofqL2ZGwAbwAwV8-rQ6dGM9XnEjgrwUJsc9l_S-0NFsPz0om6ExrJSZf8lQnKJkASgaEqE7SGWbCpcMYd_qihOdzJVvGH0P7_jopql3FQJ5vGT6PuazK260Z2hXVAxy3eyEICPHqe7R9LvLrjbM1fHZLgquTrd6dMYIN64iMDKvShFg_rXfjonOCj6jo</pre>
<pre>HTTP/1.1 302 Moved Temporarily</pre>
<pre>----------------------------------------------------------</pre>
<pre>https://adwords.google.com/um/gaiaauth?apt=None&amp;ugl=true&amp;pli=1</pre>
<pre>HTTP/1.1 302 Moved Temporarily</pre>
<pre>----------------------------------------------------------</pre>
<pre>https://adwords.google.com/select/gaiaauth?__u=5701132992&amp;__c=8003242272&amp;stylePrefOverride=2&amp;30Login=true&amp;url=%2Fo%2FTargeting%2FExplorer#search.none!ideaType=KEYWORD&amp;requestType=IDEAS</pre>
<pre>HTTP/1.1 302 Moved Temporarily</pre>
<pre>----------------------------------------------------------</pre>
<pre>https://adwords.google.com/o/Targeting/Explorer?&amp;stylePrefOverride=2&amp;__u=5701132992&amp;__c=8003242272#search.none!ideaType=KEYWORD&amp;requestType=IDEAS</pre>
<pre>HTTP/1.1 200 OK</pre>
<p>Perhaps Google has a few things to add to their <a href="http://googlewebmastercentral.blogspot.com/2010/03/googles-seo-report-card.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2010/03/googles-seo-report-card.html?referer=');">SEO report card</a>:</p>
<ul>
<li>If possible, keep the URL the same when you change content.</li>
<li>If that&#8217;s not possible, don&#8217;t use a JavaScript redirect to move the URL.</li>
<li>Also <a href="http://googlewebmastercentral.blogspot.com/2008/11/date-with-googlebot-part-ii-http-status.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2008/11/date-with-googlebot-part-ii-http-status.html?referer=');">don&#8217;t use a 302</a>. <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=93633" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=93633&amp;referer=');">Use a 301</a>.</li>
<li>Don&#8217;t redirect multiple times.</li>
<li>If that&#8217;s not possible, keep the number of redirects under 5.</li>
<li>Googlebot can&#8217;t fill out a captcha.</li>
<li>If you don&#8217;t show up at all in organic search, a paid search ad for the name of your product can be helpful.</li>
</ul>
<p>Confidential to the Google AdWords Keyword Tool team: If you&#8217;re not sure if your URLs are being crawled, check the crawl errors section of Google Webmaster Tools. It&#8217;s got this great <a href="https://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35156" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=35156&amp;referer=');">&#8220;URLs not followed&#8221; report</a> that may provide some helpful information.</p>
<p>Confidential to everyone else: You can find the Google AdWords Keyword Tool <a href="https://adwords.google.com/o/Targeting/Explorer?__u=1000000000&amp;__c=1000000000&amp;stylePrefOverride=2#search.none!ideaType=KEYWORD&amp;requestType=IDEAS" onclick="pageTracker._trackPageview('/outgoing/adwords.google.com/o/Targeting/Explorer?_u=1000000000_amp_c=1000000000_amp_stylePrefOverride=2_search.none_ideaType=KEYWORD_amp_requestType=IDEAS&amp;referer=');">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/how-not-to-move-a-url-lessons-from-the-google-adwords-keywords-tool-revamp/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Microsites. A Bad Idea Most of the Time.</title>
		<link>http://www.ninebyblue.com/blog/microsites-a-bad-idea-most-of-the-time/</link>
		<comments>http://www.ninebyblue.com/blog/microsites-a-bad-idea-most-of-the-time/#comments</comments>
		<pubDate>Wed, 12 May 2010 01:16:46 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1200</guid>
		<description><![CDATA[This morning, TechFlash noted that Drugstore.com is expanding its microsite strategy around product categories. The growing list now includes:

deluxesalonsupply.com
beauty.com
visiondirect.com
sexualwellbeing.com
athisbest.com
thenaturalstore.com
allergysuperstore.com

Possibly you can already tell based on my headline that I think this is not the awesomest strategy ever. I could call it short-sighted, misguided, any number of other prefixed words, but perhaps I&#8217;ll just tell you [...]]]></description>
			<content:encoded><![CDATA[<p>This morning, TechFlash noted that <a href="http://www.techflash.com/seattle/2010/05/drugstorecom_expands_microsite_strategy.html" onclick="pageTracker._trackPageview('/outgoing/www.techflash.com/seattle/2010/05/drugstorecom_expands_microsite_strategy.html?referer=');">Drugstore.com is expanding its microsite strategy</a> around product categories. The growing list now includes:</p>
<ul>
<li>deluxesalonsupply.com</li>
<li>beauty.com</li>
<li>visiondirect.com</li>
<li>sexualwellbeing.com</li>
<li>athisbest.com</li>
<li>thenaturalstore.com</li>
<li>allergysuperstore.com</li>
</ul>
<p>Possibly you can already tell based on my headline that I think this is not the awesomest strategy ever. I could call it short-sighted, misguided, any number of other prefixed words, but perhaps I&#8217;ll just tell you why microsites are often not the best strategy to pursue. Before I get to that though, I want to point out that I find microsites a bad idea <em>most </em>of the time. Sometimes they are a great idea. Although the more of them you have, the less likely this is the case.</p>
<p>But, you protest! The TechFlash article said that  beauty.com helped drugstore.com post a 20%  revenue for Q1 2010. Sounds awesome. Microsites rule! Except that beauty.com is the one domain from the list that isn&#8217;t actually a microsite. It&#8217;s simply a vanity domain that redirects to the <a href="http://www.drugstore.com/beauty.asp?catid=9730" onclick="pageTracker._trackPageview('/outgoing/www.drugstore.com/beauty.asp?catid=9730&amp;referer=');">beauty category of drugstore.com</a>.  So certainly, by focusing resources and awareness on that category, they&#8217;ve likely managed to increase sales, but that increase has nothing to do with microsites.</p>
<p>So what&#8217;s wrong with microsites? Let&#8217;s tally up the ways.</p>
<p><strong>You lose brand identity and audience engagement</strong></p>
<p>You spend significant corporate energy on positive brand perception and awareness. And then you start over completely from scratch with an entirely new brand. Woo? If you are reaching an entirely different audience and your current brand would be confusing, then you may in fact want to build out a new brand, but that case, you probably won&#8217;t be launching a microsite, you&#8217;ll launch a full site. In most cases, microsites are subsets of or promotions for the main site, with exactly the same audience. Do you really want to work at building up multiple brand identities? And do you really not want to benefit from the brand building in one category for another related category? (This comes especially important with ecommerce sites, such as those drugstore.com operates. Even today, we don&#8217;t want to hand over our credit card information to just any site.)</p>
<p>Brand awareness has a search impact as well. As I note in the <a href="http://www.amazon.com/gp/product/0470537191?ie=UTF8&amp;tag=nibybl-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0470537191g" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0470537191?ie=UTF8_amp_tag=nibybl-20_amp_linkCode=as2_amp_camp=1789_amp_creative=9325_amp_creativeASIN=0470537191g&amp;referer=');">searcher behavior chapter of my book</a>, searchers quickly evaluate the search results page to determine which result to click on. Many things go into that evaluation, but certainly brand recognition helps in evaluating credibility and perceived value.</p>
<p><strong>You lose the ability to leverage your audience</strong></p>
<p>Let&#8217;s say you launch an awesome site with a fantastic user experience, great products, and unrivaled customer support. For instance, let&#8217;s say you&#8217;re Zappos. Someone writes up a positive article about you in say, the <a href="http://www.nytimes.com/2008/05/24/technology/24online.html" onclick="pageTracker._trackPageview('/outgoing/www.nytimes.com/2008/05/24/technology/24online.html?referer=');">NY Times</a>. Readers start clicking over to your site. They see you sell running shoes. They just read about how great you are, so they feel confident about purchasing some products from your site. But maybe those same readers also need some clothes to go running in. If you had a separate runningclothes.com microsite, you&#8217;ve just missed a great opportunity to reach a targeted and motivated audience.</p>
<p><strong>You confuse people and search engines</strong></p>
<p>Oh, I won&#8217;t have that whole NY Times reader problem, you say. I&#8217;ll just keep a complete copy of my runningclothes.com content on my main site too! That way, I can reach the audience for my main site as well as get all the additional audience potential of the microsite. Oh really? First, that&#8217;s just confusing. If someone becomes accustomed to shopping for athletic clothes on your main site and then clicking over for shoes, but then one day they end up on runningclothes.com and everything looks the same&#8230; and yet the shoes are gone &#8212; that&#8217;s just not the experience you want to give users.</p>
<p>But the problem really comes in when you add search engines to the mix. Which version of the pages do you want them to index &#8212; the version on your main site or version on your microsite? Likely you&#8217;re going to say the microsite. (Especially if you&#8217;ve built the microsite because you think keyword-rich domain names have great search potential &#8212; read on for more on that, by the way.) But the search engine is likely to index the version on your main site because that site has been around longer, has more links, and has more overall credibility with the search engines. No problem, you say. You&#8217;ll just block those pages with robots.txt. Well, OK. You can do that. But then you lose all search engine value from any of the external links those pages may accumulate. You also lose the search value of the internal link structure. That&#8217;s not awesome.</p>
<p>My guess is that drugstore.com has Zyrtec on both its main site and its allergysuperstore.com site. Along with all the user reviews, product details, and directions. Drugstore.com ranks #37 for [zyrtec]. allergysuperstore.com doesn&#8217;t rank at all.</p>
<p><strong>You may have to spend substantial additional resources</strong></p>
<p>The microsites run by drugstore.com all use the same template and content management system. So it seems like low  engineering overhead to maintain them all. But wait. As you build out the content of both sites, you have to decide which content to put where. And decide how to spend marketing, PR, and advertising resources. When you issue a press release, which site do you talk up? All of them? What if you have 20? And you likely are doing social media. Do you now maintain 20 Facebook pages and 20 Twitter accounts? I&#8217;m tired just thinking about it.</p>
<p>And if you&#8217;ve built the microsite specifically for an advertising campaign, what happens when the campaign is over? Do you maintain the site? Abandon it? Take it down? This question gets more complicated if the microsite included a social networking element. You&#8217;ve gotten your audience engaged, now what do you do with them?</p>
<p>During the 2009 Super Bowl, Jack-in-the-Box aired a commercial that showed Jack getting hit by a bus. They launched the microsite hangintherejack.com as part of the campaign. I&#8217;m not sure what happened with the lifecycle of the site, but that domain now redirects to jackinthebox.com, so whatever assets they built up there (both in terms of content and audience) have just been thrown away. (They did better with the <a href="http://twitter.com/jackbox" onclick="pageTracker._trackPageview('/outgoing/twitter.com/jackbox?referer=');">Twitter account</a> launched as part of the campaign. That wasn&#8217;t campaign-specific and it still being used by &#8220;Jack&#8221;.)</p>
<p><strong>You cobble your search acquisition efforts</strong></p>
<p>A big part of ranking well in search engines continues to be the strength of the external links to the site. If you maintain multiple sites, then you are diluting that external link value. If five people link to your main site and five people link to your microsite, each site is competing for rankings against the rest of the web with those five links. Instead, you could have one site competing with ten links. Anything that you do for offsite search engine optimization, you have to repeat for each site.</p>
<p><strong>It can be difficult to match promotions to search visiblity</strong></p>
<p>One common case of microsites is when a company launches a new promotion. It seems to make perfect sense to launch a microsite as part of that promotion. You can tie branding to the promo and it can be a lot easier to outsource the development of the site to the agency that is managing the promotion creative than to try to coordinate in-house resources and add a section about the promotion to the main company website.</p>
<p>The trouble comes in when that promotion sparks search interest (which it undoubtedly will). I&#8217;ve <a href="http://searchengineland.com/scoring-the-superbowl-ads-do-broadcast-marketers-get-online-acquisition-16398" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/scoring-the-superbowl-ads-do-broadcast-marketers-get-online-acquisition-16398?referer=');">observed this with the Super Bowl commercials</a> in both 2009 and 2010. In 2009, <a href="http://www.ninebyblue.com/blog/social-media/superbowl-commercials-what-about-search-acquisition/">several sites, including Hyundai and Sobe</a> advertised taglines that had corresponding microsites, but those domains redirected to the main domain. Advertisers expected that viewers would type the URL into a browser address bar, but instead, many people typed the tagline or domain into a search box. Since the domain didn&#8217;t actually exist, the advertiser didn&#8217;t show up in search results. You can see this, for instance, with Hyundai&#8217;s Edit Your Own campaign.</p>
<p><a title="Hyundai Super Bowl Interest by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4600168660/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4600168660/?referer=');"><img src="http://farm5.static.flickr.com/4057/4600168660_20cc46ecb7.jpg" alt="Hyundai Super Bowl Interest" width="500" height="376" /></a></p>
<p><a title="Huydai Edit Your Own Search Results by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4600168698/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4600168698/?referer=');"><img src="http://farm4.static.flickr.com/3389/4600168698_cb17f663ba.jpg" alt="Huydai Edit Your Own Search Results" width="500" height="421" /></a></p>
<p>Another problem with launching a microsite at the same time as an ad campaign (even if you don&#8217;t redirect the URL) is that you don&#8217;t want to launch the site until the ad goes live, but you want the site to be visible in search results as soon as the ad goes live. And unfortunately, you can&#8217;t have both. The hangintherejack.com site noted above experienced this issue. It wasn&#8217;t indexed in Google until six hours after the commercial aired (and the site launched). For a site to be crawled, indexed, and ranking within six hours of launch seems pretty quick. Unless you&#8217;ve just spent millions on a Super Bowl commercial that&#8217;s caused the audience to search for the site in Google. You can, of course, mitigate this problem by buying paid search ads. But this blog post isn&#8217;t about how to work around microsite issues. It&#8217;s about why microsites can be problematic.</p>
<p>But, I can hear you asking. Wouldn&#8217;t an advertiser always have this problem, even if they just launched promotion-related content on their main site? Well, yes and no. At the very least, the domain is already known and being actively crawled by the search engines, so you increase your chances of a quick crawl of the new content, particularly if you link it from your home page as soon as it goes live. You can also launch the pages early (without all of the promotion-related content) and ensure the pages include the words that correspond to the queries the promotion will likely trigger, then swap out the content when the ad goes live.</p>
<p>For ad campaign-related web content, you always have to think through the implementation to ensure you leverage search interest, but your options are more limited when you&#8217;re dealing with a microsite.</p>
<p><strong>You don&#8217;t get the search engine value you think you get</strong></p>
<p>This is the crux of the issue in the case illustrated by drugstore.com. They aren&#8217;t launching microsites because they are working with an ad agency on creative for a campaign and it&#8217;s too difficult to get internal engineers to add content to their website. And they aren&#8217;t building a completely difference business for an entirely new audience. They&#8217;re launching entire business verticals for the same audience as their primary site on keyword-rich domains. Why? It can&#8217;t be for the type-in traffic.  Even <a href="http://www.elliotsblog.com/comparing-search-volume-to-traffic-54251" onclick="pageTracker._trackPageview('/outgoing/www.elliotsblog.com/comparing-search-volume-to-traffic-54251?referer=');">those who specialize in the domain busines</a>s will tell you that <a href="http://www.qualitynonsense.com/2477/domain-type-in-traffic/" onclick="pageTracker._trackPageview('/outgoing/www.qualitynonsense.com/2477/domain-type-in-traffic/?referer=');">type in traffic is on a serious decline</a>. We can see this with the Dockers commercial from the 2010 Super Bowl. A URL was the number two spiking query on Google that day.</p>
<p><a title="Dockers Super Bowl Interest by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4600168734/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4600168734/?referer=');"><img src="http://farm2.static.flickr.com/1393/4600168734_d449b2a9f3.jpg" alt="Dockers Super Bowl Interest" width="500" height="185" /></a></p>
<p>People use search engines as primary navigation for the web even when they already know the web address.</p>
<p>Generally, when I work with companies who want to use a bunch of keyword-rich domains, it&#8217;s because they think there&#8217;s some inherent search engine value in the domains themselves. This assumes that the brilliant PhDs at Google think to themselves: &#8220;Huh. This domain is cheaponlinebooks.com. It totally must be the most relevant result for [cheap online books] queries. After all, the words are right in the domain name!&#8221; However, as it turns out, this has been a technique used by spammers since the days of stone tablets and chisels. Or, OK. Since at least 1995. The <a href="http://www.highrankings.com/50-microsites" onclick="pageTracker._trackPageview('/outgoing/www.highrankings.com/50-microsites?referer=');">search engines are onto it</a>. (Well, <a href="http://www.seroundtable.com/archives/020382.html" onclick="pageTracker._trackPageview('/outgoing/www.seroundtable.com/archives/020382.html?referer=');">maybe not Bing quite yet</a>.)</p>
<p>There is so much super valuable content on domains that aren&#8217;t keyword rich and there is so much spammy, crappy content on keyword-rich domains that Google just doesn&#8217;t find it useful as a relevance signal.</p>
<p>Keywords <em>can </em>indirectly help when they&#8217;re in the URL because you&#8217;ll get anchor text credit for any URL-only links. But that really has nothing to do with the domain, so why not just use keyword-rich URLs on your main domain and get those benefits without incurring all of the drawbacks of microsites?</p>
<p>People also sometimes think operating multiple domains will help search engine rankings in other ways, such as that you can link to yourself for instant PageRank credit! Or that you can dominate the search results with all those domains. I hate to be the one to break the news, but search engines are on to those things too. Over time, search engines generally can figure out when sites are part of an owned network and then treat them accordingly (which is similar to how they would treat the content if it were all part of one site). And if you now want to ask how do they know so you can figure out how to hide it, then you&#8217;re getting dangerously close to thinking about search engine manipulation. Maybe <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35769#3" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=35769_3&amp;referer=');">you should read this</a> and then come back.</p>
<p>Certainly, you&#8217;ll find many out there who swear up and down that <a href="http://www.seomoz.org/blog/its-a-feeding-frenzy-for-keywordrich-domains" onclick="pageTracker._trackPageview('/outgoing/www.seomoz.org/blog/its-a-feeding-frenzy-for-keywordrich-domains?referer=');">having keywords in the domain</a> makes a big difference. I think mostly this isn&#8217;t the case. That any examples of keyword-rich domain names ranking well are also a case of the content on those domains actually being the most relevant result for a set of queries. Even if it did work, it would presumably only work for exact match, so you&#8217;d need a lot of domains to make up to set of queries you really want to rank for. That sounds exhausting. I also think that this is the kind of ranking signal that&#8217;s likely being tweaked all the time, and even if it works for a time, it&#8217;s a poor foundation for a long term business strategy.</p>
<p>But just as importantly, once you start focusing one building your business based on perceived signals in the search engine algorithms, you&#8217;ve lost sight of why you&#8217;re building the business in the first place and of your customers and while this may seem like a minor diversion, it may take you down a completely different path than the one that&#8217;s based on building substantial user value.</p>
<p><a title="Business Goals by vanessafox, on Flickr" href="http://www.flickr.com/photos/vanessafox/4599643095/" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4599643095/?referer=');"><img src="http://farm2.static.flickr.com/1307/4599643095_a110434a19.jpg" alt="Business Goals" width="500" height="389" /></a></p>
<p>Suddenly you&#8217;ve got a set of spam tactics rather than a business model.</p>
<p>So do keyword-rich domains have any value? Maybe. If you are starting a brand new site and can pick any domain name you want, in some cases it may make sense to go with a keyword-rich one. It will be memorable, easy to type, and will encourage useful anchor text. It might also encourage click through for URL-only links as it may be more obvious what the site is about. And if you have or can acquire a bunch of keyword-rich domains related to your industry, you may as well redirect them to your main site to capture any type-in traffic they happen to get (although don&#8217;t expect any SEO benefit from this).</p>
<p>But launching a whole bunch of keyword-rich microsites related to your industry in hopes that you&#8217;ll get all those microsites ranking separately for variations of query? Probably not the awesomest idea ever.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/microsites-a-bad-idea-most-of-the-time/feed/</wfw:commentRss>
		<slash:comments>63</slash:comments>
		</item>
		<item>
		<title>Should Restaurants Care About Local Search?</title>
		<link>http://www.ninebyblue.com/blog/social-media/should-restaurants-care-about-local-search/</link>
		<comments>http://www.ninebyblue.com/blog/social-media/should-restaurants-care-about-local-search/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 01:26:00 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[SEO]]></category>
		<category><![CDATA[Social Media]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1094</guid>
		<description><![CDATA[Last week, I gave a workshop about local search at O&#8217;Reilly&#8217;s Where 2.0 conference. (I also did a short video on the topic.) One of the things I talked about was how important it is for local businesses to be visible in web search and map search results. After all, over 90% of us online [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, I gave a workshop about <a href="http://en.oreilly.com/where2010/public/schedule/detail/12388" onclick="pageTracker._trackPageview('/outgoing/en.oreilly.com/where2010/public/schedule/detail/12388?referer=');">local search</a> at O&#8217;Reilly&#8217;s Where 2.0 conference. (I also did a <a href="http://www.youtube.com/watch?v=zgmRWl89_F0" onclick="pageTracker._trackPageview('/outgoing/www.youtube.com/watch?v=zgmRWl89_F0&amp;referer=');">short video on the topic</a>.) One of the things I talked about was how important it is for local businesses to be visible in web search and map search results. After all, over 90% of us online use search engines to find information, and generally, those search engines are the major ones, rather than specific verticals. Microsoft research has found that <a href="http://searchengineland.com/live-blogging-microsoft-searchification-day-2007-12283" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/live-blogging-microsoft-searchification-day-2007-12283?referer=');">86% of searchers start at a major search engine</a> when shopping online.Even when consumers plan to purchase offline, they often go online first. <a href="http://searchengineland.com/offline-conversions-how-to-measure-the-real-roi-of-paid-search-in-a-multi-channel-world-37801" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/offline-conversions-how-to-measure-the-real-roi-of-paid-search-in-a-multi-channel-world-37801?referer=');">42% of retail sales in 2009</a> were online or &#8220;web-influenced&#8221; ($917 billion in US sales were &#8220;web-influenced&#8221;). And more specific to local business, <a href="http://www.mediapost.com/publications/?art_aid=99952&amp;fa=Articles.showArticle" onclick="pageTracker._trackPageview('/outgoing/www.mediapost.com/publications/?art_aid=99952_amp_fa=Articles.showArticle&amp;referer=');">63% of consumers</a> use the internet to find local businesses, but only 44% of local businesses have a web site. That same study also found that <a href="http://www.mediapost.com/publications/?art_aid=99952&amp;fa=Articles.showArticle" onclick="pageTracker._trackPageview('/outgoing/www.mediapost.com/publications/?art_aid=99952_amp_fa=Articles.showArticle&amp;referer=');">50% of us turn to search engines first</a> for local business information, vs. 24% who turn to the Yellow Pages first.</p>
<p>After the Where 2.0 session, an attendee came up to me and ask me about restaurants. Is search really important, he wondered. Surely social media is where restaurants should concentrate efforts. After all, a new restaurant needs to raise hyperlocal awareness and no one is going to search for the restaurant name they&#8217;ve never heard of it. He suggested a Facebook campaign that engages 100 consumers from the local neighborhood might be the best way to promote a new restaurant.</p>
<p><strong>An &#8220;And&#8221; Strategy, Not an &#8220;Or&#8221; Strategy</strong></p>
<p>First, I recounted what <a href="http://www.kaushik.net/avinash/" onclick="pageTracker._trackPageview('/outgoing/www.kaushik.net/avinash/?referer=');">Avinash Kaushik</a> noted at the <a href="http://outspokenmedia.com/internet-marketing-conferences/keynote-search-union/" onclick="pageTracker._trackPageview('/outgoing/outspokenmedia.com/internet-marketing-conferences/keynote-search-union/?referer=');">SMX keynote panel</a> that he and I were both on a few weeks ago. Social media hasn&#8217;t <em>replaced</em> search.  The question isn&#8217;t search <em>or</em> social media. The question is where are your customers. Certainly for a business such as a restaurant, social media may be a great place to reach new customers, but those same customers are likely searching as well. Overall search volume was <a href="http://www.comscore.com/Press_Events/Press_Releases/2010/1/Global_Search_Market_Grows_46_Percent_in_2009" onclick="pageTracker._trackPageview('/outgoing/www.comscore.com/Press_Events/Press_Releases/2010/1/Global_Search_Market_Grows_46_Percent_in_2009?referer=');">up 46% in 2009</a>, so it&#8217;s definitely not something that&#8217;s going away. (You can see Avinash and I <a href="http://live.webpronews.com/avinash-kaushik-the-analytic-evangelist-for-google-adwords-vanessa-fox-contributing-editor-of-se/" onclick="pageTracker._trackPageview('/outgoing/live.webpronews.com/avinash-kaushik-the-analytic-evangelist-for-google-adwords-vanessa-fox-contributing-editor-of-se/?referer=');">talk more about this</a>.)</p>
<p>Think about who you&#8217;re trying to reach. Initially, you want to raise overall awareness. Social media is great for this (as you&#8217;ll see in a minute). But what about this scenario?</p>
<blockquote><p>A woman is reading Twitter and sees that a new restaurant has opened up nearby. Later, when she and her husband are trying to decide (yet again!) what to have for dinner, she remembers the new restaurant. Finally, a new idea! She suggests it. Her husband says great, but what&#8217;s on the menu? Will I like it? The woman does a quick search on Google for the name of the restaurant to see if the web site has the menu. Huh. The restaurant doesn&#8217;t come up. She goes back to Twitter and starts scrolling back through the tweets, trying to find the right one. In the background, her husband is getting hungry. And after waiting a few minutes, he picks up the phone and orders a pizza.</p></blockquote>
<p>And as a restaurant owner, you want to be discoverable long term. Your potential customers (locals and visitors) might be searching for [mexican restaurant seattle]. Or even [best mexican restaurants in seattle]. Social media is great for recommendations from friends, but it&#8217;s not always searchable and you can&#8217;t always get the immediate answers you need when your husband has the phone in hand to order pizza again.</p>
<p><strong>A Holistic Search and Social Media Strategy</strong></p>
<p>You don&#8217;t have to choose an &#8220;or&#8221; strategy, because an &#8220;and&#8221; strategy is not that much more effort. You have a web site; you are engaging in social media. The only thing left is to make sure you understand how to be found in search, which primarily consists of:</p>
<ul>
<li>Understanding what your potential audience is searching for</li>
<li>Claiming your maps listings on the major search engines</li>
<li>Ensuring your web site is search-friendly</li>
<li>Leveraging social media to improve search visibility</li>
</ul>
<p>The awesome thing is that all of this is free.</p>
<p><strong>A Local Example: West Seattle Heartland Cafe</strong></p>
<p>A couple of months ago, I found out about a <a href="http://westseattleblog.com/2010/01/update-heartland-cafe-takes-shape-and-asks-for-your-help" onclick="pageTracker._trackPageview('/outgoing/westseattleblog.com/2010/01/update-heartland-cafe-takes-shape-and-asks-for-your-help?referer=');">new restaurant near me from the local neighborhood blog</a>.  This was great use of social media (engaging with local bloggers who already have the attention of the target audience) and a great example of why engaging this way can be important. The restaurant is not only near me, but it&#8217;s directly next door to my bank, grocery store, and drugstore. The building is covered with HUGE &#8220;Heartland Cafe coming soon&#8221; signs. Yet I didn&#8217;t notice it until I read about it on the West Seattle blog.</p>
<p>I then learned that it was finally open by reading a tweet from <a href="http://twitter.com/westseattleblog/status/11548059468" onclick="pageTracker._trackPageview('/outgoing/twitter.com/westseattleblog/status/11548059468?referer=');">@westseattleblog</a>. The restaurant has a <a href="http://twitter.com/cafeheartland" onclick="pageTracker._trackPageview('/outgoing/twitter.com/cafeheartland?referer=');">Twitter account</a>! and a <a href="http://www.heartlandcafeseattle.com/" onclick="pageTracker._trackPageview('/outgoing/www.heartlandcafeseattle.com/?referer=');">web site</a>! These are all great things. But remember the &#8220;and&#8221; strategy. Can the Heartland Cafe be found in search? Sadly not.</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491840910/" title="Google Search Results: Heartland Cafe by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491840910/?referer=');"><img src="http://farm5.static.flickr.com/4044/4491840910_67d9ede4b6.jpg" width="314" height="500" alt="Google Search Results: Heartland Cafe" /></a></p>
<p>So what should they do? Let&#8217;s go through the bullets I listed above.</p>
<p><strong>1. Understand what your potential audience is searching for<br />
</strong>You always want to be found for branded searches. In this case, that would be queries such as [heartland cafe] and [heartland cafe west seattle]. This restaurant probably also wants to be found for things like comfort food, breakfast, brunch, and bar. It&#8217;s important to know how consumers search, and for restaurant related searches, the Google AdWords Keyword Tool (that you don&#8217;t need an AdWords account to use) tells us that searchers look for [breakfast] three times as often as [brunch] and that we often search for [breakfast restaurants].</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491202135/" title="Keyword Research: Breakfast vs. Brunch by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491202135/?referer=');"><img src="http://farm5.static.flickr.com/4023/4491202135_0e1f713b7c.jpg" width="500" height="480" alt="Keyword Research: Breakfast vs. Brunch" /></a></p>
<p>We look for restaurants more than bars and cafes and we&#8217;re often looking for menus and reviews.</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491202311/" title="Keyword Research: Restaurants by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491202311/?referer=');"><img src="http://farm5.static.flickr.com/4047/4491202311_9ee29bdfc9.jpg" width="500" height="369" alt="Keyword Research: Restaurants" /></a></p>
<p>There&#8217;s lots to be learned from search data, but at a quick glance, it seems like Heartland Cafe should talk up its breakfast and provide an online menu.</p>
<p><strong>2. Claim your maps listing<br />
</strong>All of the search engines provide this service, but let&#8217;s use Google as an example. It&#8217;s important to claim your maps listing for many reasons, but two of the best are that people often search directly on the maps page (particularly on mobile devices) and that if search engines determines that a matching map result would be relevant to a web search, they&#8217;ll show it directly in the web search results. Say I&#8217;m driving with some friends and decide to check out this new Heartland Cafe but I don&#8217;t remember exactly where it is. I open up Google Maps and see&#8230; a Toyota dealership.</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491841432/" title="Google Local: Business Listings by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491841432/?referer=');"><img src="http://farm5.static.flickr.com/4042/4491841432_8f49881f27.jpg" width="500" height="317" alt="Google Local: Business Listings" /></a></p>
<p>Not awesome.</p>
<p>How can the Heartland Cafe fix this? They just need to go into <a href="http://www.google.com/local/add/" onclick="pageTracker._trackPageview('/outgoing/www.google.com/local/add/?referer=');">Google&#8217;s Local Business Center</a> and claim their listing. It&#8217;s free and easy. It&#8217;s important to put the business into the right categories and provide complete information. The major search engines get local business data from third parties, so it&#8217;s likely that most businesses already have a listing (the Heartland Cafe doesn&#8217;t because it&#8217;s so new). If the maps list your business already, you can claim ownership, and then complete the listing so that it&#8217;s compelling for your target audience. And as you can see, new businesses should definitely add their listings so they show up right away.</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491841620/" title="Google Local Business Center by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491841620/?referer=');"><img src="http://farm5.static.flickr.com/4028/4491841620_9e1f8b1d89.jpg" width="500" height="329" alt="Google Local Business Center" /></a></p>
<p>Google now has place pages that pull in a great deal of information from the web (such as images and reviews) and enable business owners to provide substantial detail.</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491841888/" title="Google Places: Restaurants by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491841888/?referer=');"><img src="http://farm3.static.flickr.com/2679/4491841888_b3585357e2.jpg" width="500" height="412" alt="Google Places: Restaurants" /></a></p>
<p>You can see with this Coldwell Banker listing that the owner can provide a description, images, a web site, and more.</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491203259/" title="Google Local Claimed Listing by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491203259/?referer=');"><img src="http://farm5.static.flickr.com/4050/4491203259_79678b9543.jpg" width="500" height="375" alt="Google Local Claimed Listing" /></a></p>
<p>Once the Heartland Cafe has created a robust Google Maps listing and has started getting good reviews, they may be able to show up in the local business results for a search such as this one:</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491842374/" title="Google Local One Box by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491842374/?referer=');"><img src="http://farm5.static.flickr.com/4010/4491842374_2d732fab04.jpg" width="500" height="456" alt="Google Local One Box" /></a></p>
<p><strong>3. Ensure your web site is search friendly</strong><br />
Obviously, the first step here is to have a web site, which the Heartland Cafe has. Great! Unfortunately, it&#8217;s not showing up in search results, even for searches for the restaurant name and location. Not great. What&#8217;s going wrong?</p>
<p>It&#8217;s beyond the scope of this post to dive into ensuring that you&#8217;re providing compelling information that engages your audience, but key to this is ensuring your meeting the needs of searchers. Remember step 1 when we found that searchers are looking for menus? The Heartland Cafe&#8217;s web site doesn&#8217;t have one. That will not only limit search visibility, but it won&#8217;t answer one of the primary questions visitors to the site have. And if visitors can&#8217;t see the menu in advance, they may not decide to stop in and try the food.</p>
<p>However, the Heartland Cafe does have some great information on the site (address, including city and state &#8212; key to being seen as relevant for local searches, hours, details on the type of food). So why doesn&#8217;t any of it show up? The primary issues appear to be technical ones.</p>
<p>The individual pages don&#8217;t have corresponding unique URLs. All content loads on a single URL &#8212; <a href="http://www.heartlandcafeseattle.com" onclick="pageTracker._trackPageview('/outgoing/www.heartlandcafeseattle.com?referer=');">www.heartlandcafeseattle.com</a>. This means that search engines can&#8217;t index the content as they don&#8217;t have URLs to associate with that content. In addition, the content can&#8217;t be shared on social media. The site has an events calendar, but if I saw a cool event there and I wanted to post on Facebook about it and invite my friends, I&#8217;d have to tell them to go to the home page, then click events in the sidebar, then click&#8230;  Why is this? Well, the site is entirely in Flash. It absolutely doesn&#8217;t need to be in Flash. The site could keep the exact look and feel it currently has and be in HTML. In fact, Wordpress would be a quick and easy way to replicate the layout.</p>
<p>Normally when I see Flash sites, I recommend ways to combine Flash and HTML or point to ways of building <a href="http://searchengineland.com/google-io-new-advances-in-the-searchability-of-javascript-and-flash-but-is-it-enough-19881" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/google-io-new-advances-in-the-searchability-of-javascript-and-flash-but-is-it-enough-19881?referer=');">search-friendly Flash sites</a>, but in this case, the Flash doesn&#8217;t appear to be providing any benefits and is only detracting from the usability and searchability of the site.</p>
<p>Even with this problem, however, search engines should index the home page. Even though they won&#8217;t be able to extract any content from the pages, they can at least index the information in the title tag and meta description tag. The title tag in this case is &#8220;Heartland Cafe Seattle&#8221;, which is pretty good actually, although it could include a descriptor such as &#8220;classic midwestern comfort food&#8221;. But the <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35264" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=35264&amp;referer=');">meta description tag</a> is missing entirely. A good meta description for this page might be &#8220;West Seattle&#8217;s best midwestern comfort food in the heart of the Admiral district for breakfast, dinner, late nights, and delicious cocktails.&#8221;</p>
<p>They can at least let search engines know the site exists by <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=156184" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=156184&amp;referer=');">submitting a Sitemap file</a>. This file can be as simple as a text file that lists the URLs of the site. (This step will be more useful once they associate individual URLs with each page on the site.)</p>
<p>There are lots of other technical and content-focused things this business can do, but none of them will make much difference until the pages of the site have their own URLs.</p>
<p><strong>4. Leverage social media to improve search visibility</strong><br />
Not only does social media help you engage with audiences, but it increases your search visibility. In this case, the web site has some real problems it needs to fix before it can be found in search. But in the meantime the business owners can take better advantage of social media. Add the web site address to the Twitter profile. Add a Facebook fan page. As you can see from the earlier screenshot of the search results for a search for a restaurant name, the business is only visible at all because of social media. Address the reviews on Yelp so potential customers know that you care. The Yelp page is, after all, the third result in searches for the restaurant name.</p>
<p>Does being found in search engines really matter for a local business such as a restaurant? We may not need to go farther than the search results themselves for an answer:</p>
<p><a href="http://www.flickr.com/photos/vanessafox/4491203547/" title="Do Search Results Matter for Local Restaurants? by vanessafox, on Flickr" onclick="pageTracker._trackPageview('/outgoing/www.flickr.com/photos/vanessafox/4491203547/?referer=');"><img src="http://farm5.static.flickr.com/4046/4491203547_be912c349e.jpg" width="500" height="102" alt="Do Search Results Matter for Local Restaurants?" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/social-media/should-restaurants-care-about-local-search/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Jane and Robot Meet up At Yahoo Santa Monica, Wednesday, August 19th</title>
		<link>http://www.ninebyblue.com/blog/seo-blog/jane-and-robot-meet-up-at-yahoo-santa-monica-wednesday-august-19th/</link>
		<comments>http://www.ninebyblue.com/blog/seo-blog/jane-and-robot-meet-up-at-yahoo-santa-monica-wednesday-august-19th/#comments</comments>
		<pubDate>Wed, 19 Aug 2009 08:06:43 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=897</guid>
		<description><![CDATA[If you&#8217;re in the Santa Monica area Wednesday evening, stop by Yahoo between 4:30 and 7 for food, drinks, site reviews, discussion, and Q&#38;A time. We&#8217;ll cover anything you want related to search acquisition, search-friendly web development, content strategy, and web analytics.
Sign up at upcoming!
]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re in the Santa Monica area Wednesday evening, stop by Yahoo between 4:30 and 7 for food, drinks, site reviews, discussion, and Q&amp;A time. We&#8217;ll cover anything you want related to search acquisition, search-friendly web development, content strategy, and web analytics.</p>
<p><a href="http://upcoming.yahoo.com/event/3089265/" onclick="pageTracker._trackPageview('/outgoing/upcoming.yahoo.com/event/3089265/?referer=');">Sign up at upcoming</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/seo-blog/jane-and-robot-meet-up-at-yahoo-santa-monica-wednesday-august-19th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Branded Search</title>
		<link>http://www.ninebyblue.com/blog/branded-search/</link>
		<comments>http://www.ninebyblue.com/blog/branded-search/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 17:44:14 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=885</guid>
		<description><![CDATA[Sometimes, we take branded search (searches for your brand name or domain name) for granted. We just assume that people who know our name will find us.
Today, I wanted to book a flight on BMI (more accurately known as British Midlands Airways). First, I tried direct type in to bmi.com. That was not correct. (One [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes, we take branded search (searches for your brand name or domain name) for granted. We just assume that people who know our name will find us.</p>
<p>Today, I wanted to book a flight on BMI (more accurately known as British Midlands Airways). First, I tried direct type in to bmi.com. That was not correct. (One of the many reasons no one types in domains anymore and everyone just relies on search.) So, I did a search for [bmi]. Huh. Nowhere to be found.</p>
<p><a href="http://www.ninebyblue.com/wp-content/uploads/2009/08/bmi.jpg"><img class="alignnone size-full wp-image-887" title="BMI Search" src="http://www.ninebyblue.com/wp-content/uploads/2009/08/bmi.jpg" alt="" width="500" height="512" /></a></p>
<p>Refining my search to [bmi airways] does bring up the site, but I could have just as easily refined my search to [aer lingus] or (please never let it come to this) [ryan air].</p>
<p>Because I&#8217;m a search geek, rather than just booking my flight, I also checked out the rankings for the BMI site (which, by the way, is flybmi.com). While it ranks #6 on Yahoo for its brand name, it comes in at a sad #128 on Google and doesn&#8217;t rank at all on Bing. Searchers who aren&#8217;t as hip to refinements and instead rely on paging through results will be looking for a long time.</p>
<p>I didn&#8217;t go so far as to investigate what the problem might be  (it could be a content problem, a linking problem, a technical problem, a penalty issue&#8230;), but perhaps BMI should.</p>
<p><strong>Update! Commenters have figured it out!</strong></p>
<p>Commenters (below) and <a href="http://twitter.com/willcritchlow/status/3313669101" onclick="pageTracker._trackPageview('/outgoing/twitter.com/willcritchlow/status/3313669101?referer=');">Twitterers</a> have <a href="http://twitter.com/rishil/status/3313747575" onclick="pageTracker._trackPageview('/outgoing/twitter.com/rishil/status/3313747575?referer=');">pointed out</a> that BMI does in fact rank for its name in the UK and Ireland. And I conveniently happen to be in London today and can see for myself that it&#8217;s true.</p>
<p><a href="http://www.ninebyblue.com/wp-content/uploads/2009/08/picture-1.png"><img class="alignnone size-full wp-image-891" title="BMI Results - UK" src="http://www.ninebyblue.com/wp-content/uploads/2009/08/picture-1.png" alt="" width="500" height="388" /></a></p>
<p>As with the US, a bunch of body mass index pages rank next, along with the Brain Mind Institute.</p>
<p>I concede that BMI&#8217;s primary customers are in the UK and Ireland, but they also likely have lots of customers like me who are booking from other countries. BMI surely doesn&#8217;t want to rank #128 for their own brand name on Google. So what are they to do?</p>
<p>I wrote up some thoughts on <a href="http://www.ninebyblue.com/blog/making-geotargeted-content-findable-for-the-right-searchers/">geolocation</a> a while back, but this query points out how tricky things can be. BMI could operate separate sites for each country, each with a specific TLD and have each TLD be an entry to a central reservation system. That might be cumbersome though, so they could at least specify the country location for each folder in Google Webmaster Tools to improve things on Google.</p>
<p>It looks like simply improving their site architecture could also make a difference. An easy clue that this might be an issue is that in the UK, www.flybmi.com/bmi/splash.aspx ranks #1 on Yahoo, wwwflybmi.com ranks #1 with www.flybmi.com/bmi/en-gb/index.aspx (basically the same page) indented below it on Google, and www.flybmi.com/bmi ranks #1 on Bing. Without doing any digging into the site at all, we can see that at least four pages are competing with each other.</p>
<p>I would look into this further, but being in London, I have beer to drink at the pub.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/branded-search/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>This Week: Events in Seattle and San Jose!</title>
		<link>http://www.ninebyblue.com/blog/this-week-events-in-seattle-and-san-jose/</link>
		<comments>http://www.ninebyblue.com/blog/this-week-events-in-seattle-and-san-jose/#comments</comments>
		<pubDate>Mon, 03 Aug 2009 20:20:22 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=881</guid>
		<description><![CDATA[Come on by the Kit Cat Theatre in Seattle tonight for Ignite 7! In five minutes, I&#8217;ll tell you everything you need to know so that you never eat boring and non-delicious food again. Because seriously, life&#8217;s too short to eat bad food.
Then Wednesday evening, I&#8217;ll be hosting a Jane and Robot meetup at Yahoo&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>Come on by the Kit Cat Theatre in Seattle tonight for <a href="http://www.igniteseattle.com/2009/08/tonights-ignite-seattle-line-up/" onclick="pageTracker._trackPageview('/outgoing/www.igniteseattle.com/2009/08/tonights-ignite-seattle-line-up/?referer=');">Ignite 7</a>! In five minutes, I&#8217;ll tell you everything you need to know so that you never eat boring and non-delicious food again. Because seriously, life&#8217;s too short to eat bad food.</p>
<p>Then Wednesday evening, I&#8217;ll be hosting a Jane and Robot meetup at Yahoo&#8217;s headquarters in Sunnyvale. We&#8217;ll have the usual snacks and drinks and I&#8217;ll be answering whatever you throw at me. The in-house Yahoo SEOs will also be on hand to talk about what it&#8217;s like to do SEO at a large corporation that also happens to have a search engine (for now anyway).  <a href="http://upcoming.yahoo.com/event/3089227/" onclick="pageTracker._trackPageview('/outgoing/upcoming.yahoo.com/event/3089227/?referer=');">RSVP on upcoming</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/this-week-events-in-seattle-and-san-jose/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>You&#8217;re Invited To Yahoo! To Chat About Search!</title>
		<link>http://www.ninebyblue.com/blog/seo-blog/youre-invited-to-yahoo-to-chat-about-search/</link>
		<comments>http://www.ninebyblue.com/blog/seo-blog/youre-invited-to-yahoo-to-chat-about-search/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 16:32:50 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=870</guid>
		<description><![CDATA[I&#8217;m not positive, but it seems like there may be some news about Yahoo! floating around today. I only scanned the headlines, but I think it has something to do with the Jane and Robot meetups at the Yahoo! Sunnyvale and Santa Monica offices in August.
Next Wednesday, August 5th, come on over to the Yahoo! [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not positive, but it seems like there may be <a href="http://searchengineland.com/its-finally-official-microsoft-yahoo-make-a-deal-yahoo-gives-up-on-search-23197" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/its-finally-official-microsoft-yahoo-make-a-deal-yahoo-gives-up-on-search-23197?referer=');">some news</a> about <a href="http://www.techmeme.com/090729/p25#a090729p25" onclick="pageTracker._trackPageview('/outgoing/www.techmeme.com/090729/p25_a090729p25?referer=');">Yahoo!</a> floating around today. I only scanned the headlines, but I think it has something to do with the Jane and Robot meetups at the Yahoo! Sunnyvale and Santa Monica offices in August.</p>
<p>Next Wednesday, August 5th, come on over to the Yahoo! Sunnyvale campus from 6pm to 8pm and then Wednesday, August 19th, stop by the Santa Monica campus from 5:30pm to 8pm. As with all of our <a href="http://janeandrobot.com/meetups" onclick="pageTracker._trackPageview('/outgoing/janeandrobot.com/meetups?referer=');">Jane and Robot meetups</a>, we&#8217;ll have lots of drinks, snacks, and knowledge and all for free. Can&#8217;t beat free! </p>
<p>If you&#8217;re a web developer, search marketer, entrepreneur, work at a startup, work at a large company, have a web site, are thinking about having a web site, blog, or otherwise do stuff with the internet, come by and get answers to your hardest questions about search.</p>
<ul>
<li>Sign up for the <a href="http://upcoming.yahoo.com/event/3089227/" onclick="pageTracker._trackPageview('/outgoing/upcoming.yahoo.com/event/3089227/?referer=');">Sunnyvale event</a></li>
<li>Sign up for the <a href="http://upcoming.yahoo.com/event/3089265/" onclick="pageTracker._trackPageview('/outgoing/upcoming.yahoo.com/event/3089265/?referer=');">Santa Monica event</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/seo-blog/youre-invited-to-yahoo-to-chat-about-search/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
