I come across bad SEO advice all the time. Much of it may seem obvious to those of us who have been involved in search for any length of time, but for people who haven’t, it can be difficult to know what’s concrete advice, what’s speculation, and what’s just plain terrible. For that matter, it can be difficult for those outside of SEO to know what’s smart and what’s considered search engine manipulation.
I was in a meeting a few days ago and someone asked if it was true that for SEO purposes, a page should have as few outbound links as possible. I said outbound links were fine, great even! And then talked a bit about how it’s a bad idea to build pages for nuances in the search engine algorithms anyway, as hundreds of signals exist and they’re changing all the time. Oh, he said. We’ve been talking about implementing the canonical tag. We probably shouldn’t do that then. And I realized, how would a developer know that the canonical tag is awesome and the meta keywords tag isn’t? That you shouldn’t worry about keyword density but you should put important keywords in your title tag?
Recently, someone sent me an “SEO optimization report” for their site that came from automated software that guaranteed top ten rankings in 90 days. Some of the advice was good (use unique title tags), some was harmless (improve your Flesch readability ease score), and some was just crazy talk. Below is a bit of the crazy.
“You should increase your keyword density. You can do this by removing some text.”
This whole notion of keyword density has been around forever, but here’s what it really boils down to. How is your potential audience looking for this content? Put those words in your title tag, H1, and somewhere on the page. And use those words as anchor text in internal links to that page. If other sites link to the page using that anchor text, even better! It’s bad enough when people try to get the “right” keyword density by nonsensically repeating the same words over and over on a page, but removing other text? That’s just sad.
“Keywords in the HTML comment tags help a good ranking in Google.”
Um. Not really.
“Some search engines penalize sites if the terms from the meta keywords tag don’t appear in the body of the page.”
Well, first, search engines (in particular, Google) ignore the meta keywords tag. And also, this statement isn’t true.
“Your page includes the meta Google-Site-Verification tag twice. Search engines could regard it as a spamming attempt and might decide not to index your web site.”
Wow. I assume this is simply a case of automation going awry and whoever wrote this software doesn’t actually think that having two verified Google Webmaster Tools accounts will cause Google to remove the site from the index. But even so, having duplicate meta tags of any kind doesn’t cause Google or Bing to flag the site for spam. I mentioned this was all about the crazy, right?
“Some search engines don’t accept submissions with capitalized letters in titles or meta tags.”
Maybe someone more familiar with old school directories can weigh in on where this comes from. But recommending that your title tags not contain capital letters? This may be automated software, but someone manually wrote that message.
“Some search engines rank sites lower that are hosted at free hosting providers.”
No.
PS – Creative use of bold won’t actually help. And question marks in URLs are just fine.
Note: This post was originally posted on Jane and Robot in May 2008 and is being temporarily stored here.
A picture is worth a thousand words. Unfortunately, when it comes to major search engines (which are still primarily text-based), a picture is worth a lot of blank space. Does this mean you shouldn’t use images on your site if you want to rank in search? Not at all. Just keep some simple things in mind when adding those images to your pages. As a bonus, these tips help not only with search engine robots, but with Jane as well! You want your site to be accessible in screen readers, to those who have images turned off in their browsers, and to those who have slow connections or are on mobile browsers and may have trouble loading images.
By providing search engine robots with textual information about the images on your site, your site can benefit not only from better placement in web search results, but in image search results also. Image Seach can provide substantial search traffic, so don’t overlook this as an acquisition channel.
Below are recommendations for using images effectively for both Jane and search engine robots.
Put text in straight HTML whenever possible. Sometimes web designers like to put text in images because they can use a wider variety of fonts and can manipulate the design more freely. Much of this styling can be done with CSS and in cases where it can’t, the extra design a graphical version of the text provides may not really add visitor value. In fact, it may detract from usability because it may be difficult to read. It also may hurt viral efforts since it can’t be copied and pasted. If I want to send an email to all of my friends suggesting we all go to a hot new restaurant, I may want to copy and paste a few menu items from the restaurant’s web site to send to them. If the menu is in an image, I can’t do that.
The most well-known method for making images accessible is effective use the ALT attribute in the IMG element. And yet it’s very common to find empty ALT tags all over the web.
<img src="/images/lavender-plant.jpg" alt="Picture of a lavender plant">
What about the TITLE attribute? It likely doesn’t provide direct search engine value, although it may be useful for your visitors.
If possible, describe the image in name of the image file. For instance, lavender-plant.jpg is better than image123.jpg. If you are importing a lot of images, for instance, for a product database, it may be problematic to manually name each file. In this case, find programmatic ways to rename the images using text from how the images are tagged or categorized. If your filename includes multiple words, use hyphens to separate them (search engines tend to see a hyphen as a separator and an underscore as a joiner (so lavender_plant would be seen as one word and lavender-plant would be seen as two).
Provide a caption below or above the image that describes what it’s about and gives context for how it relates to the rest of the page.
Try to include text around the image that relates to what the image is about. Text on the page helps search engines know what the page itself is about, which helps the page rank for relevant queries, but text near images can help those images rank in image search results as well.
If you use images in menus and other navigation, make sure that you use ALT text that replicates how the image represents that menu option. But also test the implementation by turning off images in your browser and making sure the links still work. Some implementations incorrectly require images to be enabled, causing search engine robots to be unable to follow those links.
Another potential usability issue with images and navigation is that if you use a textual link combined with a background image, the text may disappear if the image doesn’t load. (This issue can happen with this type of design in places other than menus, but that scenario is where it can be commonly seen.)
|
Navigational Link With Images Enabled
|
Navigational Link with Images Disabled |
![]() |
![]() |
Many web sites use an image for the header of the page or for the company logo. This implementation works well, but be sure that you replicate the company name, heading text, or other words from that image in the ALT text.
<h1>Company Name</h1>
The CSS for this implementation positions the text at -999em. This is not recommended both because it means that when a visitor loads the page with images turned off, the text can’t be seen (and so the heading space is simply blank) and because search engines may find the practice deceptive (the text is hidden).
.home-logo {
background:transparent url(/images/logo1.gif)
no-repeat scroll center top;
height:63px;
margin-top:35px;
text-indent:-99999em;
}

If you use a lot of non-content images (for instance, arrows, bullets, and boxes), you likely don’t want those indexed. Since search engine robots spend limited time crawling each site, it may make sense to block them from crawling these types of images so they can spend all the available resources on the pages and images you do want indexed. As a bonus, if you want to provide an image search on your site (for instance, using the Live Search API), if only content images are indexed, then the image results will be more useful for your visitors.
A good way to block non-content images is to place them in a separate folder from your content images and then block that folder using robots.txt. For instance, if you place these images in a folder calledno_index_images, your robots.txt file would contain:
User-agent: *
Disallow: /no_index_images/
With a little planning and good structure, you can effectively use images on your site in ways that benefit both Jane and robots. And by optimizing images in the ways described in this article, you may also be able to tap into an additional acquisition channel – image search.
Note: This post was originally posted on Jane and Robot in November 2008 (by Nathan Buggia) and is being temporarily stored here.
There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this way can result in significant issues with search engines. In particular, it can cause duplicate content issues (since the search engine bot finds multiple valid URLs that point to the same page) and ranking issues (since all the links to the page aren’t to the same URL).
Let’s say that Jane and Robot uploaded two different online training seminars to YouTube as part of a viral marketing effort to drive more traffic to our site. To gauge our return on investment from each of these seminars, we’ve added a tracking parameter to the link within each YouTube description that a customer can click on to learn more, here are the two URLS: http://janeandrobot.com/?from=promo-seminar-1 and http://janeandrobot.com/?from=promo-seminar-2. Each would bring the customer to our home page (the same page served by http://janeandrobot.com) and we would track the conversions based on the from parameter in the URL.
While this solution may seem to work well initially, it can result in low quality tracking data and impact our search acquisition. Here’s a summary of the major problems:
Unfortunately there is no perfect solution for this scenario, and what works best for you depends on your infrastructure and situation. Here we’ve listed several common solutions that you can choose from to improve your own implementation. We generally recommend the first solution (Redirects), but there are pros and cons to each option that you should review carefully before making your decision.
The first option strives to solve the problem by trapping all of the promotional requests, recording the tracking information, then removing the tracking parameter from the URL. This can be time consuming to implement, but it is the best all-round scenario to address the three major issues listed above.
If you wanted to get fancy, and track a user’s entire session based on your referral parameter, then you can use this method as well and simply set a cookie on the client machine at the same time you trap the request. This is recommended to understand the value of traffic from different sources. In either case, here are the steps you’ll need to undertake:
1. Trap the incoming request - find where you web site application’s logic processes the HTTP request for your page. Trap each request at that point and check if it has a tracking parameter. If it does, record this in your internal referral tracking system. You can record this either in your server logs, or in a custom referral tracking database you maintain on your own.
2. Implement the redirect - next step is to implement a 301 redirect from the current URL to the same page without the tracking parameter (or the canonical URL). Don’t for get to use the cache-control attribute in the HTTP header to ensure that all the requests come to your server and don’t get handled automatically in some network-based cache. Here’s what a sample redirect header might look like:
301 Moved Permanently
Cache-Control: max-age=0
Note that ASP.Net and IIS both use 302 redirects by default, so you many need to manually create the 301 response code.
The way this works is that when a search engine encounters a promotional URL (http://janeandrobot.com/?from=promo-seminar-1) it issues an HTTP GET request to the URL. The HTTP response tells the search engine that this page has been permanently moved (301 Redirect) and provides the new address (the same as the old address but without the tracking parameter). The search engine then discards the first URL (with the tracking code) and only stores the second URL (without the tracking code). And everything is right in the world.
This implementation is one of the best options, but it does have some limitations:
Possibly the simplest option to solve this issue is to take advantage of a new standard recently adopted by Google, Yahoo and Microsoft Live Search. Their solution to this problem is to use a new attribute of the <link /> tag to explicitly tell them what the canonical URL for the page is. Assuming the <link /> tag has been created correctly, the search engines will treat this like the a 301 redirect to the canonical URL.
Here’s an example of using this tag:
<html>
<head>
<link rel="canonical" href="http://janeandrobot.com" />
</head>
</html>
Here’s a few notes about implementing this tag:
While this implementation seems a little too good to be true, there are a few potential downsides. The first is that if you implement it incorrectly, the search engines will simply ignore it, and that could be complicated to debug. The other issue is that it fixes issues #1 (duplicate content) and #2 (ranking) but does nothing to fix the 3rd issue of reporting. Still, given all of that I would likely implement this option first and do the others when I had some spare dev cycles.
A simple and elegant option is to simply place the tracking parameter behind a hash mark in the URL, creating a URL fragment. Traditionally, these are used to denote links within a page, and are ignored completely by search engines. In fact, they simply truncate the URL fragment from the URL.
Old URL
New URL with URL Fragment
By default Google Analytics will ignore the fragment as well, however there is a simple work around that was provided to us by Avinash Kaushik, Google’s web metrics evangelist. Using the following JavaScript:
var pageTracker = _gat._getTracker("UA-12345-1");
// Solution for domain level only
pageTracker._trackPageview(document.location.pathname + "/" + document.location.hash);
// If you have a path included in the URL as well
pageTracker._trackPageview(document.location.pathname + document.location.search +
"/" + document.location.hash);
You can create a few additional variations of this if you also have additional queries in the URL you would like to track. Check with your web analytics provider to find out if you need to customize your implementation to account for using URL fragments for tracking.
Does this sound too simple and easy to be true? There are a couple downsides to this approach:
Another relatively simple solution is to use robots.txt to ensure that search engines are not indexing URLs that contain tracking parameters. This method enables you to ensure that the original (canonical) version of the URL is always the one indexed and avoids the duplicate content issues involving indexing and bandwidth.
Assuming that all of our tracking parameters will follow a similar pattern to this:
http://janeandrobot.com/?from=<PromoID>
we can easily create a pattern that will match for this. Below is a robots.txt file that implements the pattern:
# Sample Robots.txt file, single query parameter
User-agent: *
Disallow: /?from=
The first line means that this rule should apply to all search engines (or robots crawling your site), and the second line tells them that they can’t index any URLs that start with ‘janeandrobot.com/?from=’ and some type of promotional code of any length. See complete information on using the Robots Exclusion Protocol. Use this pattern if you will have multiple query parameters:
# Sample Robots.txt file, multiple query parameters
User-agent: *
Disallow: /*from=
Once you’ve implemented the pattern appropriate for your site, you can easily check to see if it is working correctly by using the Google Webmaster Tools robots.txt analysis tool. It enables you to test specific URLs against a test robots.txt file. Note that although this tool tests GoogleBot specifically, all the major search engines support the same pattern matching rules. In Google Webmaster Tools:
At this point you may be thinking, wow, I can do all this and not have to write any new code? Unfortunately, there are even more downsides to this approach than the others:
Yahoo provides an online tool designed to solve this scenario. However, the solution only helps with Yahoo search traffic. To use the Yahoo fix, simply go to http://siteexplorer.search.yahoo.com and create an account for your web site in the Yahoo Site Explorer tool. Once you’ve verified ownership of your web site, you can use their Dynamic URL Rewriting tool to indicate which parameters in your URLs Yahoo should ignore.
Simply specify the name of the parameter you use for referral tracking (in our example it is ‘from’), and set the action ‘Remove from URLs’. Yahoo will then remove that parameter from all of your URLs while processing them and give you a handy little report about how many URLs where impacted.
Again, this is another solution that seems too easy to be true, but again, there are some significant limitations with this approach:
Some web sites and SEO consultants attempt to solve this by a technique called cloaking or conditional redirects. Essentially what these methods do is check if the HTTP GET request is coming from a search engine and then show them something different than normal users see. This something different could be a simple 301 redirect back to the page without the tracking parameter similar to our first solution above. The difference is that our solution implemented this redirect for all requesters, and cloaking/ conditional redirects implement it only for search engines.
The big problem with this implementation method is that cloaking and conditional redirects are explicitly prohibited in the webmaster guidelines for Google, Yahoo and Live Search. If you use this method, you risk your pages being penalized or banned by the search engines. The primary reason they prohibit this behaviors is because they want to know exactly what content they are presenting searchers using their service. When a web site shows something different to a search engine robot than to a general user, a search engine can never be sure what the user will see when they go to the web site. So, even if you’re thinking of implementing cloaking for what seems to be a valid, and not deceptive, reason, it’s still a technique search engines strongly discourage.
This leads to the second major problem with this implementation method – it adds significant complication and can be difficult to monitor whether or not it’s working – e.g. you have to test it pretending to be each of the 3 search engines robots. When things go wrong, it is likely that you’re not going to see it right away, and by the time you do, your search engine traffic may already be impacted. Check out this example when Nike ran into anissue with cloaking.
Many studies on the web that show customers prefer short, understandable URLs over long complicated ones, and are more likely to click on them in the search results. In addition, users prefer descriptive keywords in URLs. Therefore, it might be worth your time to spend a few extra minutes thinking about the tracking codes you use to see if you can make them friendlier.
Good examples
Bad examples
So you’ve implemented your new favorite method, it compiles on your dev box, and now it’s time to roll it into production, right? Maybe not! The initial goal of referrer URL-based tracking was to understand where your traffic was coming from so you can use that information to optimize your business. To ensure the data your collecting is actually useful, we highly recommend that you do some testing to ensure that all the common scenarios are working the way you expect, and you know where the holes are in your measurement capabilities. As with all metrics on the web, there will be holes in your data so you need to know what they are and account for them.
The first step in testing the implementation is to try it with a test parameter, walking the full scenario through start to finish.
Note: This post was originally posted on Jane and Robot in June 2008 and is being temporarily stored here.
Controlling what content is blocked from being found in search engines is crucial for many websites. Fortunately, the major search engines and other well-behaved robots observe the Robots Exclusion Protocol (REP), which has evolved organically since the early 1990’s to provide a set of controls over what parts of a web site search engines robots can crawl and index.
Article Sections:
The Robots Exclusion Protocol provides controls that can be applied at the site level (robots.txt), at the page level (META tag, or X-Robots-Tag), or at the HTML element level to control both the crawl of your site and the way it’s listed in the search engine results pages (SERPs). Below is a table listing the common scenarios, directives, and which search engines support them.
| Use Case | Robots.txt | META/ X-Robots-Tag | Other | Supported By |
| Allow access to your content | Allow | FOLLOW INDEX |
Google Yahoo Microsoft |
|
| Disallow access to your content | Disallow | NOINDEX NOFOLLOW |
Google Yahoo Microsoft |
|
| Disallow access to index images on the page | NOIMAGEINDEX | |||
| Disallow the display of a cached version of your content in the SERP | NOARCHIVE | Google Yahoo Microsoft |
||
| Disallow the creation of a description for this content in the SERP | NOSNIPPET | Google Yahoo Microsoft |
||
| Disallow the translation of your content into other languages | NOTRANSLATE | |||
| Do not follow or give weight to links within this content | NOFOLLOW | a href attribute: rel=NOFOLLOW |
Google Yahoo Microsoft |
|
| Do not use the Open Directory Project (ODP) to create descriptions for your content in the SERP | NOODP | Google Yahoo Microsoft |
||
| Do not use the Yahoo Directory to create descriptions for your content in the SERP | NOYDIR | Yahoo | ||
| Do not index this specific element within an HTML page | class=robots-nocontent | Yahoo | ||
| Stop indexing this content after a specific date | UNAVAILABLE_AFTER | |||
| Disallow the creation of enhanced captions | NOPREVIEW | Microsoft | ||
| Specify a sitemap file or a sitemap index file | Sitemap | Google Yahoo Microsoft |
||
| Specify how frequently a crawler may access your website | Crawl-Delay | Google WMT | Yahoo Microsoft |
|
| Authenticate the identity of the crawler | Reverse DNS Lookup | Google Yahoo Microsoft |
||
| Request removal of your content from the engine’s index | Google WMT Yahoo SE Microsoft WMT |
Google Yahoo Microsoft |
One of the first steps in managing the robots is knowing what type of content should be public vs. private. Start with the assumption that by default, everything is public, then explicitly identify the items that are private.
If you want search engines to access all the content on your site, you don’t need a robots.txt file at all. When a search engine tries to access the robots.txt file on your site and the server can’t return one (ideally by returning a 404 HTTP status code), the search engine treats this the same as a robots.txt file that allows access to everything.
Every website and every business has a different set of needs, so there’s no blanket rule for what to make private, but some common elements may apply.
REP is flexible and can be implemented a number of ways. This flexibility lets you easily specify some policies for your entire site (or subdomain) and then enhance them more granularly at the page or link level as needed.
Site wide directives are stored in a robots.txt file, which must be located in the root directory of each domain or sub-domain (e.g. http://janeandrobot.com/robots.txt.) Note that robots.txt files only apply to the hostname where they are placed, and do not apply to subdomains. So a robots.txt file located on http://microsoft.com/robots.txt will not apply to the MSDN subdomain http://msdn.microsoft.com. However, the robots.txt file does apply to all subfolders and pages within the specified hostname.
A robots.txt file is a UTF-8 encoded file that contains entries that consist of a user-agent line (that tells the search engine robot if the entry is directed at it) and one or more directives that specify content that the search engine robot is blocked from crawling or indexing. A simple robots.txt file is shown below.
User-agent: *
Disallow: /private
user-agent: – Specifies which robots the entry applies to.
* to specify that this entry applies to all search engine robots.user-agent: * (rather than in addition to those entries).The major search engines have multiple robots that crawl the web for different types of content (such as images or mobile). They generally begin all robots with the same name so that if you block the major robot, all robots for that search engine are blocked as well. However, if you want to block only the more specific robot, you can block it directly and still allow web crawl access.
Disallow: - Specifies what content is blocked
/)./. For instance, Disallow: /images blocks access to /images/, /images/image1.jpg, and /images10.You can specify other rules for search engine robots in addition to the standard instructions that block access to content as noted in other robot instructions.
Some things to note about robots.txt implementation:
Disallow: /images would block http://www.example.com/images but not http://www.example.com/Images.Block all robots - Useful when your site is in pre-launch development and isn’t ready for search traffic.
# This keeps out all well-behaved robots.
# Disallow: * is not valid.
User-agent: *
Disallow: /
Keep out all bots by default - Blocks all pages except those specified. Not recommended as is difficult to maintain and diagnose.
# Stay out unless otherwise stated
User-agent: *
Disallow: /
Allow: /Public/
Allow: /articles/
Allow: /images/
Block specific content - The most common usage of robots.txt.
# Block access to the images folder
User-agent: *
Disallow: /images/
Allow specific content - Block a folder, but allow access to selected pages in that folder.
# Block everything in the images folder
# Except allow images/image1.jpg
User-agent: *
Disallow: /images/
Allow: /images/image1.jpg
Allow specific robot - Block a class of robots (for instance, Googlebot), but allow a specific bot in that class (for instance, Googlebot-Mobile).
# Block Googlebot access
# Allow Googlebot-Mobile access
User-agent: Googlebot
Disallow: /
User-agent: Googlebot-Mobile
Allow: /
The major engines support two types of pattern matching.
Block access to URLs that contain a set of characters - Use the asterisk (*) to specify a wildcard.
# Block access to all URLs that include an ampersand
User-agent: *
Disallow: /*&
This directive would block search engines from crawling http://www.example.com/page1.asp?id=5&sessionid=xyz.
Block access to URLs that end with a set of characters - Use the dollar sign ($) to specify end of line.
# Block access to all URLs that end in .cgi
User-agent: *
Disallow: /*.cgi$
This directive would block search engines from crawling http://www.example.com/script1.cgi but not from crawling http://www.example.com/script1.cgi?value=1.
Selectively allow access to a URL that matches a blocked pattern - Use the Allow directive in conjunction with pattern matching for more complex implementations.
# Block access to URLs that contain ?
# Allow access to URLs that end in ?
User-agent: *
Disallow: /*?
Allow: /*?$
That directive blocks all URLs that contain ? except those that end in ?. In this example, the default version of the page will be indexable:
http://www.example.com/productlisting.aspx?Variations of the page will be blocked:
http://www.example.com/productlisting.aspx?nav=pricehttp://www.example.com/productlisting.aspx?sort=alphaSpecify a Sitemap or Sitemap index file - If you’d like to provide search engines with a comprehensive list of your best URLs, you can provide one or more Sitemap autodiscovery directives. Note, user-agent does not apply to this directive so you cannot use this to specify a Sitemap to some but not all search engines.
# Please take my sitemap and index everything!
Sitemap: http://janeandrobot.com/sitemap.axd
Reduce the crawling load - This only works with Microsoft and Yahoo. For Google you’ll need to specify a slower crawling speed through their Webmaster Tools. Be careful when implementing this because if you slow down the crawl too much, robots won’t be able to get to all of your site and you may lose pages from the index.
# Bingbot, please wait 5 seconds in between visits
User-agent: bingbot
Crawl-delay: 5
# Yahoo's Slurp, please wait 12 seconds in between visits
User-agent: slurp
Crawl-delay: 12
The REP page-level directives allow you to refine the site wide policies on a page-by-page basis
Placing a meta tag on the page - Place the meta tag in the head tag. Each directive should be comma delimited inside the tag. E.g. <meta name=”ROBOTS” content=”Directive1, Directive 2>.
<html>
<head>
<title>Your title here</title>
<meta name="ROBOTS" content="NOINDEX">
</head>
<body>Your page here</body>
</html>
Targeting a specific search engine - Within the meta tag you can specify which search engine you would like to target, or you can target them all.
<!-- Applies to All Robots -->
<meta name="ROBOTS" content="NOINDEX">
<!-- ONLY GoogleBot -->
<meta name="Googlebot" content="NOINDEX">
<!-- ONLY Slurp (Yahoo) -->
<meta name="Slurp" content="NOINDEX">
<!-- ONLY BingBot (Microsoft) -->
<meta name="BingBot" content="NOINDEX">
Control how your listings - there are a set of options you can use to determine how your site will show up on the SERP. You can exert some control over how the description is created, and remove the “Cached page” link.

<!-- Do not show a description for this page -->
<meta name="ROBOTS" content="NOSNIPPET">
<!-- Do not use http://dmoz.org to create a description -->
<meta name="ROBOTS" content="NOODP">
<!-- Do not present a cached version of the document in a search result -->
<meta name="ROBOTS" content="NOARCHIVE">
Using other directives - Other meta robots directives are shown below.
<!-- Do not trust links on this page, could be user generated content (UCG) -->
<meta name="ROBOTS" content="NOFOLLOW">
<!-- Do not index this page -->
<meta name="ROBOTS" content="NOINDEX">
<!-- Do not index any images on this page (will still index the if they are linked
elsewhere) Better to use Robots.txt if you really want them safe.
This is a Google Only tag. -->
<meta name="GOOGLEBOT" content="NOIMAGEINDEX">
<!-- Do not translate this page into other languages-->
<meta name="ROBOTS" content="NOTRANSLATE">
<!-- NOT RECOMMENDED, there really isn't much point in using these -->
<meta name="ROBOTS" content="FOLLOW">
<meta name="ROBOTS" content="UNAVAILABLE_AFTER">
Allows developers to specify page-level REP directives for non text/html content types like PDF, DOC, PPT, or dynamically generated images.
Using the X-Robots-Tag - to use the X-Robots-Tag, simply add it to your header as shown below. To specify multiple directives you can either comma delimit them, or add them as separate header items.
HTTP/1.x 200 OK
Cache-Control: private
Content-Length: 2199552
Content-Type: application/octet-stream
Server: Microsoft-IIS/7.0
content-disposition: inline; filename=01 - The truth about SEO.ppt
X-Robots-Tag: noindex, nosnippet
X-Powered-By: ASP.NET
Date: Sun, 01 Jun 2008 19:25:47 GMT
The X-Robots-Tag directive supports most of the same directives as the meta tag. The only limitation with this method over the meta tag implementation is that there is no way to target a specific robot – though that probably isn’t a big deal for most use cases.
You can further refine your site level and page level directives within several content tags.
Each anchor tag (link) can be modified to tell search engines that you do not trust where this URL is pointing to. This is typically used for links within user generated content (UCG) like wikis, blog comments, reviews and other community sites.
<a href="#" rel="NOFOLLOW">My Hyperlink</a>
Also, in Yahoo Search you can specify which <div> elements on a page you would not like indexed using the class=robots-nocontent attribute. However, we don’t highly recommend using this tag because it is not supported in any other engine, making it not super-useful.
<div>
No content for you! (or at least Yahoo!)
</div>
While implementing the REP is generally straight-forward, there are a few common mistakes.
user-agent: *) and also declare a section for Googlebot (user-agent: Googlebot), Google will disregard all sections in the robots.txt file except the Googlebot section. This could potentially leave you exposing much more content to Google that you might have thought.# This keeps out all well-behaved robots
User-agent: *
Disallow: /
# This looks like it is giving Google access to only this directory, but since it is a
# GoogleBot specific section, Google will disregard the previous section
# and access the whole site.
User-agent: Googlebot
Allow: /Content_For_Google/
NOFOLLOW at either the page or the link level, it is still possible for the links from the page to be indexed because the search engine may have found a reference to them from another source. Another note, using rel="NOFOLLOW" within your anchor text is still perceived as a recommendation by the search engines, not a command.To ensure that content is not indexed, either use the Disallow directive at the site level, or use NOINDEX at the page level.FOLLOW andINDEX directives as they will not be taken into account by the search engines. It sounds silly but I’ve seen a few sites that have implemented these on every page and every link.Another directive that is not recommended is the NOCACHE directive. This was created by Microsoft, and is synonymous with NOARCHIVE. While they will most likely always continue to support the directive, it is better to use NOARCHIVE so it will work on all the search engines.As you’re implementing your REP design, you should test it both before you deploy it and after. The easiest way to test this is to use the robots validator in Google’s Webmaster Tools. This tool is a good sanity check to ensure you’re not blocking URLs you want indexed, however advanced developers (or paranoid ones with critical business requirements) will want to definitively know what the robots are doing, not simply rely on what the robots say they are doing. These folks will want to look at tools as well look at their server logs to see what’s being crawled definitively.
In addition to using validation tools, reporting tools from the search engines on what they couldn’t acces, and looking at logs data to see what the search engine robots are crawling, you should check the search engine results to see if any pages you are intending to block are being indexed. If they are, use the methods described in this section to ensure you are blocking them correctly and use the search engine tools to request that the pages be removed.
When Blocked Content Appears to be Indexed - If search engines are blocked from crawling pages, they may still index the URL if the robot finds a link to that URL on a page that isn’t blocked. The listing may display the URL only, such as shown below.

Or, it may include a title and in some instances, a description. This makes it appear as though the search engine robot is disregarding the directive that blocks access to the page, but the search engine is in fact obeying the directive not to crawl the page and is using anchor text from the link to that page and descriptive details from either the page that contains the link or a source such as the Open Directory Project.
For more details, see:
Search Engine Tools For Validation - Both Google and Microsoft provide some tools as part of their Webmaster Centers to help you verify if you’ve configured your REP the way you expect. Let’s start with Google’s tools:
The first thing you should check are the list of URLs that Google has seen from your website and not indexed due to the REP. Note you can also download the list and filter, sort, and have-your-way-with-it in Excel.

The next step is to use their interactive robots.txt tool to analyze your rules and test specific URLs for blockage. When you pull up the tool they already should have it pre-populated with the robots.txt file they have on file from the last time they crawled. You can input a list of URLs you’d like to check below, select the user-agent you’d like to check against and the tool will tell you if they are blocked or not. You can also use the tool to test changes to your robots.txt file to see how Google would interpret things.

Microsoft has list of URLs blocked by robots.txt that Bingbot has tried to crawl as well.
More Accurate Views of Robot Access Through Your Logs - If you have a specific business need to ensure that the robots are following your rules, (or you’re just paranoid) then you should not simply rely on the tools they provide to test compliance. You’re going to need to go straight to the horse’s mouth and analyze your web server logs to see exactly what they are doing. There is no one easy tool for doing this, you’ll likely have to use an existing tool like one of these (Microsoft HTTP Log Parser) or write your own. It isn’t difficult, it will simply take some time to implement. A useful reference for this is a list of all the robot user agents, and more complete list of bots from Google, and Microsoft.
Verifying Robot Identity - Another thing you’ll likely want to consider in this endeavor is to validate that the robots are who they actually say they are. Google, Yahoo and Microsoft all support Reverse DNS authenticationof their robots. The process is pretty simple and described here by Google, Yahoo and Microsoft, essentially you simply find out what range their robot’s DNS is hosted in, and use that in your tool. This way, if the address changes (which it will), you don’t need to update your code.
Should you find any issues, where one of the robots are not minding the REP, or are misbehaving in some other way, you can always communicate directly with each engine through one of their forums:
If you find that you haven’t implemented the techniques described here correctly and private content from your site is indexed, each of the major search engines has methods available for requesting that it be removed. For more information, see:
Some awesome and exciting things are happening around here, and as part of that, janeandrobot.com has been down for maintenance. I thought that would be quick, but it’s looking like it might be another month. (A whole month!) I’ve had a lot of people asking how to access the articles since it’s been down, so I’m going to post them all here, and put in some temporary redirects from the old URLs to these ones. That last part might be a challenge, as I’m going to dive into if you can even implement redirects when using a WordPress plugin that has the site down for maintenance, particularly on a Windows-based GoDaddy server. Stay tuned!
(Will be an interesting study in indexing in search too!)
I’ve been thinking a lot lately about how to manage my time and all the information that comes at me every day. I know a lot of you do too. Many of us run our own companies, are working on cool projects that absorb all of our attention, and are constantly trying to find balance.
In that light, then, the premise of Clay Shirky’s new book Cognitive Surplus: Creativity and Generosity in a Connected Age seems a bit out of left field. The idea is that we have so much free time we just don’t know what to do with ourselves, so in leiu of any better ideas, we watch a lot of TV. And if watched even slightly less TV, we’d have time to do things that actually mattered. Like edit Wikipedia. Or create lolcats. Or at least, that’s the premise on the face of it, which for me made the book difficult to read. Because I don’t watch a lot of TV. Nor does anyone I know. And anyway, what’s the difference between relaxing and recharging by watching a bit of TV vs. reading a book? Or enjoying the sunset. Or taking a nap.
More on all of that in a bit, but first, here are some thoughts I did get from the book that weren’t necessarily related to the implied premise, but that I found way more interesting.
The rise of “citizen journalism”
Shirky points to many examples where the ability of regular citizens to become reporters of the world around them has led to amazing things. And it’s true. Iranians can tweet about the elections to let the world know what’s happening there. The Sudanese can text incident information to help organizations map out needs. These uses of technology are awesome, but I don’t know that they’re the result of a cognitive surplus. They didn’t come about because the Iranians and the Sudanese were watching too much television and found new uses of their time by way of technology. They came about because people had a new mechanism to capture and broadcast what was happening in their lives. Anne Frank didn’t have Twitter, so she used pen and paper.
The surplus here isn’t the time we spend watching TV. It’s increased access to technology. Shirky notes that “the chance that anyone with a camera will come across an event of global significance is simply the number of witnesses of the event times the percentage of them that have cameras.”
So much content: what to consume?
This idea of citizen journalism isn’t universally embraced. I was at an event a few weeks ago and listened in on a conversation about how blog content isn’t vetted and can’t really be relied upon in the same way that traditional journalism can. Shirky does address this, quoting what the novelist Harvey Swados said in 1951 of the advent of paperbacks:
“Whether this revolution in the reading habits of the American public means that we are being inundated by a flood of trash which will debase farther the popular taste, or that we shall now have available cheap editions of an ever-increasing list of classics, is a question of basic importance to our social and cultural development.”
Shirky notes we didn’t have to choose. We could have both. As it stands today with what’s available to us on the internet, be it vetted material from professionals, or ad-hoc creations from amateurs. In either case (and it’s really more of abroad spectrum than either/or), the same as with books or TV or any other type of information, it’s up to us to be careful consumers. Clay Johnson says we need to consciously consume. He asserts that our abundance isn’t with time, but with information. I know that’s certainly my situation. Time is the most precious possession I have, and I never seem to have enough of it. But information? I’ve got that in spades. It threatens to bury me alive.
In Although Of Course You End Up Becoming Yourself: A Road Trip with David Foster Wallace, David Lipsky recounts David Foster Wallace describing this back in 1996, even before we had Twitter and YouTube competing for our attention:
“I received five hundred thousand discrete bits of information today, of which maybe twenty-five are important. And how am I going to sort those you, you know? …I think a lot of people feel — not overhwelmed by the amount of stuff they have to do. But overwhelmed by the number of choices they have, and by the number of discrete, different things that come at them… the number of small insistent tugs on them, from a number of different systems and directions.”
As we are provided with more ways to create, we have more to sort through to consume.
Fail a lot in order to succeed
I first started thinking about the idea of valuing failure when reading the The Geography of Bliss: One Grump’s Search for the Happiest Places in the World. In it, the author Eric Weiner describes how in Iceland, practically everyone is a painter or a poet at least in part because the Iclandic culture doesn’t have the same view of success and failure as the American one does. You don’t have to be a good painter to be a painter. Just paint! When you aren’t constrained by success metrics, you feel freer to try more things. Weiner writes “If you are free to fail, you are free to try.”
Shirky is advocating this idea as well. The act of creation is what’s important, even if it’s bad Charmed fan fiction. And while I certainly think anyone who wants to write poetry should go for it, I also find the notion of failing a lot in order to succeed to be interesting. We tend to fear failure. Shirky describes how failure helps us succeed using a book metaphor: “If there was an easy formula for writing something that would become prized for decades or centuries, we wouldn’t need experimentation, but there isn’t, so we do.”
User-generated content: are we giving something up for free or getting something for free?
Shirky writes about services like YouTube and Flickr, “it can seem unfair for amateurs to be contributing their work for free to people who are making money from aggregating and sharing that work.” He notes Nicholas Carr’s use of the term “digital sharecropping” to describe how content creators are being potentially ripped off. But are they? Shirky concludes that (amateur) content creators don’t mind because they are creating for love and not for money.
I dunno. I think that at least in some cases content creators don’t mind because they don’t look at it as “digital sharecropping” — giving away their labor to others who profit. They look at it as a fair exchange of services. The content creators get a place to host their work, the tools to share it with others, and wide visibility — for free! This is something was difficult, if not impossible, before the web, and something that we tended to pay fairly hefty prices for in the early days of the web. And this (mostly free) opportunity is what makes much of what Shirky celebrates in his book possible.
Why we share
Shirky references a 2006 NYU paper called “Commons-Based Peer Production and Virtue” that describes what motivates us to voluntarily contribute to groups. In addition to personal motivations such as autonomy and competence, the paper describes social motivations around connectedness and sharing/generosity. Yahoo’s recently released reputation model addresses the personal motivations, but not the social ones. And the social ones can certainly be motivating. Shirky calls this, in part, “go[ing] public to find people who think like you.” He says to ask of users:
He asks these questions to answer the question of why people would share, create, and build communities, but I think they are also create questions to ask when building a new community and attempting to encourage user participation.
We don’t want things for the sake of those things; we want what those things provide
I think this is an important idea for anyone making any content available, building any product, appealing to any audience. Shirky brings this up to explain why older people would adopt email. It’s not that they wanted to try out the latest technology. They wanted what all of us want: to communicate with others. He writes “no one wants e-mail for itself, any more than anyone wants electricity for itself; rather, we want the things that electricity enables.”
But this notion goes well beyond his point. No one cares about your features or that you’ve worked really hard on your product or about all the data you’ve just made available as an XML file. They care about solving their problems, doing things that make them happy, making their lives better. Focus on how you can help your audience do those things and you’ve got their attention. (I talked about this during my 60 seconds as part of the Influencer Project.)
The value of combinability
Shirky writes “if you have a stick, and someone gives you another one, you have two sticks. If you have a piece of knowledge — that rubbing two sticks together in a certain way can make fire — you can do something of value you couldn’t do before.” And here too is another new surplus the culture of the web gives us. By sharing knowledge, tools, failures, successes, ideas, we can better combine them for sums much greater than the parts. He notes that the community size has to be big enough, sharing has to be easy, there should be a common format or way of understanding the information, and then, there’s the last component, the one that technology can’t solve — people. Can we work well together? Do we understand each other, trust each other, want others to make what we do better?
Build rules as you need them
Don’t spend time creating a solution to a problem until you have a problem. I think this holds true of online communities, ways of iterating online products, and even building startups. When I started my company a couple of years ago, I didn’t set up any processes at all. I’m building them out now as I find I need them, based on experience of what’s been working and not. If I had set everything up in advance, I’d still be spending just as much time now adjusting it.
What about TV?
I think that if Shirky had relied less on the idea of using TV time for more productive things, the book would have been stronger. I clearly found much of what he wrote about interesting, but I got distracted every time he’d bring the point back to how dang much we watch television.
Shirky and I really aren’t so far apart on how we think about human behavior. He writes that “human motivations change little over the years, but the opportunity can change a little or a lot, depending on the social environment.” But then we diverge: “the raw material of this change is the free time available to us.” In truth, the stats point at televison viewing at an all time high over the same period that Shirky notes the explosion of creation and sharing online. We aren’t watching less TV in order to upload cute videos of our cat to YouTube. We’re doing both.
Do we really watch that much TV a day?
This was the first point that distracted me. I started wondering what those stats really mean. Most people I know who do watch TV tend to do it while they are getting ready for work in the morning, and eating breakfast, and writing their college essays. How much of that time is really spent solely in front of the TV? Because you can’t really make a lolcat in leiu of watching TV while you’re ironing your clothes.
David Foster Wallace talked about our excessive TV watching way back in 1990 in his essay “E Unibus Pluram: Television and US Fiction“. In that essay, he describes a 1985 book called Life After Television: The Coming Transformation of Media and American Life. This book paints a picture of a future world where TVs will not just feed what the broadcaster wants passively, but will be an “interactive net” of everyone’s TVs and we’ll go from “passive dependence” to everyone being “their own harried guy with earphones and clipboard”. The author, George Gilder writes, “we will, in short, be able to engineer our own dreams.”
The book’s portrait of how we would do that are different than what’s come to be, but the general idea isn’t so far off.
Is community engagement and creation really better than and a reasonable alternative to TV?
Shirky asserts that creation — any creation — is better than mere consumption. But is that true? Is creating a lolcat and sharing it really better than relaxing to an episode of 30 Rock? And what about the percentage of those hours we spend watching the news (or possibly The Daily Show) to learn about the world? I know that in my case, I watch TV when my brain is unable to do anything else. I’ve been working for 16 hours, I can’t even process words in books very well, and I need to distract my brain so that I can get some sleep. In those instances, I find TV useful in ways that editing Wikipedia couldn’t be.
Shirky notes, “the stupidest possible creative act is still a creative act.” Implying that a creative act always trumps acts of other kinds, I suppose. Explaining why it’s better to play World of Warcraft (acknowledging that some may think of this as “grown men and women sitting in their basements pretending to be elves”) than watch TV, he says “at least they’re doing something… however pathetic it is to sit in your basement pretending to be an elf, I can tell you from personal experience: it’s worse to sit in your basement trying to decide whether Ginger or Mary Ann is cuter.”
Maybe for Shirky it is. Not that I’m a TV apologist, but one could say the same of reading: it’s a solitary activity (generally more so than TV), you aren’t creating anything or doing anything as you read. Or as you sit on a bench and watch the water. As I wrote at the beginning, it’s the insistence in the book to always bring everything back to the time we waste on TV that I find fault with. I’m not at all saying that creating and sharing and being social are bad things.
And certainly too much TV is probably not great. Going back to Wallace again, who rather famously had a love/hate relationship with TV, likened television to candy.
“What if you ate it all the time? Real pleasurable, but it dudn’t have any calories in it. There’s something really vital about food that candy’s missing… There’s nothing sinister, the thing that’s sinister about it is the pleasure that it gives you to make up for what it’s missing is a kind of… addictive, self-consuming pleasure.”
And at least in part, he agreed with what Shirky would later focus on in this book, as well perhaps agree with me:
“It gives you a certain kind of pleasure that I would argue is fairly passive. There’s not a whole lot of thought involved, the thought is often fantasy like, ‘I am this guy, I’m having this adventure.” And it’s a way to take a vacation from myself for a while. And that’s fine — I think sort of the same way candy is fine.”
And perhaps Wallace also would agree with Clay Johnson’s assertion that our problems with information overload are around what and how we choose to consume. Wallace noted that his book Infinite Jest wasn’t an indictment of entertainment, but was about our relationship to it.
“Why am I getting 75 percent of my calories from candy? I mean that’s something that a little tiny child would do, and that would be all right. But we’re postpubescent, right? Somewhere along the line, we’re supposed to have grown up.”
Shirky also maintains that we are shifting from strictly consumption around TV to “opportunities to comment on the material, share it with friends… and discuss it with other viewers”. I’d argue that we’ve always done that, we simply didn’t do it so publicly and we did it with our friends and coworkers rather than strangers around the world. Sure, it’s easier to share fanfiction now than it was on the 70s when we had to mimeograph ’zines and send them through the mail, but is Shirky really saying fanfiction is how we should spend our supposed “cognitive surplus”? (Particularly since writing fanfiction about TV shows (and commenting on them, labeling them, and so forth), at least, has a prerequisite of watching the shows in question on TV.)
Those who want to create and share and be communal are and always have been. Those who want to watch TV will. And many of us will do both.
Early in the book, Shirky writes, “this book is about the novel resource that has appeared as the world’s cumulative free time is addressed in aggregate.” But once you forget about the free time and TV aspects of the book and focus on the rest, it seems that what’s he’s really saying is that our human tendencies to create and share that we’ve always felt regardless of the free time we have available can now be done globally and at scale, and there’s real value to be harnessed from that.
Information overload is not a new concept for me. After all, I declared email bankruptcy way back in 2007. Sadly, that was a bit too well publicized at places like USA Today and the London Times, which means it’s REALLY embarrassing that I haven’t yet learned from my experiences.
Recently, I was reading Clay Johnson’s blog post on information dieting and attention fitness, which came at a very relevant time as I’m once again trying to figure out how to fit about a thousand hours of stuff in every 24 hour day. Last weekend at FOO camp, I attended a session Jane McGonigal held about hurry sickness and having time poverty. Scott Berkun, who was at the session, later did a blog post about the cult of busy. I was talking about all of this on Twitter, and Scott Hanselman sent me a link to a talk he gave about managing information overload. I said what he talked about was scary. He asked me why. The answer wouldn’t fit into 140 characters so I thought I’d do a blog post.
I am, without a doubt, too busy. I absolutely suffer from what Scott Hanselman calls “psychic weight” and often find myself “thrashing to disk”. There’s so much that my brain can’t even figure out where to start. Part of it is inefficiency, sure. But a lot of it is simply having too much to do. At the FOO session, I described it this way: I say yes to too many things, but I want to say yes to lots of things. It’s what keeps life interesting and has gotten me to where I am now. I get to do lots of amazing things because I’m open to new stuff as it comes along. But Scott Berkun said, you gotta say no to stuff. You just have to.
In Clay’s post, he says the problem isn’t too much information, it’s too much consumption. Saying yes to too much. Trying to do too much at once.
But how to do we fix it? Maybe some of it is little things. Scott Hanselman suggests not checking your email first thing in the morning. Clay suggests keeping open tabs to a minimum and closing all windows except the ones you need to accomplish the task at hand. Which apparently means we should try to have only one task at hand at a time. But it’s also big things. Not saying yes to too much in the first place.
So why is the thought of figuring this out so scary? Why is it scary to consider NOT reaching for the phone to check email while still in bed in the morning? What’s so frightening about saying no? Anyone who knows me knows I want to do it all. But I want to do it all well, which clearly is difficult when you’re doing everything at once. So maybe I’m fearing the opposite of what I should.
So, I ask you: have you solved this problem? For the little things and for the big things. How do you get the information about the world that you want without getting so drowned in information that it bogs down your life? How do you keep on top of what you’ve said yes to? How do you keep from saying yes to too much?
PS – I don’t need another copy of Getting Things Done. I already have two.
I love the Google AdWords Keyword Tool. I use it all the time and I demo it a lot when speaking at events. But I have no idea what the URL is. When I want to use it, I just do whatever else does whenever they want to go to any site on the web. I do a Google search for it. Specifically, I type [google adwords keyword tool] into Google and click on the first result.
Until today.
Now when I do that search, the tool is nowhere to be found.
Instead, I find the Google AdWords landing page, a bunch of articles on other sites about the keyword tool, and the UK version of the tool. Huh. What’s up with that? (I don’t even see a paid search ad for it.)
Google recently completely revamped the tool and as part of that, changed the URL. The old URL was https://adwords.google.com/select/KeywordTool. Now, when you (or Googlebot, if you are not you, but instead, a search engine webcrawler looking to update your index) access that URL, you get a 302 redirect to https://adwords.google.com/o/Targeting/Explorer?__u=5701132992&__c=8003242272&stylePrefOverride=2#search.none!ideaType=KEYWORD&requestType=IDEAS. And a captcha.
When I look for that URL in Google, I see it’s not indexed, although at least I see a paid search ad for the tool.
Of course, that URL doesn’t show up in part because it’s not a permanent URL. If I try to access inurl:https://adwords.google.com/o/Targeting/Explorer, I also get the keyword tool, via a set of redirect that leads to this URL: https://adwords.google.com/o/Targeting/Explorer?&stylePrefOverride=2&__u=5701132992&__c=8003242272#search.none!ideaType=KEYWORD&requestType=IDEAS.
Which is actually the same URL with the parameters in a different order. Although things don’t look much better in the Google index when I search for any URL that begins with https://adwords.google.com/o/Targeting/Explorer.
Let’s look a little more closely at those redirects. The source code of https://adwords.google.com/o/Targeting/Explorer looks like this:
<script type="text/javascript" language="javascript"> var jsRedirect = true; var url = "/select/Login?aw3=true&dst=%2Fo%2FTargeting%2FExplorer&frag=search.none%21ideaType%3DKEYWORD%26requestType%3DIDEAS"; window.location.assign(url); </script> </body> </html>
Which then leads to this (HTTP response edited to show the vital components for this example):
https://adwords.google.com/select/Login?aw3=true&dst=%2Fo%2FTargeting%2FExplorer&frag=search.none%21ideaType%3DKEYWORD%26requestType%3DIDEAS
HTTP/1.1 302 Moved Temporarily
----------------------------------------------------------
https://adwords.google.com/um/StartNewLogin?aw3=true&dst=%2Fo%2FTargeting%2FExplorer&frag=search.none%21ideaType%3DKEYWORD%26requestType%3DIDEAS
HTTP/1.1 302 Moved Temporarily
----------------------------------------------------------
https://www.google.com/accounts/ServiceLogin?service=adwords&hl=en_US<mpl=regionale&passive=true&ifr=false&alwf=true&continue=https%3A%2F%2Fadwords.google.com%2Fum%2Fgaiaauth%3Fapt%3DNone%26ugl%3Dtrue
HTTP/1.1 302 Moved Temporarily
----------------------------------------------------------
https://adwords.google.com/um/gaiaauth?apt=None&ugl=true&pli=1&auth=DQAAALUAAAAMl8-Jos-ywfDoe9g9erkG4klYT-fzYde8k9MEQMmOkonqCalB_LbFISNUgDOMGnoAZkaofqL2ZGwAbwAwV8-rQ6dGM9XnEjgrwUJsc9l_S-0NFsPz0om6ExrJSZf8lQnKJkASgaEqE7SGWbCpcMYd_qihOdzJVvGH0P7_jopql3FQJ5vGT6PuazK260Z2hXVAxy3eyEICPHqe7R9LvLrjbM1fHZLgquTrd6dMYIN64iMDKvShFg_rXfjonOCj6jo
HTTP/1.1 302 Moved Temporarily
----------------------------------------------------------
https://adwords.google.com/um/gaiaauth?apt=None&ugl=true&pli=1
HTTP/1.1 302 Moved Temporarily
----------------------------------------------------------
https://adwords.google.com/select/gaiaauth?__u=5701132992&__c=8003242272&stylePrefOverride=2&30Login=true&url=%2Fo%2FTargeting%2FExplorer#search.none!ideaType=KEYWORD&requestType=IDEAS
HTTP/1.1 302 Moved Temporarily
----------------------------------------------------------
https://adwords.google.com/o/Targeting/Explorer?&stylePrefOverride=2&__u=5701132992&__c=8003242272#search.none!ideaType=KEYWORD&requestType=IDEAS
HTTP/1.1 200 OK
Perhaps Google has a few things to add to their SEO report card:
Confidential to the Google AdWords Keyword Tool team: If you’re not sure if your URLs are being crawled, check the crawl errors section of Google Webmaster Tools. It’s got this great “URLs not followed” report that may provide some helpful information.
Confidential to everyone else: You can find the Google AdWords Keyword Tool here.
This morning, TechFlash noted that Drugstore.com is expanding its microsite strategy around product categories. The growing list now includes:
Possibly you can already tell based on my headline that I think this is not the awesomest strategy ever. I could call it short-sighted, misguided, any number of other prefixed words, but perhaps I’ll just tell you why microsites are often not the best strategy to pursue. Before I get to that though, I want to point out that I find microsites a bad idea most of the time. Sometimes they are a great idea. Although the more of them you have, the less likely this is the case.
But, you protest! The TechFlash article said that beauty.com helped drugstore.com post a 20% revenue for Q1 2010. Sounds awesome. Microsites rule! Except that beauty.com is the one domain from the list that isn’t actually a microsite. It’s simply a vanity domain that redirects to the beauty category of drugstore.com. So certainly, by focusing resources and awareness on that category, they’ve likely managed to increase sales, but that increase has nothing to do with microsites.
So what’s wrong with microsites? Let’s tally up the ways.
You lose brand identity and audience engagement
You spend significant corporate energy on positive brand perception and awareness. And then you start over completely from scratch with an entirely new brand. Woo? If you are reaching an entirely different audience and your current brand would be confusing, then you may in fact want to build out a new brand, but that case, you probably won’t be launching a microsite, you’ll launch a full site. In most cases, microsites are subsets of or promotions for the main site, with exactly the same audience. Do you really want to work at building up multiple brand identities? And do you really not want to benefit from the brand building in one category for another related category? (This comes especially important with ecommerce sites, such as those drugstore.com operates. Even today, we don’t want to hand over our credit card information to just any site.)
Brand awareness has a search impact as well. As I note in the searcher behavior chapter of my book, searchers quickly evaluate the search results page to determine which result to click on. Many things go into that evaluation, but certainly brand recognition helps in evaluating credibility and perceived value.
You lose the ability to leverage your audience
Let’s say you launch an awesome site with a fantastic user experience, great products, and unrivaled customer support. For instance, let’s say you’re Zappos. Someone writes up a positive article about you in say, the NY Times. Readers start clicking over to your site. They see you sell running shoes. They just read about how great you are, so they feel confident about purchasing some products from your site. But maybe those same readers also need some clothes to go running in. If you had a separate runningclothes.com microsite, you’ve just missed a great opportunity to reach a targeted and motivated audience.
You confuse people and search engines
Oh, I won’t have that whole NY Times reader problem, you say. I’ll just keep a complete copy of my runningclothes.com content on my main site too! That way, I can reach the audience for my main site as well as get all the additional audience potential of the microsite. Oh really? First, that’s just confusing. If someone becomes accustomed to shopping for athletic clothes on your main site and then clicking over for shoes, but then one day they end up on runningclothes.com and everything looks the same… and yet the shoes are gone — that’s just not the experience you want to give users.
But the problem really comes in when you add search engines to the mix. Which version of the pages do you want them to index — the version on your main site or version on your microsite? Likely you’re going to say the microsite. (Especially if you’ve built the microsite because you think keyword-rich domain names have great search potential — read on for more on that, by the way.) But the search engine is likely to index the version on your main site because that site has been around longer, has more links, and has more overall credibility with the search engines. No problem, you say. You’ll just block those pages with robots.txt. Well, OK. You can do that. But then you lose all search engine value from any of the external links those pages may accumulate. You also lose the search value of the internal link structure. That’s not awesome.
My guess is that drugstore.com has Zyrtec on both its main site and its allergysuperstore.com site. Along with all the user reviews, product details, and directions. Drugstore.com ranks #37 for [zyrtec]. allergysuperstore.com doesn’t rank at all.
You may have to spend substantial additional resources
The microsites run by drugstore.com all use the same template and content management system. So it seems like low engineering overhead to maintain them all. But wait. As you build out the content of both sites, you have to decide which content to put where. And decide how to spend marketing, PR, and advertising resources. When you issue a press release, which site do you talk up? All of them? What if you have 20? And you likely are doing social media. Do you now maintain 20 Facebook pages and 20 Twitter accounts? I’m tired just thinking about it.
And if you’ve built the microsite specifically for an advertising campaign, what happens when the campaign is over? Do you maintain the site? Abandon it? Take it down? This question gets more complicated if the microsite included a social networking element. You’ve gotten your audience engaged, now what do you do with them?
During the 2009 Super Bowl, Jack-in-the-Box aired a commercial that showed Jack getting hit by a bus. They launched the microsite hangintherejack.com as part of the campaign. I’m not sure what happened with the lifecycle of the site, but that domain now redirects to jackinthebox.com, so whatever assets they built up there (both in terms of content and audience) have just been thrown away. (They did better with the Twitter account launched as part of the campaign. That wasn’t campaign-specific and it still being used by “Jack”.)
You cobble your search acquisition efforts
A big part of ranking well in search engines continues to be the strength of the external links to the site. If you maintain multiple sites, then you are diluting that external link value. If five people link to your main site and five people link to your microsite, each site is competing for rankings against the rest of the web with those five links. Instead, you could have one site competing with ten links. Anything that you do for offsite search engine optimization, you have to repeat for each site.
It can be difficult to match promotions to search visiblity
One common case of microsites is when a company launches a new promotion. It seems to make perfect sense to launch a microsite as part of that promotion. You can tie branding to the promo and it can be a lot easier to outsource the development of the site to the agency that is managing the promotion creative than to try to coordinate in-house resources and add a section about the promotion to the main company website.
The trouble comes in when that promotion sparks search interest (which it undoubtedly will). I’ve observed this with the Super Bowl commercials in both 2009 and 2010. In 2009, several sites, including Hyundai and Sobe advertised taglines that had corresponding microsites, but those domains redirected to the main domain. Advertisers expected that viewers would type the URL into a browser address bar, but instead, many people typed the tagline or domain into a search box. Since the domain didn’t actually exist, the advertiser didn’t show up in search results. You can see this, for instance, with Hyundai’s Edit Your Own campaign.
Another problem with launching a microsite at the same time as an ad campaign (even if you don’t redirect the URL) is that you don’t want to launch the site until the ad goes live, but you want the site to be visible in search results as soon as the ad goes live. And unfortunately, you can’t have both. The hangintherejack.com site noted above experienced this issue. It wasn’t indexed in Google until six hours after the commercial aired (and the site launched). For a site to be crawled, indexed, and ranking within six hours of launch seems pretty quick. Unless you’ve just spent millions on a Super Bowl commercial that’s caused the audience to search for the site in Google. You can, of course, mitigate this problem by buying paid search ads. But this blog post isn’t about how to work around microsite issues. It’s about why microsites can be problematic.
But, I can hear you asking. Wouldn’t an advertiser always have this problem, even if they just launched promotion-related content on their main site? Well, yes and no. At the very least, the domain is already known and being actively crawled by the search engines, so you increase your chances of a quick crawl of the new content, particularly if you link it from your home page as soon as it goes live. You can also launch the pages early (without all of the promotion-related content) and ensure the pages include the words that correspond to the queries the promotion will likely trigger, then swap out the content when the ad goes live.
For ad campaign-related web content, you always have to think through the implementation to ensure you leverage search interest, but your options are more limited when you’re dealing with a microsite.
You don’t get the search engine value you think you get
This is the crux of the issue in the case illustrated by drugstore.com. They aren’t launching microsites because they are working with an ad agency on creative for a campaign and it’s too difficult to get internal engineers to add content to their website. And they aren’t building a completely difference business for an entirely new audience. They’re launching entire business verticals for the same audience as their primary site on keyword-rich domains. Why? It can’t be for the type-in traffic. Even those who specialize in the domain business will tell you that type in traffic is on a serious decline. We can see this with the Dockers commercial from the 2010 Super Bowl. A URL was the number two spiking query on Google that day.
People use search engines as primary navigation for the web even when they already know the web address.
Generally, when I work with companies who want to use a bunch of keyword-rich domains, it’s because they think there’s some inherent search engine value in the domains themselves. This assumes that the brilliant PhDs at Google think to themselves: “Huh. This domain is cheaponlinebooks.com. It totally must be the most relevant result for [cheap online books] queries. After all, the words are right in the domain name!” However, as it turns out, this has been a technique used by spammers since the days of stone tablets and chisels. Or, OK. Since at least 1995. The search engines are onto it. (Well, maybe not Bing quite yet.)
There is so much super valuable content on domains that aren’t keyword rich and there is so much spammy, crappy content on keyword-rich domains that Google just doesn’t find it useful as a relevance signal.
Keywords can indirectly help when they’re in the URL because you’ll get anchor text credit for any URL-only links. But that really has nothing to do with the domain, so why not just use keyword-rich URLs on your main domain and get those benefits without incurring all of the drawbacks of microsites?
People also sometimes think operating multiple domains will help search engine rankings in other ways, such as that you can link to yourself for instant PageRank credit! Or that you can dominate the search results with all those domains. I hate to be the one to break the news, but search engines are on to those things too. Over time, search engines generally can figure out when sites are part of an owned network and then treat them accordingly (which is similar to how they would treat the content if it were all part of one site). And if you now want to ask how do they know so you can figure out how to hide it, then you’re getting dangerously close to thinking about search engine manipulation. Maybe you should read this and then come back.
Certainly, you’ll find many out there who swear up and down that having keywords in the domain makes a big difference. I think mostly this isn’t the case. That any examples of keyword-rich domain names ranking well are also a case of the content on those domains actually being the most relevant result for a set of queries. Even if it did work, it would presumably only work for exact match, so you’d need a lot of domains to make up to set of queries you really want to rank for. That sounds exhausting. I also think that this is the kind of ranking signal that’s likely being tweaked all the time, and even if it works for a time, it’s a poor foundation for a long term business strategy.
But just as importantly, once you start focusing one building your business based on perceived signals in the search engine algorithms, you’ve lost sight of why you’re building the business in the first place and of your customers and while this may seem like a minor diversion, it may take you down a completely different path than the one that’s based on building substantial user value.
Suddenly you’ve got a set of spam tactics rather than a business model.
So do keyword-rich domains have any value? Maybe. If you are starting a brand new site and can pick any domain name you want, in some cases it may make sense to go with a keyword-rich one. It will be memorable, easy to type, and will encourage useful anchor text. It might also encourage click through for URL-only links as it may be more obvious what the site is about. And if you have or can acquire a bunch of keyword-rich domains related to your industry, you may as well redirect them to your main site to capture any type-in traffic they happen to get (although don’t expect any SEO benefit from this).
But launching a whole bunch of keyword-rich microsites related to your industry in hopes that you’ll get all those microsites ranking separately for variations of query? Probably not the awesomest idea ever.
Last week, I gave a workshop about local search at O’Reilly’s Where 2.0 conference. (I also did a short video on the topic.) One of the things I talked about was how important it is for local businesses to be visible in web search and map search results. After all, over 90% of us online use search engines to find information, and generally, those search engines are the major ones, rather than specific verticals. Microsoft research has found that 86% of searchers start at a major search engine when shopping online.Even when consumers plan to purchase offline, they often go online first. 42% of retail sales in 2009 were online or “web-influenced” ($917 billion in US sales were “web-influenced”). And more specific to local business, 63% of consumers use the internet to find local businesses, but only 44% of local businesses have a web site. That same study also found that 50% of us turn to search engines first for local business information, vs. 24% who turn to the Yellow Pages first.
After the Where 2.0 session, an attendee came up to me and ask me about restaurants. Is search really important, he wondered. Surely social media is where restaurants should concentrate efforts. After all, a new restaurant needs to raise hyperlocal awareness and no one is going to search for the restaurant name they’ve never heard of it. He suggested a Facebook campaign that engages 100 consumers from the local neighborhood might be the best way to promote a new restaurant.
An “And” Strategy, Not an “Or” Strategy
First, I recounted what Avinash Kaushik noted at the SMX keynote panel that he and I were both on a few weeks ago. Social media hasn’t replaced search. The question isn’t search or social media. The question is where are your customers. Certainly for a business such as a restaurant, social media may be a great place to reach new customers, but those same customers are likely searching as well. Overall search volume was up 46% in 2009, so it’s definitely not something that’s going away. (You can see Avinash and I talk more about this.)
Think about who you’re trying to reach. Initially, you want to raise overall awareness. Social media is great for this (as you’ll see in a minute). But what about this scenario?
A woman is reading Twitter and sees that a new restaurant has opened up nearby. Later, when she and her husband are trying to decide (yet again!) what to have for dinner, she remembers the new restaurant. Finally, a new idea! She suggests it. Her husband says great, but what’s on the menu? Will I like it? The woman does a quick search on Google for the name of the restaurant to see if the web site has the menu. Huh. The restaurant doesn’t come up. She goes back to Twitter and starts scrolling back through the tweets, trying to find the right one. In the background, her husband is getting hungry. And after waiting a few minutes, he picks up the phone and orders a pizza.
And as a restaurant owner, you want to be discoverable long term. Your potential customers (locals and visitors) might be searching for [mexican restaurant seattle]. Or even [best mexican restaurants in seattle]. Social media is great for recommendations from friends, but it’s not always searchable and you can’t always get the immediate answers you need when your husband has the phone in hand to order pizza again.
A Holistic Search and Social Media Strategy
You don’t have to choose an “or” strategy, because an “and” strategy is not that much more effort. You have a web site; you are engaging in social media. The only thing left is to make sure you understand how to be found in search, which primarily consists of:
The awesome thing is that all of this is free.
A Local Example: West Seattle Heartland Cafe
A couple of months ago, I found out about a new restaurant near me from the local neighborhood blog. This was great use of social media (engaging with local bloggers who already have the attention of the target audience) and a great example of why engaging this way can be important. The restaurant is not only near me, but it’s directly next door to my bank, grocery store, and drugstore. The building is covered with HUGE “Heartland Cafe coming soon” signs. Yet I didn’t notice it until I read about it on the West Seattle blog.
I then learned that it was finally open by reading a tweet from @westseattleblog. The restaurant has a Twitter account! and a web site! These are all great things. But remember the “and” strategy. Can the Heartland Cafe be found in search? Sadly not.
So what should they do? Let’s go through the bullets I listed above.
1. Understand what your potential audience is searching for
You always want to be found for branded searches. In this case, that would be queries such as [heartland cafe] and [heartland cafe west seattle]. This restaurant probably also wants to be found for things like comfort food, breakfast, brunch, and bar. It’s important to know how consumers search, and for restaurant related searches, the Google AdWords Keyword Tool (that you don’t need an AdWords account to use) tells us that searchers look for [breakfast] three times as often as [brunch] and that we often search for [breakfast restaurants].
We look for restaurants more than bars and cafes and we’re often looking for menus and reviews.
There’s lots to be learned from search data, but at a quick glance, it seems like Heartland Cafe should talk up its breakfast and provide an online menu.
2. Claim your maps listing
All of the search engines provide this service, but let’s use Google as an example. It’s important to claim your maps listing for many reasons, but two of the best are that people often search directly on the maps page (particularly on mobile devices) and that if search engines determines that a matching map result would be relevant to a web search, they’ll show it directly in the web search results. Say I’m driving with some friends and decide to check out this new Heartland Cafe but I don’t remember exactly where it is. I open up Google Maps and see… a Toyota dealership.
Not awesome.
How can the Heartland Cafe fix this? They just need to go into Google’s Local Business Center and claim their listing. It’s free and easy. It’s important to put the business into the right categories and provide complete information. The major search engines get local business data from third parties, so it’s likely that most businesses already have a listing (the Heartland Cafe doesn’t because it’s so new). If the maps list your business already, you can claim ownership, and then complete the listing so that it’s compelling for your target audience. And as you can see, new businesses should definitely add their listings so they show up right away.
Google now has place pages that pull in a great deal of information from the web (such as images and reviews) and enable business owners to provide substantial detail.
You can see with this Coldwell Banker listing that the owner can provide a description, images, a web site, and more.
Once the Heartland Cafe has created a robust Google Maps listing and has started getting good reviews, they may be able to show up in the local business results for a search such as this one:
3. Ensure your web site is search friendly
Obviously, the first step here is to have a web site, which the Heartland Cafe has. Great! Unfortunately, it’s not showing up in search results, even for searches for the restaurant name and location. Not great. What’s going wrong?
It’s beyond the scope of this post to dive into ensuring that you’re providing compelling information that engages your audience, but key to this is ensuring your meeting the needs of searchers. Remember step 1 when we found that searchers are looking for menus? The Heartland Cafe’s web site doesn’t have one. That will not only limit search visibility, but it won’t answer one of the primary questions visitors to the site have. And if visitors can’t see the menu in advance, they may not decide to stop in and try the food.
However, the Heartland Cafe does have some great information on the site (address, including city and state — key to being seen as relevant for local searches, hours, details on the type of food). So why doesn’t any of it show up? The primary issues appear to be technical ones.
The individual pages don’t have corresponding unique URLs. All content loads on a single URL — www.heartlandcafeseattle.com. This means that search engines can’t index the content as they don’t have URLs to associate with that content. In addition, the content can’t be shared on social media. The site has an events calendar, but if I saw a cool event there and I wanted to post on Facebook about it and invite my friends, I’d have to tell them to go to the home page, then click events in the sidebar, then click… Why is this? Well, the site is entirely in Flash. It absolutely doesn’t need to be in Flash. The site could keep the exact look and feel it currently has and be in HTML. In fact, Wordpress would be a quick and easy way to replicate the layout.
Normally when I see Flash sites, I recommend ways to combine Flash and HTML or point to ways of building search-friendly Flash sites, but in this case, the Flash doesn’t appear to be providing any benefits and is only detracting from the usability and searchability of the site.
Even with this problem, however, search engines should index the home page. Even though they won’t be able to extract any content from the pages, they can at least index the information in the title tag and meta description tag. The title tag in this case is “Heartland Cafe Seattle”, which is pretty good actually, although it could include a descriptor such as “classic midwestern comfort food”. But the meta description tag is missing entirely. A good meta description for this page might be “West Seattle’s best midwestern comfort food in the heart of the Admiral district for breakfast, dinner, late nights, and delicious cocktails.”
They can at least let search engines know the site exists by submitting a Sitemap file. This file can be as simple as a text file that lists the URLs of the site. (This step will be more useful once they associate individual URLs with each page on the site.)
There are lots of other technical and content-focused things this business can do, but none of them will make much difference until the pages of the site have their own URLs.
4. Leverage social media to improve search visibility
Not only does social media help you engage with audiences, but it increases your search visibility. In this case, the web site has some real problems it needs to fix before it can be found in search. But in the meantime the business owners can take better advantage of social media. Add the web site address to the Twitter profile. Add a Facebook fan page. As you can see from the earlier screenshot of the search results for a search for a restaurant name, the business is only visible at all because of social media. Address the reviews on Yelp so potential customers know that you care. The Yelp page is, after all, the third result in searches for the restaurant name.
Does being found in search engines really matter for a local business such as a restaurant? We may not need to go farther than the search results themselves for an answer: