Blog: SEO

Google Misspelling Match: A Tale Of Two Searches

January 27, 2009

As a collective people, we aren’t always the greatest spellers. Even those of us (ahem) with English degrees have our weaknesses (as you’ll see in a minute). Knowing that, site owners have long wondered about how best to make sure their sites show up in search engines for misspelled queries.

Read more…

Web Development/SEO Meetup, This Thursday January 22nd in Mountain View

January 19, 2009

We’ve had several Jane and Robot webdev/SEO meetups in Seattle and they’ve been lots of fun, so we’ve decided to take things on the road! We’re planning to do several in the bay area (and at least one in LA) and the first one is this Thursday, January 22nd at Ooyala in Mountain View.

Nate and I will be on hand to answer all your pressing search-related questions, so if you’re a web developer who wants to learn more about making web apps search-friendly or if you’re an SEO who’s interested in the technical side of search engine optimization, come on out! We’ll get things started around 6 and start the questions around 7. If you plan to attend and would like a mini site review, just post the URL here, along with your questions.

There should be plenty of time for networking, and lots of drinks and snacks!

RSVP and get directions at Upcoming.

Upcoming webdev/search conferences

And if you’re interested in the techie side of SEO, don’t miss Developer Day at SMX West in Santa Clara, CA on February 10th. If you’re a marketer who works with developers, this a great opportunity to learn the marketer-to-developer translation glossary, and if you’re a developer looking for a primer on the technical side of SEO, you should definitely check it out.

And of course the O’Reilly Found conference, which Nate and I are developing specifically for web developers is coming up June 9-11 in Burlingame, CA. We’re finalizing the agenda now, so if you’ve pitched to speak, expect to hear back from us soon. We’ll be opening registration next month, so in the meantime, sign up for the newsletter to stay up to date with the latest.

Hope to see you all on Thursday!

A Short Case Study on Redirects: 301s vs. 302s

January 5, 2009

When I moved to this site a couple of months ago and redirected the old blog to this one, I experienced the joys and sorrows that we all do when we move sites. I was redirecting at several levels. From the .htaccess file of the old site, I was doing a site-wide redirect. From the .htaccess file of the new site, I was redirecting from the old URL pattern to the new one (and from the non-www version to the www version). Wordpress itself was also redirecting, for instance, from the version of the URL without the slash to the version with it. Within .htaccess, some redirects use RedirectMatch, others use Redirect, and others are really URL rewrites using mod_rewrite.

I also redirected my feed using .htaccess and Feedburner, and am still sorting out how (and if you really can) redirect and consolidate (even if you’re using MyBrand).

There’s a lot going on and I intentionally unleashed it using all kinds of variations to see what would happen. For you. Surely someone will send me a cookie.

The difference between 301s and 302s

I have a whole set of notes that I’m writing up on all the different redirection options and what the pitfalls are, but for now, I wanted to illustrate the difference between a 301 and a 302. I am asked this a lot (“does it really make a difference?). It can get confusing, because although search engines say to use a 301 when moving a site (or page), server software tends to use a 302 as the default when you implement a generic redirect (without specifying if it should be a 301 or 302).

Some of the redirects for my site move were initially implemented as 302s and others as 301s, so I could see how things worked in real time.

A 302 is a “temporary” redirect

Search engines tend to interpret a 302 as an instruction to index the old URL but the new content. It keeps the old URL because the server has said that the new one is only “temporary”. You can see that here with a Google search for my name. As you can see, the URL is for the old site, but the title and description in the search result are from the new one.

A 301 is a “permanent” redirect

Search engines interpret a 301as an instruction to replace the old URL with the new one. As you can see, once I changed the redirect to 301, the new URL showed up in the place of the old one.

The new URL is ranking in the same location for the query [vanessa] as the old URL was, which implies that the links and anchor text that pointed to the old URL are now being transferred to the new URL. In addition, you might notice that the URL I’m redirecting my old home page to (www.ninebyblue.com/blog/) has a toolbar PageRank of 6, while the home page of this site (www.ninebyblue.com) has a toolbar PageRank of 4, so at least in terms of visible toolbar PageRank, that credit is passed via the 301 as well.

Should you use a 301 or 302?

As you can see, for a permanent site (or page) move (and to consolidate duplicate pages), a 301 is the way to go, and you should check with the details of your implementation to make sure that your redirect really is happening via 301.

For more details:

Learn more up close and personal!

If you’re geeky like me, and want more techie stuff, come on out to SMX West Developer Day in February or the O’Reilly Found conference in June! Both are shaping up to be quite geeky, and by that I mean, very fun.

Last Chance to Submit a Speaking Proposal for the O’Reilly Found Conference!

December 16, 2008

 

Tomorrow is the last day to submit a speaking proposal for the O’Reilly Found conference. I’m co-chairing the  conference with Nate Buggia, who runs the Microsoft Live Search Webmaster Center. Found is a conference for web developers that focuses on the technical issues involved with building web sites that are search engine-friendly.

We particularly would like to hear from you if you:

  • Are a developer who has real-world experience making web infrastructure (LAMP stack, Microsoft stack, CMS systems, ecommerce systems, etc.) easily crawlable by search engines.
  • Are a developer who’s run into big search engine issues with your web app and have diagnosed things and lived through it. Tell us about your battle scars!
  • Manage developers or work with developers and have developed successful SEO processes that build search-friendliess into the development cycle.
  • Work for a company that provides infrastructure that makes SEO-friendly develoment easier.
  • Have case studies, metrics, and stories you can share about the importance of search today and how vital the technical side of development is to SEO.

Since time is running out, feel free to send a paragraph or two over about what you’d like to talk about and we can contact you to work out the details.

The conference is in the San Francisco area in June. If you’d like to attend, look for registration details early next year. And comment here, email, or use the conference site wiki to tell us what issues you’d like to hear more about at the conference!

Tips on Implementing URL Tracking

December 8, 2008

Nate has just posted an article on Jane and Robot about options for implementing URL referrer tracking as not to dilute search rankings or introduce duplicate content issues. There are lots of tricky issues around using parameters in URLs, and this article dives into one use case: using tracking parameters to monitor where referrals are coming from.

The biggest problem with tracking parameters tend to be PageRank dilution. In the example used in the article, Jane and Robot wants to know which of two promotional videos drives more traffic back to the site. So each video links to the same page but uses a different tracking parameter in the URL. If the videos become really popular and lots of people blog about them and share them, Jane and Robot could accumulate quite a few links. But although those links are all pointing to a single page, they’re pointing to two separate URLs.

Google has been working on figuring out the canonical version of the URL in these cases, but they likely haven’t perfected it yet, and I don’t think the other search engines have either. So, if possible, it’s best to find a way to track these URLs separately for metrics purposes but consolidate them for optimal search value.

The Jane and Robot article describes some ways for doing just that.

(And if you’re interested in this more technical side of SEO, don’t forget about our O’Reilly Found conference, which is all about building search-friendly web applications. We’re accepting speaking proposals now and registration will open up early next year.)

Going On Cranky Geeks To Discuss Whether Or Not “Google Must Die” With John Dvorak

December 7, 2008

A few weeks ago, John Dvorak wrote an article on pcmag.com titled Why Google Must Die. It’s all about how SEO is destroying the internet, puppies, and rainbows. It’s SEO is the Worst Thing Ever Invented all over again.

Actually, Dvorak takes a bit of a different spin than the typical “SEO helps the terrorists win” perspective. He asserts that businesses need SEO because Google sucks so much and can’t figure out what relevant results are for searchers unless site owners give them a little help. Danny Sullivan and others took him to task on Sphinn, breaking down his points one by one.

Watch us battle it out on Cranky Geeks

I’ll be on Dvorak’s show Cranky Geeks this week to fight it out, er, I mean discuss it with him in a professional manner.  I had a great time last time I was on the show, and I’m looking forward to another chance to be geeky yet cranky. You can tune in live Wednesday, December 10th at 12:30 or download the podcast later from the site or from Tivo.

Read more…

Google Blog Search Changes How It Indexes Posts

December 2, 2008

Historically, Google Blog Search has indexed primarily via RSS feeds, which meant that for blogs that published partial feeds, Google Blog Search only indexed that partial portion. Any links or text in the rest of the post weren’t available through Blog Search. (The full posts were indexed in web search, of course.)

The people behind the West Seattle Blog pointed out a change to this on Twitter.

According to Jeremy Hylton of the Google Blog Search team, they now index the full content of the page. This means that not only do they index the full post even if the blog publishes a partial feed, but it means that they index the non-post parts of the pages as well. This is mostly an improvement, of course, but it’s causing some problems, particularly for people who have alerts set or do searches for references to themselves, their sites, or their brands when any of these are linked to in blogrolls.

The result is that anytime a blog publishes a new post, Google Blogsearch picks up the new page, including the sidebar details. So you may get an alert that there’s a new blog post about you, but when you go check it out, you find the post doesn’t even mention you!

Jeremy says:

We do expect to fix the problem you’re seeing. We’ll use the full page content, but exclude the content that isn’t really part of the post. I’m not sure if we’ll be able to make the change before the end of the year, but we are working on it and are pretty confident that it can be solved.

They’ll post once it’s been fixed.

A funny aside:when searching to see where else this was being talked about, I came across this juxtaposition of results:

Making Geotargeted Content Findable For the Right Searchers

December 1, 2008

A few weeks ago, I organized and moderated several sessions at SMX London. One of those sessions was about international SEO, which in part, touched on the issues related to having content available for multiple countries and languages. What’s the best way to make sure that searchers in a particular country or speaking a particular language are able to easily find the content you have available for them?

Last week, I was reading Eric Ward’s column Now Is the Winter of Linking’s Discontent, where he writes:

Personalized search results have been with us for a while, but this patent [about Google's personalized search patent that Bill Hartzer discssuses on his blog] is chock full of link building implications.  I’d say this is especially true for web sites trying to do business in multiple countries but offering their content in only one language.   And if you take the time and effort to truly make your content available in other languages, do you also need to host that other language content on a server based in that country if you want to rank well for searches originating from that country?  What about duplicate content? Aren’t French and German versions of a site, if hosted in France and Germany, duplicate? Hmmm.

These questions came up at SMX London as well. How do search engines sort out content targeted for particular languages and regions and what are the best practices for making sure you’re being seen by your target audience?

How search engines determine the geographic intent of the searcher

Search engines try to display the most relevant results possible to a searcher. The language of the searcher, the searcher’s geographic location, way the searcher accesses the search engine, and language or regional intent in the query are all factors the search engines consider when determining relevance. Since queries are generally three to four words long, search engines use all the signals they can beyond the query to figure out what searchers are really looking for.

For instance, if a searcher is in Ireland searches for [airline booking], they’ll likely get a very different list of results than a searcher in the United States, as the results will skew towards Irish airlines. But this doesn’t just happen at the country level. If a searcher in Seattle searches for [pizza], they’ll likely get more Seattle-based pizza listings than a searcher in Boston would. And for Google in particular, a searcher who’s logged into a Google account and has set a default location in Google maps may get even more targeted results. Google has made this option more visible lately, and for queries they think may have local intent, they offer a zip code option:

In addition, a searcher will get get different results:

  • Searching google.fr from the US.
  • Searching google.fr from France.
  • Searching google.fr and choosing “French pages”
  • Searching google.fr and choosing “pages from France”

And, as you might imagine, including a geographic location in the query impacts results as well. A search for [restaurant in Dublin] returns different results than [restaurant], regardless of the other signals. And searching in a particular language will generally return results in that language. For instance, look at the results for the query [donde esta los cabos] from a US IP address on google.com:

So, to recap, some ways search engines determine regional intent include:

  • Domain accessed (google.co.uk vs. google.fr)
  • Language-restriction (only search French pages)
  • Country-restriction (only search pages in France)
  • Location of searcher (at the country level, as well as more local levels, such as the city)
  • Locational or language intent in the query
  • Searcher’s default location (such as set in Google Maps)
  • The language the query was composed in

Remember  that search engines make slight tweaks to their algorithms all the time as they test what changes improve results. As personalized search becomes more important, it would make sense that if a searcher generally clicks on results in a particular language or country, pages in that language or from that country may start to appear more often for that searcher.

Note that I’m mixing language and region together a bit for the purposes of this article, although they are, of course different. And issues can crop up because there’s not a one-to-one mapping between language and country. For instance, if someone is searching for Spanish pages, should a search engine return pages from both Mexico and Spain? (Probably if the query is language-specific but not regional; and perhaps search engines should use the country associated with the site as a signal for the language the site is in.) Conversely, if you have a site that targets Spanish speakers, do you need separate sites for both Mexico and Spain? (Maybe not if your content isn’t regional, but how then do you ensure your content is returned for searchers in both Mexico and Spain?)

How search engines determine the relevance of the page

Once a search engine decides what is relevant for the query, what signals from the pages come into play? They include the following:

  • Top-level domain (TLD): Many domains can only be used for a particular country. For instance, .fr always signifies a domain in France. TLD could potentially be used as a signal in determining language as well. a .fr domain is likely to have French content.Many domains, however, aren’t country-specific. .com, .net, and .org are well-known examples, but some countries allow their domains to be used by anyone. For instance, .tv is the TLD for Tuvalu, but that country has negotiated an agreement to make the TLD available for anyone ).The exception to the standard seems to be .us. While it’s intended for US-based domains, it hasn’t really taken off, and .com is much more commonly used.
  • Server location: For domains that are not country-specific (such as .com or .tv), search engines use the geographic location of the server where the site is hosted to determine country. For instance, a .com hosted in Canada is seen as a Canadian site and a .com hosted in Australia is seen as an Australian site.
  • Google Webmaster Tools setting: Google Webmaster Tools includes an option for specifying the geographic location of a site. This option isn’t available if the TLD is country-specific. This setting basically replaces the server hosting location signal. This option is useful not only because you can host your domain anywhere and still set a location, but also because you can set each subdomain and subfolder of your site separately, if you’d like. For instance, you can set es.mysite.com or mysite.com/es to Spain and uk.mysite.com or mysite.com/uk to the United Kingdom. The disadvantage to this solution is that it only works for Google.
  • Location of incoming links: If 90% of the incoming links to a site are from Germany, then search engines figure the site is German, or at the very least, of interest to German searchers.
  • Language of pages: Again, language is technically a different relevance factor than country, but the two go hand in hand. If a site is in French, then it’s likely a site from France. The biggest signal used here is probably (as you might imagine), the language of the text on the pages. This criteria isn’t foolproof. What if the page includes multiple languages, for instance? The meta data and character encoding can help here. For instance, if you are translating your English pages into other languages, don’t forget to translate your title tag and meta description tag as well.
  • Address: For local queries (for instance, that [pizza] query from a Seattle searcher, search engines might use the physical address it finds on the page, as well as any information from the search engine’s local index (for example, Google’s Local Business Center). If your site is for a local business, make sure you include your full address and register with each engine’s local index.Even if your site isn’t specifically for a local business, you may want to include regional signals on your site. For instance, if your site is windycityrestaurantreviews.com, and you have a page about each Chicago restaurant, you might assume that anyone coming to the site understands the context is Chicago, and that you don’t need to include “Chicago, IL” in each restaurant’s address. However, when a search engine sees “Joe’s Pizza, 123 Main St.”, there’s no indication that this restaurant is in Chicago. This can cause a usability issue with visitors coming to the site from search as well. Those visitors aren’t coming to the page from the home page that may say “Reviews of all Chicago Restaurants”. They may go directly from search to the page about Joe’s Pizza, and would need confirmation that 123 Main St. is indeed in Chicago.

How should  a site owner architect a geographically targeted site?

Ideally, a company should maintain separate sites for each country, each with the correct TLD. When you do this, search engines can easily determine which page to show for searchers in different countries.

What about duplicate content?

Even if the content is the same across each site, you don’t need to worry about duplicate content. Remember that search engines generally don’t penalize for duplicate content, they filter. And in this case, filtering is exactly what you want. You want the search engine to show the UK page to searchers in the UK and filter out the US page. And that’s what search engines typically do.

If you are targeting only one country and have the .com rather than the correct TLD, make sure it’s hosted in the target country. (Check with your hosting company, if you use one, to verify where the server is actually located.)

Sounds easy enough, but this solution doesn’t work for everyone. You may not be able to get the TLD for every country you operate in, or for other infrastructure-related reasons, you may need to host all the content on the same domain. In that case, I would recommend the following:

  • Putting content for each country on a subdomain or subfolder. (Either is fine; but  if you’re starting from scratch and have a choice, I’d generally suggest going with a subdomain.)
  • Ensuring all content (including title tag and meta description) is localized.
  • Focusing on regional link-building efforts. For instance, make sure that your PR team is targeting newspapers in local regions, not just near the corporate office.
  • Including location-specific terms in internal anchor text. For instance, you might want to create an HTML site map that links to each country’s “home page” on the domain.
More strength in one domain?

At SMX London, there was some debate about if it was better to have a single domain for all countries to consolidate PageRank, and if multiple domains (one for each country) would dilute the overall strength.  Remember that relevance is a critical factor for search engine ranking and PageRank alone doesn’t equal relevance.  A page that is deemed highly relevant for a query, but has low PageRank is going to rank above a page that has high PageRank but has low relevance.

With that in mind, TLD is a strong relevance factor for results in a particular country. As for the argument that it’s more work to build links to multiple sites than to one, I content it’s around the same, since even if you had the country-specific information on subdomains or in subfolders instead, you’d still want to build regional links to each. So, I would generally recommend TLDs if you can get them.

However, if you have a .com (for instance), with separate subdomains that you’ve been maintaining for a period of time, it probably makes sense to leave things as is and consider the other relevance factors (regional links, language of content, etc.). If you radically change your site structure (for instance, from subdomains to separate TLDs), you’ll need to have the content recrawled, reindexed, and reranked, and may need to change user perception, branding, link building efforts, among other things. And that may take some time. In a situation like this, I would recommend changing only if you’re having substantial problems getting the right content to be returned for the right country indices.

What about targeting multiple countries?

What if you want results returned to everyone? Or you have German content you want returned in Germany, Switzerland, and Austria? Unfortunately, there’s no perfect solution. In some cases, you’ll have to rely on the search engines to understand what results your pages are relevant for, but keep in mind that a more specific site may be seen as more relevant.

In some cases, other sites may be more relevant. For instance, if you have a US site in English that targets tourists worldwide, your content won’t be shown to searchers in France who select “only French pages”. And even if searchers don’t filter using that option, a site that has created content in French, targeted to tourists in France who are planning a visit to the US is likely to be seen as more relevant than your site targeting the world.

What about IP-Targeting?

Some sites detect the location of the visitor based on IP address, and redirect them to a country (or other location)-specific page. While this seems to be a user-friendly solution, some issues exist:

  • The location may be incorrect. For instance, many AOL users appear to be coming from Virginia.
  • The searcher may want a different location. For instance, when I was in Zurich, I still wanted the US Hertz site, but Hertz sent me to the Swiss site automatically and gave me no options for navigating elsewhere.
  • Search engines need unique URLs in order to index content separately.
  • Search engines crawl your site from a particular location, but you want all locations indexed.

If you have your site set to detect a visitor’s location and show content based on that, I would recommend the following:

  • Serve a unique URL for distinct content. For instance, don’t show English content to US visitors on mysite.com and French content to French visitors on mysite.com. Instead, redirect English visitors to mysite.com/en and French visitors to mysite.com/fr. T hat way search engines can index the French content using the mysite.com/fr URL and can index English content using the mysite.com/en URL.
  • Provide links to enable visitors (and seach engines) to access other language/country content. For instance, if I’m in Zurich, you might redirect me to the Swiss page, but provide a link to the US version of the page. Or, simply present visitors with a home page that enables them to choose the country. You can always store the selection in a cookie so vistors are redirected automatically after the first time.

Google isn’t the only search engine

Of course, Google and Yahoo and Live aren’t the only search engines. If you’re targeting other countries, research who the dominant search players are there and how to best optimize for them. Mona Elesseily recently wrote an article on Search Engine Land about international search markets, and while she was focusing on paid search, the players and numbers are similar for organic search.

An international strategy is about more than targeting

Of course, a lot more goes into creating localized content. You should localize, not just translate, the content. Searcher behavior and customer needs may be different from country to country. Even simple phrasing may be slightly different. Different PR efforts may be need to build awareness and links. And there are conversion factors to consider. At SMX London, several panelists pointed out that searchers are more likely to click search results that had their local TLDs in the domain, because they felt more confident those domains would give them localized content.

Hopefully, this article can help sort out some of the issues that arise when planning a global site strategy, but it’s certainly only a starting point.

More information

(By Vanessa, who clearly is still working through technical blog issues.)

Technical SEO Issues Are Hard and We Need Good Solutions

October 30, 2008

Those of you who work in search marketing know that there’s a ton that goes into search engine optimization. And solutions, particularly for technical issues, can be complicated. It’s easy to say that a site shouldn’t use tracking parameters in URLs, but it’s a bit harder to really dig into the optimal solution that takes into account why the tracking parameters are used and what the system architecture that processes them is like.

While I was at Google and since, I’ve gotten lots of questions from developers about search-related issues. There’s so much great information out there about search marketing, but less about pure techie issues that dig deep into the code and technology stacks.

I’ve always been interested in all things geeky, so this led me to do a workshop for developers about search at Web 2.0 Expo earlier this year, launch Jane and Robot, a site focused on search-related issues just for developers, host meetups for web developers in Seattle, and organize a “developer day” as part of SMX Advanced.

With those aims in mind, I’m co-chairing a new conference with Nate Buggia, put on by O’Reilly. Called Found, this conference is specifically for web developers and will feature lots of case studies and real world examples from developers about their experiences and what works. We’ll cover both the LAMP and Microsoft stacks, and will dig into diagnosing technical issues, and will advance best practices to build into the web development process to ensure web applications are search-engine friendly.

This conference isn’t meant to compete with search marketing conferences like SMX, SES, or Pubcon. While those conferences definitely cover technical issues, the primary audience is search marketers (on both the organic and paid side). We envision that attendees of those conferences might send the developers they work with to the O’Reilly conference. We hope this will help with communication between marketing and development. When search marketers tell developers that the site’s URL structure or use of AJAX or Flash or redirects need to be changed for SEO improvement, those developers can have a handy toolbox for exactly the best way to implement those changes based on the infrastructure of the site.

I also talk to a lot of developers who have launched startups, and while they have a development background, their companies are just too small for them to hire separate SEO expertise. So giving them guidance on how to build their sites so they can be crawled and indexed by search engines is another goal. Startups can definitely benefit from the potential customers that search can bring.

The call for proposals is open now, so if you’re a developer who’s tackled a tough search-related problem, or you’re a search marketer who’s spent a lot of time working with developers on best practices related to SEO, we’d love to hear from you!

We’d also love to hear from developers about what issues they face and what they’d most like to see covered at the conference. And from search marketers about what technical issues they see most often.

(I’m about to jump on a plane for SMX London, so I might be a bit slow in responding to comments over the next few days. But I’ll find an internet connection again as soon as I can!)

Increasing Search Indexing Coverage With an XML Sitemap

October 13, 2008

I just read Jeff Atwood’s post on Coding Horror about the importance of Sitemaps. I’m always eager to hear about people’s experiences since I spent so much time on XML Sitemaps and getting sitemaps.org launched while I was at Google. Sitemaps, of course, are supported by Google, Yahoo, and Live Search. All you have to do is reference the Sitemap location in your robots.txt file and all the engines will pick it up.

Atwood noted that he uses Google to search for his own stuff, which makes it that much more frustrating when some of the content isn’t indexed. (Not to mention of course, the lost visitor opportunities.) Once he created an XML Sitemap, Google started finding and indexing more of his pages. Yay!

However, he and his commenters had a few questions about the process, so I thought I’d take a few minutes to answer them. Of course, I don’t work for Google anymore, so these answers are entirely my own. If you want official answers, check out the Official Google Webmaster Help forum.

Why is Google having so much trouble crawling my dynamic site? Can’t Googlebot figure out my URL scheme? (I’m paraphrasing Atwood’s post here.)
I haven’t spent a lot of time studying stackoverflow.com (the site in question), but since Google is crawling and indexing the URLs after finding them in the Sitemap, the problem likely isn’t with the dynamic nature of the URLs themselves. The issue is probably that the internal linking structure doesn’t provide links to every single page. Since Googlebot crawls the web by following links, it wouldn’t know about the unlinked URLs. Atwood notes this possibility:

“On a Q&A site like Stack Overflow, only the most recent questions are visible on the homepage… I guess I was spoiled by my previous experience with blogs, which are almost incestuously hyperlinked, where everything ever posted has a permanent and static hyperlink attached to it, with simple monthly and yearly archive pages. With more dynamic websites, this isn’t necessarily the case.”

Of course, pages with links to them (particularly no external links) may not have substantial PageRank and therefore are unlikely to rank for anything other than long tail queries. But since the scenario Atwood describes is all about long tail queries (typing in the exact title of a page, for instance), then getting those pages crawled and indexed is sufficient.

To dig a bit more into Atwood’s needs, he says, “It’s far easier to outsource the burden of search to Google and their legions of server farms than it is for our tiny development team to do it on our one itty-bitty server. At least not well.” If he’s looking to provide comprehensive search for visitors of his site, he might consider Google’s custom search engine (CSE). Generally, the CSE searches over what’s in the Google index. But if you’re submitted a Sitemap, Google will maintain a CSE-specific index that contains any URLs from the Sitemap that aren’t in Google’s web search index. So, the CSE could provide even better search results than a regular web search.

Why would Google put some URLs in the CSE-specific index and not the regular web index? Well, Google’s algorithms use lots of criteria for determining not only how to rank pages, but what pages to crawl and index as well. So, if, for instance, Googlebot has crawled what it’s deemed the maximum number of URLs from your site for the week for the web index (I’m over-simplifying here a bit), it can still add the remainder to the CSE index.

It doesn’t sound very scalable. (from John Topley)
You can easily write a script that updates the Sitemap each time the site is updated. And if your Sitemap reaches the maximum size, you can break it up into multiple Sitemaps automatically or you can segment them by folder (or whatever organizational structure works best for you). If you want, you can even ping the search engines each time the Sitemap is updated, or you can just reference it in your robots.txt file as Atwood suggests and let them pick it up.

How do you determine change frequency? (John Topley)
If your script can determine this, then you can set it up programmatically. Otherwise, I’d skip this attribute and just concentrate on listing the URLs.

I think google is not happy with the “dynamic” parts of the url e.g. “?” or “&” (Marcel Sauer)
Google does fine with dynamic URLs. They can have trouble if the dynamic nature of the site leads to things like infinite URLs, lots of URLs that display the same page, crazy parameters, or recursive redirects, but as I noted above, the trouble tends not to be with the URLs themselves, but the fact that they aren’t always well-linked.

  • Nine By Fox

    Stories from the online marketing industry, Vanessa's travel adventures, and more. For reference material and analysis, see the Library.
  • Categories

  • The Latest From Twitter