The First Rule Of Indexing: Make Sure You're Letting The Site Be Indexed
Yesterday, someone asked me to take a look at their blog. It had been indexed in search engines fairly well, but had been migrated to WordPress a couple of weeks ago and since then, everything had taken a dive. They wondered how long it took Google to follow redirects.
I started checking things out. Were all the old URLs redirecting one-to-one to the new URLs with 301s? Yep. Do the 301s seem to be redirecting correctly? Yep. How does the robots.txt file look? It’s allowing everything. Looks good so far.
Then, I checked out the site in Google webmaster tools. Which is a pretty handy program. You should all really looking into it! I noticed that the query stats weren’t reporting any data. This could mean that the site hasn’t come up for any searches lately. Not a good sign. Then I checked the crawl rate information. Indeed, the crawl was slowing down.
And blogsearch hadn’t indexed any new pages since the day of the migration.
Huh. My first guess was that it was taking a while for Google to pick up the redirects because it wasn’t crawling the site very quickly. So, I got set to figure out why that might be. But then I thought I’d just check one last thing.
And sure enough. There it was. In the source code of every page.
<meta name="robots" content="noindex, nofollow">
The search engines were following the redirects and getting a locked door. Suck.
What likely happened is that the designer put up the meta tag during the development phase and forgot to take it off once the site went live.
We could have spent days investigating the redirects and how they were being followed and indexed and speculated about possible penalties or sent out perky yet persuasive link exchange requests in a far-flung effort to increase PageRank and kick start crawling, but sometimes, the simplest explanation is the right one.
The first rule of indexing: make sure you’re letting the site be indexed.
As a sidenote, through webmaster tools, we found that some of the old pages were giving 403 errors, which means that when they shut down the old blog, they password protected some of the pages. Obviously, search engine bots won’t be able to follow redirects to the new pages if they can’t crawl the old ones, so watch for shutting things down too quickly. We also found a list of old pages that were returning 404s, which was a handy way to see what redirects were missed.
I think that meta tag thing might mostly do the trick though. Just a hunch.