Google Moderator Beta: Ask a Google Engineer

September 30, 2008

A few days ago, I noticed that Zurich-based Googler John Mueller posted a Twitter link to Google’s new Moderator application and invited everyone to ask Google engineers, such as Matt Cutts, questions. I asked Matt to bring me some frozen yogurt from the Google cafe. As Google Moderator takes advantage of the wisdom of crowds, my question soon plummeted to last place as apparently I’m the only one interested in my need for icy treats and everyone else (er, at least 86 people) voted that they didn’t like my question. (I don’t mind being voted down; I just wish I at least had some frozen yogurt!)

Matt blogged about this tool, explaining that it has been available internally at Google for a while and is a great way to prioritize questions.

If only it were searchable
It seems like a pretty cool tool. The biggest drawback I see to using it is that none of the content gets indexed, so unlike something like Yahoo! Answers, any work you put into asking or answering questions can’t be found later by those searching for that information. Why can no content be indexed, you ask? For one thing, the content is entirely in JavaScript. With JavaScript turned off, all you see is a message that says:

Google Moderator is a tool that allows distributed communities to submit and vote on questions for talks, presentations, and events. You must have JavaScript enabled in order to use this feature.

You can see that Google is indexing what’s in the noscript tag by checking out the search result:

googlemoderatorserp

Huh.

The other problem with indexing is that every URL is differentiated by characters that begin with a #. Even if the content did load without JavaScript, search engines would see every URL as moderator.appspot.com, since they drop everything after a # (since traditional web practices dictate that the # in a URL indicate an anchor point within the existing page).

As a sidenote on how ensuring your site is search-friendly can help usability, note that these issues also keep the back button from working in the browser, so if you read an answer, you can’t easily get back to the list of questions.

(The site has other, more minor, search issues, such as that the logo doesn’t link to the home page and there’s no meta description, but really, fixing those issues would be like using a thimble to bail out a sinking boat with a hole the size of a bowling ball in the bottom of it.)

Go ahead, ask me a question
I started a question series to test out the system, and I will answer the questions that are voted to the top, but I’m likely to answer them here (and post links to the answers there) because of that er, minor indexing issue. Feel free to ask a question there and test things out yourself.

Answering questions to Matt
I’m not Matt, nor do I play him on TV, but I did see a few questions to him that I thought I’d steal away to answer. (Although he has answered quite a few already himself!)

Q: What’s the best way to get a count of indexed pages in Google? Last I checked, Webmaster Tools just links to the standard “site:” operator. Various query tricks have had different levels of success in the past, but none have been reliable. (Nick, Chicago)

I love this question because I get to talk up Google Webmaster Central. A reliable count does indeed exist! Simply create an XML Sitemap that includes a comprehensive and accurate list of the pages you would like indexed. Alternately, you could create several Sitemaps, each with a different set of pages you want to track. If you want to track more than 50,000 URLs, simply add multiple Sitemaps to one Sitemap Index file. Submit the Sitemap or Sitemap Index file to Google Webmaster Tools and check back after it’s been processed. The Sitemaps tab displays not only the Sitemap URL count, but also tells you the number of URLs from that Sitemap that have been indexed. You can track this number of time to measure indexing coverage.

sitemapcount

It’s a pretty handy trick and much more accurate than the site: operator.

Q: Is Google looking for a true solution to deal with duplicate content between UK & US Websites own by the same company? (François, Brussels) and How will Google Identifies that Particular Website belongs to particular location even when it is hosted in US, and uses that Data to show that website to users of country for which it is appropriate? (Cold, Jaipur India)
I can’t speak for what Google is looking to do, but I do know that generally search engines filter duplicate content and show the most relevant version to the searcher. So, in the case of US and UK content, Google would look to show the US version to the US searcher and the UK version to the UK searcher. It wouldn’t generally look to show both versions in a single search result. Google figures out which is more relevant to the searcher using things like the searcher’s geographic location (based on IP address) and whether the searcher is using google.com or google.co.uk.

You can provide signals to Google about the content using TLD (putting the US content on yoursite.com and the UK content on yoursite.co.uk) or domains hosted in the target country (yoursiteus.com hosted in the US and yoursiteuk.com hosted in the UK), segmenting the content into subdomains or subfolders and then specifying the target country for each in Google Webmaster Tools (us.yoursite.com associated with US and uk.yoursite.com associated with UK), and using the meta language element. (To be honest, I’m not entirely sure about the meta country tag. Anyone have experience with this?)

geotarget

So, back to the tool
In particular, I think this tool could be really handy for QA at conferences. I’ve spoken at several conferences (including SMX and Web 2.0 Expo) where attendees could use an online system to submit questions, and I’ve used Twitter for this a few times, but this is the first system I’ve seen that also lets the rest of the audience vote questions up or down. I may have to try it out for my next event.

Tags: , ,

9 Comments

If the “Indexed URLs” in the sitemap within webmaster tools is more accurate, then why do we always get people in GWHG that show zero (or some unreal number) in that metric when clearly they do have many pages indexed? Is it more accurate, when updated, but slow to update? If that’s the case then accuracy isn’t truly accurate as they sure don’t update those stats too often.

Just wondering… :)

The lack of search is quite disturbing, the success of GMail & Google Reader owes a lot to their powerful search.

That said, I like what Google are obviously trying to do with this tool – to make a friendlier environment for n00b queries than Groups and increase communication with their users. I hope they develop the tool and it takes off.

I kind of agree on the indexability of the pages, but on the other hand a lot of that content is extremely short-lived — it’s used as a basis to get a discussion going and, as with this post, the actual content (answer) uses a different medium (which might get indexed or might be an answer given live).

One thing I’d like to see is a better way of linking to individual pages / questions though. The long anchor is confusing and not all CMS convert it into links properly (so I have to use something like the cool cli.gs service). Maybe that’s a part of the short life-span of a question queue as well?

Anyway, if you want to bring back some feedback to the people behind Google Moderator, check out their set at http://cli.gs/6UG72y .

John Honeck, I’m not sure about that. I haven’t seen the issue you describe. Maybe someone from the team will come by and give us more info.

John Mueller, I see your point, but my guess is that lots of people will use this for things other than short-lived talks with live answers. Looking at that scenario as a product manager, I’d say that a better solution would be to enable an event to set up a series that requires attendees to log in in some way (perhaps provide a conference code). After all, for events with live answers, being able to access the series at all isn’t all that helpful for those who aren’t attending, and in fact, the event organizers may not want those who aren’t attending to access any content from the event.

On the other hand, there are likely lots of other use cases were valuable content is being offered up and could be well-served by Google’s mission of organizing the world’s information and making it universally accessible. ;)

andy, agreed — internal search would be useful as well. Even with the use case described above (live events), attendees may want to search to see if their question was already asked.

That “indexed URLs” number is a tricky one :D . It’s “of the URLs listed in the Sitemap file, how many are indexed exactly as they are listed in the Sitemap file.” The count will generally be lower than the total number of URLs indexed because there are almost always URLs that we discover which aren’t in the Sitemap file and there are some URLs that we discover which might be canonical versions of a Sitemaps URL (for example, you might list /folder/default.asp and we might choose to index /folder/ instead). These things make it hard to use that metric as a way of keeping track of the real number of URLs we have indexed.

It is, however, great at giving you information regarding your Sitemap file: If the count is much lower than what you have indexed, chances are you could improve your Sitemap file. Also, you might choose to split up the URLs on your site and create Sitemap files for logical sections of your site — in this case, you’d get feedback about how well each section is indexed, even if the URLs are not in separate folders.

John

Ah see, that’s exactly why I think it’s a *great” metric to keep track of how many of the URLs you want indexed actually are indexed. As you say, it only works if you list every URL you want indexed. If you want 10 URLs indexed and Google does indeed have 10 URLs indexed, are those the right 10? (OK, maybe 10 is a bad example because you could just look.)

As for indexing different versions of URLs, I’d phrase it a little differently. If someone puts a URL in the Sitemap and Google indexes a different version of that page, then generally Google isn’t indexing the canonical version. It’s indexing the non-canonical version that it *thinks* is canonical. This is something that site owners may want to know. Ideally, all the non-canonical versions 301 redirect to the canonical, which would reduce this problem. Also, my understanding is that the URL listed in the Sitemap is a signal (of course, one of many) in determining what the canonical version is.

In any case, the ideal situation is one in which the Sitemap file contains a complete list and Google indexes those versions and not others. This count should help site owners get to that ideal.

In theory, you’re right, Vanessa :) . However, in practice, I can tell you that most Sitemap files are far from perfect and therefore it’s not quite that easy or useful, at least for the sites with imperfect Sitemap files. If your Sitemap file is perfect (and hey, it’s not impossible, it just takes some thought — thinking about things that you should have thought about a long time ago anyway), then these numbers can help you, as you mentioned.

The other thing John Honeck mentioned is that sometimes the data is off (or just missing). I know the team is working on that, and it can be frustrating (and scary, say when you see “0″ as the number of indexed URLs).

Anyway, I’m glad to see Sitemaps and Webmaster Tools hasn’t lost it’s grip on you completely :-) .

JohnMu,

Heh. Yep. I’ve certainly seen my share of imperfect Sitemap files. But I have talked to a lot of people who want a good metric for indexing and are willing to put in the time to make sure that their Sitemaps are accurate, in part to use this measurement.

Hopefully the bugs you mention about inaccurate counts will be resolved soon.

In Google webmaster tools mostly little difference between the total no. of urls in the sitemap and indexed urls by the google. In rare cases they both are same. Why the total no. urls in the sitemap aren’t indexing?

Leave a Reply

  • Nine By Fox

    Stories from the online marketing industry, Vanessa's travel adventures, and more. For reference material and analysis, see the Library.
  • Buy the Book!

  • Categories

  • The Latest From Twitter