Blueprint - Server Log Analysis
We provide instructions for uploading log files to a private Amazon Web Services bucket, which Blueprint then reads from. We recommend setting up a script that uploads log files daily or weekly.
We recommend providing full log files if possible. This enables us to do things such as provide diagnostics for indexing issues. For instance, if we find that some URLs are not indexed, we can determine if they’ve ever been crawled. And we can provide data such as how long it takes for a site to be completely recrawled. And if the site is load balancing, having logs from all servers helps us find misconfiguration issues with a single server that may not be evident. However, if we have only sampled logs, Blueprint can still provide substantial information, such as canonicalization issues, server errors, problems with redirects (302s instead of 301s for instance).
We can’t provide you with our server logs because they contain customer data. Do we have any alternatives?
Blueprint processes search engine bot data only, so you can exclude all customer data before uploading the log files. Our instructions provide details of writing a script that parses out only the data we need before uploading.
As noted above, Blueprint processes search engine bot data only, so you can exclude all customer data before uploading the log files. Our instructions provide details of writing a script that parses out only the data we need before uploading. That alone should make the files significantly smaller. In addition, you can set your script to upload the logs at any interval you’d like. Rather than uploading an entire week’s worth at once, you can upload daily or several times a day. You can also compress the files before uploading them.
Blueprint processes server log data each night, so by uploading daily, you’ll always see information from the previous day.
Without server log data, you can still take advantage of a number of Blueprint features. Unfortunately, you won’t get insights about how search engines are crawling the site, but you’ll still get substantial data about how audiences are finding the site, search engine rankings, and searcher interaction with the listings. As we add more technical features that don’t rely on server logs, you’ll begin to see more technical data about issues search engines are having with crawling and indexing.