Fast Sitecore Log Analyzer for Sitecore

Sitecore Log Analyzer is an amazing tool. Whenever you need to wade through your Sitecore log files, instead of opening five or twenty log files or so in your favorite text editor and performing a "Find in files" operation, you can use it to open multiple log files in one session, limit the analyzed date/time range, categorize them into errors, warnings etc (you can even browse the performance counters) and even view them on a helpful timeline.

This all works excellent if you have an on-premises Sitecore installation, to which you can point the Log Analyzer so that it may grok your logs. Things get harder if your Sitecore solution lives on Azure.

Background

If your Sitecore solution was developed and deployed to Azure with the help of the Azure module, then there are no physical files to point the Analyzer to. Sitecore on Azure is a PaaS solution, which, among other things, implies a 650MB limit on your web server VM. Therefore, the Azure module reconfigures the logging machinery so that it uses the .NET tracing infrastructure to write to the logs. This simply means that your logs are stored in an Azure Table Storage, in a table called "WADLogsTable" (among other tables -- e.g. there's a whole different table that stores the IIS failed requests, etc).

Out of the box, the Log Analyzer supports this scenario as well... in a way. You may edit its configuration file and enable its "Azure module" (which from now on I'll call "Azure plugin" so as not to confuse it with the official Sitecore Azure module). When you do that, clicking on the "Select Location" button prompts you with a chooser so that you may either select a file system log store, or provide your credentials to your Azure storage account so that the Analyzer scans the WADLogsTable table. So no problem, right?

Problem

Well, things aren't so simple. I tried opening the log files for one of our clients' solution, which has been online since early 2013. Since logs don't get archived when in table storage, it follows that there's going to be plenty of logs to browse through. However, even limiting the date/time range to the last 5 minutes results in the Analyzer choking on the sheer amount of data.

So, I used a decompiler to find the root of the cause, and sure enough after a short search the reason was evident: if you provide a date/time range, the Azure plugin queries the WADLogsTable on its Timestamp column, which is bad. Exceptionally bad.

Firstly, the Timestamp column represents the time the entry was created in the storage account. This does not necessarily match the actual time the event was captured (the diagnostics infrastructure is allowed to cache its entries in memory before committing them to storage in batch). The correct column to use is called EventTickCount, which contains the correct information (in ticks since the epoch). However, this is not the right column to use either, because secondly, and more importantly, neither of these two columns is indexed, so querying either of them results in a simple table scan, which may take a very, very long time to complete, especially if there are millions of rows in the table. The only two columns which are indexed on that table are PartitionKey and RowKey.

Solution

So how does this help us? Well, the PartitionKey column actually represents the number of ticks since the epoch normalized to the minute. So we can filter by date on the PartitionKey column, which is indexed, and performs nicely. Furthermore, since the good people who authored the Log Analyzer have taken a modular approach, there's nothing stopping us from creating our own module to replace the default one (apart from the fact that there is no released source code -- nothing a good decompiler can't handle).

And that's exactly what I decided to do. You may visit the project's GitHub page here, or you may download the compiled plugin from here. The plugin behaves exactly as the default plugin, only it actually manages to retrieve your logs :)

Have fun and, as always, happy coding!