Solr to the rescue!

As posted in part 1, attempting to store logging data in the application data storage ended up being a very bad choice. We needed to store data in a separate storage, and still maintain queryability. Enter Solr, the indexing engine. It can handle millions of documents, very frequent additions, is scalable and is queryable with great performance.

Initial Configuration

We decided to return to the original idea of having all four different types of logging under the same log, so we decided to have a single Solr core to store all the logging data. Our document definition thus became as follows:

   <field name="_version_" type="long" indexed="true" stored="true"/>
   <field name="_root_" type="string" indexed="true" stored="false"/>
   <field name="LogId" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="LogDate" type="date" indexed="true" stored="true"/>
   <field name="Position" type="string" indexed="true" stored="true" />
   <field name="Message" type="text_xml" indexed="true" stored="true" />
   <field name="Extras" type="text_xml" indexed="true" stored="true" /> 
   <field name="Duration" type="long" indexed="false" stored="true" />
   <field name="Channel" type="string" indexed="true" stored="true" /> 
   <field name="LogType" type="string" indexed="true" stored="true" /> 

The above is pretty straight forward, except for the "text_xml" storage type. This is a type we created, defined as follows:

    <fieldType name="text_xml" class="solr.TextField" positionIncrementGap="100">
 <tokenizer class="solr.PatternTokenizerFactory" pattern="(&lt;(/?))|((/?)&gt;)|(\s+)|=|&quot;"/>

The reason behind the above is that the client wanted to be able to search the logs based on data in the content (i.e. data in "Message" and "Extras"). Since both of these usually contain XML data, we thought that a valid course of action would be to tokenize the XML on each and every element, attribute name, attribute value and element value. That's what the "pattern" does.

Also there are some extra fields in the document, i.e."Channel" (a string identifying the source of the request) and LogType (with four different possible values, ERR, DEBUG, INFO, TRANS, for the four different types of logs).

Additional Configuration

Our first attempt at this showed that we needed to tweak the Solr configuration, as the web services became about 100% slower in response. In actual seconds, the services would respond in about 1.5-2 seconds before, and in about 2.5-3 seconds after the switch to Solr. 

We traced the source of this delay to the fact that we were committing each and every log entry right when it was added to the Solr core data. This commit, triggered a new Searcher and this was a very expensive operation. So we removed the Commit statement, and then activated AutoCommit (to automatically add all pending documents to the main index) every 15 seconds and also AutoSoftCommit (to trigger a new Searcher) every 60 seconds, so that all the committed documents would appear in searches. Thus, the relevant area in the solrconfig.xml of our core became:


With this change, the response times fell down to about 1-1,5 sec, thus making the web services about 16% faster on avarage.

Displaying the data

So, we've taken the load of logging off the main storage, thus regaining maintainability and also separating the main data from the log data. The next and final goal is making it easy to search the data. We immediately saw that the Solr web interface was not up to the job. In order to read a single log entry, where both the message and the extras fields were in XML format, we'd have to copy the data from the results panel, then paste into notepad, then activate wordwrap... not convenient. So, we created a viewer, a winforms application for viewing and querying this log. It offers a reduced set of options compared to the web interface of Solr, however this is what is needed most of the time. When first launched, it shows the following:

Solr Viewer initial view

You can change the url for the core you wish to query (helpful for choosing between development/staging/production instances). There is also the query field, the returned fieldlist, the sorting and the paging fields.

If an error occurs, then a helpful message shows up at the top right:

Solr viewer error display

The "View raw query" is there for allowing you to see what is actually passed to solr. When clicked, it opens up a modal with the actual query, selectable, and with the ability to copy it to the clipboard and then paste it into a browser's address bar:

raw query viewer

When all goes well, the right side fills up with the results:

Solr viewer main view with results

The "Message" and "Extras" are viewable in a separate, modal window:

Message/Extras viewer

The "raw query" also appears, and we have an extra option, to "commit pending data". Clicking on it, we trigger an immediate commit, thus making the logs real time:

Manual data commit

Attached to this post, the actual viewer.