Getting Started With Tstats & Accelerated Data Models – Part 3

Once your data model has been accelerated it is time to start writing SPL queries to take advantage of the recently accelerated data. There are many methods to doing this, but we’ll assume you don’t know how to write a proper tstats query at this point and take the path of least resistance.

SPL & Stuff

It should be noted that I’ve accelerated the past year’s worth of data in my data model. First let’s write a query without tstats to compare the differences. I’ll be using this query posted on GoSplunk. The query searches over the previous month’s apache web logs and return results in 2.689 seconds:

Pivot, Pivot Good

Not bad, but I have a relatively small set of data. Let’s see what a query looks like taking advantage of the data model without acceleration. There are a few methods of doing this, but if you haven’t mastered how to write tstats queries yet the easiest method I’ve found is to take advantage of pivot! To get started go to “Settings” then select “Data Models”

Once in the Data Model screen choose “Pivot”

Expand your data model and choose a field of interest. For this use-case I’m going to continue working on the incredibly simple data model we created and select clientip.

For this example select “Top Values by Time”

We’ll want to select the time range we used in our initial query, select “Previous Month”

Great, we’ve got results! But we want to just grab the query that pivot has provided us. To do this open the job inspector, and scroll down to the section that says “search” copy the query and open up a normal search within Splunk.

 

 

 

Change the time range to match our earlier example (previous month in this case) and run the pasted query. You should get identical results to what was previously displayed in the pivot search. The search isn’t exactly what we wanted to see, so let’s use what pivot gave us as a start, but modify where needed.

Pivot helped us start, and displays how a tstats query works, but for this case it’s not a good fit. In-fact we can strip out most of the pivot query to accurately display our timechart use-case. We’ll end up with the following query: | tstats dc(web_traffic.clientip) as visitors from datamodel=Apache_Data BY _time span=5m

Open up the job inspector and see how much faster tstats via data models is. In our use-case we observed the search taking only 0.318 seconds compared to the earlier traditional search completing in 2.689 seconds.

Yes it gets faster

Earlier we enabled acceleration, let’s take advantage of that by simply adding “summariesonly=t”. Our query now looks like this: | tstats summariesonly=t dc(web_traffic.clientip) as visitors from datamodel=Apache_Data BY _time span=5m

If we open the job inspector we see that this search took 0.139 seconds to complete.

Survey says

Using DMA we’ve reduced the overall search time from 2.689 seconds down to 0.139! That’s a decrease in search time by a factor of nearly 20! As stated in Part 1 if your environment has fairly static dashboards in which users are constantly accessing, DMA will do very well for you. I will re-emphasize to verify disk space is a non-issue prior to implementing DMA over large datasets, or over a great period of time. It should be noted that in addition to storage space increases you will essentially have a scheduled search running every 5 minutes, if using default DMA build settings. This will reduce the number of concurrent searches available to other users and scheduled searches, that being said be mindful of the number of data models you build and accelerate!

Advertisements

 

Add Comment

Required fields are marked *. Your email address will not be published.