fighting the lack of good ideas

a poor user’s guide to accelerating data models in splunk

Data Models are one of the major underpinnings of Splunk’s power and flexibility.

They’re the only way to benefit from the powerful pivot command, for example.

They underlie Splunk Enterprise Security (probably the biggest “non-core” use of Splunk amongst all their customers).

Key to achieving peak performance from Splunk Data Models, though, is that they be “accelerated“.

Unfortunately (or, fortunately, if you’re administering the environment, and your users are mostly casually-experienced with Splunk), the ability to accelerate a Data Model is controlled by the extensive RBACs available in Splunk.

So what is a poor user to do if they want their Data Model to be faster (or even “complete”) when using it to power pivot tables, visualizations, etc?

This is something I’ve run into with customers who don’t want to give me higher-level permissions in their environment.

And it’s something you’re likely to run into – if you’re not a “privileged user”.

Let’s say you have a Data Model that’s looking at firewall logs (cisco ios syslog). Say you want to look at these logs going back over days or weeks, and display results in a pivot table.

If you’re in an environment like I was working in recently, where looking at even 100 hours (slightly over 4 days) worth of these events can take 6 or 8 or even 10 minutes to plow through before your pivot can start working (and, therefore, before the dashboard you’re trying to review is fully-loaded).


One more thing.

That search that’s powering your Data Model? Sometimes (for unknown reasons (that I don’t have the time to fully ferret-out)), it will fail to return “complete” results (vs running it in Search).

So what is a poor user to do?

Here’s what I’ve done a few times.

I schedule the search to run every X often (maybe every 4 or 12 hours) via a scheduled Report.

And I have the search do an outputlookup to a CSV file.

Then in my Data Model, instead of running the “raw search”, I’ll do the following:

| inputlookup <name-of-generated-csv>

That’s it.

That’s my secret.

When your permissions won’t let you do “what you want” … pretend you’re Life in Ian Malcom‘s mind – find a way!

splunk: match a field’s value in another field

Had a Splunk use-case present itself today on needing to determine if the value of a field was found in another – specifically, it’s about deciding if a lookup table’s category name for a network endpoint is “the same” as the dest_category assigned by a Forescout CounterACT appliance.

We have “customer validated” (and we all know how reliable that kind of data can be… (the customer is always wrong) names for network endpoints.

These should be “identical” to the dest_category field assigned by CounterACT … but, as we all know, “should” is a funny word.

What I tried (that does not work) was to get like() to work:

| eval similar=if(like(A,'%B%') OR like(B,'%A%'), "yes", "no")

I tried a slew of variations around the theme of trying to get the value of the field to be in the match portion of the like().

What I ended-up doing (that does work) is this:

| eval similar=if((match(A,B) OR match(B,A)), "yes", "no")

That uses the value of the second field listed to be the regular expression clause of the match() function.

Things you should do ahead of time:

  • match case between the fields (I did upper() .. lower() would work as well)
  • remove “unnecessary” characters – in my case, I yoinked all non-word characters with this replace() eval: | eval A=upper(replace(A,"\W",""))
  • know that there are limitations to this comparison method
    • “BOB” will ‘similar’ match to “BO”, but not “B OB” (hence removing non-word characters before the match())
    • “BOB” is not ‘similar’ to “ROB” – even though, in the vernacular, both might be an acceptible shortening of “ROBERT”
  • if you need more complex ‘similar’ matching, checkout the JellyFisher add-on on Splunkbase

Thanks, also, to @trex and @The_Tick on the Splunk Usergroups Slack #search-help channel for working me towards a solution (even though what they suggested was not the direction I ended up going).

how-to timechart [possibly] better than timechart in splunk

I recently had cause to do an extensive trellised timechart for a dashboard at $CUSTOMER in Splunk.

They have a couple hundred locations reporting networked devices.

I needed to report on how many devices they’ve reported every day over the last 90 days (I would have liked to go back further…but retention is only 90 days on this data).

My initial instinct was to do this:

index=ndx sourcetype=srctp site=* ip=* earliest=-90d
| timechart limit=0 span=1d dc(ip) by site

Except…that takes well over an hour to run – so the job gets terminated at ~60 minutes.

What possible other approaches could be made?



Here are a few that I thought about:

  1. Use multisearch, and group 9 10d searches together.
    • I’ve done things like this before with good success. But it’s … ugly. Very, very ugly.
    • You can almost always accomplish what you want via stats, too – but it can be tricky.
  2. Pre-populate a lookup table with older data (a la option 1 above, but done “by hand”), and then just append “more recent” data onto the table in the future.
    • This would give the advantage of getting a longer history going forward
    • Ensuring “cleanliness” of the table would require some maintenance scheduled searches/reports … but it’s doable
  3. Something else … that “happens” to work like a timechart – but runs in an acceptable time frame.
  4. Try binning _time
    1. Tried – didn’t work 🤨

So what did I do?

I asked for ideas.

If you’re regularly (or irregularly) using Splunk, you should join the Splunk Usergroups Slack.

Go join it now, if you’re not on it already.

Don’t worry – this blog post will be here when you get back.

You’ve joined? Good good. Look me up – I’m @Warren Myers. And I love to help when I can 🤠.

I asked in #search-help.

And within a couple minutes, had some ideas from somebody to use the “hidden field” date_day and do a | stats dc(ip) by date_day site. Unfortunately, this data source is JSON that comes-in via the HEC.


Lo and behold!

I can “fake” date_day by using strftime!

Specifically, here’s the eval command:

| eval date=strftime(_time,"%Y-%m-%d")

This converts from the hidden _time field (in Unix epoch format) to yyyy-mm-dd.

This is the 🔑!

What does this line do? It lets me stats-out by day and site (just like timechart does … but it runs way faster (Why? I Don’t Know. He’s on third. And I Don’t Give a Darn! (Oh! That’s our shortstop!)).

How much faster?

At least twice as fast! It takes ~2200 seconds to complete, but given that the timechart form was being nuked at 3600 seconds, and it was only about 70% done … this is better!

The final form for the search:

index=ndx sourcetype=srctp site=* ip=* earliest=-90d@ latest=-1d@
| table site ip _time
| eval date=strftime(_time,"%Y-%m-%d")
| stats dc(ip) as inventory by date site

I’ve got this in a daily-scheduled Report that I then draw-into Dashboard(s) as needed (no point in running more often, since it’s summary data that only “changes” (at most) once a day).

Hope this helps somebody – please leave a comment if it helps you!

finally starting to get some good docs amassed

I had a decent library of documentation, templates, hand-offs, slide decks, etc in my pre-Splunk consulting life (technically, I still have them).

It’s nice to be finally getting a decent collection to draw from for my customers in my post-automation consulting life.

you can’t disaggregate

Had a customer recently ask about to disaggregate a Splunk search that had aggregated fields because they export to CSV horribly.

Here’s the thing.

You can’t disaggregate aggregated fields.

And there’s a Good Reason™, too: aggregation, by definition, is a one-way street.

You can’t un-average something.

Average is an aggregation function.

So why would you think you could disaggregate any other Splunk aggregation operation (like values or list)?

You can’t.

And you shouldn’t be able to (as nice as the theoretical use case for it might be).

So what is a body to do when you have a use case for a clean-to-export report that looks as if it had been aggregated, but every field in each row cleanly plunks-out to a single comma-separated value?

Here’s what I did:

{parent search}
| join {some field that'll exist in the subsearch}
[ search {parent search}
 | stats {some stats functions here} ]
| fields - {whatever you don't want}
| sort - {fieldname}

What does that end up doing?

The subsearch is identical to the outer search, plus whatever filtering/where/|stats you might want/need to do.

Using the resultant, filtered set, join on a field you know will be unique [enough].

Then sort however you’d like, and remove whatever fields you don’t want in the final display.

Of course, be sure your subsearch will complete in under 60 seconds and/or return fewer than 10,000 lines (unless you’ve modified your Splunk limits.conf)

stats values vs stats list in splunk

Splunk’s | stats functions are incredibly useful and powerful.

There are two, list and values that look identical…at first blush.

But they are subtly different. Here’s how they’re not the same.

values is an aggregating, uniquifying function.

list is an aggregating, not uniquifying function.

“Whahhuh?!” I hear you ask.

Here’s a prime example – say you’re aggregating on the field IP_addr all user values.

Your search might contain the following chunk: | stats values(user) as user by IP_addr. So for each unique IP address, you will collate a uniquified list of users. Maybe you have the following two IP addresses: & And you have the following user-IP address pairings: kingpin11, fergus97, gerfluggle, kingping11, jbobgorry

values will aggregate all of the following users associated with IP addresses: & gerfluggle, jbobgorry, kingping11; & fergus97.

That’s nice – it’s pretty.

But it exports in lousy form if you need to further process the data in another tool (eg Microsoft Excel).

When Splunk exports those results in a CSV, instead of getting a nice, processable file, you get tabs separating what would otherwise be individual items that have all been grouped into one field.

Enter list.

list doesn’t uniquify the values given to it, so while you still only get one line per IP address (since that was our by clause in the snippet above), you get as many IP addresses listed as there are users (in this example).

This makes for an exportable, more processable set of results that a tool like Excel can ingest to perform further analysis with relatively little reformatting needed.

Come back tomorrow for how to get the export to work “out of the box”.

don’t use symlinks unless you *know* you can

I first ran into this on Solaris in the context of [then] Opsware SAS (then HP SA, now owned by Microfocus). Bind mounts might be OK … so unless the tarball has symlinks included, don’t use them – they get traversed differently than “real” directories.

In short, when directory traversals are done, sometimes it looks at the permissions bits and if the first character is not a d (for a symlink, it’s always an l), many processes can fail.

Symlinking files is [possibly] a different story: though permissions are usually wonky on symlinks (most often lrwxrwxrwx vs -rw-r--r--, for example), since you cannot traverse into a file (whereas you can into a directory), it’s generally ok

Also – sometimes when directory listings are pulled, the symlink is fully-dereferenced, and something that appears to be in, say, $SPLUNK_HOME/etc/deployment_apps but is really in, say, /some/other/place, there are some times when Splunk will decide not to deploy it, because it’s not where it “belongs”.

Also – checksums can be computed on the symlink and not the actual file, in some (perhaps all) instances: so if, for example, you have the same outputs.conf in several apps by way of symlink, and you change it in one, the checksum for all the others may (and typically do) not get updated … so you can be left in an inconsistent state for your configs (because not all locations that should’ve received the updated outputs.conf have received it, since they’re symlinks and not a real file, and the checksum may not update on those particular apps).

Moral of the story?

Unless you really know what you’re doing, never use symlinks with Splunk.