It’s fairly common to want to table the data you’ve found in a search in Splunk – heck, if you’re not prettying the data up somewhy, why are you bothering with the tool?
But I digress.
There are two (at least) ways of making a table – you can use the |table <field(s)>
syntax, or you can use |stats <some function> <field(s)>
approach.
Interestingly, in my testing in both test and production environments, using the |stats...
approach is consistently 10-15% faster than the |table...
option.
Why? I don’t know. He’s on third. And I don’t give a darn!
This is another case of technical intricacies mattering … but I don’t know what is going on under the hood that makes the apparently-more-complex option run faster than the apparently-simler option.
Maybe someday someone in Splunk engineering will be able to enlighten me to that.
This reminds me a bit of an optimization I was able to help a friend with upwards of 12 years ago – they had queries running in MySQL that were taking forever to complete (and by “forever”, I mean they were running sometimes 4-5 times a long as the interval between running them (they ran every 5 mintues, but could take 20+ minutes to finish!)).
What I found, at least back in the dark days of MySQL 3.x was that using IN(...)
was loads faster than using OR
statements.
So a query that had a clause WHERE name IN("bob","sarah","mike","terry","sue") would run anywhere from 20-90% quicker than the logically-equivalent
WHERE name="bob" OR name="sarah" OR name="mike" OR name="terry" OR name="sue"
(given a large enough dataset overwhich it was running … on small [enough] tables (say up to a couplefew thousand records), the OR
version would run equally, or occasionally faster).
In their case, by switching to the IN(...)
form, queries went from taking 20+ minutes to finishing in ~20 seconds
!
Bonus tidbit:
It is well-known in Splunkland that using dedup
is an “expensive” operation. Want a clever way around it (that is much faster)? Instead of doing something like index=myndx | fields ip host | dedup host
, run index=myndx | fields ip host | stats count by host | fields -count
. The |stats .. |fields -count
trick seems to run anywhere from 15-30% faster than dedup
.