But I digress.
There are two (at least) ways of making a table – you can use the
|table <field(s)> syntax, or you can use
|stats <some function> <field(s)> approach.
Interestingly, in my testing in both test and production environments, using the
|stats... approach is consistently 10-15% faster than the
Why? I don’t know. He’s on third. And I don’t give a darn!
This is another case of technical intricacies mattering … but I don’t know what is going on under the hood that makes the apparently-more-complex option run faster than the apparently-simler option.
Maybe someday someone in Splunk engineering will be able to enlighten me to that.
This reminds me a bit of an optimization I was able to help a friend with upwards of 12 years ago – they had queries running in MySQL that were taking forever to complete (and by “forever”, I mean they were running sometimes 4-5 times a long as the interval between running them (they ran every 5 mintues, but could take 20+ minutes to finish!)).
What I found, at least back in the dark days of MySQL 3.x was that using
IN(...) was loads faster than using
So a query that had a clause
WHERE name IN("bob","sarah","mike","terry","sue") would run anywhere from 20-90% quicker than the logically-equivalent
WHERE name="bob" OR name="sarah" OR name="mike" OR name="terry" OR name="sue" (given a large enough dataset overwhich it was running … on small [enough] tables (say up to a couplefew thousand records), the
OR version would run equally, or occasionally faster).
In their case, by switching to the
IN(...) form, queries went from taking 20+ minutes to finishing in ~20
It is well-known in Splunkland that using
dedup is an “expensive” operation. Want a clever way around it (that is much faster)? Instead of doing something like
index=myndx | fields ip host | dedup host, run
index=myndx | fields ip host | stats count by host | fields -count. The
|stats .. |fields -count trick seems to run anywhere from 15-30% faster than
Also published on Medium.