antipaucity

fighting the lack of good ideas

4 places to check your website’s ssl/tls security settings

Qualys – https://www.ssllabs.com/ssltest

High-Tech Bridge – https://www.htbridge.com/ssl

Comodo – https://sslanalyzer.comodoca.com

SSL Checker – https://www.sslchecker.com/sslchecker

hey, virtualbox – don’t be retarded

Ran across this error recently in an Ubuntu guest on my VirtualBox install: VBoxClient: (seamless): failed to start, Stage: Setting guest IRQ filter mask Error: VERR_INTERNAL_ERROR

Gee, isn’t that a useful message.

Fortunately, there was a forums.virtualbox thread on just this error.

The upshot is that this error is actually caused because of a failure during the initial install of the VirtualBox Guest Additions.

In the middle of what looks like, at quick glance, a successful GA installation, is this nugget: Please install the gcc make perl packages from your distribution.

The GA installer can’t compile kernel modules without a compiler.

And that makes sense.

What doesn’t make sense is that this error is even possible to get! The GA installer must run as root (or via sudo).

If those package are missing, the installer should stop what it’s doing, ask the user if they want to install these packages (because without them the GA installer won’t install everything), and then when the user invariably answers “yes” (because – duh! – why wouldn’t they want this to work?), go run an apt -y install gcc make perl.

But is that what Oracle in their infinite wisdom decide to do?

No. They decided it’s better to just quietly report in the middle of a bunch of success statements that “oh, by the way – couldn’t actually do what you wanted, but if you don’t notice, you’re going to spend hours on Google trying to figure it out”.

Morons.

It realy isn’t that hard to make human-friendly error messages … nor to even try to pre-solve the error condition you found!

more thoughts on `|stats` vs `|dedup` in splunk

Yesterday I wrote-up a neat little find in Splunk wherein running stats count by ... is substantially faster than running dedup ....

After some further reflection over dinner, I figured out the major portion of why this is – and I feel a little dumb for not having thought of it before. (A coworker added some more context, but it’s a smaller reason of why one is faster then the other.)

The major reason stats count by... is faster than dedup ... is that stats can hand-off the counting process to something else (though, even if it doesn’t, incrementing a hashtable entry by 1 every time you encounter an instance isn’t terribly computationally complex) and keep going.

In contrast, dedup must compare every individual returned event’s field that matches what you’re trying to dedup to it’s growing list of unique entries for that field.

In the particular case I was seeing yesterday, that means that every single event in the list of 4,000,000 events returned by the search has to be compared one at a time to a list (that I know is going to top out at about 11,000). To use Big-O Notation, this is an O(n*m) operation (bordering on O(n2))!

That initial list of length m fills pretty quickly (it is, after all, only going to get to ~11,000 total entries (in this case)), but as it grows to its max, it gets progressively harder and hard to check whether or not the next event has already been dedup’d.

At ~750,000 events returned (roughly 1/5 my total), the list is unique field values was 98% complete – yet there were still ~3.2 million events left to go (to find just 2% more unique field values).

Those last 3.2 million events each need to check against the list of >10,500 entries – which means, roughly, 16,8 billion comparisons still need to be made!

(Because linear searching finds what it’s looking for on average by the time it has traversed half the list. If the list is being created in a slighly more efficient manner (say a heap or [balanced] binary search tree), it will still take ~43 million comparisons (3.2 million * log2(11,000)).)

Compare this to the relative complexity of using |stats count by ... – it still has to run through all 4 million events, but all it is doing is adding one to the list for every value that shows up in that particular field – IOW, it “only” has to do a total of 4 million [simple] things (because it does need to look at every event returned). dedup at a minimum is going to do ~54 million comparison (and probably a lot more – given it doesn’t merely take 13x the time to run, but closer to 25x).

The secondary contributing factor – important, but not as much a factor as what I covered above – is that dedup must process the whole event, whereas stats chucks everything that isn’t part of what it’s counting (so if an event is 1kb in size, dedup has to return the whole kb, while stats is only looking at maybe 1/10 the total (if you include a coupld extra fields)).

Another neat aspect of using |stats is that it creates a table for you – if you’re running |dedup, you then have to |table ... to get the fields you want displayed how you want.

And adding |table adds to the run time.

So there you have it – turns out those CompSci 201 classes do come in handy 18 years later 🤓

splunk oddity #17681 – stats vs table

It’s fairly common to want to table the data you’ve found in a search in Splunk – heck, if you’re not prettying the data up somewhy, why are you bothering with the tool?

But I digress.

There are two (at least) ways of making a table – you can use the |table <field(s)> syntax, or you can use |stats <some function> <field(s)> approach.

Interestingly, in my testing in both test and production environments, using the |stats... approach is consistently 10-15% faster than the |table... option.

Why? I don’t know. He’s on third. And I don’t give a darn!

This is another case of technical intricacies mattering … but I don’t know what is going on under the hood that makes the apparently-more-complex option run faster than the apparently-simler option.

Maybe someday someone in Splunk engineering will be able to enlighten me to that.

This reminds me a bit of an optimization I was able to help a friend with upwards of 12 years ago – they had queries running in MySQL that were taking forever to complete (and by “forever”, I mean they were running sometimes 4-5 times a long as the interval between running them (they ran every 5 mintues, but could take 20+ minutes to finish!)).

What I found, at least back in the dark days of MySQL 3.x was that using IN(...) was loads faster than using OR statements.

So a query that had a clause WHERE name IN("bob","sarah","mike","terry","sue") would run anywhere from 20-90% quicker than the logically-equivalent WHERE name="bob" OR name="sarah" OR name="mike" OR name="terry" OR name="sue" (given a large enough dataset overwhich it was running … on small [enough] tables (say up to a couplefew thousand records), the OR version would run equally, or occasionally faster).

In their case, by switching to the IN(...) form, queries went from taking 20+ minutes to finishing in ~20 seconds!

Bonus tidbit:

It is well-known in Splunkland that using dedup is an “expensive” operation. Want a clever way around it (that is much faster)? Instead of doing something like index=myndx | fields ip host | dedup host, run index=myndx | fields ip host | stats count by host | fields -count. The |stats .. |fields -count trick seems to run anywhere from 15-30% faster than dedup.

a lot of travel

Over the past month, and through the end of March, I’ve done, and will be doing, a lot of travel for work.

Nothing I haven’t done before, but it’s been a long time since I’ve had to be onsite for more than a couple weeks at a time – most customer leap at the chance to do remote work.

Sadly, that has not been possible with one customer, and the other is highly reticent to allow contractors to engage remotely until they’ve put in several weeks of face time.

Face-to-face interactions are certainly important (I even noted so 4.5 years ago), but conference calls, webexes, and the like can most assuredly replace much of that.

the death of the “car analogy”

With the rise of the “sharing economy”, and companies like Lyft proudly declaring 250,000 people ditched cars in favor of ride-sharing, what will be the fate of the venerable “car analogy“?

Heck, what was the common analogy before cars?

How will language and colloquial usage change with the [eventual] death of the car as the most common means of transportation (presuming, of course, it actually will die)?

I wonder if the death of the car will prove to be, in the historical view, something like the loss of the shared social experience that TV used to be.

document what didn’t work

In a recent episode of Paul’s Security Weekly, an off-hand comment was made about documentation: you shouldn’t merely document what to do, nor even why, but also what you tried that didn’t work (ie, augment the status quo).

The upshot being, to save whomever comes to this note next (especially if it turns out to be yourselfeffort you spent that was in vain.

This is similar to a famous quote attributed to Edison,

I have not failed. I’ve just found 10,000 ways that won’t work.

In light of my recommended, preferred practice and policy of “terse verbosity“, I would strongly suggest not placing the “doesn’t work” in-line, typically. Instead, put footnotes, an appendix, etc. But always

explain everything you did, but use bullet points if possible, rather than prose form

Loads of other goodies in that episode, too – but this one jumped-out as applicable to everyone.