All posts by antipaucity

volvo moving towards waze-like functionality

Shared a TechCrunch story recently in a G+ group I’m in on Volvo debuting a new service to upload live traffic data from its vehicles to be sent to other Volvos so they can avoid problem areas.

Or…a self-built and -hosted Waze.

You may recall that I wrote some about these kinds of things starting back in 2010.

Why, exactly, Volvo thinks it not only should do this, but expects its customers to want the manufacturer to be doing this is something I don’t understand right now.

Sure, it’s an interesting technical challenge – but it’s not exactly novel…except in the sense of the vehicle doing this for you instead of you doing it via some mobile app or other.

If this is something drivers must opt-in to use, that’s OK. If you can opt-out, it’s OK, too.

But if it’s not optional – this is a major privacy concern: one I’m sure many people would do without even stopping to take a breath, but a major privacy concern nonetheless.

don’t use symlinks unless you *know* you can

I first ran into this on Solaris in the context of [then] Opsware SAS (then HP SA, now owned by Microfocus). Bind mounts might be OK … so unless the tarball has symlinks included, don’t use them – they get traversed differently than “real” directories.

In short, when directory traversals are done, sometimes it looks at the permissions bits and if the first character is not a d (for a symlink, it’s always an l), many processes can fail.

Symlinking files is [possibly] a different story: though permissions are usually wonky on symlinks (most often lrwxrwxrwx vs -rw-r--r--, for example), since you cannot traverse into a file (whereas you can into a directory), it’s generally ok

Also – sometimes when directory listings are pulled, the symlink is fully-dereferenced, and something that appears to be in, say, $SPLUNK_HOME/etc/deployment_apps but is really in, say, /some/other/place, there are some times when Splunk will decide not to deploy it, because it’s not where it “belongs”.

Also – checksums can be computed on the symlink and not the actual file, in some (perhaps all) instances: so if, for example, you have the same outputs.conf in several apps by way of symlink, and you change it in one, the checksum for all the others may (and typically do) not get updated … so you can be left in an inconsistent state for your configs (because not all locations that should’ve received the updated outputs.conf have received it, since they’re symlinks and not a real file, and the checksum may not update on those particular apps).

Moral of the story?

Unless you really know what you’re doing, never use symlinks with Splunk.

a few selected horizon points

Based on some slightly simplified math, here are approximate distances to an uninterrupted horizon from various viewing heights:

  • 6 feet – slightly-above-average human eye level: 3 miles
  • 20 feet – top of the roof of a typical one-story house: 5.5 miles
  • 50 feet – short hill / top of a tree or boom truck: 8.7 miles
  • 100 feet – ~10th story window : 12.3 miles
  • 250 feet – ~20th floor of an office building: 19.4 miles
  • 350 feet – top of the Cliffs of Dover: 22.9 miles
  • 1050 feet – Empire State Building observation deck: 39.7 miles
  • 1800 feet – observatory of Burj Khalifa : 52 miles
  • 14,115 feet – top of Pike’s Peak: 145.6 miles

on internet sales tax

The debate is raging again as the Supreme Court of the United States is getting ready to make a decision on collecting sales tax for online sales.

I’ve read as many viewpoints from supporting and detracting from requiring businesses to collect sales tax from their customers.

And my [current] view is that all businesses conducting business online should collect the sales tax you would have paid if you went in person.

Company in Oregon? No sales tax. Company in Kentucky? Sales tax.

Don’t collect it for whereever the buyer happens to be: collect it based on where the seller is.

Simple.

Straighforward.

And is something the merchant is already setup to do.


hey, virtualbox – don’t be retarded

Ran across this error recently in an Ubuntu guest on my VirtualBox install: VBoxClient: (seamless): failed to start, Stage: Setting guest IRQ filter mask Error: VERR_INTERNAL_ERROR

Gee, isn’t that a useful message.

Fortunately, there was a forums.virtualbox thread on just this error.

The upshot is that this error is actually caused because of a failure during the initial install of the VirtualBox Guest Additions.

In the middle of what looks like, at quick glance, a successful GA installation, is this nugget: Please install the gcc make perl packages from your distribution.

The GA installer can’t compile kernel modules without a compiler.

And that makes sense.

What doesn’t make sense is that this error is even possible to get! The GA installer must run as root (or via sudo).

If those package are missing, the installer should stop what it’s doing, ask the user if they want to install these packages (because without them the GA installer won’t install everything), and then when the user invariably answers “yes” (because – duh! – why wouldn’t they want this to work?), go run an apt -y install gcc make perl.

But is that what Oracle in their infinite wisdom decide to do?

No. They decided it’s better to just quietly report in the middle of a bunch of success statements that “oh, by the way – couldn’t actually do what you wanted, but if you don’t notice, you’re going to spend hours on Google trying to figure it out”.

Morons.

It realy isn’t that hard to make human-friendly error messages … nor to even try to pre-solve the error condition you found!

more thoughts on `|stats` vs `|dedup` in splunk

Yesterday I wrote-up a neat little find in Splunk wherein running stats count by ... is substantially faster than running dedup ....

After some further reflection over dinner, I figured out the major portion of why this is – and I feel a little dumb for not having thought of it before. (A coworker added some more context, but it’s a smaller reason of why one is faster then the other.)

The major reason stats count by... is faster than dedup ... is that stats can hand-off the counting process to something else (though, even if it doesn’t, incrementing a hashtable entry by 1 every time you encounter an instance isn’t terribly computationally complex) and keep going.

In contrast, dedup must compare every individual returned event’s field that matches what you’re trying to dedup to it’s growing list of unique entries for that field.

In the particular case I was seeing yesterday, that means that every single event in the list of 4,000,000 events returned by the search has to be compared one at a time to a list (that I know is going to top out at about 11,000). To use Big-O Notation, this is an O(n*m) operation (bordering on O(n2))!

That initial list of length m fills pretty quickly (it is, after all, only going to get to ~11,000 total entries (in this case)), but as it grows to its max, it gets progressively harder and hard to check whether or not the next event has already been dedup’d.

At ~750,000 events returned (roughly 1/5 my total), the list is unique field values was 98% complete – yet there were still ~3.2 million events left to go (to find just 2% more unique field values).

Those last 3.2 million events each need to check against the list of >10,500 entries – which means, roughly, 16,8 billion comparisons still need to be made!

(Because linear searching finds what it’s looking for on average by the time it has traversed half the list. If the list is being created in a slighly more efficient manner (say a heap or [balanced] binary search tree), it will still take ~43 million comparisons (3.2 million * log2(11,000)).)

Compare this to the relative complexity of using |stats count by ... – it still has to run through all 4 million events, but all it is doing is adding one to the list for every value that shows up in that particular field – IOW, it “only” has to do a total of 4 million [simple] things (because it does need to look at every event returned). dedup at a minimum is going to do ~54 million comparison (and probably a lot more – given it doesn’t merely take 13x the time to run, but closer to 25x).

The secondary contributing factor – important, but not as much a factor as what I covered above – is that dedup must process the whole event, whereas stats chucks everything that isn’t part of what it’s counting (so if an event is 1kb in size, dedup has to return the whole kb, while stats is only looking at maybe 1/10 the total (if you include a coupld extra fields)).

Another neat aspect of using |stats is that it creates a table for you – if you’re running |dedup, you then have to |table ... to get the fields you want displayed how you want.

And adding |table adds to the run time.

So there you have it – turns out those CompSci 201 classes do come in handy 18 years later 🤓