Category Archives: technical

ben thompson missed *a lot* in his microsoft-github article

Ben Thompson is generally spot-on in his analysis of industry goings-on. But he missed a lot in The Cost of Developers this week.

Here’s what he got right about this acquisition:

  • Developers can be quite expensive (though, $7.5B (in equity) is only ~$265 per user (which is pretty cheap))
  • Microsoft is betting that a future of open-source, cloud-based applications that exist independent of platforms will be a large-and-increasing share of the future
  • That there is room in that future for a company to win by offering a superior user experience for developers directly, not simply exerting leverage on them
  • Microsoft is the best possible acquirer for GitHub
  • GitHub, having raised $350 million in venture capital, was not going to make it as an independent entity
  • Purely enterprise-focused companies like IBM or Oracle would be tempted to wring every possible bit of profit out of the company
  • What Microsoft wants is much fuzzier: it wants to be developers’ friend
  • [Microsoft] will be ever more invested in a world with no gatekeepers, where developer tools and clouds win by being better on the merits, not by being able to leverage users

And here’s what he missed and/or got wrong:

  • [Microsoft] is in second place in the cloud. Moreover, that second place is largely predicated on shepherding existing corporate customers to cloud computing; it is not clear why any new company — or developer — would choose Microsoft
  • It is very hard to imagine GitHub ever generating the sort of revenue that justifies this purchase price

Some of the below I commented on Google+ yesterday. The rest is in response to more idiocy & paranoia I’ve seen on some technical community mailing lists (bet you didn’t know those still existed) in the last 24 hours, or in response to specific items in Ben’s essay that are shortsighted, misguided, or incredibly wrong.

  • If you cannot see why new users, developers, and companies would go to Microsoft Azure offerings, you don’t understand what they’re doing
    • AWS is huge – but Azure and Google Cloud Platform (GCP) have huge technical (and economic) advantages
    • Amazon likes to throw new cloud features at the wall like spaghetti to see what sticks; Google and Microsoft have clearly thought-through this whole cloud business, and make incredibly solid business & technical sense to use over AWS in most use cases (the only [occasional] real exception being “but we already use AWS”). Have you not seen the Azure IoT offerings?
  • GitHub has not yet been profitable, and would probably have IPO’d (poorly) in the next year to keep from running out of cash
    • Arguably, GitHub would never become profitable on their own
  • Microsoft has a long history of contributing to OSS projects (most-to-all of which are on GitHub)
    • If they were going to acquire anyone in this space, GitHub is the only one that makes any sense
  • (This was tangentially-mentioned in Ben’s essay by linking to his analysis of the Microsoft-LinkedIn acquisition in 2016.) Alongside the LinkedIn acquisition a couple years back (which has an obvious play for an eventual IDaaS (fully-and-forever integration with Office365 regardless of where you work, everything follows automagically)), offering better integrations with their existing tools (Visual Studio already had git integrations – they should only get better with this acquisition) is a Good Thing™ for devs and end user alike (because making those excellent developer tools even better means they’ll be better whether they’re using GitHub, Bitbucket, GitLab, etc)
  • The more-or-less instantaneous expansion of offered items in the Windows Store (some kind of cloud-based/distributed build-on-demand for software when you want it (and which fork you want)) to “everything” on GitHub is a brilliant possibility
    • In light of Apple’s announcement yesterday about enabling iOS apps to come to macOS over the next releases of iOS and macOS, this should have been at the forefront of most people’s thought processes (after the keynote was done, of course)
    • Through this acquisition, it’s [probably] likely more developers will use Microsoft APIs (.NET, etc) in their projects
  • Echoing Ballmer’s chant, “Developers! Developers! Developers!”, while Microsoft doesn’t really care about Windows anymore (just look at the recent reorg), it is still THE most widespread end-user platform in the world – and bringing millions more developers “into the fold” is genius
    • Even if some small percentage will opt to go elsewhere, most won’t change because, well, change is hard
    • All the developers Microsoft had that weren’t yet using GitHub will have a huge reason to start
  • Microsoft has typically been a buy-don’t-build shop (there are exceptions, but look at the original DOS, PowerPoint, SQL Server, Skype, their failed attempt at Yahoo!, etc): they could have spent 5-10x as much building something “as good as” GitHub, or they could buy it; they opted for the “buy” (via equity, note, and not cash (smart from several business viewpoints (not least of which is the “enforced” interest the GitHub subsidiary (with its new CEO, etc) will have in continuing to ensure it is The place for developers to put their projects (after all, if that drops considerably, the equity aspect GitHub got in the deal is going to drop))))

don’t use symlinks unless you *know* you can

I first ran into this on Solaris in the context of [then] Opsware SAS (then HP SA, now owned by Microfocus). Bind mounts might be OK … so unless the tarball has symlinks included, don’t use them – they get traversed differently than “real” directories.

In short, when directory traversals are done, sometimes it looks at the permissions bits and if the first character is not a d (for a symlink, it’s always an l), many processes can fail.

Symlinking files is [possibly] a different story: though permissions are usually wonky on symlinks (most often lrwxrwxrwx vs -rw-r--r--, for example), since you cannot traverse into a file (whereas you can into a directory), it’s generally ok

Also – sometimes when directory listings are pulled, the symlink is fully-dereferenced, and something that appears to be in, say, $SPLUNK_HOME/etc/deployment_apps but is really in, say, /some/other/place, there are some times when Splunk will decide not to deploy it, because it’s not where it “belongs”.

Also – checksums can be computed on the symlink and not the actual file, in some (perhaps all) instances: so if, for example, you have the same outputs.conf in several apps by way of symlink, and you change it in one, the checksum for all the others may (and typically do) not get updated … so you can be left in an inconsistent state for your configs (because not all locations that should’ve received the updated outputs.conf have received it, since they’re symlinks and not a real file, and the checksum may not update on those particular apps).

Moral of the story?

Unless you really know what you’re doing, never use symlinks with Splunk.

hey, virtualbox – don’t be retarded

Ran across this error recently in an Ubuntu guest on my VirtualBox install: VBoxClient: (seamless): failed to start, Stage: Setting guest IRQ filter mask Error: VERR_INTERNAL_ERROR

Gee, isn’t that a useful message.

Fortunately, there was a forums.virtualbox thread on just this error.

The upshot is that this error is actually caused because of a failure during the initial install of the VirtualBox Guest Additions.

In the middle of what looks like, at quick glance, a successful GA installation, is this nugget: Please install the gcc make perl packages from your distribution.

The GA installer can’t compile kernel modules without a compiler.

And that makes sense.

What doesn’t make sense is that this error is even possible to get! The GA installer must run as root (or via sudo).

If those package are missing, the installer should stop what it’s doing, ask the user if they want to install these packages (because without them the GA installer won’t install everything), and then when the user invariably answers “yes” (because – duh! – why wouldn’t they want this to work?), go run an apt -y install gcc make perl.

But is that what Oracle in their infinite wisdom decide to do?

No. They decided it’s better to just quietly report in the middle of a bunch of success statements that “oh, by the way – couldn’t actually do what you wanted, but if you don’t notice, you’re going to spend hours on Google trying to figure it out”.

Morons.

It realy isn’t that hard to make human-friendly error messages … nor to even try to pre-solve the error condition you found!

more thoughts on `|stats` vs `|dedup` in splunk

Yesterday I wrote-up a neat little find in Splunk wherein running stats count by ... is substantially faster than running dedup ....

After some further reflection over dinner, I figured out the major portion of why this is – and I feel a little dumb for not having thought of it before. (A coworker added some more context, but it’s a smaller reason of why one is faster then the other.)

The major reason stats count by... is faster than dedup ... is that stats can hand-off the counting process to something else (though, even if it doesn’t, incrementing a hashtable entry by 1 every time you encounter an instance isn’t terribly computationally complex) and keep going.

In contrast, dedup must compare every individual returned event’s field that matches what you’re trying to dedup to it’s growing list of unique entries for that field.

In the particular case I was seeing yesterday, that means that every single event in the list of 4,000,000 events returned by the search has to be compared one at a time to a list (that I know is going to top out at about 11,000). To use Big-O Notation, this is an O(n*m) operation (bordering on O(n2))!

That initial list of length m fills pretty quickly (it is, after all, only going to get to ~11,000 total entries (in this case)), but as it grows to its max, it gets progressively harder and hard to check whether or not the next event has already been dedup’d.

At ~750,000 events returned (roughly 1/5 my total), the list is unique field values was 98% complete – yet there were still ~3.2 million events left to go (to find just 2% more unique field values).

Those last 3.2 million events each need to check against the list of >10,500 entries – which means, roughly, 16,8 billion comparisons still need to be made!

(Because linear searching finds what it’s looking for on average by the time it has traversed half the list. If the list is being created in a slighly more efficient manner (say a heap or [balanced] binary search tree), it will still take ~43 million comparisons (3.2 million * log2(11,000)).)

Compare this to the relative complexity of using |stats count by ... – it still has to run through all 4 million events, but all it is doing is adding one to the list for every value that shows up in that particular field – IOW, it “only” has to do a total of 4 million [simple] things (because it does need to look at every event returned). dedup at a minimum is going to do ~54 million comparison (and probably a lot more – given it doesn’t merely take 13x the time to run, but closer to 25x).

The secondary contributing factor – important, but not as much a factor as what I covered above – is that dedup must process the whole event, whereas stats chucks everything that isn’t part of what it’s counting (so if an event is 1kb in size, dedup has to return the whole kb, while stats is only looking at maybe 1/10 the total (if you include a coupld extra fields)).

Another neat aspect of using |stats is that it creates a table for you – if you’re running |dedup, you then have to |table ... to get the fields you want displayed how you want.

And adding |table adds to the run time.

So there you have it – turns out those CompSci 201 classes do come in handy 18 years later 🤓

splunk oddity #17681 – stats vs table

It’s fairly common to want to table the data you’ve found in a search in Splunk – heck, if you’re not prettying the data up somewhy, why are you bothering with the tool?

But I digress.

There are two (at least) ways of making a table – you can use the |table <field(s)> syntax, or you can use |stats <some function> <field(s)> approach.

Interestingly, in my testing in both test and production environments, using the |stats... approach is consistently 10-15% faster than the |table... option.

Why? I don’t know. He’s on third. And I don’t give a darn!

This is another case of technical intricacies mattering … but I don’t know what is going on under the hood that makes the apparently-more-complex option run faster than the apparently-simler option.

Maybe someday someone in Splunk engineering will be able to enlighten me to that.

This reminds me a bit of an optimization I was able to help a friend with upwards of 12 years ago – they had queries running in MySQL that were taking forever to complete (and by “forever”, I mean they were running sometimes 4-5 times a long as the interval between running them (they ran every 5 mintues, but could take 20+ minutes to finish!)).

What I found, at least back in the dark days of MySQL 3.x was that using IN(...) was loads faster than using OR statements.

So a query that had a clause WHERE name IN("bob","sarah","mike","terry","sue") would run anywhere from 20-90% quicker than the logically-equivalent WHERE name="bob" OR name="sarah" OR name="mike" OR name="terry" OR name="sue" (given a large enough dataset overwhich it was running … on small [enough] tables (say up to a couplefew thousand records), the OR version would run equally, or occasionally faster).

In their case, by switching to the IN(...) form, queries went from taking 20+ minutes to finishing in ~20 seconds!

Bonus tidbit:

It is well-known in Splunkland that using dedup is an “expensive” operation. Want a clever way around it (that is much faster)? Instead of doing something like index=myndx | fields ip host | dedup host, run index=myndx | fields ip host | stats count by host | fields -count. The |stats .. |fields -count trick seems to run anywhere from 15-30% faster than dedup.

they asked the right question

Let me compare the experience I wrote about yesterday to another I had the same year with the first customer I was ever sent to – HSBC.

Just a couple weeks after starting with ProServe in 2008, I was sent to Chicago to do a final PoC for HSBC. Someone else had done a PoC the previous year, but with HP’s acquisition of Opsware, HSBC (along with many other customers and potential customers) held-off on signing a purchase contract so they could bundle “everything” they wanted from HP under one big honking purchase order.

And due to changes in the underlying product architecture, HSBC wanted a fresh demo to play with for a little while before writing-in that line item into their PO.

Enter me. A freshly-minted consultant who hadn’t yet developed a solid cheat sheet. So fresh, I thought staying 20 minutes away in a Comfort Inn to save $12 a night was smart (it’s not – always stay as close to your customer as you can (that is within budget) when you’re traveling). But I digress.

After a set of unexpected flight delays, instead of being able to start Monday before lunch, I didn’t even get to meet the customer team until almost end-of-business Monday. Tuesday morning, my main contact met me at the door, escorted me into their lab, and introduced me to the “spare” hardware I’d be working on – a ~5-year-old Sun server running Solaris 10 (thankfully – they’d only just upgraded from Sun OS 9 on that machine a couple weeks before).

Like my main contact in Nutley later that year, my main contact at HSBC was an old hat Solaris admin – he’d been using and administering Sun equipment for nearly 20 years. Smart guy (but, unlike the guy in NJ that summer, he wasn’t a Sun fanboi purist).

The reason we were using retired (and, possibly, resurrected) hardware was because they didn’t trust one of the sales reps (who had since been fired) who made some pretty sweeping promises to them early on in the sales cycle. And, whomever had been in several months prior to do the first PoC had apparently complained bitterly about “having to use Sun”.

So they partially set me up to fail – but I was too dumb to realize it at the time…a perfect instance of the old phrase, “you can’t fool me, I’m too ignorant”.

I did have to suffer through slow network access (the NIC onboard “supported” 100Mbps … but it was flaky, so it had been down-throttled to just 10Mbps. To put this is a little context, that was slower than my home internet access – even then – 10 years ago!

Wednesday about lunchtime, the HSBC project manager for “HP automation initiatives” introduced herself and through our conversation, casually asked, “if you had your druthers, what kind of hardware would you install SA on to support our environment?”

So I answered what I’d use: each server in each SA Core (they were going to have 3) should have 16+ x86-64 CPUs, at least 32 GB RAM, and ample storage (at least 100 GB just for the install, let alone extra space which might be needed for the software and OS libraries). Oh. And it should be running RHEL – don’t use Solaris as the host OS for HPSA.

She pressed me to find out why I suggested this, and I told her, “because SA is written on Linux, and the ported to Solaris; every major issue SA has run into in the last few years regarding OS conflicts has happened on Sun hardware & OSes.”

A little while later, she thanked me for our conversation, thanked me for getting SA up and running so quickly (even on half decade out of date hardware, I had it installed and ready to demo to them in only a little over 1.5 days), which gave me time to go through its functionality, show-off some new things in 7.0 that hadn’t been possible (or as easy) in 6.1 (or 6.5, or 6.6), and even be told I could head out to the airport a little early on Thursday! Win-win-win all around.

Fast forward a few months.

I get a phone call from the engagement manager I’d worked with on the HSBC PoC week, and he asked me if I had a current passport. I told him, “yes,” and asked him why he wanted to know.

He then informed me that HSBC was getting ready to finalize a $12+ million dollar hardware, software, and services sale … but would only be buying SA if I was available to install it.

That’s cool – getting asked back is always a Good Thing™ … but what does that have to do with having a current passport? Bob elaborated: HSBC has a policy of vendors doing installs on site (not weird). And two of those “on site” locations were not in the US: one would be in London England, and the other in Hong Kong. “Would I be able to do that?”, he wanted to know.

“Yes. Yes, I would.”

“OK,” he said, “I’ll send travel dates and details in a few days.”

I hung up, then wondered if I’d said “yes” maybe a little too quickly: who gets asked to be the installation engineer who’s holding-up the finalization of a multi-million-dollar sale? Especially when I knew there were folks at least as qualified, if not much more so, available?

This was my first experience with being asked-back as a consultant (I’d been asked-for when I worked in Support, but that was very different).

And, ultimately, it’s what led to the single best services engagement I had for quite a while. And giving me a [partially] company-paid vacation to the UK. And getting my first stamps in my passport. And establishing a friendship with a customer contact in London who’ve I’ve stayed in touch with ever since.

All from not knowing the “project manager” was actually high-enough up in the HSBC management chain that her recommendations/requests for external personnel would be honored even on big contracts – and being truly honest with her when she asked what I viewed as a casual, throwaway question in a loud computer lab on a cool Wednesday afternoon in April.

The upshot is to always treat everyone you meet as “just another person” – whether a CEO or a janitor, they put their pants on the same way you do: one leg at a time.