antipaucity

fighting the lack of good ideas

what level of abstraction is appropriate?

Every day we all work at multiple levels of abstraction.

Perhaps this XKCD comic sums it up best:

If I'm such a god, why isn't Maru *my* cat?

abstraction

But unless you’re weird and think about these kinds of things (like I do), you probably just run through your life happily interacting at whatever level seems most appropriate at the time.

Most drivers, for example, don’t think about the abstraction they use to interact with their car. Pretty much every car follows the same procedure for starting, shifting into gear, steering, and accelerating/decelerating: you insert a key (or have a fob), turn it (or push a button), move the drive mode selection stick (gear shift, knob, etc), turn a steering wheel, and use the gas or brake pedals.

But that’s not really how you start a car. It’s not really how you select drive mode. It’s not really how you steer, etc.

But it’s a convenient, abstract interface to operate a car. It is one which allows you to adapt rapidly to different vehicles from different manufacturers which operate under the hood* in potentially very different ways.

The problem with any form of abstraction is that it’s just a summary – an interface – to whatever it is trying to abstract away. And sometimes those interfaces leak. You turn the key in your car and it doesn’t start. Crud. What did I forget to do, or is the car broken? Did I depress the brake and clutch pedal? Is it in Park? Did I make sure to not leave the lights on overnight? Did the starter motor seize? Is there gas in the tank? Did the fuel pump quit? These are all thoughts that might run through your mind (hopefully in decreasing likelihood of probability/severity) when the simple act of turning the key doesn’t work like you expect.

For a typical computer user, the only time they’ll even begin to care about how their system really works is when they try to do something they expect it to do … and it doesn’t. Just like drivers don’t think about their cars’ need for the fuel injector system to make minute adjustments thousands of times per second, most people don’t think about what it actually takes to go from typing “www.google.com” in their browser bar to getting the website returned (or how their computer goes from off to “ready to use” after pushing the power button).

Automation provides an abstraction to manual processes (be it furniture making or tier 1 operations run book scenarios). And abstractions are good things .. except when they leak (or outright break).

Depending on your level of engagement, the abstraction you need to work with will differ – but knowing that you’re at some level of abstraction (and, ideally, which level) is vital to being the most effective at whatever your role is.

I was asked recently how a presentation on the benefits of automation would vary based on audience. The possible audiences given in the question were: engineer, manager, & CIO. And I realized that when I’ve been asked questions like this before, I’ve never answered them wrong, but I’ve answered them very inefficiently: I have never used the level of abstraction to solve the general case of what this question is really getting at. The question is not about whether or not you’re comfortable speaker to any given “level” of customer representative (though it’s important). It is not about verifying you’re not lying about your work history (though also important).

No. That question is about finding out if you really know how to abstract to the proper level (in leakier fashions as you go upwards assumed) for the specific “type” of person you are talking to.

It is vital to be able to do the “three pitches” – the elevator (30 second), the 3 minute, and the 30 minute. Every one will cover the “same” content – but in very different ways. It’s very much related to the “10/20/30 rule of PowerPoint” that Guy Kawasaki promulgates: “a PowerPoint presentation should have ten slides, last no more than twenty minutes, and contain no font smaller than thirty points.” Or, to quote Winston Churchill, “A good speech should be like a woman’s skirt; long enough to cover the subject and short enough to create interest.”

The answer that epiphanized for me when I was asked that question most recently was this: “I presume everyone in the room is ‘as important’ as the CIO – but everyone gets the same ‘sales pitch’ from me: it’s all about ROI. The ‘return’ on ‘investment’ is going to look different from the engineer’s, manager’s, or CIO’s perspectives, but it’s all just ROI.”

The exact same data presented at three different levels of abstraction will “look” different, even though it’s conveying the same thing – because the audience’s engagement is going to be at their level of abstraction (though hopefully they understand at least to some extent the levels above (and below) themselves).

A simple example: it currently takes a single engineer 8 hours to perform all of the tasks related to patching a Red Hat server. There are 1000 servers in the datacenter. Therefore it takes 8000 engineer-hours to patch them all.

That’s a lot.

It’s a crazy lot.

But I’ve seen it countless times in my career. It’s why patching can so easily get relegated to a once-a-year (or even less often) cycle. And why so many companies are woefully out-of-date with their basic systems from known issues. If your patching team consists of 4 people, it’ll take them a year to patch all 8000 systems – and then they just have to start over again. It’d be like painting the Golden Gate Bridge – an unending process.

Now let’s say you happen to have a management tool available (could be as simple as pssh with preshared SSH keys, or as big and encompassing as Server Automation). And let’s say you have a local mirror of RHN – so you can decide just what, exactly, of any given channel you want to apply in your updates.

Now that you have a central point from which you can launch tasks to all of the Red Hat servers that need to be updated, and a managed source from which each will source their updates, you can have a single engineer launch updates to dozens, scores, even hundreds of servers simultaneously – bringing them all up-to-date in one swell foop. What had taken a single engineer 8 hours is still 8 – but it’s 8 in parallel: in other words, the “same” 8 hours is now touching scores of machines instead of 1 at a time. The single engineer’s efficiency has been boosted by a factor of, say, 40 (let’s stay conservative – I’ve seen this number as high as 1000 or more).

Instead of it taking 8000 engineer-hours to update all 1000 servers, it’s now only 200. Your 4 engineer patching team can now complete their update cycle in well under 2 weeks. What had taken a full year, is now being measured in days or weeks.

The “return on investment” at the abstraction level of the engineer is they have each been “given back” 1900 hours a year to work on other things (which helps make them promotable). The team’s manager sees an ROI of >90% of his team’s time is available for new/different tasks (like patching a new OS). The CIO sees an ROI of 7800 FTE hours no longer being expended – which means the business’ need for expansion, with an associated doubling of server estate, is now feasible without having to double his patching staff.

Every abstraction is like that – there is a different ROI for a taxi driver on his car “just working” than there is for a hot rodder who’s truly getting under the hood. But it’s still an ROI – one is getting his return by being able to ferry passengers for pay, and the other by souping-up his ride to be just that little (or lot) bit better. The ROI of a 1% fuel economy improvement by the fuel injector system being made incrementally smarter in conjunction with a lighter engine block might only be measured in cents per hour driving – but for FedEx, that will be millions of dollars a year in either unburned fuel, or additional deliveries (both of which are good for their bottom line).

Or consider the abstraction of talking about financial statements (be they for companies or governments) – they [almost] never list revenues and expenditures down to the penny. Not because they’re being lazy, but because the scale of values being reported do not lend themselves well to such mundane thinking. When a company like Apple has $178 billion in cash on hand, no one is going to care if it’s really $178,000,102,034.17 or $177,982,117,730.49. At that scale, $178 billion is a close-enough approximation to reality. And that’s what an abstraction is – it is an approximation to the reality being expressed down one level. It’s good enough to say that you start your car by turning the key – if you’re not an automotive engineer or mechanic. It’s good enough to approximate the US Federal Budget at $3.9 trillion or maybe $3900 billion (whether it should be that high is a totally different topic). But it’s not a good approximation to say $3,895,736,835,150.91 – it may be precise, but it’s not helpful.

I guess that means the answer to the question I titled this post with is, “the level of abstraction appropriate is directly related to your ‘function’ in relation to the system at hand.” The abstraction needs to be helpful – the minute it is no longer helpful (by being either too approximate, or too precise), it needs to be refined and focused for the audience receiving it.


*see what I did there?

automation

I have been deeply involved in data center management and automation for well over 5 years.

Most companies still view automation the Wrong Way™, though – and it’s a hard mindset to change. Automation is NOT about reducing your headcount, or reducing hiring.

Automation is used to:

  • improve the efficiency of business tasks
  • improve employee productivity
  • reduce human error
  • ensure consistency, and auditability
  • improve/ensure repeatability
  • replace “fire fighting” with planning and proactivity
  • ensure an organization can pass the bus test (which disturbingly-few can)
  • free engineers to work on interesting, engineering problems – not day-to-day busywork

Cringely has an article on this topic this week, entitled “An IT labor economics lesson from Memphis for IBM“.

How can a company 1/100,000th the size of IBM afford to have monitoring?  Well, it seems DBADirect has its own monitoring tools and they are included as part of their service.  It allows them to do a consistently good job with less labor.  DBADirect does not need to use the cheapest offshore labor to be competitive.  They’ve done what manufacturing companies have been doing for 100+ years – automating!

Even today IBM is still in its billable hours mindset.  The more bodies it takes to do a job the better.  It views monitoring and automation tools as being a value added, extra cost option.  It has not occurred to them you could create a better, more profitable service with more tools and fewer people.  When you have good tools, the cost of the labor becomes less important.

Any company that fails to realize that throwing more people at the problem is rarely the answer (something former IBMer Fred Brooks wrote about as a post-mortem of the OS/360 project in The Mythical Man-Month), is doomed to fail – consistently, and tragically.

And yet IBM is still in the mindset of the 1960s and raw, manual labor in an increasingly-connected, -compliant, -complex, and –cloudy world. They are still trying to solve problems the Risk way – throw a gob o’ guys at the problem, and roll over your opponents through sheer numbers.

In many ways, it is sad to see the demise of once-great companies like IBM. There’s the loss of competition, the passing of the Old Guard, etc.

But it’s also a huge opportunity for new businesses to come in, compete, and clean-up in sectors the Bug Guys can’t (or won’t) touch well.