Archive for September, 2011

ogsh/ogfs for fun and profit

Saturday, September 17th, 2011

The absolute coolest feature of HP’s Server Automation suite is the OGSH (or OGFS) – the Opsware Global SHell (or FileSystem).

I worked for Opsware before HP acquired them, and the OGSH was a new feature to the product (then called Opsware SAS (Server Automation System)). It’s a fuse module that gives a [limited] bash interface to the managed environment by presenting a live query/view into the database, and, ultimately, allowing manipulation of managed servers in the environment.

For example, to access a list of all managed servers, you login to global shell, then

cd /opsw/Server/@

The ‘@’ sign is used to indicate you are “there” – at the limit of that particular filter (in this case, “Server”).

Since it’s bash, you can run most common *nix utilities and commands. But the one that’s most handy, in my opinion, is rosh – the Remote Opsware SHell.

Remote shell opens an authenticated, logged session to a remote machine (*nix or Windows – doesn’t matter), based on your user’s/group’s permissions. For testing purposes, I always configure one group (and add myself) that can connect using root for *nix machines (and Administrator on Windows).

The basic command to connect to a machine is:

rosh -l [username] -n [machine]

You can also pass commands to rosh like it was an ssh session:

rosh -l [username] -n [machine] '[command]'

For the fullest power of rosh, though, use it in a script or loop. For example:

for sn in *; do rosh -l root -n $sn 'uptime ; uname -a'; done

That will remote shell into every server in the current view, using standard shell expansion of the splat (*), and run uptime and uname -a, printing the results to screen. That particular command is handy for quick-and-dirty reports on the managed environment to see

  • which servers are up, and which aren’t
  • how long they’ve been up

In addition to rosh, global shell provides a near-complete exposing of the SA API (which is also accessible via Java, web services, and Python (using the “PyTwist” bindings written to access the Java interfaces).

bglug meeting move!

Saturday, September 17th, 2011

Due to an unforeseen circumstance, the 1430EDT BGLUG meeting today will be at the Tates Creek Public Library.

The library can be found at

3628 Walden Drive
Lexington, KY 40517

the ticket smash, raw metrics, and communication – how to have a successful support organization

Thursday, September 15th, 2011

When I worked at Opsware, and for a while after HP bought us, we used to try to have once- or twice-a-week meetings for each support group wherein we would bring our most difficult cases (with the difficulty being determined by the case owner), and have an opportunity for everyone on the team to ask questions, contribute, and maybe even solve the problem our customer was having.

Novel idea, isn’t it? The typical Support team is driven by stats – the number of tickets in their queue, age of the ticket, number solved/closed, number escalated, etc. Support is driven by these numbers because managers don’t think of any better way to do it.

All things being equal, if you can close 40 cases in a week, that’s a lot better than your podmate who “only” finished-out 12. But what about the complexity of each of those cases? And how much effort did each engineer put into them? Did the customer come back and ask for it to be closed because it’s either no longer an issue, or they solved it themselves? Is it a question that can be answered with a reference to a specific page/section of a manual? Or was it a problem that took multiple webex engagements, and dozens of contacts back and forth to find a solution because it was a deep bug?

Theoretically, the goal of “support” is to, well, support - get the problem reporter a solution of some kind they can use. That solution may be a bug fix, an RFE, a reference to a tutorial, reconfiguring, or a work around / alternative approach to their problem. A big problem with this setup is that the reporter rarely asks the right question. They ask what they have pre-determined to be what they think is a question – but by biasing their initial report, they can often end-up dragging-out the solution process far longer than it should take. I recently wrote a guide on creating effective support tickets, based on my experience working in support, and interacting with various support organizations both before and since.

Reporter bias is the hardest issue to overcome, in my opinion; engineer bias is easier to get past because (hopefully) there are folks you can bounce the problem off of in the team who can help narrow-down the problem and find a solution … or at least figure out where to try looking next.

Communication is the key to solving problems – when I was at Opsware we utilized internal IRC channels and (gasp!) talking with each other to try to find solutions to customer issues. We also spent a lot of time wording inquiries to the reporter to try to gain as much information as possible on each iteration of the communication process.

Another key to solving problems was to make records of cases with the following:

  • initial reported behavior (or lack thereof)
  • actual problem
  • solution

Those records were sometimes on wiki pages, sometimes in our Plone internal KB, and sometimes got “promoted” out to the customer-facing KB. All of these approaches helped us get problems solved faster – either by offloading the “work” to the customer (via a KB reference), or by being able to apply previous answers more quickly when new-but-similar/identical problems were reported.

The end goal of a support team is not to outdo one another on how many cases one engineer has in his queue, or how many another has closed – the end goal is to solve customer problems. “Works well in a team setting” is a qualification typically associated with support engineering employment listings – but all too often that gets reduced to a cliche that practically means “tries to outdo his cubemates by closing more cases than the next guy”.

I’m as much a fan of personal responsibility and action as the next red-blooded capitalist, so don’t take this next section to imply I’m promoting communalism.

The way a support team should work is the way [good] sports teams work, or the way a Nascar team operates: yeah, it’s the driver of the car who gets the “glory”, but without his pit and maintenance crew, he’d be no better than you or I going to the grocery store. Any given support engineer gets to have his name tagged to the case for posterity – both with the good things he did, and the not so good ones. But since the goal is really to get the customer’s problem addressed, the ego of the engineer needs to be removed from the equation.

Bob Smith might be “the guy” who informed his customer of a solution, but generating the solution involved the other 7 people in his office. He gets the “fame” from Universal Widgets LLC, but he was just one of the [important] cogs in the process of resolving the issue.

The number of cases Bob has in his queue should have [almost] ZERO correlation to his skill as an technical engineer: it’s the 7 people behind him whom he can ask and brainstorm with that get the job done.

Maybe Bob gets to handle most of the “customer” action, but the other 7 are writing bug reports, solutions articles, etc. When evaluating that team, management needs to do just that: evaluate the team first, and the individuals second.

bglug meeting – 17 september – topic: data center automation

Wednesday, September 14th, 2011

The September meeting of the Bluegrass Linux User Group will be this Saturday, 17 Sep.

We’ll be meeting at Collexion’s facilities in Lexington at 2:30p.

I will be presenting on data center automation, specifically on HP’s Server Automation platform (the tool I use on my day job).

Some [limited] history of HPSA is available on the Opsware wikipedia page.

We’ll also briefly touch on some of the OSS alternatives to a full-blown environment like HPSA, such as:

reading again

Tuesday, September 13th, 2011

Wow. It’s been several months since I last posted a book review. I have been reading in the mean time – just haven’t gotten around to posting any of them hereon.

In the intervening months I’ve read 1434 by Gavin Menzies (follow-on to 1421) and The Lost City of Z by David Grann. I’m currently reading Netherland by Joseph O’Neill and Radical by David Platt.

I also bought a Kobo ereader at one of the Border’s stores in Louisville, and my wife and I have started reading The Adventures of Huckleberry Finn by Mark Twain together.

I’m sure there have been others, too – but I’ll be posting reviews of them over the coming weeks.

debugging authorized_keys and ssh

Tuesday, September 13th, 2011

I saw an interesting question this morning on ServerFault, entitled “SSH Prompts for password even though private keys are available, presented to server and known to it”.

  • when my user is not already connected to the server (first ssh connexion), it prompts for password even though privates keys are availiable (PuTTY + Pagent). After that first connection, if I open a secondary or a third connection it gets connected with the keys.
  • If I close all connections and open a new one it prompts for the password.
  • If I have let say 4 open connections and I close the first one (the one that prompted for the password), the fifth connection will be opened with the keys

Now that is an interesting problem. The answer supplied, with follow-on comments was also interesting, but the process behind solving this is even more fascinating, I think.

The issue is that password-less logins should work. sshd_config has been set properly, and there is a set of matching keys in authorized_keys.

But it doesn’t work, obviously – or there’d be no question raised.

A list of items to look into, both from the supplied answer, and from my own thoughts (somebody else beat me to an answer):

  • permissions on .ssh/authorized_keys (must be 600)
  • verify sshd has been started/restarted post changes to sshd_config
  • check to see if home directory is remotely mounted / mounted on demand
  • check to see if key has a passphrase in use
  • look at /var/log/auth.log for errors
  • check to see if the home directory is encrypted (actual answer)

Debugging is something I have written about recently – it seems to come up over and over in my line of work.

It’s a skill that’s vital to have in the IT world, and yet an awful lot of folks do not.


The answer, for those interested:

It sounds like, for whatever reason, the user’s home directory is not available if the user is not logged in, so that sshd can’t find the authorized_keys file

The user’s home directory must be using ecrypt or something like that

that’d be the cause, then, since sshd can’t decrypt the contents of the home directory

Ubuntu Desktop asks if you want to encrypt the home directory (why not?) without mentioning what it may do to ssh… a simple “note: this will effect SSH…” would be helpful