Archive for the ‘work’ Category

technical career development

Tuesday, September 20th, 2011

Career development. Career path. Development opportunities. Taking your career to the next level.

Terms and phrases we all hear and pretty much pass over in our day-to-day lives. Right up until we want to move to a new/better job or performance reviews roll around.

But what do they mean, and how can you advance your career (presuming, of course, that you want to)?

This is by no means an exhaustive list – indeed, I’d appreciate any other ideas / feedback / improvements y’all may suggest :)

For a software developer:

  • be the documentation KING of your code – if it’s not right, make it right
  • own every bug in your code – even when it’s not “yours”
  • be The Guy™ who learns a new component of the code/product (at least conversationally) every few weeks
  • write at least one tutorial a month on the internal wiki/kb about something you found or did with the code
  • write at least one tutorial or similar a month externally (maybe a personal blog) in a general fashion about something you learned or did

For a systems consultant:

  • be the documentation KING of every project you work on – make ABSOLUTELY sure the next guy can do more after you leave
  • own every issue you find, even when it’s really somebody else’s problem (no throwing it over the fence)
  • the The Guy™ who learns something new about the environment or product every couple weeks
  • write at least one tutorial a month and/or give an overview talk of something you learned/did
  • write about what you’ve done (changing names to protect the innocent) on a blog or elsewhere
  • teach as many people as are willing to learn what you know (in your company / on your team / etc)

Focus – decide where you want to be, and plot a course to get there.

Finally, NEVER make yourself “irreplaceable” – the instant you make yourself irreplaceable, you also make yourself unpromotable: after all, if you’re the Only Guy™ who can do your job, why would your boss/manager/supervisor even think of moving you into a new role?


As a side note – if you’re ever working at a customer site, don’t take calls from anyone other than the customer while you’re at your desk/cube/workspace: even if it’s project related, take it in a different room :)

ogsh/ogfs for fun and profit

Saturday, September 17th, 2011

The absolute coolest feature of HP’s Server Automation suite is the OGSH (or OGFS) – the Opsware Global SHell (or FileSystem).

I worked for Opsware before HP acquired them, and the OGSH was a new feature to the product (then called Opsware SAS (Server Automation System)). It’s a fuse module that gives a [limited] bash interface to the managed environment by presenting a live query/view into the database, and, ultimately, allowing manipulation of managed servers in the environment.

For example, to access a list of all managed servers, you login to global shell, then

cd /opsw/Server/@

The ‘@’ sign is used to indicate you are “there” – at the limit of that particular filter (in this case, “Server”).

Since it’s bash, you can run most common *nix utilities and commands. But the one that’s most handy, in my opinion, is rosh – the Remote Opsware SHell.

Remote shell opens an authenticated, logged session to a remote machine (*nix or Windows – doesn’t matter), based on your user’s/group’s permissions. For testing purposes, I always configure one group (and add myself) that can connect using root for *nix machines (and Administrator on Windows).

The basic command to connect to a machine is:

rosh -l [username] -n [machine]

You can also pass commands to rosh like it was an ssh session:

rosh -l [username] -n [machine] '[command]'

For the fullest power of rosh, though, use it in a script or loop. For example:

for sn in *; do rosh -l root -n $sn 'uptime ; uname -a'; done

That will remote shell into every server in the current view, using standard shell expansion of the splat (*), and run uptime and uname -a, printing the results to screen. That particular command is handy for quick-and-dirty reports on the managed environment to see

  • which servers are up, and which aren’t
  • how long they’ve been up

In addition to rosh, global shell provides a near-complete exposing of the SA API (which is also accessible via Java, web services, and Python (using the “PyTwist” bindings written to access the Java interfaces).

the ticket smash, raw metrics, and communication – how to have a successful support organization

Thursday, September 15th, 2011

When I worked at Opsware, and for a while after HP bought us, we used to try to have once- or twice-a-week meetings for each support group wherein we would bring our most difficult cases (with the difficulty being determined by the case owner), and have an opportunity for everyone on the team to ask questions, contribute, and maybe even solve the problem our customer was having.

Novel idea, isn’t it? The typical Support team is driven by stats – the number of tickets in their queue, age of the ticket, number solved/closed, number escalated, etc. Support is driven by these numbers because managers don’t think of any better way to do it.

All things being equal, if you can close 40 cases in a week, that’s a lot better than your podmate who “only” finished-out 12. But what about the complexity of each of those cases? And how much effort did each engineer put into them? Did the customer come back and ask for it to be closed because it’s either no longer an issue, or they solved it themselves? Is it a question that can be answered with a reference to a specific page/section of a manual? Or was it a problem that took multiple webex engagements, and dozens of contacts back and forth to find a solution because it was a deep bug?

Theoretically, the goal of “support” is to, well, support - get the problem reporter a solution of some kind they can use. That solution may be a bug fix, an RFE, a reference to a tutorial, reconfiguring, or a work around / alternative approach to their problem. A big problem with this setup is that the reporter rarely asks the right question. They ask what they have pre-determined to be what they think is a question – but by biasing their initial report, they can often end-up dragging-out the solution process far longer than it should take. I recently wrote a guide on creating effective support tickets, based on my experience working in support, and interacting with various support organizations both before and since.

Reporter bias is the hardest issue to overcome, in my opinion; engineer bias is easier to get past because (hopefully) there are folks you can bounce the problem off of in the team who can help narrow-down the problem and find a solution … or at least figure out where to try looking next.

Communication is the key to solving problems – when I was at Opsware we utilized internal IRC channels and (gasp!) talking with each other to try to find solutions to customer issues. We also spent a lot of time wording inquiries to the reporter to try to gain as much information as possible on each iteration of the communication process.

Another key to solving problems was to make records of cases with the following:

  • initial reported behavior (or lack thereof)
  • actual problem
  • solution

Those records were sometimes on wiki pages, sometimes in our Plone internal KB, and sometimes got “promoted” out to the customer-facing KB. All of these approaches helped us get problems solved faster – either by offloading the “work” to the customer (via a KB reference), or by being able to apply previous answers more quickly when new-but-similar/identical problems were reported.

The end goal of a support team is not to outdo one another on how many cases one engineer has in his queue, or how many another has closed – the end goal is to solve customer problems. “Works well in a team setting” is a qualification typically associated with support engineering employment listings – but all too often that gets reduced to a cliche that practically means “tries to outdo his cubemates by closing more cases than the next guy”.

I’m as much a fan of personal responsibility and action as the next red-blooded capitalist, so don’t take this next section to imply I’m promoting communalism.

The way a support team should work is the way [good] sports teams work, or the way a Nascar team operates: yeah, it’s the driver of the car who gets the “glory”, but without his pit and maintenance crew, he’d be no better than you or I going to the grocery store. Any given support engineer gets to have his name tagged to the case for posterity – both with the good things he did, and the not so good ones. But since the goal is really to get the customer’s problem addressed, the ego of the engineer needs to be removed from the equation.

Bob Smith might be “the guy” who informed his customer of a solution, but generating the solution involved the other 7 people in his office. He gets the “fame” from Universal Widgets LLC, but he was just one of the [important] cogs in the process of resolving the issue.

The number of cases Bob has in his queue should have [almost] ZERO correlation to his skill as an technical engineer: it’s the 7 people behind him whom he can ask and brainstorm with that get the job done.

Maybe Bob gets to handle most of the “customer” action, but the other 7 are writing bug reports, solutions articles, etc. When evaluating that team, management needs to do just that: evaluate the team first, and the individuals second.

bglug meeting – 17 september – topic: data center automation

Wednesday, September 14th, 2011

The September meeting of the Bluegrass Linux User Group will be this Saturday, 17 Sep.

We’ll be meeting at Collexion’s facilities in Lexington at 2:30p.

I will be presenting on data center automation, specifically on HP’s Server Automation platform (the tool I use on my day job).

Some [limited] history of HPSA is available on the Opsware wikipedia page.

We’ll also briefly touch on some of the OSS alternatives to a full-blown environment like HPSA, such as:

why technical intricacies matter

Monday, August 8th, 2011

I have been working on a upgrade for one of our customers for nearly a month.

Last week we spent about two hours focused on one specific problem that had been rearing its ugly head on an exceedingly-frequent basis: one of the components of the application was routinely pitching OutOfMemory errors from the Java Virtual Machine (jvm). The errors were actually being returned from WebLogic  (currently an Oracle product; previously from BEA).

Much googling of the error messages returned the following Sun bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4697804, and the workaround:
Disable VM heap resizing by setting -mx and -ms to the same value.
This will prevent us from hitting the most common sources of the vm_exit_out_of_memory exits.
The best thing to do is increase swap size on the machines encountering this error.

[If you want to skip the rest of this, feel free: the short version is we boosted swap space from 1GB to 13GB, and it works like a champ now.]

Important Things You Should Know™

  • The version (1.4) and platform (32-bit) of Java is used for a variety of reasons by this product in this component
  • A 32-bit OS/machine1 can only access ~3GB of RAM (due to OS overhead and bus address mapping strategies)
  • A 64-bit OS/machine can access between 248 and 264 bytes (256TB-16EB) of memory (depending on addressing model used)
  • There are two types of memory a system can use: heap and stack
  • The jvm gets memory for itself from the host OS from the heap
  • If more memory is need by the Java application in question, and it has not yet exceeded the max (-Xmx argument) amount available to the jvm, the jvm will get more memory for itself from the system
  • The 32-bit jvm has a certain amount of overhead itself (I have seen 5-25%, depending on the application)

Environmental issues for the application in question

  • 8 CPUs
  • 32GB physical memory
  • ~9GB RAM in use, the rest unused
  • RHEL 4 64-bit
  • 1GB swap

Go check out this video while you think for a few seconds :)

Oh, you’re back? Welcome!

More details about the Sun jvm: when the jvm needs more memory, so long as the system can issue it, it will ask for a multiple of what it really needs (observationally about 40%, or 1.4x the “actual” request). And while it is asking for more memory, it swaps itself out to swap space (virtual memory, or a special location/partition on the drive). After it gets its new allocation, it loads itself back in from swap, and goes on its merry way.

Why does it ask for more than what the application ”actually” requested? It’s a best-guess on the part of the jvm – if you have allocated 256M of RAM minimum, and 1G max, when the application asks for 257M, the jvm doesn’t want to ask for more RAM too often from the OS, so it asks for ~360M, with the theory being that if you needed 1M over your initial amount, you will likely need yet more. This continues on until the jvm has asked for as much RAM as it is allowed, or until the application quits – whichever comes first.

Last piece of useful technical data:

  • The specific component in the application I was working with asks for 256MB to start, with a cap of 1280MB (we raised that to 2560MB (2.5GB) as an initial attempt to stave-off OutOfMemory errors)

I know it’s been a little while, but think back to that initial list of Important Things … and add into the mix that the component in question was chewing an entire CPU (in normal operation it rarely will go above 25%), and was using 3600MB of virtual memory and 2.8GB of real RAM. That’s a problem. First, because we have 32GB of real memory – there’s no reason the whole component shouldn’t fit in memory (2.8GB is equal to our 2.5GB max plus some jvm overhead). Second, because while it’s chewing an entire CPU, it’s never actually coming up, or, if it does, it’s taking an hour or more (when normally the entire application will start in 12-20 minutes from power on).

What was the problem with this ONE component? The detail is in the list of environmental factors: there was only 1GB of swap space. Uh oh. That means that unless the jvm asks for all 2.5GB up front, it will have to keep re-allocating memory to itself from the system. But with only 1GB of swap space, it has no place to unload itself to while it asks for more and then load itself back into RAM.

What to do? Let’s go back to that obscure Sun bug: “increase swap size on the machine”. We tried going from 1GB to 13GB (had a 12GB partition not being used, so we flipped it to be a swap partition) and rebooting the server.

After increasing swap space, not only does the application start in about the expected amount of time (~15 minutes), but it never pegs the CPU! Woot!

With a newer version of the product, there is an installation prerequisite check to ensure that there is as much swap space as physical RAM installed – but no explanation of why this is now the case.

Whether the above travails are the entire reason, or merely a single example of why it’s important, I won’t be installing onto any machine that doesn’t have enough swap again.


1 without special drivers/kernel modifications

new job

Monday, February 21st, 2011

Today I started a new job, which will hopefully involve a bit less travel than my last one did. I enjoyed working with my team at my last employer, and wish them the best in their future ventures.

Now off to find out where my first customer will be :)

upgrades

Monday, February 7th, 2011

I fly quite frequently – last year I re-met Delta‘s Gold Medallion status, and made it all the way to Platinum (go me!).

One of the perks is that I frequently get upgraded from the coach tickets I book to First/Business seats instead – for free (and free == better). I was about 23k miles away from Diamond status last year when the year ended. Delta’s rules entail rolling-over your “extra” Medallion Qualification Miles (ie those over your last milestone but less than the next one) to kick-start the next year’s earnings.

For several months last year I was traveling nearly every week from my home in Lexington to Hartford CT – which was awesome from the frequent flier standpoint: the higher up you go on the status chart the more bonus miles you also accrue to your miles balance.

The tricky thing is that MQMs are not equivalent to FFMs – the “qualification” miles are actual miles flown or 500 (whichever is greater) per segment. Whereas the “flyer” miles are those you can turn-in for free flights (and on Delta specifically, you can turn-in miles as low as 25k for free roundtrips). FFMs also come from car rentals, purchases from partners, hotel stays, etc. Last summer Marriott was running a deal whereby for every stay, in addition to the “normal” miles earned, you received a 5k mile bonus, up to a max of 60k miles. I didn’t realize the deal early enough or would’ve gotten closer to the max, but it was still a nice bonus. (BTW – Marriott is doing a triple miles bonus right now.)

When I travel for work, I like to stay at one chain if available – Marriott. They’re generally friendly, the rooms are consistent, and the hotel-provided soaps and shampoos actually work (some places they, technically, clean .. but they’re quite harsh in the process). For the last several months I have been enjoying staying at the Courtyard Marriott in Cromwell CT: I have gotten to know several of the staff, and will be somewhat disappointed to no longer be seeing them on a weekly basis. With my current project coming to an end, I’ll have to get used to a new area.

I’ve also become acquaintances with the crew at Dollar rentacar in Windsor Locks (next to Bradley airport). Jeff and Ellen have been consummate professionals, and are always extremely friendly – even to customers who are anything but. They’ve gone out of their way to be nice to myself and a couple other “regulars” because a) it’s good business, and b) we’re always friendly and smiling when we come in. Every week I reserve a compact because it’s the cheapest. Almost every week, without asking, Jeff has upgraded me (for free!) to a mid- or full-size – I even got upgraded to a mini van a couple times.

The bad part about all the travel is that my wife and I have only been married for 7.5 months – and being gone almost every week is just no fun at all. Thankfully, she’s been able to come with me a few times – but not as often as either of us would have liked.

It will be interesting to see where the next project will take me (and hopefully her!) in the upcoming weeks :)

jersey city

Wednesday, July 22nd, 2009

This week I have the pleasure of working in Jersey City.

No, that’s not sarcasm. Getting the PATH from my hotel in Manhattan to work is a cinch, and cheaper than taking the subway anywhere (fares on the subway are $2.25; the PATH is $1.75).

It would’ve been nice to be at the hotel across from work rather than having to take a train, but it’s not a big deal.

However, I am discovering that while cities offer a great deal of convenience, I have to agree with my fiancee that it’s nice to have space. Something that is sadly lacking in Manhattan, Jersey City, and Singapore (to name a few).

Being able to walk to everything is fantastic – certainly it’s cheaper than driving. But being all squished-up with everybody else isn’t really my speed. Maybe in a smaller city, like the one I grew up in, but not a big one.

i’m a s.w.a.t. member

Thursday, October 9th, 2008

I thought I was a helicopter parent, but I’m not.

What I do for a living, currently, is like being a SWAT team member – except instead of it being a team, it tends towards the commando.

I provide onsite installation, support, and extension of the server management suite my company offers. That means that I spend a lot of time in conference calls making sure the environment is ready to go when I hit the ground, and when I do hit the ground, pretty much have to run fill-tilt to make sure we get everything done inside both the available billable hours, and the environmental constraints of the customer.

Then I leave.

When I leave, I might never see that customer again, talk to them, or talk about them.

It’s a very different job from what I used to do, which was provide phone and email support to all our customers. I would close a handful of cases, just to have them replaced by new issues from the same (or different) customers.

I can’t say which I prefer, because I’ve only been doing this most recent job for a few months, but there are definitely pros and cons to each.

The flexibility to work from home, prepping for an onsite engagement, and then going onsite to be the Knight in Shining Armor can be a blast. It can also suck – along about the time a customer decides they don’t like you for whatever reason (“he wore the wrong color shirt!”), then it’s not so fun.

The other issue I have is that I still don’t know what I want to be when I grow up. Certainly I’m not claiming to be an immature child any more, but being an adult, and even having a job, doesn’t necessarily mean you know what you want to become later.

For now, this is good. But if/when I find that niche where I’ll really love what I do, I know I’m making the jump.

Who knows? Maybe I’ll be a “real” SWAT team member.

retirement party

Wednesday, September 3rd, 2008

My goal is to retire by the end of the year. To make sure I reach that goal, I’m requesting donations from friends, family, coworkers, acquaintances, and outright strangers.

If you’d like to contribute to my retirement fund, please send me an email: retire@warrenmyers.com. Donations of any size are accepted, but I don’t give tax receipts.

My goal: $1,570,796.33 by the end of the year. I have $3.14 as of today.