antipaucity

fighting the lack of good ideas

doing technical phone screens

Related to a previous post on career development, I thought it could be interesting to look at one approach to the technical screen that I have used over the past few years when interviewing candidates.

  1. for folks with no “real” experience yet, I ask them to rank themselves on a few key technologies on the “Google scale”
    • the range is 0..10 where a 0 is no knowledge, 1 is some, 10 is “you wrote the book”, 9 is you could’ve written the book, or you edited/contributed
    • on a few occasions, I have had folks ask to change their ranking from their initial [overconfident] statement to one that is much closer to inline with their true experience/comfort/knowledge level – and that’s OK in my book – honesty is always the best policy here
  2. a couple quick “about us” questions – open-ended inquiries that let the candidate tell me what they’ve done for work
    • this verifies their resume
    • gets them warmed-up for the rest of the call
    • allows the candidate to brag on something
  3. perhaps a couple quick probes to find out more about a specific experience
  4. a few basic / intermediate questions to assess candidate’s technical chops (ie, verify that their resume is accurate)
    • this goes along with my personal rule of “never put anything on a resume you don’t want to be asked about”
  5. open-ended, intentionally-vague questions to gauge problem solving ability, and methodologies
    • see how they go about refining the problem statement (if at all)
    • gauge estimation skills
    • gauge teamwork and delegation aptitude
  6. a few intermediate/advanced questions about an area they *don’t* know anything about – to gauge their response to unfamiliar/stressful situations
    • in my field in particular, it is impossible to know every new technology or even (probably) to be truly 100% aware of those that you do use every single day
  7. a few intermediate/advanced questions in their now-articulated fields of expertise (presuming I have any)
    • this verifies more of their stated (and unstated) job experience, and helps determine at what title/work level they should start
  8. lifestyle/workstyle questions
    • how much they enjoy travel
    • how they handle last-minute demands and “requests” by customers and management
  9. a few questions to gauge flexibility of response to changing requirements
    • for example, switching a project from being Solaris-based to Windows-based part way into implementation because a new CIO has come in, or new licensing is available, etc
  10. open time for them to ask me whatever they may wish to know that I can tell them
    • this usually ends-up being very short because the candidate was stressed-out over the interview, and can’t think of anything about the company they want to know on the spot

What I try to NEVER ask:

  • “trivia” questions – I bet there are C questions even K&R couldn’t answer 🙂
    • I guarantee I can ask you a question about your area of expertise you cannot answer…just like I guarantee you could do the same to me
    • since that is the case, trivia questions are pretty pointless, and more of an ego stroke to the asker than anything else
  • pointless “MindTrap“, lateral-thinking questions
    • riddles are fun – but only add to the stress of the interview (like “why are manhole covers round”)
  • pointless problem-solving and estimation problems
    • for example, “how would you move Mt Fuji”, or “how many gallons of water flow into New York Harbor from the Hudson River per hour”
    • estimation problems are wonderful tools and games to play, but not in an interview
  • illegal questions
    • sometimes they slip out, but it’s never intentional 🙂

I adjust my questioning to fit the situation, timing, and candidate responses – so it’s [somewhat] different every time.

When the interview is done, I write-up my evaluation of the candidate and send it on to the hiring manager. In line with Joel Spolsky‘s “Guerilla Guide to Interviewing“, I make sure to put my firm conclusion of Hire/No-Hire near the top, and again at the bottom – with my reasoning in between.

One thing I have noticed about almost every interview I have ever taken or given is that I end up learning something in the process – and not just about the candidate (or company). It’s important to listen to both how and the candidates responds to questions, and what they say.

So, if you ever get the chance to interview with me, you have an idea of how I’m going to run the show 🙂

http is a stateless protocol

The ubiquitous protocol that enables the internet as we know it, http, is stateless.

Stateless merely means that any given request has nothing to do with the previous, or the next request. This enables the world wide web, as web servers do not need to keep track of who is receiving data, nor ow much they have: they get a request, and ship data to the requestor.

It is up to the requestor (often a web browser) to handle the incoming data.

If not every part of a web page, for example, is sent, the browser will display what it can.

This is analogous to a creditor sending you a bill (request), and you sending a check back to them – once the bill has been sent, the creditor knows nothing about the state of the bill until he receives a payment. Likewise, once the check is dropped in the mail, the payor knows nothing about his bill until the check clears his bank.

Why is this important? Because of an oft-repeated “request for enhancement” to the product I use on a daily basis. When the implementors of Opsware SAS were picking how a user should communicate with the system, they picked to run everything over http(s). They chose to utilize http because it’s commonplace, well-understood, and easy to work with.

One of the things about statelessness is that you cannot know how many people are using a given web page at the same time. Google cannot tell anyone how many people are actually looking at www.google.com at this moment. They can tell you how many loaded it,and how many just presses “Search”, but they can’t know what percentage of the loaders promptly went elsewhere – either to a different page, or a different room in their home.

One way around the statelessness of http is to utilize cookies or session data – but that merely adds a check layer to the interaction, it does not provide true “statefulness”.

Several times during my time in Support at Opsware (and after HP’s acquistion), I would have a customer who was looking for the ability to determine who was logged-in at any given time (in similar fashion to running `w` or `who` or `finger` on a Linux/Unix system). This could be important to know whether a user is “doing something” before doing an application restart.

However, since communication is all done via http, there can be no state known in the tool. Once you load a web page, it is being viewed/rendered on your local machine in your web browser – the server could be shut off, your network connection removed, or any of a host of other simulations of restarting the application. And your browser would be none-the-wiser, nor should it be: it has the data it requested/received, and you’re doing something with it.

This carries over to the product I work with. Jobs might be scheduled by a user to run every day at 0200 – but he doesn’t need to be actively logged-in to have them run. Likewise, someone may have logged-in, but is not “doing” anything currently (maybe they’re at lunch).

Another case of why technical intricacies matter 🙂

the ticket smash, raw metrics, and communication – how to have a successful support organization

When I worked at Opsware, and for a while after HP bought us, we used to try to have once- or twice-a-week meetings for each support group wherein we would bring our most difficult cases (with the difficulty being determined by the case owner), and have an opportunity for everyone on the team to ask questions, contribute, and maybe even solve the problem our customer was having.

Novel idea, isn’t it? The typical Support team is driven by stats – the number of tickets in their queue, age of the ticket, number solved/closed, number escalated, etc. Support is driven by these numbers because managers don’t think of any better way to do it.

All things being equal, if you can close 40 cases in a week, that’s a lot better than your podmate who “only” finished-out 12. But what about the complexity of each of those cases? And how much effort did each engineer put into them? Did the customer come back and ask for it to be closed because it’s either no longer an issue, or they solved it themselves? Is it a question that can be answered with a reference to a specific page/section of a manual? Or was it a problem that took multiple webex engagements, and dozens of contacts back and forth to find a solution because it was a deep bug?

Theoretically, the goal of “support” is to, well, support – get the problem reporter a solution of some kind they can use. That solution may be a bug fix, an RFE, a reference to a tutorial, reconfiguring, or a work around / alternative approach to their problem. A big problem with this setup is that the reporter rarely asks the right question. They ask what they have pre-determined to be what they think is a question – but by biasing their initial report, they can often end-up dragging-out the solution process far longer than it should take. I recently wrote a guide on creating effective support tickets, based on my experience working in support, and interacting with various support organizations both before and since.

Reporter bias is the hardest issue to overcome, in my opinion; engineer bias is easier to get past because (hopefully) there are folks you can bounce the problem off of in the team who can help narrow-down the problem and find a solution … or at least figure out where to try looking next.

Communication is the key to solving problems – when I was at Opsware we utilized internal IRC channels and (gasp!) talking with each other to try to find solutions to customer issues. We also spent a lot of time wording inquiries to the reporter to try to gain as much information as possible on each iteration of the communication process.

Another key to solving problems was to make records of cases with the following:

  • initial reported behavior (or lack thereof)
  • actual problem
  • solution

Those records were sometimes on wiki pages, sometimes in our Plone internal KB, and sometimes got “promoted” out to the customer-facing KB. All of these approaches helped us get problems solved faster – either by offloading the “work” to the customer (via a KB reference), or by being able to apply previous answers more quickly when new-but-similar/identical problems were reported.

The end goal of a support team is not to outdo one another on how many cases one engineer has in his queue, or how many another has closed – the end goal is to solve customer problems. “Works well in a team setting” is a qualification typically associated with support engineering employment listings – but all too often that gets reduced to a cliche that practically means “tries to outdo his cubemates by closing more cases than the next guy”.

I’m as much a fan of personal responsibility and action as the next red-blooded capitalist, so don’t take this next section to imply I’m promoting communalism.

The way a support team should work is the way [good] sports teams work, or the way a Nascar team operates: yeah, it’s the driver of the car who gets the “glory”, but without his pit and maintenance crew, he’d be no better than you or I going to the grocery store. Any given support engineer gets to have his name tagged to the case for posterity – both with the good things he did, and the not so good ones. But since the goal is really to get the customer’s problem addressed, the ego of the engineer needs to be removed from the equation.

Bob Smith might be “the guy” who informed his customer of a solution, but generating the solution involved the other 7 people in his office. He gets the “fame” from Universal Widgets LLC, but he was just one of the [important] cogs in the process of resolving the issue.

The number of cases Bob has in his queue should have [almost] ZERO correlation to his skill as an technical engineer: it’s the 7 people behind him whom he can ask and brainstorm with that get the job done.

Maybe Bob gets to handle most of the “customer” action, but the other 7 are writing bug reports, solutions articles, etc. When evaluating that team, management needs to do just that: evaluate the team first, and the individuals second.

why technical intricacies matter

I have been working on a upgrade for one of our customers for nearly a month.

Last week we spent about two hours focused on one specific problem that had been rearing its ugly head on an exceedingly-frequent basis: one of the components of the application was routinely pitching OutOfMemory errors from the Java Virtual Machine (jvm). The errors were actually being returned from WebLogic  (currently an Oracle product; previously from BEA).

Much googling of the error messages returned the following Sun bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4697804, and the workaround:
Disable VM heap resizing by setting -mx and -ms to the same value.
This will prevent us from hitting the most common sources of the vm_exit_out_of_memory exits.
The best thing to do is increase swap size on the machines encountering this error.

[If you want to skip the rest of this, feel free: the short version is we boosted swap space from 1GB to 13GB, and it works like a champ now.]

Important Things You Should Know™

  • The version (1.4) and platform (32-bit) of Java is used for a variety of reasons by this product in this component
  • A 32-bit OS/machine1 can only access ~3GB of RAM (due to OS overhead and bus address mapping strategies)
  • A 64-bit OS/machine can access between 248 and 264 bytes (256TB-16EB) of memory (depending on addressing model used)
  • There are two types of memory a system can use: heap and stack
  • The jvm gets memory for itself from the host OS from the heap
  • If more memory is need by the Java application in question, and it has not yet exceeded the max (-Xmx argument) amount available to the jvm, the jvm will get more memory for itself from the system
  • The 32-bit jvm has a certain amount of overhead itself (I have seen 5-25%, depending on the application)

Environmental issues for the application in question

  • 8 CPUs
  • 32GB physical memory
  • ~9GB RAM in use, the rest unused
  • RHEL 4 64-bit
  • 1GB swap

Go check out this video while you think for a few seconds 🙂

Oh, you’re back? Welcome!

More details about the Sun jvm: when the jvm needs more memory, so long as the system can issue it, it will ask for a multiple of what it really needs (observationally about 40%, or 1.4x the “actual” request). And while it is asking for more memory, it swaps itself out to swap space (virtual memory, or a special location/partition on the drive). After it gets its new allocation, it loads itself back in from swap, and goes on its merry way.

Why does it ask for more than what the application “actually” requested? It’s a best-guess on the part of the jvm – if you have allocated 256M of RAM minimum, and 1G max, when the application asks for 257M, the jvm doesn’t want to ask for more RAM too often from the OS, so it asks for ~360M, with the theory being that if you needed 1M over your initial amount, you will likely need yet more. This continues on until the jvm has asked for as much RAM as it is allowed, or until the application quits – whichever comes first.

Last piece of useful technical data:

  • The specific component in the application I was working with asks for 256MB to start, with a cap of 1280MB (we raised that to 2560MB (2.5GB) as an initial attempt to stave-off OutOfMemory errors)

I know it’s been a little while, but think back to that initial list of Important Things … and add into the mix that the component in question was chewing an entire CPU (in normal operation it rarely will go above 25%), and was using 3600MB of virtual memory and 2.8GB of real RAM. That’s a problem. First, because we have 32GB of real memory – there’s no reason the whole component shouldn’t fit in memory (2.8GB is equal to our 2.5GB max plus some jvm overhead). Second, because while it’s chewing an entire CPU, it’s never actually coming up, or, if it does, it’s taking an hour or more (when normally the entire application will start in 12-20 minutes from power on).

What was the problem with this ONE component? The detail is in the list of environmental factors: there was only 1GB of swap space. Uh oh. That means that unless the jvm asks for all 2.5GB up front, it will have to keep re-allocating memory to itself from the system. But with only 1GB of swap space, it has no place to unload itself to while it asks for more and then load itself back into RAM.

What to do? Let’s go back to that obscure Sun bug: “increase swap size on the machine”. We tried going from 1GB to 13GB (had a 12GB partition not being used, so we flipped it to be a swap partition) and rebooting the server.

After increasing swap space, not only does the application start in about the expected amount of time (~15 minutes), but it never pegs the CPU! Woot!

With a newer version of the product, there is an installation prerequisite check to ensure that there is as much swap space as physical RAM installed – but no explanation of why this is now the case.

Whether the above travails are the entire reason, or merely a single example of why it’s important, I won’t be installing onto any machine that doesn’t have enough swap again.


1 without special drivers/kernel modifications

where google makes its money

Wired has an interesting infographic today from WordStream on where Google makes its money in advertising.

No surprise on some of the top entries: but the last was surprising (both to me, and the folks who did the analysis): Cord Blood. Seems “rich parents” are wanting to store their newborn’s umbilical cord blood for the stem cells contained, in the hopes that they could be used later in life if some health crisis arises.

Fascinating.

kaching

I had been playing with a fun stock market simulator/investing application on Facebook until yesterday. It was called kaching (now defunct). The authors decided to focus their efforts on their for-pay service, kaching.com, and drop the free app on facebook.

That’s all well and good – folks making money does not bother me.

What does bother me is when the maintainers of the application say they are expressly not inviting the 60,000+ users to their new service. A reply I received to a post I made when I found out the app was being removed from one of their admins was incredibly unprofessional and rude. He said they weren’t inviting the facebook users because they were not likely to want to use it, and wouldn’t pay for it.

My request was for the app to be kept up, just have maintenance on it cease and no new features be added. Keeping the app alive would have cost them next to nothing. Removing it has alienated 60,000+ people who [almost] all complained, and have made comments like I am right now warning people away from their “premium” services.

I’m all for folks making money: that is, after all, how bills get paid. I’m all for having a closed platform – if that’s what you want to do (though open platforms seem to last longer and work better overall… but that’s an entire series of posts in its own right). But using 60,000 folks on facebook to effectively beta test your premium services, and then drop them just because you want to refocus, does not bode well for professionalism or future success. No, none of us paid to use the app. But an awful lot of us had a lot of fun playing with it.

Shame they’ve decided to upset 60,000 people in such a way. Even more of a shame is that they used 60,000 people as guinea pigs without telling us.

Active, free alternatives on facebook:

irony

About 2 years ago, I wrote about the problem of holding onto electronic stuff just because storage was cheap.

It wasn’t until I met my fiancee that I realized I did the same thing with “real” stuff – holding onto it just because it was there.

I’m no where near a candidate for Hoarding: Buried Alive (praise the Lord!) .. but I could easily have been in a few years had I not met someone so helpful in keeping priorities about “stuff” straight – if it has a home and can live there relatively neatly.. it’s ok. Otherwise, it needs to be elsewhere.

Why is the self-storage industry doing so well in the US? Why do we own so much stuff we can’t even keep it in our homes? When did “stuff” become more important than people?