antipaucity

fighting the lack of good ideas

assessment and capacity analysis and planning for virtualization initiatives

Q:

What would need to go into an assessment tool for a virtualization initiative?

A:

Typical factors may include:

  • current CPU load per server
  • what’s running on each server
  • current hardware of each server
  • expected percentage increase in usage
  • OS usage – homogenous or heterogeneous
  • new hardware or re-use current hardware
  • storage needs
  • vendor for virtualization (VMware, Microsoft, Xen)

And don’t forget the all-important:

  • BUDGET

My experience is all related around VMware, but what I’ve seen and used in the past is the following:

  • look at all CPU utilizations currently
  • add those average and peak percentages in two separate columns
  • plan for ~10% overhead from your hypervisor of choice
  • for every 40% of ‘average’ or 80% of ‘peak’, use one server of the type you now consider “high-end” (ie, if you have a total of 687% of ‘peak’, you need 9 physical servers running your hypervisor of choice)

Other thoughts:

  • I like to plan for 1 full spare physical server per ~6, so that I can utilize Vmware’s Vmotion for migrating servers around
  • plan for buying/utilizing SAN storage of some form so your VMs can be moved to different physical servers easily

I originally answered this topic ~2 years ago on serverfault.com.

new connexions collection available

I have been working on my Connexions submissions again recently, and have a collection ready for use (it will be growing as time goes on): “Debugging and Supporting Software Systems

I realize there are some small typos in the current text, but I will be addressing that in a upcoming revision šŸ™‚

I’d love to get feedback from anyone on how it could be improved/expanded.

lightsquared attacking gps manufacturers

The LightSquared situation keeps getting more interesting. InfoWorld has another story on them attacking GPS manufacturers for not beingĀ moreĀ careful about filtering adjacent frequency bands (per a DoD recommendation from 2008).

LightSquared is at loggerheads with makers and users of GPS (Global Positioning System) over interference between the navigation system and its planned cellular LTE (Long-Term Evolution) network. That network would transmit on frequencies close to those used for GPS. The company has long argued that makers of GPS equipment are to blame for the interference because they don’t use strong enough filters to keep their receivers from searching for signals in LightSquared’s bands. But this is the first time LightSquared has accused the vendors of flouting a specific rule.

The DoD’s GPS Standard Positioning Service Performance Standard called for GPS receivers to filter out transmissions on frequencies adjacent to the GPS band, LightSquared told the FCC in a filing related to the agency’s ongoing consideration of the company’s network proposal. The standard, issued in September 2008, recommends that receivers reject all transmissions on frequencies that are more than 4MHz outside the GPS band, said Jeffrey Carlisle, LightSquared’s executive vice president for regulatory affairs and public policy. That 4MHz buffer is essentially a “guard band” to protect operations on either side, he said.

LightSquared plans eventually to use frequencies adjacent to the GPS band for its LTE network, but after mandatory tests earlier this year showed strong interference in that area, the company said it would start out in a slightly lower-frequency block.

Here’sĀ somethingĀ that’s a little disturbing, though:

There is no mandatory standard for filtering in GPS receivers, and the FCC does not certify the devices for this

And here:

In addition to the DoD recommendation, the International Telecommunication Union, a United Nations agency, has also warned since 2000 that stronger filtering might be necessary to protect GPS from nearby transmissions

The ‘Coalition to Save Our GPS’ had the following to say:

“GPS receivers incorporate filters that reject transmissions in adjacent bands that are hundreds of millions of times more powerful than those of GPS. What LightSquared is proposing, however, is to transmit signals that are at least one billion times more powerful,” the group said in a statement. “There has never been, nor will there ever be, a filter that can block out signals in an immediately adjacent frequency band that are so much more powerful, nor has LightSquared put forward any credible, independent expert opinion or other evidence that this is possible.”

I’m no expert, but “hundreds of millions” is distinctly not far-off from “one billion” (since one billion is equal to ten hundred million). I also acknowledge not having much domain expertise in radio signals, transmission, etc – but what LightSquared isĀ lookingĀ to do seems a lot more useful than worrying about some poorly-built GPS receivers.

The FCCĀ said earlier this week that it would not allow the LTE service to launchĀ unless the interference issue was resolved.

LightSquared has said it is confident the plan will be approved next month.

light shows

I’ve recently had some travel for work that had put me up in Indianapolis.

Tuesday evening I watched the best light show ever: a miles-high, miles-wide thunderhead flashing nearly constantly for over 30 minutes.

It was a little east of where I was staying in Fishers, but man was it pretty!

For the record, God’s fireworks are cooler than any 4th of July party šŸ˜€

connexions

A few years ago I was working for Sigma Xi as an intern, and was introduced to the then-young Connexions project from Rice University.

This week I was reminded of the service, and have started looking into ways I can contribute to their open repository of educational materials.

I’d written two articles published there when I was at Sigma XI, and while one of them now looks somewhat quaint and dated, I think there are some other areas that I could contribute to that would be helpful.

CNX is free and open to anyone to use, add-to, modify, and reference – so have fun šŸ™‚

the fcc decides to intervene on lightsquared

I’ve taken an interest inĀ LightSquared recently.

Today InfoWorldĀ reports that the FCC “won’t allow LightSquared’s proposed mobile broadband service to interfere with GPS signals, even though the potential interference would be caused by GPS receivers picking up signals outside of their designated spectrum”.

So, the devicesĀ are in error, but the FCC is going to prevent LightSquaredĀ from interfering?

Sounds like the FCC should be going after the receiver manufacturers to ensure their systems don’t bleed over, rather than after a company not operating on GPS spectrum.

Wait, I forgot: that’d be too logical for a government agency šŸ˜

why technical intricacies matter

I have been working on a upgrade for one of our customers for nearly a month.

Last week we spent about two hours focused on one specific problem that had been rearing its ugly head on an exceedingly-frequent basis: one of the components of the application was routinely pitching OutOfMemory errors from the Java Virtual Machine (jvm). The errors were actually being returned from WebLogic Ā (currently an Oracle product; previously from BEA).

Much googling of the error messages returned the following Sun bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4697804, and the workaround:
Disable VM heap resizing by setting -mx and -ms to the same value.
This will prevent us from hitting the most common sources of the vm_exit_out_of_memory exits.
The best thing to do is increase swap size on the machines encountering this error.

[If you want to skip the rest of this, feel free: the short version is we boosted swap space from 1GB to 13GB, and it works like a champ now.]

Important Things You Should Knowā„¢

  • The version (1.4) and platform (32-bit) of Java is used for a variety of reasons by this product in this component
  • A 32-bit OS/machine1 can only access ~3GB of RAM (due to OS overhead and bus address mapping strategies)
  • A 64-bit OS/machine can access between 248 and 264 bytes (256TB-16EB) of memory (depending on addressing model used)
  • There are two types of memory a system can use: heap and stack
  • The jvm gets memory for itself from the host OS from the heap
  • If more memory is need by the Java application in question, and it has not yet exceeded the max (-Xmx argument) amount available to the jvm, the jvm will get more memory for itself from the system
  • The 32-bit jvm has a certain amount of overhead itself (I have seen 5-25%, depending on the application)

Environmental issues for the application in question

  • 8 CPUs
  • 32GB physical memory
  • ~9GB RAM in use, the rest unused
  • RHEL 4 64-bit
  • 1GB swap

Go check out this video while you think for a few seconds šŸ™‚

Oh, you’re back? Welcome!

More details about the Sun jvm: when the jvm needs more memory, so long as the system can issue it, it will ask for a multiple of what it really needs (observationally about 40%, or 1.4x the “actual” request).Ā And while it is asking for more memory, it swaps itself out to swap space (virtual memory, or a special location/partition on the drive). After it gets its new allocation, it loads itself back in from swap, and goes on its merry way.

Why does it ask for more than what theĀ applicationĀ “actually” requested? It’s a best-guess on the part of the jvm – if you have allocated 256M of RAM minimum, and 1G max, when the application asks for 257M, the jvm doesn’t want to ask for more RAM too often from the OS, so it asks for ~360M, with the theory being that if you needed 1M over your initial amount, you will likelyĀ need yet more. This continues on until the jvm has asked for as much RAM as it is allowed, or until the application quits – whichever comes first.

Last piece of useful technical data:

  • The specific component in the application I was working with asks for 256MB to start, with a cap of 1280MB (we raised that to 2560MB (2.5GB) as an initial attempt to stave-off OutOfMemory errors)

I know it’s been a little while, but think back to that initial list of Important Things … and add into the mix that the component in question was chewing an entire CPU (in normal operation it rarely will go above 25%), and was using 3600MB of virtual memory and 2.8GB of real RAM. That’s a problem. First, because we have 32GB of real memory – there’s no reason the whole component shouldn’t fit in memory (2.8GB is equal to our 2.5GB max plus some jvm overhead). Second, because while it’s chewing an entire CPU, it’s never actuallyĀ coming up, or, if it does, it’s taking an hour or more (when normally the entire application will start in 12-20 minutes from power on).

What was the problem with this ONE component? The detail is in the list of environmental factors: there was only 1GB of swap space. Uh oh. That means that unless the jvm asks for all 2.5GB up front, it will have to keep re-allocating memory to itself from the system. But with only 1GB of swap space, it has no place to unload itself to while it asks for more and then load itself back into RAM.

What to do? Let’s go back to that obscure Sun bug: “increase swap size on the machine”. We tried going from 1GB to 13GB (had a 12GB partition not being used, so we flipped it to be a swap partition) and rebooting the server.

After increasing swap space, not only does the application start in about the expected amount of time (~15 minutes), but it neverĀ pegs the CPU! Woot!

With a newer version of the product, there is an installation prerequisite check to ensure that there is as much swap space as physical RAM installed – but no explanation of whyĀ this is now the case.

Whether the above travails are the entireĀ reason, or merely a singleĀ example of why it’s important, I won’t be installing onto any machine that doesn’t have enough swap again.


1 without special drivers/kernel modifications