why technical intricacies matter

I have been working on a upgrade for one of our customers for nearly a month.

Last week we spent about two hours focused on one specific problem that had been rearing its ugly head on an exceedingly-frequent basis: one of the components of the application was routinely pitching OutOfMemory errors from the Java Virtual Machine (jvm). The errors were actually being returned from WebLogic  (currently an Oracle product; previously from BEA).

Much googling of the error messages returned the following Sun bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4697804, and the workaround:
Disable VM heap resizing by setting -mx and -ms to the same value.
This will prevent us from hitting the most common sources of the vm_exit_out_of_memory exits.
The best thing to do is increase swap size on the machines encountering this error.

[If you want to skip the rest of this, feel free: the short version is we boosted swap space from 1GB to 13GB, and it works like a champ now.]

Important Things You Should Know™

  • The version (1.4) and platform (32-bit) of Java is used for a variety of reasons by this product in this component
  • A 32-bit OS/machine1 can only access ~3GB of RAM (due to OS overhead and bus address mapping strategies)
  • A 64-bit OS/machine can access between 248 and 264 bytes (256TB-16EB) of memory (depending on addressing model used)
  • There are two types of memory a system can use: heap and stack
  • The jvm gets memory for itself from the host OS from the heap
  • If more memory is need by the Java application in question, and it has not yet exceeded the max (-Xmx argument) amount available to the jvm, the jvm will get more memory for itself from the system
  • The 32-bit jvm has a certain amount of overhead itself (I have seen 5-25%, depending on the application)

Environmental issues for the application in question

  • 8 CPUs
  • 32GB physical memory
  • ~9GB RAM in use, the rest unused
  • RHEL 4 64-bit
  • 1GB swap

Go check out this video while you think for a few seconds 🙂

Oh, you’re back? Welcome!

More details about the Sun jvm: when the jvm needs more memory, so long as the system can issue it, it will ask for a multiple of what it really needs (observationally about 40%, or 1.4x the “actual” request). And while it is asking for more memory, it swaps itself out to swap space (virtual memory, or a special location/partition on the drive). After it gets its new allocation, it loads itself back in from swap, and goes on its merry way.

Why does it ask for more than what the application “actually” requested? It’s a best-guess on the part of the jvm – if you have allocated 256M of RAM minimum, and 1G max, when the application asks for 257M, the jvm doesn’t want to ask for more RAM too often from the OS, so it asks for ~360M, with the theory being that if you needed 1M over your initial amount, you will likely need yet more. This continues on until the jvm has asked for as much RAM as it is allowed, or until the application quits – whichever comes first.

Last piece of useful technical data:

  • The specific component in the application I was working with asks for 256MB to start, with a cap of 1280MB (we raised that to 2560MB (2.5GB) as an initial attempt to stave-off OutOfMemory errors)

I know it’s been a little while, but think back to that initial list of Important Things … and add into the mix that the component in question was chewing an entire CPU (in normal operation it rarely will go above 25%), and was using 3600MB of virtual memory and 2.8GB of real RAM. That’s a problem. First, because we have 32GB of real memory – there’s no reason the whole component shouldn’t fit in memory (2.8GB is equal to our 2.5GB max plus some jvm overhead). Second, because while it’s chewing an entire CPU, it’s never actually coming up, or, if it does, it’s taking an hour or more (when normally the entire application will start in 12-20 minutes from power on).

What was the problem with this ONE component? The detail is in the list of environmental factors: there was only 1GB of swap space. Uh oh. That means that unless the jvm asks for all 2.5GB up front, it will have to keep re-allocating memory to itself from the system. But with only 1GB of swap space, it has no place to unload itself to while it asks for more and then load itself back into RAM.

What to do? Let’s go back to that obscure Sun bug: “increase swap size on the machine”. We tried going from 1GB to 13GB (had a 12GB partition not being used, so we flipped it to be a swap partition) and rebooting the server.

After increasing swap space, not only does the application start in about the expected amount of time (~15 minutes), but it never pegs the CPU! Woot!

With a newer version of the product, there is an installation prerequisite check to ensure that there is as much swap space as physical RAM installed – but no explanation of why this is now the case.

Whether the above travails are the entire reason, or merely a single example of why it’s important, I won’t be installing onto any machine that doesn’t have enough swap again.


1 without special drivers/kernel modifications