fighting the lack of good ideas

watch your mtu size in openstack

For a variety of reasons related to package versions and support contracts, I was unable to use the Red Hat built KVM image of RHEL 7.2 for a recent project. (The saga of that is worthy of its own post – and maybe I’ll write it at some point. But not today.)

First thing I tried was to build an OpenStack instance off of the RHEL 7.2 media ISO directly – but that didn’t work.

So I built a small VM on another KVM host – with virt-viewer, mirt-manager, etc – got it setup and ready to go, then went through the process of converting the qcow image to raw, and plopping it into the OpenStack image inventory.

Then I deployed the two VMs I need for my project (complete with additional disk space, yada yada yada). So far, so good.

Floating IP assigned to the app server, proper network for both, static configs updated. Life is good.

Except I cannot ssh out from the newly-minted servers anywhere. Or if it will ssh out, it’s super laggy.

I could ssh-in, but not out. I could scp out (to some locales, but not others), but was not getting nearly the transfer rates I should have been seeing. Pings worked just fine. So did nslookup.

After a couple hours of fruitless searching, got a hold of one of my coworkers who setup our OpenStack environment: maybe he’d know something.

We spent another about half hour on the phone, when he said, “hey – what’s your MTU set to?” “I dunno – whatever’s default, I guess. “Try setting it to 1450.”

Why 1450? What’s wrong with the default of 1500? Theoretically, the whole reason defaults are, well, default, is that they should “just work”. In other words, they might not be optimal for your situation, but they should be more-or-less optimalish for most situations.

Unless you’re in a basically-vanilla “layered networking” environment (apologies if “layered networking” is the wrong term, it’s the one my coworker used, and it made sense to me – networking isn’t really my forte). Fortunately, my colleague had seen an almost-identical problem several months ago playing with Docker containers. The maximum transmission unit is the cap on the network packet size, which is important to set in a TCP/IP environment – otherwise devices on the network won’t know how much data they can see at once from each other.

1500 bytes is the default for most systems, as I mentioned before, but when you have a container / virtual machine / etc hosted on a parent system whose MTU is set to 1500, the guest cannot have as large an MTU because then the host cannot attach whatever extra routing bits it needs to identify which guest gets what data when it comes back. For small network requests, such as ping uses, you’re nowhere near the MTU, so they work without a hitch.

For larger requests, you can (and will) start running into headspace issues – so either the guest MTU needs to shrink, or the host needs to grow.

Growing the host’s MTU isn’t a great option in a live environment – because it could disrupt running guests. So shrinking the guest MTU needs to be done instead.

Hopefully this helps somebody else.

Now you know, and knowing is half the battle.

deploying openstack by ken pepple

Where do I begin?

How about with this being perhaps the most overpriced tech book I have ever seen. At just under 70 pages, and a penny shy of $25, Deploying OpenStack by Ken Pepple exceeds the cost per page numbers I can remember from college. Wow.

Thankfully, I did NOT pay for this book – I was able to borrow it from my local library. I do feel sad, though, that anyone paid for this.

There are a couple nice diagrams wedged in the pages, but this is worse than a documentation dump from the various OpenStack sites. This is a sad example of an O’Reilly book – one I would never have dared think would have made it past their editor board, let alone be published for such an outrageous price.

There are also several amusing typos – including claiming that the test server he used for one deployment had a 1.4 Mhz CPU: Athlons were never measured below 600 Mhz that I can recall, and certainly the dual-core system he talked about should have been in Ghz.

At best, this is a published, overly-long blog post. At worst, it’s a pointless display of the hype surrounding “Cloud” – instead of giving lots of useful information, it’s stuck at the bare basics of the process, and frozen in time from more than a year ago! Given the rate of change in toolsets like this, there needs to not only be a lot more content in any printed work related to the technology, but also a planned cycle of releasing new editions – likely on the order of every year (or more) … especially in the early stages of a project/product.

Do yourself a HUGE favor: skip this book, and read the online documentation instead. You’ll be very glad you did.