For a variety of reasons related to package versions and support contracts, I was unable to use the Red Hat built KVM image of RHEL 7.2 for a recent project. (The saga of that is worthy of its own post – and maybe I’ll write it at some point. But not today.)
First thing I tried was to build an OpenStack instance off of the RHEL 7.2 media ISO directly – but that didn’t work.
So I built a small VM on another KVM host – with virt-viewer, mirt-manager, etc – got it setup and ready to go, then went through the process of converting the qcow image to raw, and plopping it into the OpenStack image inventory.
Then I deployed the two VMs I need for my project (complete with additional disk space, yada yada yada). So far, so good.
Floating IP assigned to the app server, proper network for both, static configs updated. Life is good.
Except I cannot ssh out from the newly-minted servers anywhere. Or if it will ssh out, it’s super laggy.
I could ssh-in, but not out. I could scp out (to some locales, but not others), but was not getting nearly the transfer rates I should have been seeing. Pings worked just fine. So did nslookup.
After a couple hours of fruitless searching, got a hold of one of my coworkers who setup our OpenStack environment: maybe he’d know something.
We spent another about half hour on the phone, when he said, “hey – what’s your MTU set to?” “I dunno – whatever’s default, I guess. “Try setting it to 1450.”
Why 1450? What’s wrong with the default of 1500? Theoretically, the whole reason defaults are, well, default, is that they should “just work”. In other words, they might not be optimal for your situation, but they should be more-or-less optimalish for most situations.
Unless you’re in a basically-vanilla “layered networking” environment (apologies if “layered networking” is the wrong term, it’s the one my coworker used, and it made sense to me – networking isn’t really my forte). Fortunately, my colleague had seen an almost-identical problem several months ago playing with Docker containers. The maximum transmission unit is the cap on the network packet size, which is important to set in a TCP/IP environment – otherwise devices on the network won’t know how much data they can see at once from each other.
1500 bytes is the default for most systems, as I mentioned before, but when you have a container / virtual machine / etc hosted on a parent system whose MTU is set to 1500, the guest cannot have as large an MTU because then the host cannot attach whatever extra routing bits it needs to identify which guest gets what data when it comes back. For small network requests, such as ping uses, you’re nowhere near the MTU, so they work without a hitch.
For larger requests, you can (and will) start running into headspace issues – so either the guest MTU needs to shrink, or the host needs to grow.
Growing the host’s MTU isn’t a great option in a live environment – because it could disrupt running guests. So shrinking the guest MTU needs to be done instead.
Hopefully this helps somebody else.
Now you know, and knowing is half the battle.