antipaucity

fighting the lack of good ideas

create your own clustered cloud storage system with moosefs and pydio

This started-off as a how-to on installing ownCloud. But their own installation procedures don’t work for the 8.0x release and CentOS 6.

Most of you know I’ve been interested in distributed / cloud storage for quite some time.

And that I find MooseFS to be fascinating. As of 2.0, MooseFS comes in two flavors – the Community Edition, and the Professional Edition. This how-to uses the CE flavor, but it’d work with the Pro version, too.

I started with the MooseFS install guide (pdf) and the Pydio quick start steps. And, as usual, I used Digital Ocean to host the cluster while I built it out. Of course, this will work with any hosting provider (even internal to your data center using something like Backblaze storage pods – I chose Digital Ocean because they have hourly pricing; Chunk Host is a “better” deal if you don’t care about hourly pricing). In many ways, this how-to is in response to my rather hackish (though quite functional) need to offer file storage in an otherwise-overloaded lab several years back. Make sure you have “private networking” (or equivalent) enabled for your VMs – don’t want to be sharing-out your MooseFS storage to just anyone 🙂

Also, as I’ve done in other how-tos on this blog, I’m using CentOS Linux for my distro of choice (because I’m an RHEL guy, and it shortens my learning curve).

With the introduction out of the way, here’s what I did – and what you can do, too:

Preliminaries

  • spin-up at least 3 (4 would be better) systems (for purposes of the how-to, low-resource (512M RAM, 20G storage) machines were used; use the biggest [storage] machines you can for Chunk Servers, and the biggest [RAM] machine(s) you can for the Master(s))
    • 1 for the MooseFS Master Server (if using Pro, you want at least 2)
    • (1 or more for metaloggers – only for the Community edition, and not required)
    • 2+ for MooseFS Chunk Servers (minimum required to ensure data is available in the event of a Chunk failure)
    • 1 for ownCloud (while this might be able to co-reside with the MooseFS Master – this tutorial uses a fully-separate / tiered approach)
  • make sure the servers are either all in the same data center, or that you’re not paying for inter-DC traffic
  • make sure you have “private networking” (or equivalent) enabled so you do not share your MooseFS mounts to the world
  • make sure you have some swap space on every server (may not matter, but I prefer “safe” to “sorry”) – I covered how to do this in the etherpad tutorial

MooseFS Master

  • install MooseFS master
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-master moosefs-cli
  • make changes to /etc/mfs/mfsexports.cfg
    • # Allow everything but “meta”.
    • #* / rw,alldirs,maproot=0
    • 10.132.0.0/16 / rw,alldirs,maproot=0
  • add hostname entry to /etc/hosts
    • 10.132.41.59 mfsmaster
  • start master
    • service moosefs-master start
  • see how much space is available to you (none to start)
    • mfscli -SIN

MooseFS Chunk(s)

  • install MooseFS chunk
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-chunkserver
  • add the mfsmaster line from previous steps to /etc/hosts
    • cat >> /etc/hosts
    • 10.132.41.59 mfsmaster
    • <ctrl>-d
  • make your share directory
    • mkdir /mnt/mfschunks
  • add your freshly-made directory to the end of /etc/mfshdd.cfg, with a size you want to share
    • /mnt/mfschunks 15GiB
  • start the chunk
    • service moosefs-chunkserver start
  • on the MooseFS master, make sure your new space has become available
    • mfscli -SIN
  • repeat for as many chunks as you want to have

Pydio / MooseFS Client

  • install MooseFS client
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-client
  • add the mfsmaster line from previous steps to /etc/hosts
    • cat >> /etc/hosts
    • 10.132.41.59 mfsmaster
    • <ctrl>-d
  • mount MooseFS share somewhere where Pydio will be able to get to it later (we’ll use a bind mount for that in a while)
    • mfsmount /mnt/mfs -H mfsmaster
  • install Apache and PHP
    • yum -y install httpd
    • yum -y install php-common
      • you need more than this, and hopefully Apache grabs it for you – I installed Nginx then uninstalled it, which brought-in all the PHP stuff I needed (and probably stuff I didn’t)
  • modify php.ini to support large files (Pydio is exclusively a webapp for now)
    • memory_limit = 384M
    • post_max_size = 256M
    • upload_max_filesize = 200M
  • grab Pydio
    • you can use either the yum method, or the manual – I picked manual
    • curl http://hivelocity.dl.sourceforge.net/project/ajaxplorer/pydio/stable-channel/6.0.6/pydio-core-6.0.6.tar.gz
      • URL correct as of publish date of this blog post
  • extract Pydio tgz to /var/www/html
  • move everything in /var/www/html/data to /mnt/moosefs
  • bind mount /mnt/moosefs to /var/www/html/data
    • mount –bind /mnt/moosefs /var/www/html/data
  • set ownership of all Pydio files to apache:apache
    • cd /var/www/html && chown -R apache:apache *
    • note – this will give an error such as the following screen:
    • Screen Shot 2015-04-20 at 16.32.48this is “ok” – but don’t leave it like this (good enough for a how-to, not production)
  • start Pydio wizard
  • fill-in forms as they say they should be (admin, etc)
    • I picked “No DB” for this tutorial – you should use a database if you want to roll this out “for real”
  • login and starting using it

Screen Shot 2015-04-20 at 17.07.51

Now what?

Why would you want to do this? Maybe you need an in-house shared/shareable storage environment for your company / organization / school / etc. Maybe you’re just a geek who likes to play with new things. Or maybe you want to get into the reselling business, and being able to offer a redundant, clustered, cloud, on-demand type storage service is something you, or your customers, would find profitable.

Caveats of the above how-to:

  • nothing about this example is “production-level” in any manner (I used Digital Ocean droplets at the very small end of the spectrum (512M memory, 20G storage, 1 CPU))
    • there is a [somewhat outdated] sizing guide for ownCloud (pdf) that shows just how much it wants for resources in anything other than a toy deployment
    • Pydio is pretty light on its basic requirements – which also helped this how-to out
    • while MooseFS is leaner when it comes to system requirements, it still shouldn’t be nerfed by being stuck on small machines
  • you shouldn’t be managing hostnames via /etc/hosts – you should be using DNS
    • DNS settings are far more than I wanted to deal with in this tutorial
  • security has, intentionally, been ignored in this how-to
    • just like verifying your inputs is ignored in the vast majority of programming classes, I ignored security considerations (other than putting the MooseFS servers on non-public-facing IPs)
    • don’t be dumb about security – it’s a real issue, and one you need to plan-in from the very start
      • DO encrypt your file systems
      • DO ensure your passwords are complex (and used rarely)
      • DO use key-based authentication wherever possible
      • DON’T be naive
  • you should be on the mailing list for MooseFS and Pydio forum.
    • the communities are excellent, and have been extremely helpful to me, even as a lurker
  • I cannot answer more than basic questions about any of the tools used herein
  • why I picked what I picked and did it the way I did
    • I picked MooseFS because it seems the easiest to run
    • I picked Pydio because the ownCloud docs were borked for the 8.0x release on CentOS 6 – and it seems better than alternatives I could find (Seafile, etc) for this tutorial
    • I wanted to use ownCloud because it has clients for everywhere (iOS, Android, web, etc)
    • I have no affiliation with either MooseFS or Pydio beyond thinking they’re cool
    • I like learning new things and showing them off to others

Final thoughts

Please go make this better and show-off what you did that was smarter, more efficient, cheaper, faster, etc. Turn it into something you could deploy as an AMID on AWS. Or Docker containers. Or something I couldn’t imagine. Everything on this site is licensed under the CC BY 3.0 – have fun with what you find, make it awesomer, and then tell everyone else about it.

I think I’ll give LizardFS a try next time – their architecture is, diagrammatically, identical to the “pro” edition of MooseFS. And it’d be fun to have experience with more than one solution.

on-demand, secure, distributed storage

In follow-up to a friend’s blog post on TrueCrypt, and in conjunction with some previous investigation and interests I have had, I am wondering how difficult it would be to run a tool like MooseFS in conjunction with TrueCrypt to provide a Wuala-like service as a plausibly-deniable data haven a la Cryptonomicon.

testing hardware performance differences

I’ve been attempting to understand how hard disk cache sizes affect performance recently (and whether it’s worth shelling-out about twice as much for a drive with 128MB vs one with just 64MB).

What would be the best way to personally investigate the performance differences to help determine which is better (if there’s even a noticeable difference)?

What benchmarking tools are best suited for such a task?

#moosefs @smartfile – distributed, redundant file management (#olf2013 talk)

As promised, some follow-up to OLF.

Chris from SmartFile gave a great talk at OLF this year on MooseFS and how SmartFile leverages it to handle their rapidly-growing storage infrastructure.

Specifically, he compared it to Ceph and GlusterFS.

In short, MooseFS provides better configurability than either Ceph or GlusterFS, runs with lower overhead, and provides more flexibility for their environment.

One of the other cool things Chris said was they adopted the BackBlaze StoragePod design for their “green monster” storage units. Nice to see open source being leveraged not just in software, but hardware, too 🙂

p2p cloud storage

I have yet to find a peer-to-peer file storage system.

You’d think that with all the p2p and cloud services out there, there’d be a way of dropping files into a virtual folder and having them show up around the network (encrypted, of course) – replicated on some kinda of RAID-over-WAN methodology.

I’m thinking Cryptonomicon’s data haven, but done on the cheap (and not in a dedicated place like the Sultanate of Kinakuta).

I’d use it if it were built.

digital preservation

I have been an active member on the Stack Exchange family of sites [nearly] since StackOverflow started a few years ago.

Recently a new proposal has been made for Digital Preservation. Many of the proposed questions are interesting (including one of mine) – and I would strongly encourage anyone interested in the topic to check it out.

The topic has resparked a question I have had for a long time – why is it important to archive data?

Not that I think it’s inherently bad to hold onto digital information for some period of time – but what is the impetus for storing it more-or-less forever?

In tech popculture we have services like Google’s gmail which starts users at a mind-boggling 7+ gigabytes of storage! For email! Who has 7GB of email that needs to be stored?! For a variety of reasons, I hold onto all of my work email for the duration of my employment with a given company – you never know when it might be useful (and it turns out it’s useful fairly frequently). But personal email? Really? Who needs either anywhere near that much, or to hold onto it for that long? And those few people who arguably DO need that much, or to keep it forever, can afford to store it somewhere safely.

I think there is a major failing in modern thinking that says we have to save everything we can just because we can. Is storage “cheap”? Absolutely. But the hoard / “archive” mentality that pervades modern culture needs to be combated heavily. We, as a people, need to learn how to forget – and how to remember properly. Our minds are, more and more, becoming “googlized“. We have decided it’s more important to know how to find what we want rather to know it. And for some things, this is good:

If you are a machinist, is it better to know how to reverse-thread the inside of a titanium pipe end-cap, or to go look up what kind of tooling and lathe settings you will need when you get around to making that part? I suppose that if all you ever do in life is mill reverse-threaded titanium pipe end-caps, you should probably commit that piece of information to memory.

But we need to remember to forget, too:

when you need to make two of these things. Ever. In your entire life. In the entire history of every company you ever work for. Well, then I would say it’s better to go look up that particular datum when you need it. And then promptly forget it.

The historical value, interest, and amazing work that is contained in the “Domesday Books” is amazing – and something that has been of immense value to historians, archivists, politicians, and the general public. Various and sundry public records (census data, property deeds, genealogies, etc) are fantastic pieces to hold onto – and to make as available and accessible as possible.

Making various other archives available publicly is great too (eg the NYO&WRHS) – and I applaud each and every one of those efforts; indeed, I contribute to them whenever I can.

I continuously wonder, though, how many of these records and artifacts truly need to be saved – certainly it is true of physical artifacts that preservation is important, but how many copies of the first printing of Moby Dick do we need (to pick an example)?

I don’t know what the best answer is to digital hoarding, but preservation is a topic which needs to be considered carefully.

storage strategies – part 4

Last time I talked about storage robustifiers.

In the context of a couple applications with which I am familiar, I want to discuss ways to approach balancing storage types and allocations.

Storage Overview

Core requirements of any modern server, from a storage standpoint, are the following:

  • RAM
  • swap
  • Base OS storage
  • OS/application log storage
  • Application storage

Of course, many more elements could be added – but these are the ones I am going to consider today. From a cost perspective, hardware is almost always the least expensive part of an application deployment – licensing, development, maintenance, and other non-“physical” costs will typically far outweigh the expenses of hardware.

RAM

RAM is cheap, comparatively. Any modern OS will lap-up as much memory as is given to it, so I always try to err on the side of generous.

swap

After having found an instance where Swap Really Mattered™, I always follow the Red Hat guidelines which state that swap space should be equal to installed RAM plus 2 gigabytes. For example, if 16GB of RAM is installed, swap should at least equal 18GB. Whether swap should be on a physical disk or in a logical volume is up for debate, but do not chintz on swap! It is vital to the healthy operation of almost all modern systems!

Base OS

This will vary on a per-platform basis, but a common rule-of-thumb is that Linux needs about 10GB for itself. Windows 2008 R2 requests a minimum of 20GB, but 40GB is substantially better.

OS/application logs

Here is another wild variable – though most applications have pretty predictable log storage requirements. For example, HP’s Server Automation (HPSA) tool will rarely exceed 10GB in total log file usage. Some things, like Apache, may have varying log files depending on how busy a website is.

Application

Lastly, and definitely most importantly, is the discussion surrounding the actual space needed for an application to be installed and run. On two ends of the spectrum, I will use Apache and HPSA for my examples.

The Apache application only requires a few dozen megabytes to install and run. The content served-up by Apache can, of course, be a big variable – a simple, static website might only use a few dozen megabytes. Whereas a big, complex website (like StackOverflow) might be using a few hundred gigabytes (in total with any dynamically-generated content which might be in a database, or similar)*.

The best way to address varying storage needs, in my opinion, is to use a robustifying tool like LVM – take storage presented to the server, and conglomerate it into a single mount point (perhaps /var/www/html) so that as content needs change, it can be grown transparently to the website.

Likewise, with HPSA, there are several base storage needs – for Oracle, initial software content, the application itself, etc. Picking-up on a previous post on bind mounts, I think it a Very Good Thing™ to present a mass of storage to a single initial mount point, like /apps, and then put several subdirectories in place to hold the “actual” application. Storage usage for the “variables” of HPSA – Software Repository, OS Media, Model Repository (database) – is very hard to predict, but a base guideline is that you need 100GB in total to start.

Choosing Storage Types

My recommendations

This is how I like to approach storage allocation on a physical server (virtual machines are a whole other beast, which I’l address in a future post) for HP Server Automation:

Base OS

I firmly believe this should be put on local storage – ideally a pair of mirror RAIDed 73GB drives (these could be SSDs, to accelerate boot time, but otherwise the OS, per se, should not be being “used” much. You could easily get away with 36GB drives, but since drives are cheap, using slightly more than you “need” is fine.

swap

Again, following the Red Hat guidelines (plus my patented fudge growth factor), ideally I want swap to be on either a pair of 36GB or a pair of 73GB drives – not RAIDed (neither striping, nor mirroring swap makes a great deal of sense). Yes, this means you should create a pair of swap partitions and present the whole shebang to the OS.

OS/application logs

Maybe this is a little paranoid, but I like to have at least 30GB for log space (/var/log). I view logs to be absolutely vital in the monitoring and troubleshooting arenas, so don’t chintz here!

Application

HPSA has four main space hogs, so I’ll talk about them as subheadings.

Oracle

It is important that the database has plenty of space – start at 200GB (if possible), and present it as a logically-managed volume group, preferably made up of one or more [growable] LUNs from a SAN.

Note: Thin-provisioning is a perfectly-acceptable approach to use, by the way (thin provisioning present space as “available” but not yet “allocated” from the storage device to the server).

Core application

The application really doesn’t grow that much over time (patches and upgrades do cause growth, but they are pretty well-defined).

Since this is the case, carve 50-60GB and present it as a [growable] LUN via LVM to the OS.

OS Media

Depending on data retention policies, number of distinct OS flavors you need to deploy, and a few other factors, it is a Good Idea™ to allocate a minimum of 40GB for holding OS media (raw-copied content from vendor-supplied installation ISOs). RHEL takes about 3.5GB per copy, and Windows 2008 R2 takes about the same. Whether this space is presented as an NFS share from a NAS, or as a [growable] LUN under an LVM group from a SAN isn’t vitally-important, but having enough space most certainly is.

Software Library

This is truly the biggest wildcard of them all – how many distinct packages do you plan to deploy? How big are they? How many versions needs to be kept? How many target OSes will you be managing?

I prefer to start with 50GB available to the Library. But I also expect that usage to grow rapidly once the system is in live use – Software Libraries exceeding 300GB are not uncommon in my field. As with the OS Media discussion, it isn’t vitally-important whether this space is allocated from a NAS or a SAN, but it definitely needs to be growable!

Closing comments (on HPSA and storage)

If separate storage options are not available for the big hogs of SA, allocating one, big LVM volume (made up of LUNs and/or DAS volumes), and then relying on bind mounts is a great solution (and avoids the issue of needing to worry about any given chunk of the tool exceeding its bounds too badly – especially if other parts aren’t being as heavily-used as might have been anticipated).


*Yes, yes, I know – once you hit a certain size, the presentation and content layers should be split into separate systems. For purposes of this example, I’m leaving it all together.