antipaucity

fighting the lack of good ideas

all hail the thunderstorm!

Got our first hail of the year today – pea sized, and not much (thankfully) – but it’s here.

owncloud vs pydio – more diy cloud storage

Last week I wrote a how-to on using Pydio as a front-end to a MooseFS distributed data storage cluster.

The big complaint I had while writing that was that I wanted to use ownCloud, but it doesn’t Just Work™ on CentOS 6*.

After finishing the tutorial, I decided to do some more digging – because ownCloud looks cool. And because it bugged me that it didn’t work on CentOS 6.

What I found is that ownCloud 8 doesn’t work on CentOS 6 (at least not easily).

The simple install guide and process really is about version 8, and the last one that can be speedy-installed is 7. And as everyone knows, major version releases often make major changes in how they work. This appears to be very much the case with ownCloud going from 7 to 8.

In fact, the two pages needed for installing ownCloud are so easy to follow, I see no reason to copy them here. It’s literally three shell commands followed by a web wizard. It’s almost too easy.

You need to have MySQL/MariaDB installed and ready to accept connections (or use SQLite) – make a database, user, and give the user perms on the db. And you need Apache installed and running (along with PHP – but yum will manage that for you).

If you’re going to use MooseFS (or any other similar tool) for your storage backend to ownCloud, be sure, too, to bind mount your MFS mount point back to the ownCloud data directory (by default it’s /var/www/html/owncloud/data). Note: you could start by using local storage for ownCloud, and only migrate to a distributed setup later.

Pros of Pydio

  • very little futzing needed to make it work with CentOS 6
  • very clean user management
  • very clean webui
  • light system requirements (doesn’t even require a database)

Pros of ownCloud

  • apps available for major mobile platforms (iOS, Android), desktop)
  • no futzing needed to work with CentOS 7
  • very clean user management
  • clean webui

Cons of Pydio

  • no interface except the webui

Cons of ownCloud

  • needs a database
  • heavier system requirements
  • doesn’t like CentOS 6

What about other cloud environments like Seafile? I like Seafile, too. Have it running, in fact. Would recommend it – though I think there are better options now than it (including ownCloud & Pydio).


*Why do I keep harping on the CentOS 6 vs 7 support / ease-of-use? Because CentOS / RHEL 7 is different from previous releases. I covered that it was different for the Blue Grass Linux User Group a few months ago. Yeah, I know I should be embracing the New Way™ of doing things – but like most people, I can be a technical curmudgeon (especially humorous when you consider I work in a field that is about not being curmudgeonly).

Guess this means I really need to dive into the new means of doing things (mostly the differences in how services are managed) – fortunately, the Fedora Project put together this handy cheatsheet. And Digital Ocean has a clew of tutorials on basic sysadmin things – one I used for this comparison was here.

jump start your brain by doug hall

I’m happy I didn’t pay for this copy of Jump Start Your Brain.

I’m saddened someone else did in order to give it to me.

The core of Doug Hall’s creative self-help book from 1996 is decent: get outside yourself, remember what it’s like to be a kid, have fun, don’t take yourself too seriously, and be willing to take calculated risks.

The problem is that summary could be said of pretty much any 3-5 page group of the book, and the rest of the pages seem to be filled with text, quotes, and graphics to show you that you can’t be effectively creative if you’re stagnant in your thinking.

Save yourself the trouble of buying (or even reading) this book, and instead take its core advice:

Maybe version 2.0 is better? I dunno. Not really psyched to find out.

But the blog looks nice.

steam by andrea sutcliffe

Andrea Sutcliffe’s book Steam: The Untold Story of American’s First Great Invention was a pure joy to read. Being the second review I’m writing with my “new” system, I hope you find this book as interesting as I have.

In 1784, James Rumsey designed a boat that could, by purely mechanical means, move its way upstream. What he devised was truly brilliant: imagine a catamaran or pontoon boat with a platform across the two hulls. Anchored to the platform is a waterwheel. The waterwheel dips into the river, and is connected via a linkage to poles that push the boat against the current like a Venetian Gondola.

Why did he develop such a device? Because at the time, shipping by barge etc was incredibly simple downstream – you load-up the barge, give it a small crew, and float downriver. But because there was no way of mechanically returning the vessel upstream (without using sail power, which can be fickle to use, and uses a lot of otherwise-usable cargo area). So barges and shipping vessels tended to be crudely made so they would only ever go downstream – at their destination they’d be turned into building materials. And the crews would have to return on foot. To put this in perspective, it took about 4 weeks to float a barge from Pittsburg down the Ohio to the Mississippi to New Orleans. And it took about 6 months to get home.

Enter the need for reliable mechanical ship propulsion.

Beginning in his teens as a surveyor for the 6th Lord Fairfax, George Washington became enamored with the idea of inland navigation – that is, using streams, canals, rivers, and lakes to transport people and goods instead of the ocean. During his tenure as a surveyor, then an engineer, then a general, he never lost sight of what he viewed as the budding nation’s biggest hurdle to westward expansion – the overwhelmingly high cost of transporting goods from east to west, and vice versa. Along the coast, transport was simple and cheap. But to go far inland made prices exorbitantly high for both consumers and shippers – which made markets hard to tap.

The initial days of the steam wars are proof that ideas are worthless. Stationary steam engines, like those made by Boulton & Watt were too heavy and inefficient to possibly consider putting on a boat – at any scale. So while the idea of steam-powered travel had been running around folks’ minds for 20+ years by the time Rumsey built his simple mechanical boat, there was no way to practically use it.

What was needed were major improvements on steam engine design and implementation before wider applications for their power could be found. This is where the steamboat wars start to become exciting. Independently, Rumsey and a man named John Fitch (with his business partner) developed the pipe boiler which reduced the amount of water needed for operating an engine for the same power output, increased fuel efficiency, cut heating time, and lightened the engine itself. Traditional steam engines used a pot boiler – effectively a massive tank of water that would be heated in gestalt. As anyone who has ever timed how long it takes to start boiling water in a tea kettle vs a stock pot knows, water is very difficult to heat, and lots of energy is needed to move it even a couple degrees.

The fact is, that one new idea leads to another, that to a third, and so on through a course of time until someone, with whom none of these ideas was original, combines all together, and produces what is justly called a new invention. –Thomas Jefferson

Fascinatingly, Thomas Jefferson was against the idea of patents and copyright law, and likely would have campaigned heavily against it in the Constitutional process had he not been Minister to France. From a letter he wrote years after serving on the first Patent Commission Board:

He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature… Inventions then cannot, in nature, be a subject of property. Society may give an exclusive right to the profits arising form them, as an encouragement to men to pursue ideas which may produce utility, but this may or may not be done, according to the will and convenience of the society.

Contrast this to the efforts of both Fitch and Rumsey who lobbied for patent boards of some kind (at both the state and federal levels) between the end of the Revolutionary War and the ratification of the Unites States Constitution.

Sutcliffe’s account of the first “steamboat wars” shows that intellectual property litigation is an expensive, time-consuming, and distracting effort – whose end may or may not have any value.

Progress is an illusion, it happens, but it is slow and invariably disappointing. –George Orwell

Thornton’s condenser is undoubtedly one of the best calculated to condense without a jet of water, but I conceive the difficulty of getting rid of the air insurmountable .. when [the air] is drove back again by the steam to the cold condenser, it becomes nearly equal to common air in density, and skulks into the bottom of the condenser for security. –John Fitch (describing a new condenser design in 1790)

Based upon the extensive research Ms Sutcliffe has done into the early history and designs of steam engines and their associated mechanical conveyances, an old idea of mine has newly gained plausible validity: that of a steam-powered tank. Back in high school I postulated that both the power-to-weight and power-to-size ratio of steam engines had advanced sufficiently by the late 1850s that, in conjunction with a primitive form of caterpillar track design (which Fitch would have called an “endless chain of feet” (vs an early idea of his to use an “endless chain of paddles”)), that the first fully-mechanized war machines could have been built and sent into battle not in WWI, as the first tanks actually were, but instead during the Civil War – 50 years sooner. Leonardo Da Vinci has designed a human-powered armored car in the late 15th century. Replacing man power with steam power could have been a logical thing to have done – but no one ever did.

In the availability of men willing to persevere with a possibly “ridiculous” idea, America had an advantage. –Frank D Pager on the early successes of the Industrial Revolution in America.

Fitch and Rumsey took their war to the people in a series of “pamphlets” published over the course of many months. From Sutcliffe’s description of a “pamphlet” in this context, it seems they were the late 18th century version of a sourced blog or op-ed. Ranging from 20 to 50 (or more) pages in length, with affidavits, letters, and histories presented, the pamphlet was the common man’s research or position paper. I suppose they may have been used by others, too – but the context given in Steam shows them used as marketing and propaganda pieces.

He that studies and writes on the improvements of the arts and sciences labours to benefit generations unborn, for it is impossible that his contemporaries will pay any attention to him. –Oliver Evans

It’s the same each time with progress. First they ignore you, then they say you’re mad, then dangerous, then there’s a pause and then you can’t find anyone who disagrees with you. –Tony Benn (British Labour politician)

Seems that’s where Ghandi may have gotten the inspiration for this famous quotation:

First they ignore you, then they laugh at you, then they fight you, then you win.

Or perhaps it was Benn who was inspired by Ghandi. Or maybe they just realized the same thing independently.

hey yahoo! sports – why not always post the magic number for every team?

Since the magic number (and I’ll take the example of baseball, because while I don’t get to watch them much, I do follow the Mets) is so easy to calculate, why not post it on the standings as soon as there have been games played?

This would be a good use of technology relative to baseball (or any sport).

In case you’re wondering, the math for the magic number is as follows:

G + 1 − WA − LB

where

  • G is the total number of games in the season
  • WA is the number of wins that Team A has in the season
  • LB is the number of losses that Team B has in the season

As of today, the magic number for the Mets is 162 + 1 – 12 – 6, or 145.

why do i use digital ocean?

Besides the fact that I have a referral code, I think Digital Ocean has done a great job of making an accessible, affordable, cloud environment for folks (like me) to spin-up and -down servers for trying new things out.

You can’t beat an average of 55 seconds to get a new server.

There are other great hosting options out there. I know folks who work at and/or use Rackspace. And AWS. Or Chunk Host.

They all have their time and place, but for me, DO has been the best option for much of what I want to do.

Their API is simple and easily-accessed, billing is straight-forward, and you can make your own templates to deploy servers from. For example, I could make a template for MooseFS Chunk servers so I could just add new ones whenever I need them to the cluster.

And I can expand/contract servers as needed, too.

create your own clustered cloud storage system with moosefs and pydio

This started-off as a how-to on installing ownCloud. But their own installation procedures don’t work for the 8.0x release and CentOS 6.

Most of you know I’ve been interested in distributed / cloud storage for quite some time.

And that I find MooseFS to be fascinating. As of 2.0, MooseFS comes in two flavors – the Community Edition, and the Professional Edition. This how-to uses the CE flavor, but it’d work with the Pro version, too.

I started with the MooseFS install guide (pdf) and the Pydio quick start steps. And, as usual, I used Digital Ocean to host the cluster while I built it out. Of course, this will work with any hosting provider (even internal to your data center using something like Backblaze storage pods – I chose Digital Ocean because they have hourly pricing; Chunk Host is a “better” deal if you don’t care about hourly pricing). In many ways, this how-to is in response to my rather hackish (though quite functional) need to offer file storage in an otherwise-overloaded lab several years back. Make sure you have “private networking” (or equivalent) enabled for your VMs – don’t want to be sharing-out your MooseFS storage to just anyone 🙂

Also, as I’ve done in other how-tos on this blog, I’m using CentOS Linux for my distro of choice (because I’m an RHEL guy, and it shortens my learning curve).

With the introduction out of the way, here’s what I did – and what you can do, too:

Preliminaries

  • spin-up at least 3 (4 would be better) systems (for purposes of the how-to, low-resource (512M RAM, 20G storage) machines were used; use the biggest [storage] machines you can for Chunk Servers, and the biggest [RAM] machine(s) you can for the Master(s))
    • 1 for the MooseFS Master Server (if using Pro, you want at least 2)
    • (1 or more for metaloggers – only for the Community edition, and not required)
    • 2+ for MooseFS Chunk Servers (minimum required to ensure data is available in the event of a Chunk failure)
    • 1 for ownCloud (while this might be able to co-reside with the MooseFS Master – this tutorial uses a fully-separate / tiered approach)
  • make sure the servers are either all in the same data center, or that you’re not paying for inter-DC traffic
  • make sure you have “private networking” (or equivalent) enabled so you do not share your MooseFS mounts to the world
  • make sure you have some swap space on every server (may not matter, but I prefer “safe” to “sorry”) – I covered how to do this in the etherpad tutorial

MooseFS Master

  • install MooseFS master
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-master moosefs-cli
  • make changes to /etc/mfs/mfsexports.cfg
    • # Allow everything but “meta”.
    • #* / rw,alldirs,maproot=0
    • 10.132.0.0/16 / rw,alldirs,maproot=0
  • add hostname entry to /etc/hosts
    • 10.132.41.59 mfsmaster
  • start master
    • service moosefs-master start
  • see how much space is available to you (none to start)
    • mfscli -SIN

MooseFS Chunk(s)

  • install MooseFS chunk
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-chunkserver
  • add the mfsmaster line from previous steps to /etc/hosts
    • cat >> /etc/hosts
    • 10.132.41.59 mfsmaster
    • <ctrl>-d
  • make your share directory
    • mkdir /mnt/mfschunks
  • add your freshly-made directory to the end of /etc/mfshdd.cfg, with a size you want to share
    • /mnt/mfschunks 15GiB
  • start the chunk
    • service moosefs-chunkserver start
  • on the MooseFS master, make sure your new space has become available
    • mfscli -SIN
  • repeat for as many chunks as you want to have

Pydio / MooseFS Client

  • install MooseFS client
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-client
  • add the mfsmaster line from previous steps to /etc/hosts
    • cat >> /etc/hosts
    • 10.132.41.59 mfsmaster
    • <ctrl>-d
  • mount MooseFS share somewhere where Pydio will be able to get to it later (we’ll use a bind mount for that in a while)
    • mfsmount /mnt/mfs -H mfsmaster
  • install Apache and PHP
    • yum -y install httpd
    • yum -y install php-common
      • you need more than this, and hopefully Apache grabs it for you – I installed Nginx then uninstalled it, which brought-in all the PHP stuff I needed (and probably stuff I didn’t)
  • modify php.ini to support large files (Pydio is exclusively a webapp for now)
    • memory_limit = 384M
    • post_max_size = 256M
    • upload_max_filesize = 200M
  • grab Pydio
    • you can use either the yum method, or the manual – I picked manual
    • curl http://hivelocity.dl.sourceforge.net/project/ajaxplorer/pydio/stable-channel/6.0.6/pydio-core-6.0.6.tar.gz
      • URL correct as of publish date of this blog post
  • extract Pydio tgz to /var/www/html
  • move everything in /var/www/html/data to /mnt/moosefs
  • bind mount /mnt/moosefs to /var/www/html/data
    • mount –bind /mnt/moosefs /var/www/html/data
  • set ownership of all Pydio files to apache:apache
    • cd /var/www/html && chown -R apache:apache *
    • note – this will give an error such as the following screen:
    • Screen Shot 2015-04-20 at 16.32.48this is “ok” – but don’t leave it like this (good enough for a how-to, not production)
  • start Pydio wizard
  • fill-in forms as they say they should be (admin, etc)
    • I picked “No DB” for this tutorial – you should use a database if you want to roll this out “for real”
  • login and starting using it

Screen Shot 2015-04-20 at 17.07.51

Now what?

Why would you want to do this? Maybe you need an in-house shared/shareable storage environment for your company / organization / school / etc. Maybe you’re just a geek who likes to play with new things. Or maybe you want to get into the reselling business, and being able to offer a redundant, clustered, cloud, on-demand type storage service is something you, or your customers, would find profitable.

Caveats of the above how-to:

  • nothing about this example is “production-level” in any manner (I used Digital Ocean droplets at the very small end of the spectrum (512M memory, 20G storage, 1 CPU))
    • there is a [somewhat outdated] sizing guide for ownCloud (pdf) that shows just how much it wants for resources in anything other than a toy deployment
    • Pydio is pretty light on its basic requirements – which also helped this how-to out
    • while MooseFS is leaner when it comes to system requirements, it still shouldn’t be nerfed by being stuck on small machines
  • you shouldn’t be managing hostnames via /etc/hosts – you should be using DNS
    • DNS settings are far more than I wanted to deal with in this tutorial
  • security has, intentionally, been ignored in this how-to
    • just like verifying your inputs is ignored in the vast majority of programming classes, I ignored security considerations (other than putting the MooseFS servers on non-public-facing IPs)
    • don’t be dumb about security – it’s a real issue, and one you need to plan-in from the very start
      • DO encrypt your file systems
      • DO ensure your passwords are complex (and used rarely)
      • DO use key-based authentication wherever possible
      • DON’T be naive
  • you should be on the mailing list for MooseFS and Pydio forum.
    • the communities are excellent, and have been extremely helpful to me, even as a lurker
  • I cannot answer more than basic questions about any of the tools used herein
  • why I picked what I picked and did it the way I did
    • I picked MooseFS because it seems the easiest to run
    • I picked Pydio because the ownCloud docs were borked for the 8.0x release on CentOS 6 – and it seems better than alternatives I could find (Seafile, etc) for this tutorial
    • I wanted to use ownCloud because it has clients for everywhere (iOS, Android, web, etc)
    • I have no affiliation with either MooseFS or Pydio beyond thinking they’re cool
    • I like learning new things and showing them off to others

Final thoughts

Please go make this better and show-off what you did that was smarter, more efficient, cheaper, faster, etc. Turn it into something you could deploy as an AMID on AWS. Or Docker containers. Or something I couldn’t imagine. Everything on this site is licensed under the CC BY 3.0 – have fun with what you find, make it awesomer, and then tell everyone else about it.

I think I’ll give LizardFS a try next time – their architecture is, diagrammatically, identical to the “pro” edition of MooseFS. And it’d be fun to have experience with more than one solution.