Archive for the ‘technical’ Category

on-demand, secure, distributed storage – one step closer

In follow-up to a post from 2013, and earlier this year, I’ve been working on a pointy-clicky deployable MooseFS+ownCloud atop encrypted file systems environment you can rent/buy as a service from my company.

I’ve also – potentially – kicked-off a new project from Bitnami to add MooseFS to their apps list.

automation is a multiplier

Multipliers. They’re ubiquitous – from ratchet wrenches to fertilizer, blocks-and-tackle to calculators, humans rely on multipliers all the time.

Multipliers are amazing things because they allow an individual to “do more with less” – a single person can build a coral castle with nothing more complex than simple machines. Or move 70 people at 70 miles per hour down an interstate merely by flexing his foot and twitching his arm.

Feats and tasks otherwise impossible become possible due to multipliers.

Automation is a multiplier. Some automating is obviously multiplicative – robots on assembly lines allow car manufacturers to output far more vehicles than they could in the pre-robot era. Even the assembly line is an automating force, and multiplier regarding the number of cars that could be produced by a set number of people in a given time period.

In the ever-more-constrained world of IT that I orbit and transit through – with salary budgets cut or frozen, positions not backfilled, and the ever-growing demands of end-users (whether internal or external), technicians, engineers, project managers, and the like are always being expected to do more with the same, or do more with less.

And that is where I, and the toolsets I work with, come into play – in the vital-but-hidden world of automation. Maybe it’s something as mundane as cutting requisition-to-delivery time of a server or service from weeks to hours. Maybe it’s something as hidden as automatically expanding application tiers based on usage demands – and dropping extra capacity when it’s no longer needed (one of main selling points of cloud computing). The ROI of automation is always seen as a multiplier – because the individual actor is now able to Get Things Done™ and at least appear smarter (whether they are actually any smarter or not is a totally different question).

Go forth and multiply, my friends.

reverse proxying from apache to tomcat

After much hemming and hawing, I was able to get Apache working as a reverse proxy to Tomcat today.

<VirtualHost *:80>
    ServerName domain.com
    ServerAlias www.domain.com
    ProxyPreserveHost on
    ProxyPass / http://localhost:8080/path/
    ProxyPassReverse / http://domain.com:8080/path/
</VirtualHost>

That’s all you need (though you can add much more). Note the trailing slashes on the proxy paths – without them, you have no dice.

pydio has clients now

In update to my recent how-to, I found out from the founder of Pydio there are dedicated clients now. IOW, you don’t have to use just the WebUI.

I haven’t tried any of them yet, but good to know they’re now there – it makes comparing Pydio and other tools like ownCloud easier.

what level of abstraction is appropriate?

Every day we all work at multiple levels of abstraction.

Perhaps this XKCD comic sums it up best:

If I'm such a god, why isn't Maru *my* cat?

abstraction

But unless you’re weird and think about these kinds of things (like I do), you probably just run through your life happily interacting at whatever level seems most appropriate at the time.

Most drivers, for example, don’t think about the abstraction they use to interact with their car. Pretty much every car follows the same procedure for starting, shifting into gear, steering, and accelerating/decelerating: you insert a key (or have a fob), turn it (or push a button), move the drive mode selection stick (gear shift, knob, etc), turn a steering wheel, and use the gas or brake pedals.

But that’s not really how you start a car. It’s not really how you select drive mode. It’s not really how you steer, etc.

But it’s a convenient, abstract interface to operate a car. It is one which allows you to adapt rapidly to different vehicles from different manufacturers which operate under the hood* in potentially very different ways.

The problem with any form of abstraction is that it’s just a summary – an interface – to whatever it is trying to abstract away. And sometimes those interfaces leak. You turn the key in your car and it doesn’t start. Crud. What did I forget to do, or is the car broken? Did I depress the brake and clutch pedal? Is it in Park? Did I make sure to not leave the lights on overnight? Did the starter motor seize? Is there gas in the tank? Did the fuel pump quit? These are all thoughts that might run through your mind (hopefully in decreasing likelihood of probability/severity) when the simple act of turning the key doesn’t work like you expect.

For a typical computer user, the only time they’ll even begin to care about how their system really works is when they try to do something they expect it to do … and it doesn’t. Just like drivers don’t think about their cars’ need for the fuel injector system to make minute adjustments thousands of times per second, most people don’t think about what it actually takes to go from typing “www.google.com” in their browser bar to getting the website returned (or how their computer goes from off to “ready to use” after pushing the power button).

Automation provides an abstraction to manual processes (be it furniture making or tier 1 operations run book scenarios). And abstractions are good things .. except when they leak (or outright break).

Depending on your level of engagement, the abstraction you need to work with will differ – but knowing that you’re at some level of abstraction (and, ideally, which level) is vital to being the most effective at whatever your role is.

I was asked recently how a presentation on the benefits of automation would vary based on audience. The possible audiences given in the question were: engineer, manager, & CIO. And I realized that when I’ve been asked questions like this before, I’ve never answered them wrong, but I’ve answered them very inefficiently: I have never used the level of abstraction to solve the general case of what this question is really getting at. The question is not about whether or not you’re comfortable speaker to any given “level” of customer representative (though it’s important). It is not about verifying you’re not lying about your work history (though also important).

No. That question is about finding out if you really know how to abstract to the proper level (in leakier fashions as you go upwards assumed) for the specific “type” of person you are talking to.

It is vital to be able to do the “three pitches” – the elevator (30 second), the 3 minute, and the 30 minute. Every one will cover the “same” content – but in very different ways. It’s very much related to the “10/20/30 rule of PowerPoint” that Guy Kawasaki promulgates: “a PowerPoint presentation should have ten slides, last no more than twenty minutes, and contain no font smaller than thirty points.” Or, to quote Winston Churchill, “A good speech should be like a woman’s skirt; long enough to cover the subject and short enough to create interest.”

The answer that epiphanized for me when I was asked that question most recently was this: “I presume everyone in the room is ‘as important’ as the CIO – but everyone gets the same ‘sales pitch’ from me: it’s all about ROI. The ‘return’ on ‘investment’ is going to look different from the engineer’s, manager’s, or CIO’s perspectives, but it’s all just ROI.”

The exact same data presented at three different levels of abstraction will “look” different, even though it’s conveying the same thing – because the audience’s engagement is going to be at their level of abstraction (though hopefully they understand at least to some extent the levels above (and below) themselves).

A simple example: it currently takes a single engineer 8 hours to perform all of the tasks related to patching a Red Hat server. There are 1000 servers in the datacenter. Therefore it takes 8000 engineer-hours to patch them all.

That’s a lot.

It’s a crazy lot.

But I’ve seen it countless times in my career. It’s why patching can so easily get relegated to a once-a-year (or even less often) cycle. And why so many companies are woefully out-of-date with their basic systems from known issues. If your patching team consists of 4 people, it’ll take them a year to patch all 8000 systems – and then they just have to start over again. It’d be like painting the Golden Gate Bridge – an unending process.

Now let’s say you happen to have a management tool available (could be as simple as pssh with preshared SSH keys, or as big and encompassing as Server Automation). And let’s say you have a local mirror of RHN – so you can decide just what, exactly, of any given channel you want to apply in your updates.

Now that you have a central point from which you can launch tasks to all of the Red Hat servers that need to be updated, and a managed source from which each will source their updates, you can have a single engineer launch updates to dozens, scores, even hundreds of servers simultaneously – bringing them all up-to-date in one swell foop. What had taken a single engineer 8 hours is still 8 – but it’s 8 in parallel: in other words, the “same” 8 hours is now touching scores of machines instead of 1 at a time. The single engineer’s efficiency has been boosted by a factor of, say, 40 (let’s stay conservative – I’ve seen this number as high as 1000 or more).

Instead of it taking 8000 engineer-hours to update all 1000 servers, it’s now only 200. Your 4 engineer patching team can now complete their update cycle in well under 2 weeks. What had taken a full year, is now being measured in days or weeks.

The “return on investment” at the abstraction level of the engineer is they have each been “given back” 1900 hours a year to work on other things (which helps make them promotable). The team’s manager sees an ROI of >90% of his team’s time is available for new/different tasks (like patching a new OS). The CIO sees an ROI of 7800 FTE hours no longer being expended – which means the business’ need for expansion, with an associated doubling of server estate, is now feasible without having to double his patching staff.

Every abstraction is like that – there is a different ROI for a taxi driver on his car “just working” than there is for a hot rodder who’s truly getting under the hood. But it’s still an ROI – one is getting his return by being able to ferry passengers for pay, and the other by souping-up his ride to be just that little (or lot) bit better. The ROI of a 1% fuel economy improvement by the fuel injector system being made incrementally smarter in conjunction with a lighter engine block might only be measured in cents per hour driving – but for FedEx, that will be millions of dollars a year in either unburned fuel, or additional deliveries (both of which are good for their bottom line).

Or consider the abstraction of talking about financial statements (be they for companies or governments) – they [almost] never list revenues and expenditures down to the penny. Not because they’re being lazy, but because the scale of values being reported do not lend themselves well to such mundane thinking. When a company like Apple has $178 billion in cash on hand, no one is going to care if it’s really $178,000,102,034.17 or $177,982,117,730.49. At that scale, $178 billion is a close-enough approximation to reality. And that’s what an abstraction is – it is an approximation to the reality being expressed down one level. It’s good enough to say that you start your car by turning the key – if you’re not an automotive engineer or mechanic. It’s good enough to approximate the US Federal Budget at $3.9 trillion or maybe $3900 billion (whether it should be that high is a totally different topic). But it’s not a good approximation to say $3,895,736,835,150.91 – it may be precise, but it’s not helpful.

I guess that means the answer to the question I titled this post with is, “the level of abstraction appropriate is directly related to your ‘function’ in relation to the system at hand.” The abstraction needs to be helpful – the minute it is no longer helpful (by being either too approximate, or too precise), it needs to be refined and focused for the audience receiving it.


*see what I did there?

may 11 bglug meeting 6:30p at beaumont branch: topic – freeipa

We will be meeting at the Beaumont Library Branch at 6:30p on 11 May.

Our speaker is the LUG’s own Nathaniel McCallum, one of the FreeIPA maintainers – and all-around nice guy.

Come out and support the LUG, learn something new, and meet cool people.

owncloud vs pydio – more diy cloud storage

Last week I wrote a how-to on using Pydio as a front-end to a MooseFS distributed data storage cluster.

The big complaint I had while writing that was that I wanted to use ownCloud, but it doesn’t Just Work™ on CentOS 6*.

After finishing the tutorial, I decided to do some more digging – because ownCloud looks cool. And because it bugged me that it didn’t work on CentOS 6.

What I found is that ownCloud 8 doesn’t work on CentOS 6 (at least not easily).

The simple install guide and process really is about version 8, and the last one that can be speedy-installed is 7. And as everyone knows, major version releases often make major changes in how they work. This appears to be very much the case with ownCloud going from 7 to 8.

In fact, the two pages needed for installing ownCloud are so easy to follow, I see no reason to copy them here. It’s literally three shell commands followed by a web wizard. It’s almost too easy.

You need to have MySQL/MariaDB installed and ready to accept connections (or use SQLite) – make a database, user, and give the user perms on the db. And you need Apache installed and running (along with PHP – but yum will manage that for you).

If you’re going to use MooseFS (or any other similar tool) for your storage backend to ownCloud, be sure, too, to bind mount your MFS mount point back to the ownCloud data directory (by default it’s /var/www/html/owncloud/data). Note: you could start by using local storage for ownCloud, and only migrate to a distributed setup later.

Pros of Pydio

  • very little futzing needed to make it work with CentOS 6
  • very clean user management
  • very clean webui
  • light system requirements (doesn’t even require a database)

Pros of ownCloud

  • apps available for major mobile platforms (iOS, Android), desktop)
  • no futzing needed to work with CentOS 7
  • very clean user management
  • clean webui

Cons of Pydio

  • no interface except the webui

Cons of ownCloud

  • needs a database
  • heavier system requirements
  • doesn’t like CentOS 6

What about other cloud environments like Seafile? I like Seafile, too. Have it running, in fact. Would recommend it – though I think there are better options now than it (including ownCloud & Pydio).


*Why do I keep harping on the CentOS 6 vs 7 support / ease-of-use? Because CentOS / RHEL 7 is different from previous releases. I covered that it was different for the Blue Grass Linux User Group a few months ago. Yeah, I know I should be embracing the New Way™ of doing things – but like most people, I can be a technical curmudgeon (especially humorous when you consider I work in a field that is about not being curmudgeonly).

Guess this means I really need to dive into the new means of doing things (mostly the differences in how services are managed) – fortunately, the Fedora Project put together this handy cheatsheet. And Digital Ocean has a clew of tutorials on basic sysadmin things – one I used for this comparison was here.

why do i use digital ocean?

Besides the fact that I have a referral code, I think Digital Ocean has done a great job of making an accessible, affordable, cloud environment for folks (like me) to spin-up and -down servers for trying new things out.

You can’t beat an average of 55 seconds to get a new server.

There are other great hosting options out there. I know folks who work at and/or use Rackspace. And AWS. Or Chunk Host.

They all have their time and place, but for me, DO has been the best option for much of what I want to do.

Their API is simple and easily-accessed, billing is straight-forward, and you can make your own templates to deploy servers from. For example, I could make a template for MooseFS Chunk servers so I could just add new ones whenever I need them to the cluster.

And I can expand/contract servers as needed, too.

create your own clustered cloud storage system with moosefs and pydio

This started-off as a how-to on installing ownCloud. But their own installation procedures don’t work for the 8.0x release and CentOS 6.

Most of you know I’ve been interested in distributed / cloud storage for quite some time.

And that I find MooseFS to be fascinating. As of 2.0, MooseFS comes in two flavors – the Community Edition, and the Professional Edition. This how-to uses the CE flavor, but it’d work with the Pro version, too.

I started with the MooseFS install guide (pdf) and the Pydio quick start steps. And, as usual, I used Digital Ocean to host the cluster while I built it out. Of course, this will work with any hosting provider (even internal to your data center using something like Backblaze storage pods – I chose Digital Ocean because they have hourly pricing; Chunk Host is a “better” deal if you don’t care about hourly pricing). In many ways, this how-to is in response to my rather hackish (though quite functional) need to offer file storage in an otherwise-overloaded lab several years back. Make sure you have “private networking” (or equivalent) enabled for your VMs – don’t want to be sharing-out your MooseFS storage to just anyone :)

Also, as I’ve done in other how-tos on this blog, I’m using CentOS Linux for my distro of choice (because I’m an RHEL guy, and it shortens my learning curve).

With the introduction out of the way, here’s what I did – and what you can do, too:

Preliminaries

  • spin-up at least 3 (4 would be better) systems (for purposes of the how-to, low-resource (512M RAM, 20G storage) machines were used; use the biggest [storage] machines you can for Chunk Servers, and the biggest [RAM] machine(s) you can for the Master(s))
    • 1 for the MooseFS Master Server (if using Pro, you want at least 2)
    • (1 or more for metaloggers – only for the Community edition, and not required)
    • 2+ for MooseFS Chunk Servers (minimum required to ensure data is available in the event of a Chunk failure)
    • 1 for ownCloud (this might be able to co-reside with the MooseFS Master – this tutorial uses a fully-separate / tiered approach)
  • make sure the servers are either all in the same data center, or that you’re not paying for inter-DC traffic
  • make sure you have “private networking” (or equivalent) enabled so you do not share your MooseFS mounts to the world
  • make sure you have some swap space on every server (may not matter, but I prefer “safe” to “sorry”) – I covered how to do this in the etherpad tutorial

MooseFS Master

  • install MooseFS master
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-master moosefs-cli
  • make changes to /etc/mfs/mfsexports.cfg
    • # Allow everything but “meta”.
    • #* / rw,alldirs,maproot=0
    • 10.132.0.0/16 / rw,alldirs,maproot=0
  • add hostname entry to /etc/hosts
    • 10.132.41.59 mfsmaster
  • start master
    • service moosefs-master start
  • see how much space is available to you (none to start)
    • mfscli -SIN

MooseFS Chunk(s)

  • install MooseFS chunk
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-chunkserver
  • add the mfsmaster line from previous steps to /etc/hosts
    • cat >> /etc/hosts
    • 10.132.41.59 mfsmaster
    • <ctrl>-d
  • make your share directory
    • mkdir /mnt/mfschunks
  • add your freshly-made directory to the end of /etc/mfshdd.cfg, with a size you want to share
    • /mnt/mfschunks 15GiB
  • start the chunk
    • service moosefs-chunkserver start
  • on the MooseFS master, make sure your new space has become available
    • mfscli -SIN
  • repeat for as many chunks as you want to have

Pydio / MooseFS Client

  • install MooseFS client
    • curl “http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS” > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS && curl “http://ppa.moosefs.com/MooseFS-stable-rhsysv.repo” > /etc/yum.repos.d/MooseFS.repo && yum -y install moosefs-client
  • add the mfsmaster line from previous steps to /etc/hosts
    • cat >> /etc/hosts
    • 10.132.41.59 mfsmaster
    • <ctrl>-d
  • mount MooseFS share somewhere where Pydio will be able to get to it later (we’ll use a bind mount for that in a while)
    • mfsmount /mnt/mfs -H mfsmaster
  • install Apache and PHP
    • yum -y install httpd
    • yum -y install php-common
      • you need more than this, and hopefully Apache grabs it for you – I installed Nginx then uninstalled it, which brought-in all the PHP stuff I needed (and probably stuff I didn’t)
  • modify php.ini to support large files (Pydio is exclusively a webapp for now)
    • memory_limit = 384M
    • post_max_size = 256M
    • upload_max_filesize = 200M
  • grab Pydio
    • you can use either the yum method, or the manual – I picked manual
    • curl http://hivelocity.dl.sourceforge.net/project/ajaxplorer/pydio/stable-channel/6.0.6/pydio-core-6.0.6.tar.gz
      • URL correct as of publish date of this blog post
  • extract Pydio tgz to /var/www/html
  • move everything in /var/www/html/data to /mnt/moosefs
  • bind mount /mnt/moosefs to /var/www/html/data
    • mount –bind /mnt/moosefs /var/www/html/data
  • set ownership of all Pydio files to apache:apache
    • cd /var/www/html && chown -R apache:apache *
    • note – this will give an error such as the following screen:
    • Screen Shot 2015-04-20 at 16.32.48this is “ok” – but don’t leave it like this (good enough for a how-to, not production)
  • start Pydio wizard
  • fill-in forms as they say they should be (admin, etc)
    • I picked “No DB” for this tutorial – you should use a database if you want to roll this out “for real”
  • login and starting using it

Screen Shot 2015-04-20 at 17.07.51

Now what?

Why would you want to do this? Maybe you need an in-house shared/shareable storage environment for your company / organization / school / etc. Maybe you’re just a geek who likes to play with new things. Or maybe you want to get into the reselling business, and being able to offer a redundant, clustered, cloud, on-demand type storage service is something you, or your customers, would find profitable.

Caveats of the above how-to:

  • nothing about this example is “production-level” in any manner (I used Digital Ocean droplets at the very small end of the spectrum (512M memory, 20G storage, 1 CPU))
    • there is a [somewhat outdated] sizing guide for ownCloud (pdf) that shows just how much it wants for resources in anything other than a toy deployment
    • Pydio is pretty light on its basic requirements – which also helped this how-to out
    • while MooseFS is leaner when it comes to system requirements, it still shouldn’t be nerfed by being stuck on small machines
  • you shouldn’t be managing hostnames via /etc/hosts – you should be using DNS
    • DNS settings are far more than I wanted to deal with in this tutorial
  • security has, intentionally, been ignored in this how-to
    • just like verifying your inputs is ignored in the vast majority of programming classes, I ignored security considerations (other than putting the MooseFS servers on non-public-facing IPs)
    • don’t be dumb about security – it’s a real issue, and one you need to plan-in from the very start
      • DO encrypt your file systems
      • DO ensure your passwords are complex (and used rarely)
      • DO use key-based authentication wherever possible
      • DON’T be naive
  • you should be on the mailing list for MooseFS and Pydio forum.
    • the communities are excellent, and have been extremely helpful to me, even as a lurker
  • I cannot answer more than basic questions about any of the tools used herein
  • why I picked what I picked and did it the way I did
    • I picked MooseFS because it seems the easiest to run
    • I picked Pydio because the ownCloud docs were borked for the 8.0x release on CentOS 6 – and it seems better than alternatives I could find (Seafile, etc) for this tutorial
    • I wanted to use ownCloud because it has clients for everywhere (iOS, Android, web, etc)
    • I have no affiliation with either MooseFS or Pydio beyond thinking they’re cool
    • I like learning new things and showing them off to others

Final thoughts

Please go make this better and show-off what you did that was smarter, more efficient, cheaper, faster, etc. Turn it into something you could deploy as an AMID on AWS. Or Docker containers. Or something I couldn’t imagine. Everything on this site is licensed under the CC BY 3.0 – have fun with what you find, make it awesomer, and then tell everyone else about it.

I think I’ll give LizardFS a try next time – their architecture is, diagrammatically, identical to the “pro” edition of MooseFS. And it’d be fun to have experience with more than one solution.

keep your wordpress installs up-to-date

I run several websites on my server – nothing heavy, just some various vhosts for Apache.

Many (but not all) of them run WordPress.

At some unknown point (and I haven’t kept the crap that was being used around), over 100,000 files were uploaded to the root directory of one of the websites (the only one, apparently, I did not have cron’d to keep up-to-date with the latest-and-greatest version of WordPress) – most of these were random-named HTML or JavaScript files. Sometime late Thursday night / early Friday morning of last week, some number of those were triggered which launched a DDoS (distributed denial-of-service) attack against a hosting company in England.

After a relatively short period of time (on the order of a couple hours at most), this otherwise-low-traffic site generated 48MB in Apache httpd logs (normal for a given day is on the order of a few dozen to couple hundred kilobytes).

My hosting provider, with no warning, “locked” my server, and sent me an administrative message with the following cryptic email:

Your server with the above-mentioned IP address has carried out an attack on another server on the Internet.

This has placed a considerable strain on network resources and, as a result, a segment of our network has been adversely affected.

Your server has therefore been deactivated as a precautionary measure.

A corresponding log history is attached at the end of this email.

10:00:21.645887 14:da:e9:b3:97:dc > 28:c0:da:46:26:0d, ethertype IPv4 (0x0800), length 1514: 176.9.40.74 > 85.233.160.139: ip-proto-17
10:00:21.646166 14:da:e9:b3:97:dc > 28:c0:da:46:26:0d, ethertype IPv4 (0x0800), length 1514: 176.9.40.74 > 85.233.160.139: ip-proto-17
10:00:21.649166 14:da:e9:b3:97:dc > 28:c0:da:46:26:0d, ethertype IPv4 (0x0800), length 1514: 176.9.40.74 > 85.233.160.139: ip-proto-17
10:00:21.649416 14:da:e9:b3:97:dc > 28:c0:da:46:26:0d, ethertype IPv4 (0x0800), length 1514: 176.9.40.74 > 85.233.160.139: ip-proto-17
10:00:21.649421 14:da:e9:b3:97:dc > 28:c0:da:46:26:0d, ethertype IPv4 (0x0800), length 1514: 176.9.40.74.54988 > 85.233.160.139.8888: UDP, length 8192

Gee, thanks, hosting company – that was informative.

After several hours of back-and-forth with their support group, I was finally able to get a rescue boot environment enabled, a KVM session to that environment, and could start diagnosing the problem(s). First, of course, were the normal checks of dmesg, /var/log/messages, and the like. there was running dig to find out who was being attacked (how I found the target IP belonged to a hosting provider in the UK). Nothing. I was also Googling similar error messages, and finally found a clue (though cannot recall where) that malicious JavaScript can cause messages like those provided to me to be trapped by external logging systems.

This led me to look in /var/log/httpd instead of just /var/log. And there is where I found the unusual log file for my LUG’s website here in Kentucky – bglug-access_log was 48 megabytes. And bglug-error_log was 4.3 MB. As I mentioned above, a typical access_log for that site is closer to ~100 KB.

Opening the ginormous log file showed a host of HTTP 200 response codes for things that looked nothing like WordPress files (things like “qdlrdi-casio-parliament-90treaty.html”). There shouldn’t be HTTP 200 (OK) response codes for non-WordPress files, because it’s a WordPress-powered website.

Running a file listing to screen failed (in the rescue boot environment) – but doing an ls -l > files.out, and then a wc -l files.out showed over 105,000 files in the root directory of the BGLUG website.

To get my server back up and online as quickly as possible, I edited the Apache vhosts.conf and disabled the Blue Grass Linux User Group site and contacted my hosting company as to what the root cause of the issue was, and what I had done to fix it (both needed for them to reenable my system).

After getting the server back online normally, I was able to clear-out all the junk that had been transparently uploaded into the LUG’s site.

One of the biggest annoyances of the whole process (after not having been given any warning from my hosting provider, but just a summary disconnect) was that permissions on the directory for the website were “correct” to have disallowed uploading random junk to the server:
drwxr-xr-x 6 bglug apache 5611520 Apr 11 13:24 bglug

The user bglug had not been compromised (it hasn’t even logged-in in a few months) – and neither was the apache group (which, of course, cannot login, but still).

Apparently, some part of the version of WordPress the site was running (or a plugin) was compromised, and allowed a malicious attacker to upload junk to the server, and spawn this DDoS on my server.

Moral of the story? Keep all your software up-to-date, and monitor your logs for suspicious activity – not sure monitoring would’ve done me good in this case, but it’s a Good Practice™ anyway.