Tag Archives: data

after “the cloud”

Cloud computing has been hyped for the last decade+.

For those few of you haven’t heard of it and understand it, cloud computing is a computing-as-a-utility concept wherein compute (and storage) happens on systems which you may not own. That’s it.

So – now that we’ve been offloading our storage, computing, and other tasks to others in an on-demand manner, what is next?

When computing started, it was centralized, you worked on terminals (that communicated right back to the central machine), and did not “own” any of the work at your local work station.

Then we moved into the PC era where computing was done locally, and we only saved data to a server if “we wanted to be backed up”.

Now we’re moving back into a centralized (and distributed at the same time) computing environment where we can access the same document on our iPad and laptop and twelve other people can see it at the same time, too (eg Google Docs).

We are moving more and more toward ubiquitous computing – smartphones, tablets, laptops, PCs, servers, cars, everything we own is becoming computing-aware (also related: the “internet of things“).

What’s going to come after the cloud hype dies out and we’re back to “business as usual”? Well, other than some as-yet-unnamed term becoming the hot topic du jour – nothing. Computing hasn’t changed in the last 50 years except to become faster, smaller, and more prevalent.

Where computing happens will always depend on the given job at hand – we will centralize when it makes sense, we will distribute when it makes sense, and we will localize when it makes sense.

The real concern for the next decade is data security and integrity. It doesn’t matter where you store your data, or how you process it: if you cannot rely on its accuracy, integrity, and safety, it’s just so much noise.

If you can’t access it when you want need, you’ve already lost.

maps

I love maps. I have a calendar with historical maps on my wall next to my desk. I love books based around atlases (such as the Historical Atlas of series (many by Ian Barnes (similarly related review)). I like going to museums, visiting websites, used book shops, etc and just peruse the maps. I used to have a small collection of rail and bus transit maps from around the world (London, Hong Kong, Singapore, New York City, Washington DC, Chicago …). On my phone I have Apple Maps, Google Maps, MapQuest, Scout, TeleNav, Park Me, and Google Earth.

I love books like 1421 by Gavin Menzies (my review) that have histories of map making, ancient maps reproduced, etc.

When I graduated from HVCC in 2001, I had hoped to join many of my classmates from school at MapInfo. I think GIS is fascinating (and know someone, now, who works for the KY government doing GIS).

I wish I could be a cartographer.

I can’t draw, though – so I sate my appetite for geography via reading maps others have made.

Data visualization, which is all map-making is, is another, broader interest of mine – but also one I don’t have enough of a grasp of to work with intelligently too often.

All this leads me to ask for the best introduction to GIS you have seen for someone interested in cartography, and with a basic knowledge of system design and architecture. What would it be?

establishing a data haven cloud

In Neal Stephenson’s seminal book, Cryptonomicon, he describes the creation of a “data haven” in the fictional Sultanate of Kinakuta.

Why has no-one started building such a service (or, at least not in a public way) on existing cloud services (eg AWS or Rackspace) and/or create their own global network?

Data backup and replication is not “difficult” – and neither is the concept of distributed (and replicated) storage (LeftHand Networks was doing RAID-over-LAN a while before HP bought them).

So – why is this not available as a service to which you can subscribe (or use anonymously)? Incorporating in a ‘friendly’ country, offering anonymized connections (fully encrypted, etc), and giving a client that works a la Dropbox or Box.com.

There should be lots of companies who would love to offer a service like this – it should be fairly lucrative, and pretty easy to setup.

digital preservation

I have been an active member on the Stack Exchange family of sites [nearly] since StackOverflow started a few years ago.

Recently a new proposal has been made for Digital Preservation. Many of the proposed questions are interesting (including one of mine) – and I would strongly encourage anyone interested in the topic to check it out.

The topic has resparked a question I have had for a long time – why is it important to archive data?

Not that I think it’s inherently bad to hold onto digital information for some period of time – but what is the impetus for storing it more-or-less forever?

In tech popculture we have services like Google’s gmail which starts users at a mind-boggling 7+ gigabytes of storage! For email! Who has 7GB of email that needs to be stored?! For a variety of reasons, I hold onto all of my work email for the duration of my employment with a given company – you never know when it might be useful (and it turns out it’s useful fairly frequently). But personal email? Really? Who needs either anywhere near that much, or to hold onto it for that long? And those few people who arguably DO need that much, or to keep it forever, can afford to store it somewhere safely.

I think there is a major failing in modern thinking that says we have to save everything we can just because we can. Is storage “cheap”? Absolutely. But the hoard / “archive” mentality that pervades modern culture needs to be combated heavily. We, as a people, need to learn how to forget – and how to remember properly. Our minds are, more and more, becoming “googlized“. We have decided it’s more important to know how to find what we want rather to know it. And for some things, this is good:

If you are a machinist, is it better to know how to reverse-thread the inside of a titanium pipe end-cap, or to go look up what kind of tooling and lathe settings you will need when you get around to making that part? I suppose that if all you ever do in life is mill reverse-threaded titanium pipe end-caps, you should probably commit that piece of information to memory.

But we need to remember to forget, too:

when you need to make two of these things. Ever. In your entire life. In the entire history of every company you ever work for. Well, then I would say it’s better to go look up that particular datum when you need it. And then promptly forget it.

The historical value, interest, and amazing work that is contained in the “Domesday Books” is amazing – and something that has been of immense value to historians, archivists, politicians, and the general public. Various and sundry public records (census data, property deeds, genealogies, etc) are fantastic pieces to hold onto – and to make as available and accessible as possible.

Making various other archives available publicly is great too (eg the NYO&WRHS) – and I applaud each and every one of those efforts; indeed, I contribute to them whenever I can.

I continuously wonder, though, how many of these records and artifacts truly need to be saved – certainly it is true of physical artifacts that preservation is important, but how many copies of the first printing of Moby Dick do we need (to pick an example)?

I don’t know what the best answer is to digital hoarding, but preservation is a topic which needs to be considered carefully.