Cloud computing is a sea change - How sysadmins can prepare

So there’s a sea change coming in where, why, and how sysadmin and system integrator talent is going to be applied to solve problems in the IT/web space. I’ve felt this way for some time, largely influenced by what I’ve seen working at Joyent (a cloud computing company) for nearly two years. As a profession we’ve been dabbling with the sorts of things I’m about to discuss for a long time, but I’ve never really articulated it myself and I’d like to get to get it off my chest, as it were. What happened was last week I was at the Bootup Labs office talking to Trevor and Boris (among others) about where the cloud computing space is going and what kinds of software we as an industry are going to need to manage it. The conversation was fun, and as usual I rambled incoherently but Trevor’s a smart lad with some ideas and he has taken those ramblings and run with them, putting up a compelling piece entitled Hosting Apocalypse that you should probably just go read. I’ll wait. What it did for me is put into focus the fact that the constructs us sysadmins deal with on a daily basis are changing. Here’s why.

The gist of Trevor’s post is that there’s almost certainly going to be price war start in the IaaS (Infrastructure as a Service), or “Cloud Computing” space. The players (Amazon, Google, IBM, Microsoft) are massive, have huge war chests, and have the means (money) to destroy everybody else in order to establish themselves as dominant providers. There will probably end up only being a handful of providers (not unlike the server market today). That has some pretty big implications. The first casualties are likely going to be the traditional “hosting” companies that most web developers use today to deploy their web apps. They won’t be able to compete on cost. They won’t be able to compete on provisioning agility / scale, either. And scale is important (see below). So yeah, it’s gonna suck to be a ServerBeach or EV1 type provider sitting on datacenters full of aging, generic, power sucking, under-utilized PCs trying to sell them one-by-one while competing against an Amazon or Microsoft or Google who just built a $500M datacenter floating in the freakin’ ocean equipped with the latest in purpose built, power saving, virtualization nodes..  Apocalypse? Probably, yes. Another immediate casualty is going to be “control panel” software, like Ensim or Plesk and the shared hosting that it caters to. Shard is gonna be squeezed out of the market, attacked on the bottom end by these IaaS giants, and rendered irrelevant on the top end by smarter services. Things like Google App Engine, Reasonably Smart, GitHub, Campaign Monitor, Mosso, and Heroku are all examples the different kinds of “higher level hosting services” that are replacing the shared and dedicated hosts we used to use to do this kind of stuff. But control panel software is for pussies anyways, right, so why would a sysadmin care if they all eat it? Because the other casualty of the upcoming cloud era is going to be the traditional sysadmin. That’s right, as an industry we’re truly about to automate ourselves out of our jobs. Which, of course, is what we’ve been nobly aspiring to do for 30 years, but, folks, this time we’re actually going to do it and it’s going to happen at a massive scale as the whole industry shifts to Iaas/PaaS. This, I think, is a sea change event for sysadmins. Our roles as System Integrators will totally change. Got your attention now?

You already see it on every site that talks about cloud computing: Cost justifications showing how having to pay your own sysadmins cost soooo much money. There’s a lot of small shop sysadmins out there right now, working as employees or contractors, and the balance of power is being shifted to the hands of developers more than ever as they can now fire up VMs with a few lines of code. They don’t need sysadmins for basic server provisioning. What does it mean for us folks out there that have traditionally bought and/or managed “do it all” shared hosting or dedicated physical servers? It means we won’t be able to buy them, for one thing. And it means nobody will want us to, either. I can hear you now: “But they’re so flexible in the right hands!! I’m a scripting wizard and my dev peers and boss love me for setting up shit they can’t/don’t want to understand!! How do I continue to be valuable and solve problems?!?!”. So yes, a login on a dedicated box or typical shared host is flexible. Yes, you got postfix working with SASL and that was a cause for celebration and adoration. But let’s be real here. Things are changing. And with that in mind, there’s three things I think all good sysadmin types need to work on ASAP to continue to be valuable in the upcoming cloud computing era:

  1. We have to accept and plan for a future where getting access to bare metal will be a luxury. Clients won’t want to pay for it, and frankly we won’t have the patience or time to deal with waiting more than 5 minutes for *any* infrastructure soon anyways. We have to assume “shared hosting” is going to die. Plan your exit strategy now! The good news is that we’ll still have access to “bare” virtual machines. I definitely don’t see that going away; VMs are the building blocks for everything else these brave new PaaS/SaaS providers are cooking up for the millions of web devs out there to consume. So our sysadmin skills and OS knowledge will still be required, but not by as many people/shops. Some of us will be building VM images. Most of us won’t be. If we don’t know virtualization inside and out, we’ll be out of work. Disposable VM images and better tools is going to mean the vast majority of sysadmins will be totally out of the loop when it comes to spinning sites/apps up and down.
  2. We need to learn to program better, with richer languages. Why should the developers have all the fun? When you think about it sysadmins have a tremendous advantage: We know how the OS and computer work!. Most web devs don’t!  :)  But let’s not get cocky. The fact that we know how to run 6 different apps/services on the same machine or can tweak the UNIX scheduler to do our bidding or can swap cron implementations won’t matter. Instead of scripting processes and files on a box in bash, we’ll need to script integration between processes and data in different VMs. We’ll just as importantly need to script integration with services like storage (think S3), asynchronous messaging gateways (XMPP, AMQP), and talk to lots and lots of REST and SOAP APIs. Why? Because we sysadmins and systems engineers are going to be inventing all sorts of neat services that us and those pesky web devs (and ultimately enterprises) are going to eat up. Being able to integrate these new services and deployment options into existing solutions will very, very valuable to everyone. In short, the things we need to do SI on have programmable interfaces, so we’d better learn to talk to them directly.
  3. Perhaps the biggest challenge we’re going to have is one we share with developers: How to deal with a world where everything is distributed, latency is wildly variable, where we have to scale for *throughput* more than anything else. I believe we’re only just starting to see the need for “scale” on the web and in systems work. We’ve gotten by on big centralized solutions for a long time as a profession and industry because it was simpler and in turn more reliable, but just like with processors “the free ride is over”. Us sysadmins are going to have to deal with gobs of VMs, gobs of remote services, gobs of “things” that need to run somewhere, somehow. We’re often going to have to give up some of the flexibility luxuries we’ve enjoyed (like the POSIX filesystem model) to reap the rewards of distributed systems. We’ll increasingly depend on messaging even for the simplest of scripting tasks, I think, as just one example. We need to start thinking differently about how we architect and implement the solutions our customers and peers are going to need.

So what can you do, in practical terms, today, to get ready for all this stuff? Start playing with a language like Ruby or Python and try doing some simple scripting in it instead of, say, bash. Play with stdin/stdout/stderr in the language so you can continue to build scripts The UNIX Way(tm). Add a REST or XMLRPC interface to that simple script so you can talk to it via a UNIX pipe and over the web. Fun! Try talking to EC2 and S3, via their cli tools and via a language library. Wrap your head around Git. Setup an XMPP server and write a little bot, oops, I mean “agent”, that lets you see that status of all your VMs/machines via that XMPP server. Cool, huh? Play around with Xen, KVM, VMware, xVM, whatever virtualization stuff you can. Figure out how to move a Xen image to/from EC2 and your local XenSource or xVM. Not so easy, huh? Finally, spend some time learning about HPC technologies like batch schedulers and Globus and distributed file systems like Lustre. There’s a LOT of tech and lessons from the academic HPC world that are incredibly relevant yet under-used and under-appreciated outside of that niche. Install Grid Engine or Condor across a few nodes and write a script to inject jobs for queued processing to get a taste. Instead of storing all that logging data for the new billing project on a centralized SAN, could you use kfs and hypertable? How many tens of thousands might that save the project? Try a test run on a cloud provider to proof-of-concept it. Now you’re talkin’.  :)

In short, all sysadmins are going to need to broaden our skill set to reach out across the network to speak to distributed systems and services. Centralized anything is dead. Successful UNIX sysadmins have always been good at scripting, we just need to take that scripting to the next level, and yes we’ll be doing a lot more of it. We’re going to have to take our HPC friends out for beers whenever possible and hear their battle stories of the dangers of latency and TCP. We’re going to need to learn to think differently and dive into distributed, event-driven architectures. We’re gonna have to play with some new stuff, we’re still going to have to be creative, and we’re going to have a lot of fun doing it!!

And that’s the ultimate take-away for me. I rarely get to see the physical servers anymore. I seldom setup or install software in an OS. I treat VMs like disposable containers. Verbs like “cloning” are part of my regular vocabulary. I now routinely work from within irb instead of a bash shell. I work with different constructs and ultimately my job is very different than it was just a few years ago. I’ve had to learn a lot of new stuff, and I still have far more to learn. Yet I’m having more fun now than I’ve had in years in this profession!! So don’t be shy, embrace the cloud. If you’re a UNIX sysadmin you already have the right stuff to succeed in this new world of utility on-demand computing, IMO.
The fact that I’ve started playing with Plan 9 and DragonFly again, is, my friends, a different post.  ;)


About this entry