FastCGI, SCGI, and Apache: Background and Future
Over at the Ruby on Rails Weblog, David made a post titled Apache gets serious about FastCGI. I’ve tangled with FastCGI more than most people have, for a variety of reasons that I won’t get into except to say that I’ve spent time at places where Apache wasn’t the web server of choice and where PHP wasn’t the only language in use. Here’s some of my thoughts on the whole FastCGI thing that may be useful to someone new to the game.
The first thing to realize is that for most experienced webmasters FastCGI has essentially been considered abandoned technology for a good many years. Like, half a decade, or in Internet time, a freaking eternity. It didn’t catch on for a bunch of reasons, one being that the FastCGI implementation in Apache hasn’t been one of Apache’s, shall we say, highlights.. The other reason is that when quicker alternatives to traditional CGI were being dreamed up Apache was The One True UNIX Web Server(tm) and a lot of people thought that The One True Way(tm) to go was with in-process modules. Modules could be very quick with no IPC overhead and get all sorts of useful info back from the web server that is easy to do when you’re in the damned web server itself, and rather hard to get when you’re not. This made sense when processors were 500MHz and lots of UNIXes had crappy IPC performance. So lots of Apache modules were being created, life was good, and then mod_php happened. How PHP became the defacto Apache module that’s in every Apache install on the planet is a story I’ll let someone else tell, but the end result is that in the world of UNIX/Linux web development PHP became the only thing that mattered and it was a module and so FastCGI as a generic, web server independant gateway was a solution looking for a problem that didn’t exist.
So, fast-forward 5 years and where does that leave Rails, Django and whatever miracle framework that will crop up next week? In a tough spot, because PHP remains the only Apache language module that’s ever gotten widespread adoption. Why does this matter? Because these are large frameworks written in interpreted languages and using CGI (the other technology still supported everywhere that lets you tell the web server to run some code) you have to load the whole shebang from scratch ON EVERY REQUEST. Performance will be shit, trust me, and so for anything but development with these “megaframeworks” CGI is completely unfeasable. You need a way of getting all that code loaded into memory and have it stay persistent across requests. That’s what you get with an Apache module, and that’s what you get with FastCGI. The renewed interest in FastCGI is because suddenly PHP’s assumed rule of the webdev roost has been called into question and now you have these compelling competing frameworks, written in competing languages, needing persistance on the web server and they’re not going to get it with an Apache module. Oh, and there’s another more general trend at play that I’ll touch on below.
Back to the topic that motivated this post. What’s wrong with FastCGI in Apache? Anybody who’s tried to run a busy Zope site behind Apache via FastCGI knows this one. The UNIX Domain Sockets are unreliable, for unknown reasons. Switch to TCP runners and they sometimes hang. Unexplicably. I’ve seen it with PHP-via-FastCGI too. Matters aren’t helped by the fact that the FastCGI C code itself is crufty and, as I mentioned above, ancient (yes, 5 years ago is ancient). Finally, FastCGI in Apache just isn’t as flexible as we’ve come to expect things in Apache to be. With the general lack of interest in FastCGI over the years, it just didn’t get fixed. So people who really wished it did work generally worked around the problem. SCGI came about as a simpler FastCGI replacemet in the Python world. Zope, for example, is almost universally deployed behind Apache via mod_proxy these days. Ditto for Java. But the FastCGI technology itself is clearly quite a bit better than most people’s experience with it under Apache would lead them to believe. It’s been rock solid in Zeus for many years. Recently lighttpd has also proven that FastCGI can be quite robust and quick in an open source web server, to the point that it’s superior FastCGI implementation propelled lighttpd into the limelight from nowhere as *the* way to run the new wave of web frameworks. (Sidenote: Both lighttpd and Zeus are non-fork()ing asynchronous daemons; coincidence?)
So FastCGI may be good stuff afterall, but if we’ve gotten along just fine thank you without it for the last 5 years, the question becomes, do we really need FastCGI?
To answer that, let’s start with my comment to David’s post in response to a reader who wondered why one would bother with FastCGI when you have mod_ruby since mod_php and friends were proven to be “the best solutions”:
Actually, Ahmad, mod_(php|perl|python|ruby) has proven to often not be the best solution in practice. Embedding your interpreters in the httpd process often ends up just handcuffing you later when it comes time to do a site (perhaps one of many) upgrade, and sucks away precious memory in each Apache process making it harder to scale higher traffic sites up in volume. Per-user runners are also incredibly convenient for mapping sites into OS sandboxes (via ulimit, RBAC, SELinux, whatever).
I’m with David Morton though, and think that at this point SCGI is the better way to go if your backend doesn’t do smart proxy rewriting. FastCGI is quickly becoming irrelevant for the Python and Ruby frameworks that matter to me. I really think Zope has it right in this regard making it easy, quick, and reliable to proxy rewrite requests from Apache into the Zope appserver (and VirtualHostMonster) via mod_proxy.
My point here is that Apache modules are not a viable path forward. I think most experienced sysadmins already know this. Building and maintaining Apache gets exponentially more complex as you add modules, and that’s reason enough to avoid it in my books without even considering the memory consumption issue. A generic solution for persistent out-of-process page generation/handling is needed. There’s zero doubt about that. FastCGI is the leading contender, by default if no other reason, so in this regard any work done to improve FastCGI support in Apache is great. I happen to think SCGI is a better route to go, but in the end they’re very similar and if either becomes more mainstream we’ll all be better off and I’ll be happy.
Well… mostly happy… Because what I’m *really* wondering is whether we should be continuing with a CGI paradigm at all, or should we go the way of Java and Zope app servers and use what we already have: HTTP.
What Java and Zope app servers do (for the unfamiliar) is run their own solid HTTP servers that do intelligent URL parsing/generation for you to make sticking them behind a HTTP proxy (like Apache’s mod_proxy, or Squid, or whatever) at an arbitrary point in the URI a piece of cake. Typically you redirect some URL’s traffic (a virtual host, subdirectory, etc.) off to the dedicated app server the same way a proxy server sits between your web browser and the web server. It works just like directing requests off to a Handler in Apache, except the request is actually sent off to another HTTP server instead of handed off to a module or CGI script. And of course the reply comes back as a HTTP object that’s sent back to the originator. There’s a bunch of reasons why doing this with HTTP instead of CGI is a really nice approach. One is that setting up these app servers becomes pretty simple for sysadmins and doing the configs on the upstream webserver/proxy is IDENTICAL no matter what kind of downstream app server you’re talking to. That’s reduces errors. It’s flexible, too, allowing you start up an app server instance (which, of course, acts like a web server) on a port, run it as whatever system user you want, jail it, zone it, firewall it, whatever, and then you send HTTP requests to the thing. You can go straight to the app server in your web browser to debug stuff. Since it’s HTTP we already have a full suite of tools that can do intelligent things with the protocol. Firewalls, load balancers, proxies, and so on. There’s a huge market of mature “HTTP brokers” both free and commercial, including Apache itself (mod_proxy, which can be hooked into rewrites).
Given that the Java and Zope camps went down this road before, and both arrived at the HTTP app server solution makes me wonder why the current darlings haven’t also taken the same approach. One argument is that it’s much simpler to implement a CGI-like interface than a full-on HTTP server. But I don’t buy that, not with languages like Python and Ruby that make it as easy to embed a HTTP server in your app as a CGI-like interface! Maybe the stock httpd modules aren’t up to snuff? Could it be that hard to fix them if they are indeed lacking, and wouldn’t that be useful to all sorts of people who don’t need these huge frameworks? Maybe the Rails and Django folks just fear they’ll automatically become as complex and evil as the Java Application Servers they’re trying to out-simplify if they take this approach? Maybe it’s just too hard to support both CGI and HTTP models properly (the Zope guys ought to know the answer) and not worth the effort when a CGI based approach “mostly works”? Maybe they think that CGI (and hence persistent CGI runners like FastCGI and SCGI) is the only path to widespread adoption given the current mass-hosting landscape?
Deployment, as always, is a key issue. In my mind a FastCGI-like solution ought to be more compatible with the web hosting crowd. FastCGI runners can be started up and shutdown on the fly (it is a CGI imitator, afterall). The app server approach, by contrast, is a bit more “static” and likely appeals more to the “in house” crowd (I hesitate to call this the “enterprise” crowd) who are running their own servers.
If you actually read this far, then I’m impressed, and hopefully you’ve learned something. If I’ve made an incorrect statement please do let me know in the comments below. It’s an exciting time to be invovled in web development, and my first prediction for 2006 is that we’ll see either FastCGI or SCGI become a core Apache module, and that we won’t see a whole lot of movement towards an app server approach. Java and Zope are perceived as overly complex, and so people’s gut instincts may lead them away from the app server and towards FastCGI/SCGI. So pay attention to this FastCGI/SCGI stuff, cause it’s going to be important whether or not it’s the technically superior approach.
What’s on my wish list? Being able to use a threaded Apache with FastCGI/SCGI runners. That would be a powerful combination.
39 Comments
Jump to comment form | comments rss | trackback uri