Archive for October, 2008

Apache - good for nothing?

Friday, October 31st, 2008

Apache (the http server) is one of the major reasons why PHP is so popular; but with today’s trend of using front-controllers have we obsoleted our old friend?

In the olden days we used to write web scripts in Perl and Apache would have to fork a new interpreter process to execute each script request. This was really slow, especialy on the hardware available at the time. PHP came along and provided an embedded interpreter (mod_php) and gave a huge performance boost to the web-scripting world. This was a major motivator for migrating from Perl to PHP. It doesn’t hurt that Apache also happens to be a solid piece of software that can be (and probably has been) be compiled on pretty much all operating systems out there. PHP has unashamedly piggy-backed on its host’s popularity, and for that, we salute you Apache!

Another reason PHP gained popularity is that you could drop a PHP file into any directory and Apache would just run it - no need for the “cgi-bin”. Today, website best-practice would typically recommend against executing “physical .php files” like add-user.php, and instead have pretty urls like /users/add (good for readability and for SEO). Apache provides us with such functionality though a variety of methods, the most popular being mod_rewrite. We can easily use rewrite rules to route /users/add => add-user.php. Thing is, we’re not happy with that. We’ve all got drunk on “MVC” and cool applications use front-controllers (a single entry point for all requests, normally “index.php”). So instead of using Apache’s built-in functionality, we’re re-implementing the wheel inside our own applications. Why use mod_rewrite when we can do the same thing, but slower, using Zend_Controller_Router_Route_Regex? /me rolls-eyes.

We’re even ditching Apache for static stuff too. If you don’t already, it’s probably only a matter of time before you run CSS and Javascript request through PHP too (for good reason). Yahoo’s frontend performance guidelines suggest reducing the number of HTTP requests is important, so it makes sense to generate (and cache) a single CSS file and a single JS file. You might also do clever stuff like postfixing the urls with a version number to help caching, and/or gzip/minify/manipulate headers. These are good things to do, but you’re now using Apache even less.

So what is left? Images, video, flash? Lots of sites which graduate beyond a single-server use a “static file server” to handle this stuff. There’s also no point using a feature-rich, but slower, server like Apache for doing basic static file serving, when alternatives like Lighty/nginx can do this more efficiently. Another option might be to locate static content on a NAS of some sort which Apache will alias (Alias /images /mnt/nas/images). We’ve now got a document root that looks like this:-

index.php
.htaccess (to route everything to index.php)

Wow, clean! (Our PHP code is outside the docroot in some include_path location). I just had a thought - we could get rid of the document root entirely, but sticking this in our virtualhost: “php_value auto_prepend_file /path/to/app/index.php”. (That would be pretty funny, running a site without a docroot).

We don’t really use Apache for much, so why do we use it at all? I think it boils down to being a nice host environment for PHP. There’s nothing really wrong with that, except PHP is tightly coupled to Apache. I think Apache rocks, but if we just end up using it as a host for PHP, then that’s bloated overkill? What about if we stripped out the features of Apache that we dont use until we’ve got a lightweight http wrapper for our PHP app to run in… isn’t that what Rubyists do with Mongrel?

The type of code I write in PHP today is request/response stuff. I get an incoming HTTP request, do some processing, and create an HTTP response. That sounds obvious, but the subtle difference to what I was doing 5 years ago, is that my PHP code is taking care of ALL of the request lifecycle. I don’t use Apache for authentication, logging, uri-routing, headers setting, gzip, caching etc - everything is done by my code. It’s this slight shift in paradigm that makes me think “scripting application servers” like Mongrel+Rails or CherryPy are worth keeping an eye on, and another reason why I question the future of PHP.

Has PHP Peaked? Is the grass greener?

Thursday, October 30th, 2008

I’ve been “a web developer” for about 12 years now (crikey!). The first few were spent with Perl and then I moved over to PHP when I accepted that whilst mod_perl was “better”, it also was also worse. Perl CGI was the defacto language for web scripting but it was slow and cumbersome to write web apps with. Then mod_perl came along and removed the speed issue and gave the developer more power, but it was also much harder to implement, and normally involved two apache instances with one acting as a reverse-proxy for the other, and of course required a dedicated server (when dedicated servers were expensive!). Around the same time PHP 3 was taking off and it gave the speed boost of mod_perl but without the headache of implementation and made it much easier to write web scripts, particularly in a shared hosting environment.

Looking back at that transitional time, I get the vibe that a similar shift is happening today. From Perl->PHP developers got an easy life, with lots of the legwork that you used to do taken care of (like parsing the query string - s/%([a-fA-F0-9]{2})/pack “H2″, $1/eg; #ring any bells?). I see the “new kids” on the block like Python and Ruby offering developers an easier life. They’re offering not only things like Rails/Django-type frameworks, but also a much more attractive package of support tools like gems/eggs, capistrano, migrations and both have unit-testing frameworks built in as a part of their core distributions. Ten years ago the average web developer didn’t care about unit tests, source control and deployment, but now we do and these two alternatives offer something more attractive.

This timing of this blog post and the recent namespace discussion/decision in the PHP world are not coincidental. I’m disappointed with how things have played out with that whole fiasco. The end result is pretty horrible from an average Joe developer’s position (fugly syntax, confusing separator) but the how the community / core devs reached this point is the saddest part. There was so much bickering, trolling and flaming, which to me indicates a much larger problem: something’s wrong with the management of the project. It’s also not just this incident, there was drama recently with “contributor licenses” and PDO2 and I get the vibe that the activity of the project/community decreased a lot in recent years. It wasn’t so long ago that “PHP6″ was going to be the next big thing and unicode was going rock our socks, but now we’re still struggling to get a 5.3 release out the door and suddenly v6 seems farther away than ever. We also have a BDFL who contradicts the direction of the language. Rasmus does quite a bit of hating on “new PHP” style code and promotes an old-school approach (”let’s all use SQL inside our HTML!” ;)). I have no problem with someone having this opinion, but for that person to also be the guiding star of PHP is obviously an issue (I wonder if he wishes BDFL was a position one could resign from :D). Take namespaces as a good example of a lack of leadership - there was no one around to smack the kids, tell them to stfu and to make an executive decision as the mature adult.

The situation with PHP5 also depresses me. I know we can prove anything with statistics… but it’s 4 years old now, PHP4 is official dead/end-of-life and yet PHP5 represents only 40% of installations. IE7 was released only 2 years ago and it looks like is has about a 50/50 split with IE6. Perhaps of greater concern is how many users are actually developing with PHP5? I’d bet that there are more “PHP4″ developers out there than “PHP5″. If so, then does that drop the potential number of developers active in PHP5 open source projects? Back in the early 2000s there were loads of active PHP projects - everyone was on a level playing field and writing code for the same version. Today, I’d struggle to name a popular PHP5 project outside of frameworks: where the hell is our sexy PHP5 version of OSCommerce, “PHPAdsNew”, the classic forums, or even blogs (ok, Habari). From what I can see, it’s just not the same as it used to be.

Times have changed. The web has moved on and so have the needs of its developers. I think we might have reached the peak of PHP. It’s clearly going to be a language that remains popular for years to come, but I think it’s like Perl when PHP 3 was released. It does the job, but there are compelling reasons to switch to another language which is now beyond the “fan boy” stage and is maturing (within the web-app sense). I also think it’s not just a “PHP vrs Python” or “PHP vrs Ruby” question, and also has a lot to do with how web architectures are evolving generally (distributed clouds, REST services, better caching options, prevalence of AJAX).

I’m conflicted about writing this post because the PHP job-market is really good at the moment and PHP is “breaking through into the enterprise”. I think there are more conferences going on around the world than there ever has been and there are lots of other signs that PHP is doing well. I don’t think the sky is falling or anything dramatic, but I do think the wind is changing. Perhaps the global economic situation might also affect us? There’s less capital expenditure going to be made, so does that mean less investment in rewriting old PHP4 systems to PHP5? Maybe it’ll be a good thing for PHP, in that companies won’t be risking/investing in moving away from PHP (and maybe even switching to PHP because it’s cheaper?). Or will it act to speed up the stagnation?

I plan on being a PHPer for years to come but I’ve got a sneaky feeling I’ll be having an affair with another language in the not too distant future. On that note… time to read a little more of my new Python book ;-).

Controller/Model Detox, please.

Monday, October 27th, 2008

This is a quick mini-rant on one reason why I’m annoyed with MVC frameworks.

My beef is that business logic should not be in controllers but it often is. Business logic is the “M” (model) part of MVC. The problem is the type of actions you want to perform in the model (”create post”, “view post”) tend to match up nicely with the controllers and actions (/post/add, /post/view). It’s very easy to get sucked in at this stage and create a new controller and some actions, and implement your “business logic” inside the controller framework. I guilty of this, and I’m getting pissed off at myself for doing it. It’s so frustrating to see myself doing something I know is bad.

I’ve started to think of applications “from the controller outwards” and my model has become merged into my controller (perhaps this condition is “controller driven design”). This is bad. Testing your business logic is now about testing it through your controllers, which sucks. You can’t reuse your model in another context (like command-line tools, or a webservice) without going through the controller dispatcher / bootstrap, and forget about evolving the model and refactoring it / extending it. You’re also very tightly coupled with the framework.

I feel like a noob and I know it’s a bad thing, but I can’t help but think it’s to be expected with a lot of the frameworks out there. If you’re writing a project with “XYZ Framework” then you get sucked into all the stuff it does and you feel like you need to use it as the starting point. My project is magically going to kick ass and be well written because it uses this framework! You probably start off by looking at tutorials, which focus more on the controllers than on the model. In a lot of cases the model layer is reduced to an object that represents a database table/record. This encourages the developer to write things like authorisation code and data validation at the controller level, rather than within the model.

I’m fed up of this. I need to readjust my perspective and start ignoring the framework when it comes to designing and writing my code. The controller layer should be the last thing you write, not the first; it should be a lightweight layer that interfaces between HTTP requests/responses and your model.

Moving to EC2

Saturday, October 11th, 2008

I’ve been in lust with Amazon’s AWS services for the last year or two, particularly EC2. If you’ve not played with EC2 yet then I encourage you to do so. The first time you fire-up “10 servers” just to see if you can is a buzz and your mind will start going over all the cool things you could do.

It’s taken me a while but I’m not in the process of moving my personal servers into the cloud. The main reason for doing this is because I’m a geek and I want to play and learn more about AWS first-hand :D. It’ll be more expensive than a little VPS somewhere but it’s certainly not bank-breaking (should be $72/month + bandwidth @ $0.15/gig).

My first step in this move was to take control of my DNS. I don’t want to run my own DNS servers (ugh) but now that I am my own host I need to get a 3rd-party to handle this for me. I’ve head good things about DNSMadeEasy so I setup a couple of domains with them to test things out. Their interface is a bit clunky but it works: you can easily set all sorts of DNS records and their help/faq/tutorials are good. So I use DNSMadeEasy to point my domain to an IP address, and the IP address is of my EC2 instance. Amazon now have something called an “Elastic IP” which means the IP address is “fixed” to the outside world, but I can map it to any of my EC2 instances (and remap it on demand).

As well as hating managing DNS I also hate managing email. At the moment I run my own Postfix/Dovecot server and manage a set of Spam Assassin rules. My plan is to offload this entirely to Gmail, using their POP/IMAP service mainly with the web-interface as a backup. I’m not sure how this will work out just yet but it should do what I need.

As for rest of the server stuff… woot! I love messing around with LAMP stacks and experimenting with different configs amd I’m in my element with this part. With EC2 you need to create an “image” (AMI) of your server, which means taking Amazon’s base Fedora image, installing your bits and pieces, then “saving” it. You then use this image to create new EC2 instances. For this I’m actually using Alestic’s basic Ubuntu Hardy image, which is a nice barebones linux install (I don’t think it even has a GCC compiler). From that I’ve installed the latest versions of Apache/PHP/MySQL from source as well as setup Subversion (to run over svn+ssh).

One of the perceived downsides of EC2 is that it’s “flakey”. It’s not a “real” machine with a “real” harddrive, so if it crashes or goes *poof* for some reason, then you lose all the data on it (that changed since you booted up the image). Whilst true, there’s an argument that this *could* happen right now on my and your real servers: their harddrives could explode and you’d lose all the data. What EC2 is making you do is think about that disaster scenario upfront and forcing you to do something about it. One natural solution is to write backup scripts that use S3 for storage, so I’m going to be setting some stuff up that periodically backs up my SVN repositories and other data to this super-redundant-never-gonna-let-you-down storage system. Amazon have also released something recently called Elastic Block Storage (EBS) which is a “more real harddrive” than an EC2 instance’s local storage. Think of it like a NAS. The performance isn’t supposed to be as great as a normal physical machine but frankly, I don’t care. For me this is premature optimisation for projects that don’t even exist yet. I also reckon the flexibility/control that EC2 clusters provide outweighs the negative I/O aspect (it’s like writing a web app in PHP rather than C). My plan is to play with EBS for MySQL storage and see how it goes.

I’m looking forward to my new home! There are some many things I want to play with, things that without this sort of on-demand virtualisation I couldn’t otherwise do.

Quick’n'dirty localhost benchmark

Saturday, October 4th, 2008

My previous post suggested wrapping an application’s data layer underneath a REST API. I was curious as to the performance implications of making an “internal HTTP” request, for exmple requesting http://localhost/people, compared to a standard “mysql_query()”. This evening I’ve been playing around with trying to see what the difference might be, and hopefully find that the performance hit is reasonable enough to continue investigating the idea. This is really crude benchmarking at the moment, and closer to anecdotal evidence than anything statistical or complete.

I’m using microtime(true) calls to record the time of script execution. I setup one script, mysql.php, to connect to the db and execute a query. The other script, subrequest.php, issues an HTTP request via my home-rolled HTTP client to http://localhost/rawmysql.php (which is the same logic as mysql.php, but without some microtime() calls). I’m using as single install of Apache running on a little linux VPS.

Good news! At a really trivial level, the overhead of making the HTTP request is literally a couple of milliseconds, which is much better than I thought it would be (test this yourself by doing a file_get_contents(”http://localhost”); on a static page you might get a sub-1ms response). I’m not sure why I thought it would be slower but my gut-feeling was I’d be looking at something in the 10-20ms region. I’m pleased it’s 10x faster than I expected ;-).

There are a bunch of other issues I’m concerned about (size of recordsets, json encoding/decoding, compression, overhead of additional http requests / additional Apache instances…) but fundamentally making an HTTP request to the localhost is fast, and a REST Data Layer definitely looks feasible - especially when I think that one of the major advantages to this approach will be having a unified caching layer (that will make it easier and encourage developers to cache).

Next, to write a simple “REST Server” and get some proof of concepts online.