Posts Tagged ‘HTTP’

HPHP - First thoughts

Tuesday, February 2nd, 2010

Facebook have released HipHop, a piece of software that converts a PHP script into C++ and then compiles it into a static binary. That binary operates it own HTTP server.

About a year ago I wrote a couple of posts about my thoughts on Apache and how we don’t really use it much and actually why I think it’s still relevant. In one fell swoop Facebook has provided an answer/solution to my ruminations; it is now possible (I think) to run PHP as a standalone application server (kinda like Rails with Mongrel, or Python with CherryPy et al).

Most of the applications I write have a single entry point that bootstraps the framework and handles all the routing - my “index.php”. For this scenario, using HPHP sounds like a win: you compile your index.php and instead of using .htaccess to route a set of urls to it, you just configure Apache to reverse-proxy those requests to your backend HPHP instance/s (not sure yet if it will thread, fork or block yet). You could also still run regular php scripts on that Apache server, and just offload some requests to the HPHP daemon.

It sounds like it will really come into its own when you’re essentially running Apache as a container for a monolithic PHP application. Instead of running Apache+PHP on a backend server, you just run the HPHP application. Win! (This is going to rock for API/REST services.)

Some questions:-

  • How will it handle autoloading and dynamic classes? Presumably we’d have to give the compiler a list of directories of libraries to include in the compilation?
  • What about 5.3? I’ve personally switched, does HPHP support it? (Also, will HPHP and PHP releases be synced?)
  • How does the HTTP/server stuff work? Is it a single process that manages worker-pools, does it thread or what?
  • How will multiple applications work? I’m guessing each one will need a new HPHP daemon.

Apache - good for nothing?

Friday, October 31st, 2008

Apache (the http server) is one of the major reasons why PHP is so popular; but with today’s trend of using front-controllers have we obsoleted our old friend?

In the olden days we used to write web scripts in Perl and Apache would have to fork a new interpreter process to execute each script request. This was really slow, especialy on the hardware available at the time. PHP came along and provided an embedded interpreter (mod_php) and gave a huge performance boost to the web-scripting world. This was a major motivator for migrating from Perl to PHP. It doesn’t hurt that Apache also happens to be a solid piece of software that can be (and probably has been) be compiled on pretty much all operating systems out there. PHP has unashamedly piggy-backed on its host’s popularity, and for that, we salute you Apache!

Another reason PHP gained popularity is that you could drop a PHP file into any directory and Apache would just run it - no need for the “cgi-bin”. Today, website best-practice would typically recommend against executing “physical .php files” like add-user.php, and instead have pretty urls like /users/add (good for readability and for SEO). Apache provides us with such functionality though a variety of methods, the most popular being mod_rewrite. We can easily use rewrite rules to route /users/add => add-user.php. Thing is, we’re not happy with that. We’ve all got drunk on “MVC” and cool applications use front-controllers (a single entry point for all requests, normally “index.php”). So instead of using Apache’s built-in functionality, we’re re-implementing the wheel inside our own applications. Why use mod_rewrite when we can do the same thing, but slower, using Zend_Controller_Router_Route_Regex? /me rolls-eyes.

We’re even ditching Apache for static stuff too. If you don’t already, it’s probably only a matter of time before you run CSS and Javascript request through PHP too (for good reason). Yahoo’s frontend performance guidelines suggest reducing the number of HTTP requests is important, so it makes sense to generate (and cache) a single CSS file and a single JS file. You might also do clever stuff like postfixing the urls with a version number to help caching, and/or gzip/minify/manipulate headers. These are good things to do, but you’re now using Apache even less.

So what is left? Images, video, flash? Lots of sites which graduate beyond a single-server use a “static file server” to handle this stuff. There’s also no point using a feature-rich, but slower, server like Apache for doing basic static file serving, when alternatives like Lighty/nginx can do this more efficiently. Another option might be to locate static content on a NAS of some sort which Apache will alias (Alias /images /mnt/nas/images). We’ve now got a document root that looks like this:-

index.php
.htaccess (to route everything to index.php)

Wow, clean! (Our PHP code is outside the docroot in some include_path location). I just had a thought - we could get rid of the document root entirely, but sticking this in our virtualhost: “php_value auto_prepend_file /path/to/app/index.php”. (That would be pretty funny, running a site without a docroot).

We don’t really use Apache for much, so why do we use it at all? I think it boils down to being a nice host environment for PHP. There’s nothing really wrong with that, except PHP is tightly coupled to Apache. I think Apache rocks, but if we just end up using it as a host for PHP, then that’s bloated overkill? What about if we stripped out the features of Apache that we dont use until we’ve got a lightweight http wrapper for our PHP app to run in… isn’t that what Rubyists do with Mongrel?

The type of code I write in PHP today is request/response stuff. I get an incoming HTTP request, do some processing, and create an HTTP response. That sounds obvious, but the subtle difference to what I was doing 5 years ago, is that my PHP code is taking care of ALL of the request lifecycle. I don’t use Apache for authentication, logging, uri-routing, headers setting, gzip, caching etc - everything is done by my code. It’s this slight shift in paradigm that makes me think “scripting application servers” like Mongrel+Rails or CherryPy are worth keeping an eye on, and another reason why I question the future of PHP.

4hrs To Write “setUrl()”

Friday, May 16th, 2008

This evening I kicked off coding my new project (codename: April) which at its core is an HTTP Client library for PHP5.

Prior to starting writing code, I’ve done a fair bit of research into the other options that are out there, including analysing their source code to see how they do things. I don’t think I’ve ever looked at 3-4 different libraries side-by-side like this and compared in depth how they do essentially the same thing. It’s very interesting to see the different approaches the authors take as well as the bigger picture of their frameworks.

Of the libraries that I’m using for reference (Zend_Http_Client, Solar_Http, PEAR::HTTP_Request and PECL::HTTP) only Pecl::HTTP fully separates Request/Response/Client. The others take a 2-component approach and have a Response object plus either the Request built into the Client (Zend) or the Client built into the Request (PEAR & Solar). I personally think it’s important to separate each of the three components: the input, the output, and the bit in the middle to gain maximum flexibility and code reuse.

Another area I looked at today was how URLs are handled and validated (or not!). Solar and Zend both have URI objects and PEAR has Net_Url(2). Solar and PEAR both use parse_url(), PEAR validates the parts using regexes but Solar does no validation checking (using whatever parse_url() spits back). The Zend guys clearly don’t trust parse_url and their implementation has lots of regexes to split the URI and validate it. None of them use PHP’s Input Filter, which conveniently has sanitizing and filtering calls for URLs (it could be that it’s for URLs and not URIs).

That reminds me - I spent about an hour of my coding session researching and arguing with myself whether I should be using URL or URI terminology. My conclusion was I prefered URL (an URL specifies a location of the resource, which is implicit in an HTTP request); but also that URI was valid because in the context of an HTTP Client one should assume that the URI was a URL (there’s no way it could be a URN - just doesn’t make sense). Time well spent I think :-).

So, after lots of reading and thinking I wrote some test cases for April_Http_Request and put pen to paper to write her first function: setUrl() (and associated alias, setUri()). It took me 4hrs and it’s only one function but I think I’ve given it my fullest attention! I want this project to use the best bits from my reference libraries and I’m quite happy to spend time learning how they tick. I’m also going to use their tests (both Solar and Zend [and April] use PHPUnit) so my code should at least cover the same “gotchas” as they do. This worked amazingly well for testing my setUrl() method: I managed to reuse almost all the tests for Zend_Uri_Http and after a few failures I tweaked my code to pass - setUrl() is now covered by 25 or so tests (without me having to write them).

Fingers firmly crossed I can continue doing this, as that’s a lot of brainwork that I can reuse (under the New BSD license).