Roy Fielding: English, motherfucker, do you speak it?

March 26th, 2009

When I think of a REST web-service, simplicity springs to mind. I understand the web and HTTP, and my development languages (PHP & Javascript) allow me to easily interact with them them. Compared to trying to get my head around “Enterprise SOAP”, and especially from within PHP, REST is easy. Or so I thought.

I stumbled onto a blog post by Roy Fielding in which he bitch-slaps people who call their API “RESTful” when, according to him, it’s not. One might think that as the person who defined the “REST” acronym he is the authorative voice on the matter. However, I challenge you to read that post and understand what he’s talking about.

A REST API should not be dependent on any single communication protocol, though its successful mapping to a given protocol may be dependent on the availability of metadata, choice of methods, etc. In general, any protocol element that uses a URI for identification must allow any URI scheme to be used for the sake of that identification. [Failure here implies that identification is not separated from interaction.]

Insert image of Samual L. Jackson here!

Fielding posted a response to the criticism of his language, in which he explains (how kind!) that he’s directing his blog post at “specialists”, who in his eyes, should be able to understand it.

However, when I send out a message to API designers, I expect the audience to be reasonably competent in the field. I have to talk to them as a specialist because I want them to understand, as specialists themselves, exactly what I am trying to convey and not some second-order derivatives.

What use is criticising somebody when you do so in language that makes it difficult for them to understand your points? Presumably if the great unwashed have failed to grok the disseration, then they’re probably not going to react well to more of the same “academic speak”, and the criticism will either fall on deaf ears or it will not be fully understood (or at all), which totally defeats its purpose.

I think he also contradicts himself in his response. He doesn’t want people to understand “second-order derivatives”, yet he’s pleased that there are enough clever-people out there who can explain it to the thickies (thus creating second-order derivatives).

Fortunately, there are more than enough people who are specialist enough to understand what I have written (even when they disagree with it) and care enough about the subject to explain it to others in more concrete terms, provide consulting if you really need it, or just hang out and metablog.

But isn’t this exactly the communication problem that causes people to missunderstand REST? All the designers of RESTful services who don’t understand him will now rely on other people’s translations (i.e. second-order derivatives), which may make the message fuzzy and wrong.

People who create APIs are often not academics. Why the fuck couldn’t he just explain himself in plain english, with some nice “for dummies” examples, and save us all some time. It angers me that instead of saying, “Wow, a lot of people didn’t understand my post (who want to understand me); let me rephrase and give you some examples…”, he responds by patronising us. What a dick.

MacGDBp for xdebugging

January 18th, 2009

I’ve just got around to moving servers and switched out Zend’s debugging extension for Xdebug and I now need a client to do my debugging with (I’m ditching ZDE too). MacGDBp was hyped up a while ago as being some sort of amazing PHP debugging client for Mac so I gave it a shot.

My setup is a remote server with Xdebug running on it. To get this working you need to configure some settings in php.ini such as the port (9000) and my local computer’s hostname (so either a static IP or some dynamic dns hostname). You also need to allow incoming connections on your local network for port 9000, and have them forwarded to the correct machine (my laptop). This was all painless enough to setup, but I can imagine the firewall issue is a cockblock for corporate environments or debugging on the go with café wifi.

So now the networking is working, I connect to my server with an extra query string variable set for XDEBUG_START_SESSION and away we go - it works! I can step through my code and see the variable objects and so on…. only one thing is missing: the source of the file being debugged! So far as I could see from a quick skim of the GDBp spec, there isn’t the ability to transfer the source code being debugged between Xdebug and the client; instead it refers the full path of the file on the remote server, eg “file:///www/mysite/index.php”. This is fine if you’re working locally because MacGDBp can use that path to open up the local source code and display the source/lines/breakpoints as you would expect; but it totally falls apart when you’re remote debugging and it displays nothing. So I can debug remote scripts, but I can’t see the source of what I’m debugging, bloody marvellous!

I managed to work around this by using MacFusion, which I use to edit remote files with a local IDE, and making a symbolic link between “/www” and “/Volumes/myremotesite” (which is the FUSE mount point for my remote site). It kinda sucks though: it would be much slicker if the debug client could get at the source directly (via it’s own use of FTP/SSH perhaps?).

Anyways, I got things working ok in the end. There’s another thing I really don’t like with the client and that’s that I have to press the “start” button on the debug client every time I want to do some debugging. So press start, go to browser and load script, switch to client and debug, make some changes to the script, press start, press refresh. It’s really annoying, why can’t it just “start” automatically after the end of each session? (So far as I know, it’s just listening on port 9000 for incoming debug sessions).

I’m really not impressed with this piece of kit so far (in part due to the hype around it and it not livng up to my expectations). It’s the only “standalone” client out there so I’m probably stuck with it for a while :).

Apache - good for nothing?

October 31st, 2008

Apache (the http server) is one of the major reasons why PHP is so popular; but with today’s trend of using front-controllers have we obsoleted our old friend?

In the olden days we used to write web scripts in Perl and Apache would have to fork a new interpreter process to execute each script request. This was really slow, especialy on the hardware available at the time. PHP came along and provided an embedded interpreter (mod_php) and gave a huge performance boost to the web-scripting world. This was a major motivator for migrating from Perl to PHP. It doesn’t hurt that Apache also happens to be a solid piece of software that can be (and probably has been) be compiled on pretty much all operating systems out there. PHP has unashamedly piggy-backed on its host’s popularity, and for that, we salute you Apache!

Another reason PHP gained popularity is that you could drop a PHP file into any directory and Apache would just run it - no need for the “cgi-bin”. Today, website best-practice would typically recommend against executing “physical .php files” like add-user.php, and instead have pretty urls like /users/add (good for readability and for SEO). Apache provides us with such functionality though a variety of methods, the most popular being mod_rewrite. We can easily use rewrite rules to route /users/add => add-user.php. Thing is, we’re not happy with that. We’ve all got drunk on “MVC” and cool applications use front-controllers (a single entry point for all requests, normally “index.php”). So instead of using Apache’s built-in functionality, we’re re-implementing the wheel inside our own applications. Why use mod_rewrite when we can do the same thing, but slower, using Zend_Controller_Router_Route_Regex? /me rolls-eyes.

We’re even ditching Apache for static stuff too. If you don’t already, it’s probably only a matter of time before you run CSS and Javascript request through PHP too (for good reason). Yahoo’s frontend performance guidelines suggest reducing the number of HTTP requests is important, so it makes sense to generate (and cache) a single CSS file and a single JS file. You might also do clever stuff like postfixing the urls with a version number to help caching, and/or gzip/minify/manipulate headers. These are good things to do, but you’re now using Apache even less.

So what is left? Images, video, flash? Lots of sites which graduate beyond a single-server use a “static file server” to handle this stuff. There’s also no point using a feature-rich, but slower, server like Apache for doing basic static file serving, when alternatives like Lighty/nginx can do this more efficiently. Another option might be to locate static content on a NAS of some sort which Apache will alias (Alias /images /mnt/nas/images). We’ve now got a document root that looks like this:-

index.php
.htaccess (to route everything to index.php)

Wow, clean! (Our PHP code is outside the docroot in some include_path location). I just had a thought - we could get rid of the document root entirely, but sticking this in our virtualhost: “php_value auto_prepend_file /path/to/app/index.php”. (That would be pretty funny, running a site without a docroot).

We don’t really use Apache for much, so why do we use it at all? I think it boils down to being a nice host environment for PHP. There’s nothing really wrong with that, except PHP is tightly coupled to Apache. I think Apache rocks, but if we just end up using it as a host for PHP, then that’s bloated overkill? What about if we stripped out the features of Apache that we dont use until we’ve got a lightweight http wrapper for our PHP app to run in… isn’t that what Rubyists do with Mongrel?

The type of code I write in PHP today is request/response stuff. I get an incoming HTTP request, do some processing, and create an HTTP response. That sounds obvious, but the subtle difference to what I was doing 5 years ago, is that my PHP code is taking care of ALL of the request lifecycle. I don’t use Apache for authentication, logging, uri-routing, headers setting, gzip, caching etc - everything is done by my code. It’s this slight shift in paradigm that makes me think “scripting application servers” like Mongrel+Rails or CherryPy are worth keeping an eye on, and another reason why I question the future of PHP.

Has PHP Peaked? Is the grass greener?

October 30th, 2008

I’ve been “a web developer” for about 12 years now (crikey!). The first few were spent with Perl and then I moved over to PHP when I accepted that whilst mod_perl was “better”, it also was also worse. Perl CGI was the defacto language for web scripting but it was slow and cumbersome to write web apps with. Then mod_perl came along and removed the speed issue and gave the developer more power, but it was also much harder to implement, and normally involved two apache instances with one acting as a reverse-proxy for the other, and of course required a dedicated server (when dedicated servers were expensive!). Around the same time PHP 3 was taking off and it gave the speed boost of mod_perl but without the headache of implementation and made it much easier to write web scripts, particularly in a shared hosting environment.

Looking back at that transitional time, I get the vibe that a similar shift is happening today. From Perl->PHP developers got an easy life, with lots of the legwork that you used to do taken care of (like parsing the query string - s/%([a-fA-F0-9]{2})/pack “H2″, $1/eg; #ring any bells?). I see the “new kids” on the block like Python and Ruby offering developers an easier life. They’re offering not only things like Rails/Django-type frameworks, but also a much more attractive package of support tools like gems/eggs, capistrano, migrations and both have unit-testing frameworks built in as a part of their core distributions. Ten years ago the average web developer didn’t care about unit tests, source control and deployment, but now we do and these two alternatives offer something more attractive.

This timing of this blog post and the recent namespace discussion/decision in the PHP world are not coincidental. I’m disappointed with how things have played out with that whole fiasco. The end result is pretty horrible from an average Joe developer’s position (fugly syntax, confusing separator) but the how the community / core devs reached this point is the saddest part. There was so much bickering, trolling and flaming, which to me indicates a much larger problem: something’s wrong with the management of the project. It’s also not just this incident, there was drama recently with “contributor licenses” and PDO2 and I get the vibe that the activity of the project/community decreased a lot in recent years. It wasn’t so long ago that “PHP6″ was going to be the next big thing and unicode was going rock our socks, but now we’re still struggling to get a 5.3 release out the door and suddenly v6 seems farther away than ever. We also have a BDFL who contradicts the direction of the language. Rasmus does quite a bit of hating on “new PHP” style code and promotes an old-school approach (”let’s all use SQL inside our HTML!” ;)). I have no problem with someone having this opinion, but for that person to also be the guiding star of PHP is obviously an issue (I wonder if he wishes BDFL was a position one could resign from :D). Take namespaces as a good example of a lack of leadership - there was no one around to smack the kids, tell them to stfu and to make an executive decision as the mature adult.

The situation with PHP5 also depresses me. I know we can prove anything with statistics… but it’s 4 years old now, PHP4 is official dead/end-of-life and yet PHP5 represents only 40% of installations. IE7 was released only 2 years ago and it looks like is has about a 50/50 split with IE6. Perhaps of greater concern is how many users are actually developing with PHP5? I’d bet that there are more “PHP4″ developers out there than “PHP5″. If so, then does that drop the potential number of developers active in PHP5 open source projects? Back in the early 2000s there were loads of active PHP projects - everyone was on a level playing field and writing code for the same version. Today, I’d struggle to name a popular PHP5 project outside of frameworks: where the hell is our sexy PHP5 version of OSCommerce, “PHPAdsNew”, the classic forums, or even blogs (ok, Habari). From what I can see, it’s just not the same as it used to be.

Times have changed. The web has moved on and so have the needs of its developers. I think we might have reached the peak of PHP. It’s clearly going to be a language that remains popular for years to come, but I think it’s like Perl when PHP 3 was released. It does the job, but there are compelling reasons to switch to another language which is now beyond the “fan boy” stage and is maturing (within the web-app sense). I also think it’s not just a “PHP vrs Python” or “PHP vrs Ruby” question, and also has a lot to do with how web architectures are evolving generally (distributed clouds, REST services, better caching options, prevalence of AJAX).

I’m conflicted about writing this post because the PHP job-market is really good at the moment and PHP is “breaking through into the enterprise”. I think there are more conferences going on around the world than there ever has been and there are lots of other signs that PHP is doing well. I don’t think the sky is falling or anything dramatic, but I do think the wind is changing. Perhaps the global economic situation might also affect us? There’s less capital expenditure going to be made, so does that mean less investment in rewriting old PHP4 systems to PHP5? Maybe it’ll be a good thing for PHP, in that companies won’t be risking/investing in moving away from PHP (and maybe even switching to PHP because it’s cheaper?). Or will it act to speed up the stagnation?

I plan on being a PHPer for years to come but I’ve got a sneaky feeling I’ll be having an affair with another language in the not too distant future. On that note… time to read a little more of my new Python book ;-).

Controller/Model Detox, please.

October 27th, 2008

This is a quick mini-rant on one reason why I’m annoyed with MVC frameworks.

My beef is that business logic should not be in controllers but it often is. Business logic is the “M” (model) part of MVC. The problem is the type of actions you want to perform in the model (”create post”, “view post”) tend to match up nicely with the controllers and actions (/post/add, /post/view). It’s very easy to get sucked in at this stage and create a new controller and some actions, and implement your “business logic” inside the controller framework. I guilty of this, and I’m getting pissed off at myself for doing it. It’s so frustrating to see myself doing something I know is bad.

I’ve started to think of applications “from the controller outwards” and my model has become merged into my controller (perhaps this condition is “controller driven design”). This is bad. Testing your business logic is now about testing it through your controllers, which sucks. You can’t reuse your model in another context (like command-line tools, or a webservice) without going through the controller dispatcher / bootstrap, and forget about evolving the model and refactoring it / extending it. You’re also very tightly coupled with the framework.

I feel like a noob and I know it’s a bad thing, but I can’t help but think it’s to be expected with a lot of the frameworks out there. If you’re writing a project with “XYZ Framework” then you get sucked into all the stuff it does and you feel like you need to use it as the starting point. My project is magically going to kick ass and be well written because it uses this framework! You probably start off by looking at tutorials, which focus more on the controllers than on the model. In a lot of cases the model layer is reduced to an object that represents a database table/record. This encourages the developer to write things like authorisation code and data validation at the controller level, rather than within the model.

I’m fed up of this. I need to readjust my perspective and start ignoring the framework when it comes to designing and writing my code. The controller layer should be the last thing you write, not the first; it should be a lightweight layer that interfaces between HTTP requests/responses and your model.

Moving to EC2

October 11th, 2008

I’ve been in lust with Amazon’s AWS services for the last year or two, particularly EC2. If you’ve not played with EC2 yet then I encourage you to do so. The first time you fire-up “10 servers” just to see if you can is a buzz and your mind will start going over all the cool things you could do.

It’s taken me a while but I’m not in the process of moving my personal servers into the cloud. The main reason for doing this is because I’m a geek and I want to play and learn more about AWS first-hand :D. It’ll be more expensive than a little VPS somewhere but it’s certainly not bank-breaking (should be $72/month + bandwidth @ $0.15/gig).

My first step in this move was to take control of my DNS. I don’t want to run my own DNS servers (ugh) but now that I am my own host I need to get a 3rd-party to handle this for me. I’ve head good things about DNSMadeEasy so I setup a couple of domains with them to test things out. Their interface is a bit clunky but it works: you can easily set all sorts of DNS records and their help/faq/tutorials are good. So I use DNSMadeEasy to point my domain to an IP address, and the IP address is of my EC2 instance. Amazon now have something called an “Elastic IP” which means the IP address is “fixed” to the outside world, but I can map it to any of my EC2 instances (and remap it on demand).

As well as hating managing DNS I also hate managing email. At the moment I run my own Postfix/Dovecot server and manage a set of Spam Assassin rules. My plan is to offload this entirely to Gmail, using their POP/IMAP service mainly with the web-interface as a backup. I’m not sure how this will work out just yet but it should do what I need.

As for rest of the server stuff… woot! I love messing around with LAMP stacks and experimenting with different configs amd I’m in my element with this part. With EC2 you need to create an “image” (AMI) of your server, which means taking Amazon’s base Fedora image, installing your bits and pieces, then “saving” it. You then use this image to create new EC2 instances. For this I’m actually using Alestic’s basic Ubuntu Hardy image, which is a nice barebones linux install (I don’t think it even has a GCC compiler). From that I’ve installed the latest versions of Apache/PHP/MySQL from source as well as setup Subversion (to run over svn+ssh).

One of the perceived downsides of EC2 is that it’s “flakey”. It’s not a “real” machine with a “real” harddrive, so if it crashes or goes *poof* for some reason, then you lose all the data on it (that changed since you booted up the image). Whilst true, there’s an argument that this *could* happen right now on my and your real servers: their harddrives could explode and you’d lose all the data. What EC2 is making you do is think about that disaster scenario upfront and forcing you to do something about it. One natural solution is to write backup scripts that use S3 for storage, so I’m going to be setting some stuff up that periodically backs up my SVN repositories and other data to this super-redundant-never-gonna-let-you-down storage system. Amazon have also released something recently called Elastic Block Storage (EBS) which is a “more real harddrive” than an EC2 instance’s local storage. Think of it like a NAS. The performance isn’t supposed to be as great as a normal physical machine but frankly, I don’t care. For me this is premature optimisation for projects that don’t even exist yet. I also reckon the flexibility/control that EC2 clusters provide outweighs the negative I/O aspect (it’s like writing a web app in PHP rather than C). My plan is to play with EBS for MySQL storage and see how it goes.

I’m looking forward to my new home! There are some many things I want to play with, things that without this sort of on-demand virtualisation I couldn’t otherwise do.

Quick’n'dirty localhost benchmark

October 4th, 2008

My previous post suggested wrapping an application’s data layer underneath a REST API. I was curious as to the performance implications of making an “internal HTTP” request, for exmple requesting http://localhost/people, compared to a standard “mysql_query()”. This evening I’ve been playing around with trying to see what the difference might be, and hopefully find that the performance hit is reasonable enough to continue investigating the idea. This is really crude benchmarking at the moment, and closer to anecdotal evidence than anything statistical or complete.

I’m using microtime(true) calls to record the time of script execution. I setup one script, mysql.php, to connect to the db and execute a query. The other script, subrequest.php, issues an HTTP request via my home-rolled HTTP client to http://localhost/rawmysql.php (which is the same logic as mysql.php, but without some microtime() calls). I’m using as single install of Apache running on a little linux VPS.

Good news! At a really trivial level, the overhead of making the HTTP request is literally a couple of milliseconds, which is much better than I thought it would be (test this yourself by doing a file_get_contents(”http://localhost”); on a static page you might get a sub-1ms response). I’m not sure why I thought it would be slower but my gut-feeling was I’d be looking at something in the 10-20ms region. I’m pleased it’s 10x faster than I expected ;-).

There are a bunch of other issues I’m concerned about (size of recordsets, json encoding/decoding, compression, overhead of additional http requests / additional Apache instances…) but fundamentally making an HTTP request to the localhost is fast, and a REST Data Layer definitely looks feasible - especially when I think that one of the major advantages to this approach will be having a unified caching layer (that will make it easier and encourage developers to cache).

Next, to write a simple “REST Server” and get some proof of concepts online.

Using REST internally within an application

September 29th, 2008

I’m a total REST fanboi. It’s at the very core of “the web” and it’s a great way to open up a webservice fairly easily (and in a way that can be consumed easily).

I’ve recently been working on a web-app where the UI could make use of AJAX to increase the usability; but I’ve been a little hamstrung by my approach to my “controllers”. When I want to do something via AJAX then I find myself creating a specific route for that request, which means I kind of duplicate some of my logic (and it’s a pain to write extra code).

Take “in-page pagination” as an example. My app displays a table of people when /people is requested, along with all the rest of the markup for that page (navigation, panels of other data etc). When the user wants to view the 2nd page they click on “next” which takes them to /people?page=2, and another table of data. To AJAX things up, I want to fire off an XmlHttpRequest when they click “next” which brings back a JSON string of “page 2 data”, which I can dynamically inject into the page. If I call /people?page=2 via AJAX, then I incurr the overhead of generating “all the other panels/markup” as well as it returning a full HTML page, rather than a nice JSON string; my workaround is to create a new route, /people/ajax, which returns just a JSON string. That’s pretty sucky when you scale that up to even a small “real world” application.

What would be cool is if I could automatically expose my “data mapper” (database) CRUD methods via HTTP. So instead of making a new Action for “/people/ajax” that essentially wraps a call to $db->getPeople() and then json_encodes the recordset, I could directly call getPeople() via AJAX. Hmn… I’m sure there’s a technology somewhere that would be appropriate for that! ;).

So my idea is to use REST as layer on top of my application’s data. That’s nothing new, it’s called “a webservice!” but what if we extend it to also completely wrap the data-layer from the internal application? So the only way to receive or update any data is by going through the REST interface, kinda like consuming your own internal webservice. Think of it like running 2 servers, a REST-server and a web-server (where your normal application lives). Your “application code” has to issue HTTP requests to the REST server in order to get data or to update records.

So why do this? Well firstly it (probably) makes your entire application inherently RESTful, which helps solve my AJAX example. If you’ve already got something at the REST-server level that returns data when you call “GET /people?page=2″, then you immediately have access to that same data “webservice” from inside the browser: no need for extra controllers or forking a controller based on an “X-Requested-With” header. This is also true for “normal webservices” - you immediately have webservices available for anyone else to consume (obviously, subject to your security considerations). I think this approach might also help with caching: if all data requests go through the REST-server then it provides an abstraction for the caching (your “HTTP-client” just caches requests using normal HTTP caching principals). It also locks the data-layer away from your junior developers :D (and helps teach them REST!). I haven’t thought much about it, but it might help mask other scaling issues: the client, your app, just connects to the REST-server, which worries about the multiple backend MySQL or Memcached servers and external webservices. Maybe it also gives you a unified log of all data operations (by logging the “HTTP request”) regardless of where the actual data it stored? Anyways, I think there are some nice reasons why this architecture might be pretty cool.

Now you might be thinking, “Dude, this is gunna suck! An HTTP request is going to be much slower than my normal mysql_query()!”. And you’re probably right - in that extra layers of abstraction tend to have a speed hit. But depending how it was implemented, It might not be much at all, and dare I say give better performance than some of the “PHP ORM” libraries? Firstly, the HTTP request is at worst going be on the same local network,  probably on the same local machine (perhaps a unix socket?), and at best not via any IPC at all - you could interface directly with the “REST-server” if that “server” was some PHP code (thought this perhaps defeats some of the benefits of a client/server architecture).

Actually thinking a little more about it, I don’t think the overhead would be much at all. Memcached/MySQL/CouchDB etc are often distributed over multiple machines. CouchDB is very relevant to this idea, as they make data accessible via an HTTP/JSON interface. This gives an overview of how PHP and CouchDB can work together and is conceptually similar to how I’d see a “REST data layer” working.

When I get some free time (:() I’m going to explore this idea some more :).

Good experience with Delicious’s search

September 9th, 2008

I remember reading an article recently about what a well known and respected web designer thought were the fundamental skills and knowledge-areas for people aspiring to get into web design. It’s a great article: it covers the bases of art, typography, usability and crosses them with business skills like marketing, public speaking and communication. It’s the perfect link to send someone who asks, “how do I become a web designer?” (or rather, “how do I become a good web designer!).

Recently someone asked me that question so I naturally thought of this article; but couldn’t remember who wrote it or on what site it was. I knew it was a “big name”, about web design skills/techniques/knowledge (my keywords) so I searched some of their sites but with no success. I also spent about 20 frustrating minutes “just fucking googling it” without success. So whats next? I need to search blogs so I figure Technorati: which was crap; and Google Blog Search: no dice (although I did manage to serendipitiously find two or three other great links!). By now I’m really annoyed. I know this page exists, but “my” failure to find it is driving me mad. In desperation I google “web design” channels on FreeNode (IRC) to see if my description of the article rings any bells with anyone who might have it stored in their favourites… FAVOURITES! Why didn’t I think of that earlier?

I fire up www.delicious.com and on only my second search, my long-lost page shows up as the #1 url for the keywords “web designer skills“. Jackpot! Perhaps I got lucky as this is definitely a “bookmarkable link” but I was really impressed how quickly I found it with Delicious.

Here’s the link.

Someone Actually Made An ORM Abstraction Layer

August 14th, 2008

Last year I wrote a post about the rather silly idea of creating an ORM abstraction layer… I laughed out loud today when I read that someone actually wrote one.

If you wait long enough, someone will write/build/design what you were thinking about.”

It’s only a matter of time before some bright spark decides that this needs to be extended across frameworks.