Google GData, or RESTful PUT done right
I’ve been looking at Google GData as a source of inspiration for protocols designs for some stuff we’re doing at $REALJOB, and I must say I’m impressed: they managed to add some real value to Atom and RSS, while keeping things as simple as possible.
In particular, we’ve been having some discussion at work regarding the PUT method: someone expressed their worry about the fact that it’s not possible to do partial PUTs. This made me half-smile in recollection of the CAP fiasco1…
Let me go back one step. This is actually a classical problem in data bases: how do you modify data while guaranteeing consistency and maximum parallelism. If you define consistency over a whole object (as opposed to only caring about single attributes), you don’t have a choice: partial modifications won’t cut it.
Allowing partial modifications means different clients can modify an object at the same time, as long as they don’t touch the same attribute. Think about a calendar event: a client might want to change the meeting title, while another might modify the description, and there’s no reason why that wouldn’t work (except that there is no way of knowing whether that’s semantically acceptable).
But now both clients have outdated data, so they need to fetch the latest version if they need to continue working with it. There is even a race condition here, but for the sake of argument, let’s ignore it. Remember, this is still the easy case.
Then you have to handle conflicts, that is two clients trying to set the title to the same or to different values. In that case some piece of software would need to decide who gets to proceed and who needs to retry. What’s worse, the way objects to change are specified often makes it so changes aren’t idempotent, meaning you can’t simply redo them and obtain the same result.
You can make partial modifications safe if you lock the object, but you would need to keep a write lock for extensive amounts of time, reducing concurrency and introducing the usual complexity associated with them.
The alternative is to always do full stores: even if a clients alters just one attribute, it needs to store back the whole thing. In other words, a store is an idempotent operation.
You might see the issue with this: two clients that perform a store would overwrite each other’s data, it would seem. However, there is a proven technique to avoid this: optimistic locking.
What this boils down to is: you add a ‘version’ attribute to each object; a write2 only succeeds if the version is the same as expected, and a write automatically increases the version number. In case two clients run to make changes, one of them will atomically perform the change; the other one will get an error, and will be forced to reload the object from the store. It’s up to the client whether to reapply some or all of the changes, possibly with help from a user.
If people are willing to put up with the complexity of partial modifications, there must be some clear advantage, right? Well, yes, if you care a lot about how much data you send around. In fact, I’d contend it is really only a data compression technique; and you could apply some kind of delta reduction to optimistic locking too.
So to go back to RESTful protocols, GData and my $REALJOB: Google did something smart, which is embedding the version number inside the edit URI you PUT objects to. Clients PUTing to obsolete URIs get back a 409 error with the new editURI. This is simple, effective, and unless you’re dealing with megabytes of data, also efficient enough to not care.
1 I know there finally is an RFC for CAP, but I still consider it a fiasco because it took way too long, so long in fact that most projects interested in it moved on a long ago to other protocols, or to other jobs in my case.
2 The same holds for deletes.
Comments
Leave a Reply