Stuff Yaron Finds Interesting

Technology, Politics, Food, Finance, etc.

Optimistic Concurrency – A False Panacea

As soon as an Internet scale service expects to allow clients to both read and write data it's a sure bet that optimistic concurrency will come up. After all, how else do you solve the lost update problem without drowning in a sea of performance crippling locks? Better yet, implementing optimistic concurrency in a service is pretty trivial. You just need some kind of change indication system (dates, e-tags, updategrams, etc.) and a 1/2 decent transactioning system (available off the shelf) and you're pretty much done. But unfortunately all optimistic concurrency does is move the 'lost update' lump under the carpet and make it the client's problem. Moving the lump isn't bad but it does mean that before declaring victory you absolutely must be sure that your clients have a workable solution for the merge issue.

[Ed. Note: I updated the article in response to a number of comments on the web.]

A classic optimistic concurrency scenario is a CRM system. Several sales associates all work with the same customer and they all contribute to maintaining a common customer record in the CRM system. However sales associates are often off line and make changes when they are offline that they expect to synch when they come on-line. In most cases only one associate made a change during any random period so the updates usually happen without conflict. But what happens when there is a conflict? Do you tell the sales associate "Sorry, I've deleted all your changes because there's a conflict?" Or better yet do you show them some kind of generic merge UI that shows current values, their proposed changes and suggestions for how to resolve the two? Anyone who has spent more than 5 seconds with real users understands just how much hatred the first scenario will cause and how much confusion the second one will result in.

Optimistic concurrency fundamentally means that when the optimism isn't justified the client has to clean up the mess, which usually means some kind of merge. And let's face it, humans don't handle merges very well at all. So in practice when there is a conflict things tend to end up on the floor. In fact, most 'consumer oriented' systems I know don't use optimistic concurrency at all for this very reason. They tend instead to use 'last update wins'. This turns out to work pretty well in practice because most users instinctively know they don't want to deal with merging and will quickly break up a work space so that each user has their own section. I see this on Wiki's all the time where different people will take ownership of a particular section of the page. People also tend to be pretty good at talking to each other so that pervasive changes are handled via a mutual agreement that everyone else will stay out until the person making the changes says o.k. Of course all these mechanisms are really just informal locks. And that makes sense. Humans understand exclusivity, locks make intuitive sense, so if the system (for good performance and other reasons) won't provide locking the users will figure out their own way to implement locks, even if informally.

Still, there are certainly examples of systems where optimistic concurrency makes a ton of sense. For example, Libor's blog gives an example of a financial trading system where an order is sent in and optimistically recorded but if previous to execution the market price changes the order will be rejected. In this case a conflict isn't resolved via a merge but rather via a rejection. Excellent. This solves the merge problem for this scenario.

In my own experience I've seen order processing systems where in the vast majority of cases the optimism is justified, orders will go through without problem. But in some cases, often several days after the order was submitted, one or more things will go wrong and the order will have to be rejected. Making the multi-step order process transparent enough to allow a customer to understand what went wrong and potentially try to fix it is usually too expensive so instead when an error occurs the system will abort the order and raise a red flag. The system admins then have to go dig through the various databases to figure out what the heck happened and how it can be fixed. This isn't pleasant but because rejections are relatively rare it is more cost effective to just handle those rejections manually then to redesign the entire system.

In both of the previous cases optimistic concurrency worked just fine, but only because there was a well understood way of dealing with cases where the optimism wasn't justified. Which brings us to the point of the article. Optimistic concurrency is good stuff but it isn't magic. Before allowing some server side developer to wave off scalability issues by sprinkling magic 'optimistic concurrency' pixie dust make sure you understand exactly how clients will deal with conflicts.

11 Responses to Optimistic Concurrency – A False Panacea

  1. Pingback: Stuff Yaron Finds Interesting

Leave a Reply

Your email address will not be published. Required fields are marked *