In a previous article I argued that I needed some kind of journaling/backup for my Windows Azure Tables in order to handle my own screw ups. In this article I re-examine the value of versioning for recovering from self inflicted data corruption. Discuss backups as a possible substitute for versioning. Look at what versioning might look like if added as a native feature of Windows Azure Table Store and finish up by proposing a design that would let me implement versioning on top of Windows Azure Table Store.
This article is part of a series. Click here to see summary and complete list of articles in the series.
2 What about backups?
3 Imagining a versioned Windows Azure Table Store
4 In place versioning on top of the table store
The value of versioning in recovering from application errors has already been covered here and here. But to summarize - when it hits the fan versioning can help one figure out if the original damage has been compounded by subsequent changes. Furthermore version, by providing the outcome of a command let’s one examine what happened in the past with less baggage than needed to understand the past then the ’replay’ issues that plague command journals.
Versioning is also useful as a last ditch ’go back in time’ mechanism where if the damage is just too great to repair at least the system can provide the option of turning back the clock to some better state. Although one shouldn’t overstate the utility of this feature. In non-trivial cases there will be a variety of side effects of ’turning back the clock’ that will be hard to control and the clock can’t go too far back or issues with schema changes, functionality changes, etc. come into play. Many of the same issues with replaying command journals apply to using versioning as an emergency escape hatch to the past.
So while versioning is useful, I suspect that command journals and tombstones in the average case probably provide the most bang for the buck. My real hope is that systems like Windows Azure Table Store will offer versioning as a feature so the cost and complexity of taking advantage of versioning will go way down.
As discussed below implementing versioning on top of the Windows Azure Table Store, while not brain surgery, isn’t trivial either. A much simpler technique would be to regularly backup tables. This can be done in the background without having to interfere with normal operations. So it’s less risky.
Backups work using snapshots. At regular intervals the table is read in (typically with a filter that ignores values that haven’t changed since the last snapshot) and a snapshot created. Unfortunately snapshots miss things. If a value is changed multiple times between snapshots then the intermediate values will not be recorded.
This leads to situations where if a buggy command is given between snapshots and then the buggy value is overwritten just before the snapshot I have no way of knowing what the original value was unless I can replay the command (which is tricky and assumes that the value produced by the bug is predictable). This makes it more or less impossible to handle the put syndrome since I can’t see if the same value was written twice or a new value written.
For similar reasons backups are not useful when dealing with the etag syndrome since it’s at best just luck if the snapshot happens to have captured the correct system state at the time the command was executed.
Also backups don’t deal at all well with deletes. Unless one copies the entire table during every snapshot (a rather expensive proposition) then any deleted records will be missed. So if one is going to implement delta based snapshots (e.g. just copying things changed since last snapshot) then one also needs to implement tombstones and backup the tombstone table as well.
If a transaction is in progress during a snapshot then only the parts of the transaction that occurred before the snapshot will be captured, those coming after will be missed until the next snapshot. So restoring from the most recent snapshot means restoring the system to an inconsistent state. While inconsistency happens anyway in loosely coupled systems its one thing for a user to issue a command that fails in a bad way, something the user is generally told about. It’s another thing for the system at some point to just ’shift state’ to some previous, inconsistent, point and users are then told to pick up the mess.
Still, for all of that, at least backups offer some hope of turning back the clock in the case of hopeless data corruption so perhaps they do have some value.
Versioning tends to come in two flavors, linear and non-linear. My belief is that Windows Azure Table Store only needs linear versioning. My reasoning is that if one looks at the table store using the lens of the CAP Theorem then one notices that Windows Azure Table Store focuses on availability and consistency. If one is willing to give up partition tolerance (which Azure Table Store has) then most of the use cases for non-linear versioning go away. It is possible in a consistent system to enforce an order, even without locking, thanks to optimistic concurrency which the table store supports.
So if the table store supported linear versioning then the experience would be that every write would cause a new version of a particular row to come into existence, I’ll call that the tip version.
All existing store commands would work exactly as they do now but would only apply to the tip version and in the case of POST and PUT would create new tip versions. The delete command would create a tombstone entry stating that the row was deleted. The tombstone entry would be invisible to all the existing Windows Azure Table commands.
I don’t think that check-in/check-out semantics are appropriate to a highly distributed system like the table store so the commands available to a versioning aware client would actually be quite limited. I would add a way to specify a version in the URL of a row (say with a query parameter) as well as a query to include versions in the output of a table query. Finally I would support the ability to destroy (as in delete without trace) a row. I don’t know that much more than that is really needed in terms of interacting with older versions.
[Disclaimer: The following is more of a mental exercise. I haven’t had time to actually mock this up and make sure all the details are right.]
Right now Windows Azure Table Store doesn’t offering versioning so I’ve given some thought to how I might implement versioning myself on top of the table store. The services I work on that use the table store tend to be high read and low write. So I want an approach to adding versioning to the table store that places the cost of versioning more on writes than reads. I also want an approach that is more or less guaranteed to produce consistent output. That is, I don’t want to end up in a situation where the state of my production tables and my version history are out of whack. The whole point of introducing versioning is that it’s correct and complete so I can reason about certain things that would otherwise be hard to do. If I can’t get consistency in my version store I might as well use backups which at least are simpler to implement. Thankfully the table store provides the features to meet all of my requirements including consistency.
The approach I would use is in place versioning. That is most current version of a row (referred to as the tip) and its previous versions all live in the same partition in the same table. This is the opposite of the approach I used with tombstones because in the case of tombstones consistency wasn’t a problem.
In the in place versioning approach the tip version of any row will have whatever partition key/row key it is supposed to have plus the prefix ”tip” on the row key. This means that anytime I want to interact with the tip version of a row I just generate the expected partition key/row key and add in ”tip” as a prefix on the row key. This makes reads fast.
Every column I’m versioning will contain a version ID which is a monotonically increasing integer. The first time I create the ”tip” version of a row (e.g. when the row is first created) I will give it the version number 0. When updating a row I will copy the old value and give it the prefix ”old”. Then I’ll update the tip version and increment its version number. The key to consistency with an in place versioning approach is that it’s possible to both create the old version and update the tip atomically. The table store’s entity group transaction mechanism is guaranteed to be atomic and so can be used to solve exactly this problem.
To version enable a table store table I need either to build a proxy or a library. My guess is that I would use a library to save the processing and network time of a proxy but what is really nice about a proxy is that I can use the proxy as a lock down mechanism. I can make sure nobody but the proxy has the key to the table so if someone doesn’t go through the proxy then they don’t get access to the data. That alone will prevent tons of bugs. By having a single proxy I can also more easily control issues like versioning of the proxy code which deals with a whole other set of bugs. But proxies do demand both a processing and latency cost so I have to consider that in deciding between proxies and libraries.
The following goes through the standard methods in their non-version aware form and explains how their behavior would change if one was using a version aware library/proxy to interact with the table store using an in place versioning approach.
- If the query contains a filter that specifies a rowkey then prefix the rowkey value(s) with ”tip”. In all cases add the filter argument (if it doesn’t already exist) of ”rowkey gt ’old”’. This will filter out everything but tip versions of rows since ’tip’ comes after ’old’.
- A GET is needed to retrieve the current ’tip’ version. If none exists then the request should fail since there is nothing to delete. If the tip version does exist then create an entity group transaction that includes creating a new row to act as the tombstone with a column ’tombstone’ set to true as well as a delete command for the current tip that includes the etag from the GET in an if-match header.
- First retrieve the existing versions ’tip’ (using an etag if one was provided in an if-match or equivalent header). If there isn’t one then the resource doesn’t exist or has been deleted and so the request should fail. If the ’tip’ version exists then an entity group transaction is needed to update the tip version as previously described but use if-match with the etag retrieved from the original GET.
- The logic is the same as PUT for all intents and purposes. It’s just that values not specified in the MERGE request have to be retrieved from the soon to be replaced ’tip’ version to create the ’old’ prefixed copy.
- Check to see if a tip version exists. If so, then fail. If not then check to see if there is a tombstone. If so then issue the POST request with the version number set to an increment of the number in the tombstone. If there is no tombstone then the version number is 0 and row key will have ’tip’ added as a prefix.
- Entity group transaction
- In essence just glue together the instructions for the individual methods mentioned above and apply to the contents of the entity group transaction. Entity group transactions even support if-match headers.