Solving my multi-master synch problem – Well Duh, Couch DB
Friday August 23rd 2013, 9:49 am
Filed under: SOA/Web/Etc.
Filed under: SOA/Web/Etc.
I really need to synch both arbitrary structured data and blobs in in a multi-master peer to peer environment. Oh and I really don’t want to write the code to make this work and it has to work on a variety of mobile, desktop and cloud environments. And yes, I want a pony with that. Thankfully there are pony’s for everyone! The solution? CouchDB, duh!
1 Apache CouchDB Replication Protocol
Overview This is the protocol used internally by CouchDB. This is a HTTP based REST protocol which does not assume there is any particular master, just peers. Each synch however is directional with a source and a target. So to bi-directional synch one needs to run two synchs, one in each direction. The target remembers the last synch ID it has seen and uses it to request a list of changes from the source. The list of changes identify specific revisions which are changes to individual docs. Which are then retrieved and used to update the target. Note that neither the source nor target need to necessarily be involved in the synching. CouchDB is a DB, it has a set of REST APIs. Anyone with read access to the source and read/write access to the target can run the synch algorithm.
Standard License Apache 2.0.
Authentication Mechanisms CouchDB seems to support both name/password w/cookie support and OAuth but honestly any HTTP friendly approach should work.
Authorization Mechanisms I didn’t see anything about this but I didn’t look very hard. My guess is that if anything is supported it’s based on some kind of filtering.
Structured Data Support Mechanisms Their internal model are documents which are each defined using JSON. Each document contains an ID and is identified in the URL path by that ID. Each change to a document causes a revision entry to be created that specifies exactly what changed in the document. One can then ask for that revision ID to see just the parts of the document changed by that revision.
BLOB Support Mechanisms Documents can have one or more attachments, each with its own ID, associated with a document. There doesn’t seem to be any diff support or other smart mechanism for handling changes to blobs.
Bi-Directional Synch Support Mechanisms The protocol is unidirectional. So if you want to have multi-master then you must do a synch in each direction. But as will be seen later CouchDB makes no assumptions about there being a single master, the result of this will be seen in the conflict resolution support.
Multi-Party Sync Support Mechanisms You would be expected to set up a bunch of point to point synchronizations, one in each direction. See below for conflict resolution.
Conflict Resolution Model Support If a conflict is detected between the state of a document in the target and in the source then one of the two versions of the document will be made into history (and recorded) and the other version will be declared the winner. A fairly arbitrary process is used to pick the winner and the record is marked as conflicted so dedicated software can come along later and try to fix things in a more reasonable way.
2 The dead & the interesting but not quite right
The following are projects that are in the right area but no longer appear to be alive: OpenSync & FeedSync.
2.1 Sync systems
BitTorrent Sync A peer to peer synch protocol although it seems mostly focused on large files. The website has some data but it doesn’t appear terribly open.
Gnome Conduit This is really more code than protocol. It’s written in Python and has a fairly complex set of capabilities including multi-master, extendable data types, conflict resolution, etc. But I’m looking for something that is a protocol first and is widely supported.
SyncML/OMA Data Synchronization V2.0 I didn’t look too deep into this protocol both because it appears to be client/server but also because I don’t see any kind of open source community around it.
Wave It still lives as an Apache incubator project. But honestly I think it tries to do too many things and it doesn’t seem to be making progress towards building the kind of software I can grab and run with in all the environments I need to support.
2.2 Distributed Databases
MongoDB Uses master/slave replication, not appropriate.
Riak This is really dynamo which is all about distributing a single ’logical’ database into shards with redundant copies across many nodes. But I’m really looking for multi-master synch where all the nodes are independent but have data they need to synch with each other.
Redis Uses master/slave replication, not appropriate.
Cassandra See comments on Riak.