Below I look through options to add ad-hoc mesh support to Thali. I evaluate Wi-Fi Direct, Bluetooth, Serval, OpenGuardan and Commotion. It’s clear to me that at least the open source mesh technologies are not ready yet for prime time. They need more time in the oven. But both Commotion and Serval (which are working together) are exciting and I can easily imagine in a few years them reaching the point where they are ready for prime time. But not today.
This leaves Wi-Fi Direct and Bluetooth. Both are pretty seriously flawed for our mainline scenarios which involve opportunistic synching. They are really only useful when dealing with a small group of peers on a regular basis over a long period of time. We do have those scenarios but they aren’t as high priority at the moment. So for now I’m just not going to worry about it.
[Note: Originally published on 9/4 but updated on 9/5 thanks to Ben Mendis who kicked me in the rear to take a better look at Commotion. I updated again on 9/10 thanks to comments by Michael Rogers around limitations of Wi-Fi and options around Bluetooth. On 9/15 I updated the Bluetooth section based on more of the excellent conversation with Michael Rogers and added a section on AllJoyn]

1 Defining the problem

Thali's base communication mechanism is Tor hidden services. This enables Thali devices to reach each other regardless of what NATs or Firewalls are in there way in a manner that is resistant to traffic analysis. But what happens when one isn’t on the Internet at all? We still want Thali devices to be able to communicate so a goal has been to support some kind of ad-hoc communication mechanism. That is, if two Thali devices are close enough to reach each other directly via a technology like wi-fi or bluetooth they should be able to communicate securely and privately. Technologies like Wi-Fi Direct exist to provide this kind of point to point communication.
Ideally however we would go a step farther and use a technology that supports ad-hoc mesh networking. In an ad-hoc mesh system a large number of devices that are reasonably close to each other can automatically form a routing network. The idea is that Yael’s phone can see Muhammad’s laptop but not Yosi’s tablet which is too far away for Yael’s phone to reach via a technology like wi-fi or bluetooth. But Muhammad’s laptop can reach Yosi’s tablet, it’s in range. In a mesh system Yael, Muhammad and Yosi would form a mesh where Yael’s phone would be able to send a message to Yosi’s tablet via Muhammad’s laptop.

2 Evaluating contenders

In looking at possible technology we are looking for the following qualities:
Wireless Connectivity What wireless technologies does the contender support? Minimally we have to have wi-fi. But bluetooth and NFC are nice to haves.
Requires Ad Hoc Wi-Fi Because of wide ranging legacy support it’s very popular to use ad-hoc Wi-Fi as a communication medium. But ad-hoc Wi-Fi is slow (capped at 11 Mbs), doesn’t allow the device to also be in infrastructure mode so they lose their connection to the Internet and on Android it requires rooting the phone. These are all show stoppers for us. We really want to see Wi-Fi Direct which suffers from none of these problems. We understand why ad-hoc WiFi is a good choice for many people (especially in poorer nations with older devices) but we are targeting more modern devices so it isn’t a good choice for thali.
Open Source/Open Standard Thali is an open source/open standard project so we need an ad-hoc mesh technology that is as well.
Supports IP We really don’t want to re-invent the Internet so we want an ad-hoc mesh technology that lets us route IP packets. That means it needs to be able to assign local IP addresses, listen for communications, etc. Ideally it would show up as an IP enabled network adapter.
Multi-platform Thali wants to run everywhere. Right now that means Android, Linux, Mac and Windows. But we also want to add iPhone and Windows Phone. So we want an ad-hoc mesh technology that either can or has a clear story for how it will reach all of those platforms.
Plays well with Java All our current software is written in Java so we really want a technology that will play well with our existing code.
Supports direct ad hoc connections Enables two devices to talk directly to each other.
Supports discovery Has a mechanism to enable devices within in range of each other to discover each other
Supports ad hoc meshes Is able to form a mesh for relay routing.

3 The contenders

3.1 Wi-Fi Ad-Hoc Mode

This isn’t really a contender but it’s going to cause enough confusion with Wi-Fi Direct that it’s worth calling out the differences. Wi-Fi Ad-Hoc Mode is part of the original Wi-Fi standards which contained two basic modes - infrastructure and ad-hoc. Most folks are used to infrastructure mode where there is a single access point that everybody connects to and all communication goes through that single access point. In ad-hoc mode it’s possible to set up peer to peer connections but a lot of key wi-fi features are missing.
  • There is a speed limitation of 11 Mbps.
  • There apparently is no signal strength monitoring.
  • There is no standard discovery story.
  • The security story is WEP although Windows 7 supports WPA2
  • A Wi-Fi adapter can be in infrastructure or ad-hoc mode, but not both. With Wi-Fi direct it’s possible for an adapter to be in infrastructure mode and simultaneously accept Wi-Fi Direct connections. So one can be both connected to the Internet and using Wi-Fi Direct.
Wi-Fi Direct, discussed below, addresses all of these issues.

3.2 Wi-Fi Direct

Wi-Fi direct is a standard from the Wi-Fi Alliance that is intended to make it easy for devices to communicate in an ad-hoc fashion.
Unfortunately Wi-Fi Direct as implemented by Google in Android is close to useless to Thali. The reason is outlined here. The problem is that every connection request to the device over Wi-Fi Direct must be confirmed by the user. For a typical Thali scenario such as walking through a conference opportunistically exchanging content with people you know, this would be a nightmare. The user would be endlessly prompted with connect requests and would have to spend their time hitting ’ok’. Google did provide a private API to override this behavior but apparently it was removed in later releases of Android. There are some work arounds on rooted devices but right now we don’t want to go the rooted route.
Note that there are still scenarios where Wi-Fi Direct could be useful. If you have a small group of people who frequently interact without Internet infrastructure for long periods of time then Wi-Fi Direct is just fine.
Wi-Fi Direct explicitly supports legacy clients. It is possible to make a Wi-Fi Direct ’group’ (a dynamic meeting of Wi-Fi direct clients) look like a normal Wi-Fi access point thanks to the ability of Wi-Fi Direct groups to have SSIDs that are visible to legacy clients. However I believe that for the legacy client to connect it will need to get a passphrase. In Android’s case it automatically generates a passphrase for each group but getting the passphrase is tricky. For example, imagine that we have two clients using Wi-Fi direct and a third that is legacy. One of the two Wi-Fi direct clients will become the group owner and only that client will have the pass phrase, the other will not. Since there is no a priori way to know which client will be the group owner both users have to check their clients to see if they have the pass phrase. Furthermore it’s possible for a device to be a member of multiple groups in which case figuring things out probably involves at least a few exploding heads. All of these problems are potentially addressable but... um... yuck.
Wireless Connectivity Well, it supports Wi-Fi and Wi-Fi and um... Wi-Fi.
Requires Ad Hoc Wi-Fi By definition, no.
Open Source/Open Standard It is 100% NOT an open standard. In fact, if you aren’t a member of the Wi-Fi Alliance then it costs $99 just to see the Wi-Fi Direct specification! There are open source implementations of wi-fi direct though but I’m not sure how they pulled that off. It’s also not clear what the costs are to implement or if only hardware manufacturers have to pay. It’s confusing.
Supports IP Yes
Multi-platform Android since 4.0 has support built in. Windows appears to support it staring with Windows 7. Linux appears to have support but I think it’s only with a special kernel. Neither OS/X nor iOS have support. Windows Phone 8.0 and above appear to have support if the hardware supports it.
Plays well with Java On Android, yes. On Windows and Linux, not so much. Windows only seems to offer Wi-Fi Direct APIs via Win32 so we would have to use JNI to set things up. Linux? I’m guessing it’s all C level APIs.
Supports direct ad hoc connections In short, yes. But every single connection has to be confirmed by the user.
Supports discovery It enables devices to advertise themselves using name/value pairs so it’s a reasonably flexible discovery mechanism.
Supports ad hoc meshes No. It’s model is still hub and spoke, it just negotiates who the hub is.

3.3 Bluetooth

Bluetooth is, at its essence, a wire replacement protocol. It is not intended to be used over long distances the way Wi-Fi is. It can support reasonably high data rate speeds but only very locally.
Wireless Connectivity Bluetooth’s protocol is designed to work well up to around 300 feet. Data rates have gone up with Bluetooth 3.0 introduction to 26Mbps.
Requires Ad Hoc Wi-Fi No.
Open Source/Open Standard I haven’t investigated this enough. It sorta looks like if you just don’t actually use the Bluetooth logo you could at least implement software compliant with the publicly available standards for free. I strongly suspect that there is a fee to implement the hardware due to some kind of group licensing consortium.
Supports IP Sorta. There is a native serialization protocol that looks kind of like TCP but is not. So we can’t just drop a Bluetooth server socket into an HTTP server and expect anything to work. However they do support streams so we could probably hack up something. There is a profile for Bluetooth called Personal Area Network (PAN) that does support IP but it is not supported natively on Android.
Multi-platform Yes.
Plays well with Java At least on Android.
Supports direct ad hoc connections Mostly, yes. Generally Bluetooth only wants to connect paired devices. This requires switching to discovery mode (which requires a user confirmation) and then approving a pairing (which also requires user confirmation). Then anytime the two devices are near each other (see below) they can connect securely. However there is another way to connect, at least in Android land, which is mentioned below. Using createInsecureRfcommSocketToServiceRecord and it’s other half listenUsingInsecureRfcommWithServiceRecord. This just requires exchanging UUIDs and does not require pairing or any user dialogs.
Supports discovery Sorta. At least in Android there are two different ways to handle discovery. The traditional way is to switch the device into ’discovery mode’. This enables the devices to be discovered by other Bluetooth devices. But making a device ’discoverable’ requires (at least in Android land) getting the user to click on a dialog and it only lasts up to 2 minutes or so and then the device isn’t discoverable anymore. So for the kind of opportunistic synching we want where two users devices will just sync without bothering the users it’s pretty much useless. However there is another choice for discovery. The way this choice works is if, through some magic, one device knows the other device’s UUID (each Bluetooth device can advertise UUIDs) then at least in Android the first device can call the createInsecureRfcommSocketToServiceRecord API, pass in the UUID of a device it’s looking for and if the device is in range create a connection. It seems like this works via a Bluetooth protocol called Service Discovery Protocol and it works below the level of normal discovery. So in theory if one wanted to see if say any of the 200 people in one’s address book were around then one has to start pumping out 200 different connection requests and see if any work. We have no idea what the battery implications of constantly cycling through all 200 or whatever addresses just to see if they are around is. In theory this could be done incredibly efficiently but it’s unclear how Bluetooth stacks in general handle this.
Supports ad hoc meshes No.

3.4 Serval

Serval is a project to enable phones to use wi-fi to build meshes for voice, pictures, web, etc. It’s mostly focused on disaster recovery and providing services in under served or outrageously expensive areas. The website is a typical eye chart (we are too, so I’m not pointing fingers) which makes it hard for me to be sure what’s really going on. But I think the core of their project is the Serval DNA.
Wireless Connectivity They seem focused on Wi-Fi but it’s not clear to me if they are using Wi-Fi ad-hoc mode or Wi-Fi direct. Given the project’s affinity for Android I’d imagine they intend to use Wi-Fi Direct.
Requires Ad Hoc Wi-Fi I’m not completely sure, it looks like it does.
Open Source/Open Standard Their code is GPL 2.0 which pretty much means we can’t touch it since we are Apache 2.0.
Supports IP Sorta. It’s clear they aren’t terribly interested in supporting IP but apparently they currently expose their functionality over a UDP port (not TCP) to integrate with systems like Linux that expect that. However it also appears that they have implemented TCP over their Mesh Streaming Protocol and that it terminates locally. So if you go up high enough in their stack it seems you can find TCP.
Multi-platform Their focus seems to be on Android but they also run on Linux and Mac.
Plays well with Java Their core code is in C so JNI it is.
Supports direct ad hoc connections I believe so.
Supports discovery Yup.
Supports ad hoc meshes That would be the point.

3.5 OpenGarden

Because of their successful app FireChat they get a lot of attention but their website is less than informative. The only thing I know for sure is that at the time I wrote this is that their SDK is not yet available and what little code they have released is GPL v3.

3.6 Commotion

This is a project to build metro area scale ad-hoc mesh wireless networks. Right now the project looks to be in extremely early stages and is not ready for anything like prime time (I know the feeling). They do use Serval for their key distribution but I’m not exactly clear on the relationship since they just talk about Serval in the context of crypto but Serval is its own stand alone mesh system. So do they use a mesh to handle keys for their mesh? Or are they using something more limited? It’s hard to tell.
Wireless Connectivity In theory it could run on anything but they support Wi-Fi as their primary mechanism.
Requires Ad Hoc Wi-Fi Yes, it appears they do.
Open Source/Open Standard They are AGPLv3 which pretty much means we can’t touch them with a 100 foot pole, at least not without turning them into some kind of executable we talk to over the local network (as in how we handle Tor).
Supports IP Yes, it looks like it.
Multi-platform They have support for Android (rooted) and that’s it for the PC world. They are working on commotion client but it is currently unstable and clearly marked as not working. It looks like they aspire that commotion client will support Linux, Mac and Windows.
Plays well with Java Everything seems to be in C, so JNI it is.
Supports direct ad hoc connections Yes.
Supports discovery Not yet anyway. According to their architecture page they haven’t implemented features like naming yet.
Supports ad hoc meshes That is their whole point in existence.

3.7 AllJoyn

It took awhile to figure out but it seems the real docs for AllJoyn are here. AllJoyn came up because it provides essentially the same features as UPnP. That is, a way for devices to discover, connect and control each other. But AllJoyn actually doesn’t solve the key problem we need solved, which is the ability to dynamically create connectivity where there is no Internet connection. AllJoyn assumes the device is already connected to either normal Wi-Fi or Wi-Fi direct and then builds from there.
Wireless Connectivity Wi-Fi Based but can in theory run on other transports
Requires Ad Hoc Wi-Fi No
Open Source/Open Standard
Supports IP YES!!!!!!
Multi-platform Yes, with support for Android, iOS, Linux and Windows. No explicit Mac support but I suspect the Linux code could be adapted.
Plays well with Java I’m not 100% sure. Certainly on Android but I don’t know if the desktop SDKs expose Java APIs or just native.
Supports direct ad hoc connections Sorta. Again, AllJoyn just ’assumes’ there is a Wi-Fi connection and then it can do it’s thing. But it doesn’t really help resolve any of the issues identified with getting the Wi-Fi connection in the first place.
Supports discovery Yes.
Supports ad hoc meshes No.

4 The conclusion

One of the things that got me so incredibly excited about Thali is that the core building blocks I needed all existed. These turned out to be CouchDB, mutual SSL auth and Tor hidden services. They were there, just waiting to be used. This gave me confidence that the core Thali mission was doable, even with unbelievably constrained resources. I do not have the same confidence about mesh networking. Honestly the whole area is a bit of a mess. I’ve only listed a few contenders above but there are lots more. That isn’t a bug, it’s a feature. It means people are actively working on the problem and things will hopefully get better and better until we reach the point where the mesh is ready for prime time.
But for now our primary scenario, opportunistic synching, isn’t terribly workable with Wi-Fi direct because of Android’s insistence on requiring user confirmation to join groups. Bluetooth is theoretically workable but only over short distances and with unknown effects on battery life due to issues with the discovery mechanism discussed above. Besides Android doesn’t support the PAN Bluetooth profile so we can’t really do IP over Bluetooth. We would instead have to take the stream from Bluetooth and try to hook up both SSL and HTTP over that. I’m not sure if that is even doable, that is, if the SSL stack would be happy with just a stream and not try to mess with anything below that. But in any case, this is a mess that I’ll be avoiding for right now.

A Wait, isn’t a direct (or mesh) connection a traffic analysis bonanza?

Sorta. So right now our wireless devices, especially Wi-Fi, use fixed MAC addresses. This provides a constant ID that can be used to follow us around. But unless there is extra data to associate that MAC address with an identity then it just says “I’ve seen this person before” but not “This is Joe”. If Thali were to do something obvious like advertise a user’s public key as part of mesh discovery then we would not only have a fixed ID like a MAC address but we would be taking it further by using an ID that presumably can be easily associated with an identity. It would be the moral equivalent of doing discovery via email address or SSN. Obviously, a bad idea.
So we will have to use a different approach.
There are a couple of approaches to deal with this situation. First, we can decide to not do worse but not do better. If we assume that MACs are going to stay static then we don’t need to worry about using the same ID. We just have to worry about using an ID that isn’t immediately mappable to a user’s primary key. We could do something as simple as create a second public key to advertise that is only used in mesh and communicate that to friends and such. This is easy and cheap and reduces the problem back to MACs.
But I am hoping we will start to see MAC addresses changed randomly. There are reasons why this can hurt but it’s necessary if we are to pass the security laugh test (along with the ability to completely change how cell phones work, but that’s another story). Once this happens then the mesh ID for Thali would become a security hole, not a feature because we would be re-introducing a constant ID.
So this leads to the second approach. In theory we could try some fun games. For example, imagine that someone has 500 people in their address book. They could advertise their public key 500 times, each time encrypted with the public key of their friends. Of course you don’t want to advertise who your friends are. So what you would actually do is your identity with each of your friend’s public keys but not include any of the headers to identify which public key you used to encrypt the value. The result? Everyone else in the mesh network would need to slurp up the 500 keys you just published and then try to decrypt each and every key to see if any of them use their key. Now multiply that by how many people are on the network. This isn’t impossible btw. Say 500 addresses per user * 1000 users on the local mesh = 500,000 keys to test. Today that’s a bit much for a phone but in a year? Two years? Computing is growing vastly faster that human scale. Pretty much anything on human scale is going to be squashed by computing available. So maybe this kind of brute force approach makes sense?