Thali’s Story -1 – Getting Node.js and Local P2P to play well

As part of developing Thali we have a series of stories. One of our stories is -1 whose goal is to enable the native discovery and high bandwidth P2P frameworks we have on iOS and Android to successfully work with JXCore’s implementation of Node.js on those platforms. We originally had intended to start with story 0 but we have had enough challenges at the native layer that it made sense for us to just break out a simpler story using TCP/IP sockets rather than the full PouchDB stack as our first baby step. This article explains the simplification.

1 -1 Deliverable

Our goal in this deliverable is to enable a phone to discover other phones and then raise a notification in node.js (running via JXCore) that another phone has been found. We then want to enable phones that have discovered each other to be able to connect to each other. The discovery part we have well underway but the part that seems to confuse folks a lot is exactly how the “connect” part works.
In -1 we have simplified the living daylights out of this requirement so we can get some basic infrastructure going. The requirement now is that we will stand up a TCP/IP socket server in node.js. The server listens for connections and then connects to a raw TCP/IP socket. In our case our server implements echo, that is, whatever it receives on input it will just send back to output.
So the trick is for Phone A to discover Phone B, then for Phone A to establish a connection to Phone B and to use a TCP/IP client on Phone A to talk to the TCP/IP socket listener on Phone B.
Now for the test to fully pass this has to simultaneously happen in both directions. That is, if phone A and B are within range of each other than both phones should discover each other and both phones should have TCP/IP clients that will then connect to the TCP/IP socket listener on the other phone. I use the term simultaneously in the loosest possible sense, I don’t mean that anything happens in lock step. Only that both device should see each other at roughly the same time and finish exchanging data at roughly the same time.
And yes, this is as straight forward as it sounds. Well... sorta.

2 Android complications

We are currently using Bluetooth for high bandwidth communications in Android. The good news is that Android’s Bluetooth uses a client/server paradigm to handle connectivity. That is, there is a Bluetooth client socket and a Bluetooth server socket. And yes, you can use both at the same time.
The way the logic works is that both phones will start up Bluetooth Server sockets and block on an “accept” call on each waiting for someone to connect.
They will then both start local discovery to see if they can find any peers.
If they find a peer then the peer will open a Bluetooth Client Socket which will terminate at the other peer’s Bluetooth Server Socket. At this point the “client” peer has an output stream going to the “server” peer and an input stream to receive data from the “server” peer. At the same time the “server” peer will have an input stream from the “client” peer and an output stream to send data to the “client” peer.
Just to make things more fun it is not just possible, but necessary, that both phones are simultaneously client and server to the other. In other words Phone A will have a client Bluetooth socket terminating at Phone B’s Bluetooth server socket and more or less simultaneously Phone B will have a Bluetooth client socket that terminates at Phone A’s Bluetooth server socket.
Now this is all well and good but there is still a problem. Bluetooth doesn’t speak TCP/IP and our test is about TCP/IP clients being able to send messages to TCP/IP servers.
The answer is, we need to build two TCP/IP - Bluetooth bridges.

2.1 TCP/IP Client Bridge

The good news is that both TCP/IP on Android and Bluetooth on android use stream abstractions to describe communications. So what we need to do is “cross the streams”. That is, when the TCP/IP client writes to its output stream we need to copy those bytes over to the Bluetooth client socket’s output stream. Simultaneously when bytes come in over the Bluetooth client socket’s input stream we need to copy those bytes to the TCP/IP connection’s input stream. And yes, it really is that simple. But who does the copying? That is where the TCP/IP Client Bridge comes in. When we want to create a connection from the TCP/IP Client Socket to the Bluetooth Client Socket what we need to do is stick a TCP/IP Server Socket in between. This is the TCP/IP Client Bridge (it’s called a client bridge because it’s bridging a client, even though it uses a server socket to do so). A picture will probably make this easier to understand.
TCP/IP Client Socket’s Output Stream ---> TCP/IP Client Bridge’s Server Socket’s Input Stream ---> Bluetooth Client Socket’s Output Stream
TCP/IP Client Socket’s Input Stream <--- TCP/IP Client Bridge’s Server Socket’s Output Stream <--- Bluetooth Client Socket’s Input Stream
So to make this work what we have to do is when the node.js client finds out about another device and asks the native library to create a connection the native library needs to:
  1. Open up a Bluetooth Client Socket to the remote device
  2. Set up the TCP/IP Client Bridge on any available port
  3. Return the port the TCP/IP Client Bridge is listening on to the node.js code
At that point the node.js code will point its TCP/IP Client to the port where the TCP/IP Client Bridge lives and now we can have the TCP/IP client’s data shared across Bluetooth.

2.2 TCP/IP Server Bridge

Of course what use is a TCP/IP client if we don’t have a TCP/IP server? In our test we will stand up a TCP/IP server on each device and that device will share its port with the discovery infrastructure provided by Thali. The idea then is that as soon as Thali gets the TCP/IP server’s port then Thali will set up a Bluetooth Server Socket. The goal is that when a Bluetooth Client socket on a remote device terminates a connection at the Bluetooth Server socket on this device that all content sent in and out will be relayed to the local TCP/IP server. The setup is essentially the same as above.
TCP/IP Server Socket’s Input Stream <--- TCP/IP Server Bridge’s Client socket’s Output Stream <--- Bluetooth Server Socket’s Input Stream
TCP/IP Server Socket’s Output Stream ---> TCP/IP Server Bridge’s Client socket’s Input Stream ---> Bluetooth Server Socket’s Output Stream
So in this scenario everything starts when the Node.js code starts a TCP/IP Server listener and then passes the port of the listener to the Native code as part of starting discovery. The Native code sets up the Bluetooth Server Socket and blocks on accept waiting for connections. When a connection comes in the native code will start the TCP/IP Server Bridge which is basically a TCP/IP client socket that will connect to the port that was passed in by the Node.js code and relay any data received from the Bluetooth Server Socket’s connection.

2.3 What about failures?

Bluetooth connections fail for all sorts of reason. It’s not just that connections disappear. It’s that sometimes they will appear to stay open but no matter what no more data will be transmitted. To deal with that we have time outs that will tear down the connection if no data is sent for long enough.
For now we want to use a very simple error model. Which is that if anything goes wrong at the Bluetooth layer then the connection is to be torn down. The TCP/IP Client Bridge should return a TCP/IP connection exception and the TCP/IP Server Bridge should close its client connection. In both cases this will tell the Node.js code that the TCP/IP connection is gone. At that point it is up to the Node.js code to decide how to handle this.
In the case of a TCP/IP server there is nothing for the node.js code to do. Either the client reconnects or it doesn’t.
The more interesting question is how the TCP/IP client code should handle a connection failure. It’s job is to be listening for events and if, while the TCP/IP connection fails, it gets a notification that the remote peer it was talking to isn’t around anymore then it has to know to give up.
If, on the other hand, the TCP/IP connection fails and no notification is received that the remote peer has left the area then it is up to the node.js client code to ask the native layer to please create a new connection to the remote peer. In that case the steps described above in the TCP/IP Client Bridge section will be repeated and the node.js client will receive a new port to point its TCP/IP client at.
So the TCP/IP Client Bridge (and its server counterpart) are intended to be dumb as posts. There is no error recovery logic.
Now, at some point we may want to change that. I have some ideas that would make error recovery a lot nicer for the node.js code but that is a problem for a future story.

3 iOS Complications

iOS is actually more complicated than Android because there we use the multi-peer connectivity framework for high bandwidth communications and unlike Bluetooth on Android, it does not use full duplex connections. Instead one of the types of data one can send are named streams. A named stream is simplex, that is, it goes in one direction, from the device that opened it to the device that receives it. But we are still in the TCP/IP business. So how do we run TCP/IP over that? The answer is that Thali has to add an extra layer of functionality.
We can do this through a simple naming convention. When iOS device A opens a named stream to iOS device B the name of the stream will be the concatenation of device A’s name plus device B’s plus the string “Request”. When device B receives the incoming stream from device A it will respond by opening its own named stream from Device B to Device A. That stream’s name will be identical to device A’s named stream with the exception that the string “Request” at the end will be replaced with the string “Response”. This means that the response stream from B to A will list A’s name first in the stream name.
By treating the two named streams as a socket we now have a full duplex connection.
Once we have established a full duplex connection then everything goes more or less like Android. That is, we need both a TCP/IP Client Bridge and a TCP/IP Server Bridge.
Note that just as with Android, iOS also needs to be able to have two pairs of paired named streams. That way each device can be a client of the other. In other words if devices A and B discover each other then device A should have an output stream named ABRequest paired to an input stream named ABResponse. This represents A as a client of B. At the same time device A should have an input stream named BARequest and an output stream named BAResponse. This represents A as a server to B.
The same logic applies to B. B will have an input stream named ABRequest and an output stream named ABResponse representing B as a server to A. B will also have an output stream named BARequest and an input stream named BAResponse representing B as a client to A’s server.

3.1 TCP/IP Client Bridge

As with Android when a phone discovers another phone it will use its TCP/IP client to open a connection to the TCP/IP socket listener on the other phone. To make this work the node.js code, upon being informed of the discovery, will ask the native code to please give it a local host port to connect to in order to talk to the remote phone. To make that work the native code will open a named pipe from the local phone to the remote phone. As soon as the remote phone responds by opening a properly named pipe in the opposite direction then the native code will set up the TCP/IP client bridge and will set up a local host port listener to relay data. The structure is, when Phone A talks to Phone B:
TCP/IP Client Socket’s Output Stream ---> TCP/IP Client Bridge’s Server Socket’s Input Stream ---> Multipeer named Output stream “ABRequest”
TCP/IP Client Socket’s Input Stream <--- TCP/IP Cient Bridge’s Server Socket’s Output Stream <--- Multipeer named Input stream “ABResponse”
So once the native layer is asked to connect to a remote device it will:
  1. Open up a named stream from the local device to the remote device
  2. Wait for the remote device to open up a properly named stream to the local device
  3. Set up the TCP/IP Client Bridge on any available port
  4. Return the port the TCP/IP Client Bridge is listening on to the node.js code
At that point the node.js code will point its TCP/IP Client to the port where the TCP/IP Client Bridge lives and now we can have the TCP/IP client’s data shared across the multipeer connectivity framework.

3.2 TCP/IP Server Bridge

The TCP/IP server bridge works in the same sense. When the server starts up it will start its TCP/IP socket listener and share the local host port of that socket listener with the native code. When the native code gets an incoming connection it will pipe it to that listener port.
In other words, if Phone B receives a request from Phone A:
TCP/IP Server Socket’s Input Stream <--- TCP/IP Server Bridge’s Client socket’s Output Stream <--- Multipeer named Input Stream “ABRequest”
TCP/IP Server Socket’s Output Stream ---> TCP/IP Server Bridge’s Client socket’s Input Stream ---> Multipeer named Output Stream “ABResponse”
The logic for the native code is:
  1. Receive a message from the Node.js code telling it the port that the Node.js local server is listening on
  2. Receive a named input stream from a remote device
  3. Establish an output stream to the device that sent the input stream with the appropriate name
  4. Start the TCP/IP Server Bridge and connect to the local Node.js server’s port as shared in step 1
Now it’s just a matter of copying data back and forth.

3.3 What about failures?

The basic idea for handling failures is the same for iOS as for Android. In this case however if the iOS TCP/IP Client Bridge should detect a problem with either the output or input streams that make up a single TCP/IP connection then it is up to the iOS code to tear down both streams and send a TCP/IP error on the local TCP/IP socket. It is then up to the Node.js code to decide if it wants to retry. If it does then the native code will start the process of establishing a named output stream and waiting for a responding properly named input stream from scratch. The same logic applies as in the Android section for handling notifications that a peer has left the area. At that point it is up to Node.js to figure this out and stop asking for connections to that peer. Any existing connections should automatically die but just in case they don’t the Node.js code should kill the connection anyway.
On the TCP/IP Server Bridge the logic is also similar. If the TCP/IP Server Bridge detects a problem with the named input or named output stream then it must kill both of them and wait for the remote device to re-establish the connection.

4 So where is the code?

At the time I am writing this the code to test out Story -1 lives at https://github.com/thaliproject/Thali_CordovaPlugin/tree/story-1-yarong. The key test code lives inside of Thali_CordovaPlugin/test/sockettest.js. There is a function in there called nodeJSTest which I use as a quick and dirty local test just to make sure the test code does what I think. But the real test is called, not surprisingly, realTest(). That code is essentially a mock up based on my understanding of how we are writing the current native layer. But the master of the native layer API as exposed to Node.js is Matt. So please talk with him to get the final API specifications. But the idea is that the native layer on either iOS or Android should “just work” with realTest(). Please don’t get too hung up on any bugs in realTest(). Since I don’t have actual native code to test against I’m sure there are lots of bugs there.
For what it’s worth nodeJSTest() was tested on Android using the Thali_CordovaPlugin code and it ran just fine. So this at least argues that the basic infrastructure is working. So now the trick is to use realTest() and thus have a TCP/IP client on one device start communicating to the TCP/IP socket server on the other device and vice versa. This will show that both the native discovery and high bandwidth communication code paths work properly with Node.js.

4 thoughts on “Thali’s Story -1 – Getting Node.js and Local P2P to play well”

  1. Hey Yaron!
    Where can I find the Bluetooth/TCP bridge code? The link above gives me a 404.

    I found your blog and the Thali project the other day after struggling to get P2P working on Android. (Actually, I first heard about it from the issues you opened on couchbase-lite-java-listener). I’m following along on your’s and Jukka’s blog, trying to wrap my head around Android’s WiFi direct. When I found Thali_CordovaPlugin_BtLibrary, I was so relieved I almost cried!! I’m considering ditching my native apps and moving to cordova just so I can use Thali.

    Awesome work!

    1. I know you already found the code because I saw your issue but for everyone else the code now lives on https://github.com/thaliproject/ in general. There are several things to keep in mind about the code. The most important of which is – this is still an experiment! For example, we currently use Wifi direct which we know doesn’t work right on Android. We also are running into a lot of issues with Bluetooth on Android. We are still trying to work around those. Once we get Bluetooth under control then we will switch out Wifi direct with BLE. At that point we will hopefully have something that works reliably.

Leave a Reply to Administrator Cancel reply

Your email address will not be published. Required fields are marked *