Can Thali’s local P2P capability survive Android Marshmallow?

Starting with Android Marshmallow (see here) it is no longer supposed to be possible for programs running on Android to see the local device’s Wi-Fi or Bluetooth MAC address. This is a major problem for Thali because we depend on retrieving the Bluetooth address and then advertising it over BLE. A peer device then connects using the Bluetooth MAC retrieved over BLE with an insecure RFCOMM connection and then they authenticate/encrypt the connection at the application layer. But now this method won’t work because as of Marshmallow we aren’t supposed to be able to get the Bluetooth MAC address. Below I explore the various avenues we are investigating to try and deal with this change. But if anyone knows anyone at Google we can talk to about this that would be really helpful.

1 Who cares? Or why dynamic connectivity matters.

The core scenario for Thali is that we have a large number of devices (phones, laptops, IoT, etc.) that are all able to discover each other and communicate in the absence of Wi-Fi or Cellular connectivity. Our current focus is mostly on factory environments which typically have neither Wi-Fi or Cell.
It turns out we also need to enable P2P communication even when there is Wi-Fi because many Wi-Fi APs are configured to not allow for local multicast and explicitly prohibit devices attached to the WLAN from communicating directly with each other. This kind of isolated AP deployment is particularly common with corporate and public Wi-Fi Infrastructure Access Points.
To that end, on Android, we use BLE to let devices discover each other and Bluetooth enable them to communicate directly. We don’t communicate that often because it drains battery. Instead we use BLE to determine when it’s worth communicating and if there is something interesting to be said then and only then will we open up a higher bandwidth (and more battery sucking) link over Bluetooth (and we hope eventually Wi-Fi).
In our security model all the various devices have their own identities and a trust infrastructure. For our corporate users this trust infrastructure is centralized. This means that devices are provisioned with a trusted key and/or a list of trusted devices and then when offline they will only communicate with devices that can show a trust assertion issued by the trusted key or if they are on the list of trusted devices. And yes, using Identity Exchange, we can create ad-hoc trust as well so users can choose who to trust without having to ask permission from anybody. Thali’s central ideal is no centralized points of control.
But the point is that we have lots of devices with lots of identities who are able to discover and ideally communicate even though they have never seen each other before and without necessarily having any human intervention.
One of our most critical scenarios is the walk by. This is also a scenario where Android used to shine. In this scenario someone has a phone in their pocket, they walk by another device, communicate over BLE, figure out that they should talk, and then do a data transfer over Bluetooth. The user is oblivious to this. Our app is running in the background. Everything is encrypted and authenticated (see here and here).
But the walk by scenario more or less dies with Marshmallow. The reason is that as explained in gory detail below most of the solutions for Marshmallow now require the user to manually accept a system dialog. In some cases this has to be done at least once for each and every device the user might talk to. Since this can easily be in the 10s or 100s or devices over say a month or two the user experience is clearly pretty awful.

2 Hack Android?

There are claims online (see here) that it is still possible to get the Bluetooth MAC address using either Settings.Secure or reflection on BluetoothAdapter.getDefaultAdapter(). I don’t know if this actually works and if it works on devices that aren’t jailbroken. We’ll need to try this out but even if this does work presumably Google will fix these “holes” eventually so they don’t seem like a good long term solution.

3 The one time awful UX option

Thali developer Tomi Paananen nicknamed this hack “Help a brother out” or “Bro Mode”. This approach would let us learn our Bluetooth MAC address by asking a different device to tell us what it is.
The experience would be:
  1. A user, for the first time, is using a Thali app and detects over BLE someone they want to exchange data with. Let’s assume that the other person is also using Thali for the first time.
  2. One device will make itself discoverable over Bluetooth. This requires showing a system dialog to the user that they have to approve. The user obviously will have no understanding of why this is necessary. And obviously this doesn’t work at all if the two devices are both in their user’s pockets (e.g. the walk by scenario).
  3. At that point the other device will discover the first device’s Bluetooth address and send it over and now we can move back to our normal behavior.
This should be a one time experience. That is, once we get our Bluetooth address we shouldn’t ever need to ask for it again. So we can just record it and in the future advertise it over BLE as we used to do before Marshmallow.
Since it isn’t clear if this will actually work we have to build a test app and run it on two devices, Device A and B.
  1. On Device A we will hit a button telling the device to:
    1. start a Bluetooth service listener using a well known SDP UUID we call the Bro Mode UUID.
      1. See here for details on how to start a Bluetooth Service. Note however that we should use listenUsingInsecureRfcommWithServiceRecord on the default BluetoothAdapter rather than using listenUsingRfcommWithServiceRecord. That way later on Device B will be able to connect without having to be paired. Also note that this is where we pass in the Bro Mode UUID.
    2. Switch the phone into Bluetooth Discoverable mode. This will trigger the UX mentioned above.
      1. See here for details on how to activate discoverability.
  2. On Device B we will hit a button telling it to “discover” other Bluetooth devices..
    1. See here for details on how to begin scanning for discoverable devices. Note that thanks to the changes in Marshmallow we now need to have either ACCESS_FINE_LOCATION or ACCESS_COARSE_LOCATION permission in order to successfully see the results of the scan. Also, as explained here, the way to ask for these permissions has now changed.
    2. When we get a callback on the BroadcastReceiver for ACTION_FOUND we need to get the EXTRA_DEVICE field in order to get access to the BluetoothDevice object and call fetchUuidsWithSdp() and see if the Bro Mode UUID is listed. If it is listed then we can call getAddress() to see Device A’s Bluetooth MAC address and then call createInsecureRfcommSocketToServiceRecord on the Bro Mode UUID and pass in Device A’s Bluetooth MAC address.
      1. Another approach that might also work is that as soon as we get the BluetoothDevice from EXTRA_DEVICE we could immediately call getAddress(), record the result and then call createInsecureRfcommSocketToServiceRecord with the Bro Mode UUID. If that connection works then we know we are connected to the right device and can pass in the address. I just don’t know how long the createInsecureRfcommSocketToServiceRecord call would take to fail if we aren’t connected to the right address.
  3. On Device A when the service receives the incoming connection it will read in the MAC address from the Bluetooth connection and display them on the screen. We will need to write the address down and then manually go to Settings->About Phone->Status->Bluetooth address and visually confirm that the address sent by Device B matches the device’s Bluetooth address.
This test will help us understand just how big the changes are in Marshmallow. For example,
Will Device B see Device A’s real address or a fake address?
Does insecure RFCOMM still work now that Google has taken away the ability to see Bluetooth MAC addresses?
A question that this test won’t answer is - does Android ever rotate the Bluetooth MAC address? We know that when performing scans they use a temporary Bluetooth MAC address. But what about for hosting Bluetooth services, do they always use the same address? One suspects that the answer is yes because otherwise a bunch of existing Bluetooth devices would break. But we shall see.

3.1 If this works how do we perform programmatic testing?

Let’s imagine that we adopt this approach. Now we want to run tests in our CI environment (see a picture here) how do we do it cleanly? If we have a fresh phone it won’t know its Bluetooth address and there is no way to get it that doesn’t require a person sitting there while the test runs in order to hit the Bluetooth discovery permission dialog. At best what we can do is manually get the Bluetooth addresses and provide them programmatically which means we can never run automated tests on the system we use to get the Bluetooth addresses in the first place! This is just nuts. Maybe we need to connect a mechanical hand to the Raspberry PIs you see in those pictures with a conductive finger tip.

3.2 How do we handle spoof attacks?

A fairly obvious attack in this scenario is we have devices A and B. A is in discoverable mode and B is doing the discovering. What if B lies to A about what address it sees? In that case A will now go around advertising the wrong address!
Note that this is a denial of service attack, not a man in the middle attack. Well mostly anyway.
For simplicity’s sake lets say that the fake Bluetooth address actually belongs to device B. In that case Device A will go around telling people to connect to Device B. If Device B happens to be in the area then it will receive connections from folks who are looking for Device A. But Device B won’t be able to properly establish a data channel because we perform a PSK based TLS handshake and Device B won’t know the PSK (which is generated dynamically based on a combination of constant and ephemeral keys, see here and here). But at the least Device B would be able to tell that whomever the third device is (and we don’t leak any identity information about them) they were trying to connect to Device A. That kind of traffic analysis information has its own value. Although to be fair it doesn’t really matter because if Device B is close enough to be connected to then it is also close enough to overhear communications and see for itself that the new device wants to connect to Device A. So no additional harm is done. But in any case, while this is happening Device A is out of luck. Nobody is going to be able to connect to it. Ever.
One work around is that if we hand out our address and get complaints over BLE that the connection never works then after awhile we can assume we have a bad address and try discovery again. I’m sure the user won’t be at all confused why they are suddenly being asked to make their device discoverable over Bluetooth.
And yes, a smart attacker could use this recovery mechanism to force a device with the right MAC to take the wrong MAC. For example the attacker could initiate a connection and then send a BLE error saying that the Bluetooth connection didn’t work. The only good news is that the attacker would have to authenticate themselves with an identity the device being attacked accepts. So the device being attacked would know who the attacker was. In most real world scenarios that’s probably good enough.

4 The twice per user awful UX option

The previous approach is clearly not going along with what Google is trying to do, prevent apps from knowing the device’s MAC address, and so there are various reasons to believe it might not work or might stop working in the future. There is another option that should work just fine but at the cost of a truly awful user experience.
In this approach we use Bluetooth pairing. Unlike in the previous approach where a device only needs to go into discoverable mode exactly once, ever, for all time, just to get its address from some authenticated peer. In this approach anytime two devices discover each other for the first time one of the devices will have to go into discoverable mode, the other device will then have to discover and pair with it. This requires two dialogues. The first dialog is to enter discoverable mode and the second dialog is to accept a pairing. Details on how to pair are given here. In the future when a device finds that it wants to connect to another device via BLE it can call getBondedDevices (see here) to get a BluetoothDevice object for the remote device and begin a connection. Presumably Google will have to make sure this works regardless of what games they may play with Bluetooth MAC addresses.
Right away all of our walk by scenarios just died an awful death. If two folks happen to pass each other with data for each other but haven’t previously paired then even though they can discover each other over BLE and determine that they have data for each other they can’t move the data unless someone pulls their phone out of their pocket and completes the pairing.
The only good news is that once two devices are paired there never needs to be another pairing between those devices and the pairing only needs to go in a single direction. But imagine a factory environment with 10s or 100s of devices. How long before users stop using the app because they are sick to death of the double confirmation dialogs?

5 The once per user awful UX option

A variant on the previous section is to use Wi-Fi Direct instead of Bluetooth to communicate between Android devices. There are actually a few benefits to this approach. First, it requires less dialogues. Second, it should provide speeds that are at least 10x faster than Bluetooth. But there are problems.
To understand the problems let’s imagine that there is Device A and B. Device A discovers Device B over BLE and now they want to switch to Wi-Fi Direct. One way to do this would be to use the connect method but that would require the device calling connect to know the MAC address of the other device and that is the information that Google is hiding from us.
A work around for this would be to use Wi-Fi Direct Service discovery. This would actually be really sweet. What we would do is advertise some random UUID over BLE. Then the discovering device would issue a Wi-Fi Direct Service discovery request (which, btw, doesn’t require any system UX) looking for the same UUID. When it finds it then it knows it has found over Wi-Fi Direct the same service that was advertising over BLE. Now the device could establish a Wi-Fi Direct connect request. This part is bad because one of the two devices MUST display a system UX asking the user if they are o.k. with establishing the connection. That is obviously ridiculously stupid, it has literally no security value, but it’s required by the Wi-Fi Direct standard and so apparently Google does it. So forget background scenarios. The only good news is that once the user says yes they shouldn’t have to say yes for that device again.
But wait, there’s a problem. We have spent months and months testing Wi-Fi Direct Service discovery and what we have discovered is - it doesn’t work reliably. If you have two phones of the same make and model then yes it can work reasonably well (we had amazingly good experiences with Samsung Galaxy S5s, for example). But if you try it with two different phones then discovery can take anywhere from minutes to never. We also found that Wi-Fi Direct Service Discovery seems to break the main Wi-Fi stack (e.g. Infrastructure Mode) so normal Wi-Fi stops working while we use Wi-Fi Direct Service Discovery. We also have found that sometimes Wi-Fi Direct Service Discovery will stop working completely until the device is rebooted.
So the bottom line is - Wi-Fi Direct Service Discovery doesn’t seem to work in the real world.
Now what does work a treat is Wi-Fi Direct Peer Discovery. Unfortunately only three values are advertised over Wi-Fi Direct Peer Discovery, a hard coded device type, a hard coded device name (that isn’t even guaranteed to be unique) and the MAC address. So there is no way for one device to signal to the other over BLE which Wi-Fi Direct peer it is. The best the discovering device can do is try to blindly connect to all the Wi-Fi Direct peers it finds and hope it gets lucky. That actually could work in most normal circumstances but one shudders to imagine what it would be like in say a conference room filled with Thali users.
But wait, there’s another hack! What we can do is have say Device A create a Wi-Fi Direct group (see here and here). As a side effect of how Wi-Fi Direct works this will create a Wi-Fi Direct Infrastructure group with a randomly generated SSID. Device A can find this SSID by calling requestGroupInfo and then calling getNetworkName on the WifiP2pGroup object passed to the callback.
Device A could then advertise that SSID over BLE. At that point Device B could do a Wi-Fi Infrastructure scan by calling getScanResults on the WifiManager. In the scanResult there is a SSID (which is the value the other device gave us over BLE) and the BSSID. The BSSID should (famous last words, I know) be the other device’s Wi-Fi MAC address. So Device B just has to grub through the results until it finds an entry with Device A’s SSID and now it knows Device A’s MAC address as well. Note that none of this will work properly if Device B doesn’t have ACCESS_FINE_LOCATION or ACCESS_COARSE_LOCATION permissions. The reason is that without these permissions Android won’t return the right BSSID value in the scan results.
From here Device B can now call connect and pass in a config with the deviceAddress set to the BSSID/MAC. Now Device B has to listen for the WIFI_P2P_CONNECTION_CHANGED_ACTION intent to know when the connection happens. Once the connection is confirmed then data can be exchanged.
All of which sounds lovely but remember, as mentioned above, that if this is the first time that Device A is connecting to Device B then a UX will be displayed by Android asking the user if they are o.k. establishing the Wi-Fi Direct connection. So if Device B is in the user’s pocket we can forget about the walk by scenario. And of course in a factory with 10s or 100s of devices how many times is the user going to have to hit “OK” before they can’t take it anymore?
Another not terribly obviously problem is that in Wi-Fi Direct on Android a device can either be a group owner (in the example above Device A is the group owner) or it can be a member of a group but it can’t be both at the same time. The reason this matters is that if Device C shows up and wants to talk to Device B it won’t be able to connect directly to Device B. The reason is that Device B is currently a member of a group owned by Device A. So Device C will either have to wait for Device B to cut its connection to Device A or Device C will have to join Device A’s group and hope that Device A will be willing to relay data to Device B. From a battery perspective I can’t imagine that the owner of Device A is going to be thrilled by any of this. And of course should Device A happen to walk away then the whole connection infrastructure collapses and we have to have logic smart enough to pick up the pieces. It’s all doable but what a serious pain!
Also note that all of this assumes that Wi-Fi Direct group connections don’t cause the same malfunctions in the Wi-Fi Infrastructure stack that Wi-Fi Direct Service Discovery does. Given our experience with Wi-Fi Direct to date I can’t say I’m holding my breath that it will work terribly well in practice.

6 The no UX but big security hole (maybe?) option

The last approach we have come up with has the advantage that it requires no system UX at all. It also uses WiFi and thus is minimally 10x faster than Bluetooth. The only disadvantage is that it opens up whole new vistas of security holes (maybe?) and may disable normal Wi-Fi on one (and depending on bugs, possibly both devices).
Things start off the same as in the previous section. Device A creates a Wi-Fi Direct group and advertises the SSID over BLE. Device A will also have to send the password (it’s mandatory) for its Wi-Fi Direct group over BLE. Device A can get the password via getPassphrase and it’s straight forward enough to encrypt it using the keying material from Thali’s notification infrastructure.
Device B then creates a WifiConfiguration and sets the SSID and password using the values set by Device A over BLE. This configuration is then submitted via addNetwork. It’s at this point that things get slightly odd. In the old days we would have to call disconnect and then enableNetwork followed by reconnect to force Device B to connect to Device A as its Wi-Fi Infrastructure Access Point.
This disconnect/enableNetwork/reconnect approach isn’t great because it now makes Device A into Device B’s default access point and all network traffic from Device A is now going to head over to Device B. This has the same security implications of connecting to some random unrecognized Wi-Fi Infrastructure Access Point. It also means that if there is an application on Device A that was using the Internet in parallel with the Thali application (say someone streaming music) then it’s going to lose it’s Internet connection. So we have to be careful to only use this approach when no one else is actively using the Internet connection and even then we still have security issues.
Starting with Lollipop however a new Android feature was introduced, multi-networking. With multi-networking it is possible to call addNetwork and then use multi-networking as described here to get a Network object representing Device A’s Wi-Fi Direct Group. Then it becomes possible to call bindSocket on that Network object to bind a socket to that Wi-Fi Direct Group.
But I just don’t know how multi-networking works in practice when there are multiple Wi-Fi Access Points around. Reading online it seems like one of the (many) motivations for this feature would be to enable a phone to connect to a Wi-Fi Printer with its own AP without forcing all the other apps to connect to that printer as well. This would argue that multi-networking phones should support talking to multiple Wi-Fi Access Points in parallel. But typically network cards don’t support that in Wi-Fi Infrastructure mode. So I just don’t know. It might be that multi-networking means that the security threats of this approach are reduced, or not. We’ll have to experiment to find out.
Note, btw that all of this assumes that Device A won’t have any nasty problems while it is hosting the Wi-Fi Direct Group. Our experience to date with Wi-Fi Direct Service Discovery doesn’t give us a lot of hope but it’s possible. In the worst case however we would have to figure out how to turn Device A into a hotspot, see here for details and check here for issues with Lollipop. But turning Device A into a hotspot means that Device A will almost certainly lose its Internet connection since typically AP mode is exclusive to allowing a network card to connect to other Wi-Fi Access Points.

6.1 Can’t an attacker force a Wi-Fi Connection in the future?

In theory Android remembers Wi-Fi Access Points it has successfully connected to in the past. So an attacker who wants to hijack a user’s connection in the future could just turn on a WiFi group with the same SSID and password as it previously used and trick the other device into connection to it. But if we delete the network configuration after using it then Android shouldn’t remember the network and this attack shouldn’t work. This assumes, btw, that Android would automatically connect to a network that was connected to programmatically and not manually by a user.

7 Conclusion

This is an awesome amount of Yak shaving just to enable a P2P connection. Clearly something is pretty fundamentally wrong here. I wish we could just work with the Android folks to come up with a P2P approach that supports dynamic P2P networks, supports the walk by scenario, is secure and is user friendly. I don’t even think this would be terribly hard to do. Anyone know anyone at Google we can talk to about this?

One thought on “Can Thali’s local P2P capability survive Android Marshmallow?”

Leave a Reply

Your email address will not be published. Required fields are marked *