How do we exchange identities in Thali without making our users hate us?

In Thali identities are public keys. But typing in a 4 Kb RSA key or even a 512 bit EC key isn’t exactly easy. So how do users securely exchange their keys? Our original approach was using QRCodes. But lining up the phones, scanning the values, etc. is all a serious pain. So if ultimate security isn’t a requirement our backup plan is to use a variant of Bluetooth’s secure simple pairing with numeric comparison which itself is just an implementation of a coin-flip or commitment protocol. The main downside of this approach is that it provides a 1:1,000,000 chance of an attack succeeding.
[Updated on 3/9/2015 with a new appendix, the last two questions at the end are new.]

1 Scenario

Alice and Bob both have generated public keys on their cell phones. They now want to securely exchange those public keys in a manner that will prevent an unknown key-share attack (UKS). That is, so that Mallory can’t show up and man-in-the-middle Alice and Bob such that Alice thinks Bob’s key is PKM1 and Bob thinks Alice’s key is PKM2 where PKM1 and PKM2 are public key pairs controlled by Mallory.
A constraint in this specific scenario is that there is no cellular or Wi-Fi infrastructure. The key exchange has to occur using local radios (e.g. BLE, Bluetooth, Wi-Fi Direct, etc.). Note, nothing prevents exchanging keys over the Internet, it’s just that for this specific scenario we need something that will work when there is no Internet connectivity.
In our threat model we make the following assumptions:
  1. Mallory knows Alice and Bob’s public keys ahead of time. They are, after all, public.
  2. Mallory knows exactly when and where Alice and Bob will exchange their public keys and has the correct equipment to modify any signals in any way she wants. Conceptually one can imagine a wire running from Alice’s phone to Mallory’s ‘hack box’ to Bob’s phone such that Mallory can alter and delay any transmission at any time she wants.
  3. We assume Mallory can block all radio transmissions at any time for as long as she wants. Therefore we explicitly do not attempt to prevent denial of service attacks, only UKS attacks.
  4. We assume Mallory cannot change the software on Alice or Bob’s phones.
Our goal then is for Alice and Bob to exchange their keys without Mallory being able to successfully launch her UKS attack.

2 A constraint on our solution

Normally the way we deal with this scenario isn’t with a radio like BLE, Bluetooth or Wi-Fi. We use QRCodes. We would have Alice display her key on her phone and Bob would take a picture and then vice versa. Mallory does not have a good way to alter visual signals so this mechanism is generally considered secure. However it’s also a serious pain in the tuchus. Try it sometime (we have). It’s really annoying. And for some scenarios the annoyance level is high enough and the security required low enough that it’s not workable.
So we need a way to prevent the UKS attack from succeeding that doesn’t involve people taking pictures. Our working assumption however (possibly wrong, which is why we are bringing it up) is that a UKS can only be prevented if there is some secondary channel that cannot be easily faked by which the key exchange can be confirmed to have worked. Anything involving a radio is automatically assumed to be compromised by Mallory and pictures are already out for the reasons described. So what this leaves us with is humans.
Our assumption then is that to prevent the UKS attack, while still having an acceptable UX, we need the key exchange process to end by having the humans involved look at their phone screens and confirm… something. Ideally the something would be the same number showing on both of their screens. Furthermore to meet our UX requirements that number cannot be much more than 6 numeric digits. In other words the end of the exchange is “Hey Bob, does your phone show 123456? Yes Alice, it does” then both Alice and Bob hit confirm on their phones.

3 Modified Bluetooth Secure Simple Pairing with Numeric Comparison

The approach we want to use is a coin-flip or commitment protocol. It’s as old as the hills but the nice folks in Bluetooth land did a good job of documenting it so we are going to work off their work. Specifically we are using the Secure Simple Pairing (SSP) algorithm with numeric comparison taken from pg 616, section 2.3.5.6.2, figure 2.3 of Bluetooth Core Spec 4.2 (defined here).
There are however a few minor modifications we will need.
In the Bluetooth protocol the two sides exchange public keys in step 1 of their algorithm in order to perform a Diffie-Hellman key exchange. That is not needed in our case. Rather we will just exchange public keys but not generate the Diffie-Hellman key.
In addition the Bluetooth protocol, due to characteristics of the Bluetooth transport, has an identified ‘initiating device’ and ‘non-initiating device’. In our case we can’t be sure (and don’t want to care) who initiated because both users could have switched their phones into pair mode at the same time. To deal with this we will define a canonical ordering on public keys which will allow both user’s devices to determine who the ‘leader’ (a.k.a initiating device) is without any further communication. The selected leader will be responsible for calculating the compute confirmation value.
We intend to support RSA in preference to EC. Our reasoning is that there are still open issues regarding the security of the curves selected by EC and there has been substantially more work done on RSA attacks than there has been on EC. So we tend to think RSA (at key sizes of at least 2K) is more trustworthy than EC.
The Bluetooth algorithm however is defined only for EC.
For purposes of picking a leader when RSA is used we will compare the modulus’s of each public key as if they were integers and the smaller integer will be picked as the leader. For purposes of including RSA public keys into the hash functions we will encode both the modulus and exponents as ASCII strings encoding their integer values separated by a ‘.’ As given in the HTTPKEY URL Scheme.
We will define EC logic later when and if we adopt EC as an option.

A Um... how does the commitment protocol actually work?

For those that don’t feel like downloading the enormous PDF and looking up the table in question I have summarized the algorithm below.
  1. Use a cryptographically secure function to calculate a 128 bit random nonce RNmine
  2. Send public key PKmine and simultaneously wait to receive public key PKother
  3. If the modulus treated as an integer of PKmine> PKother goto step 11
  4. Calculate Cb =AES − CMACRNmine(PKmine||PKother) . That is, generate an AES-CMAC where the key is RNmine and the message is a UTF-8 string encoding PKmineconcatenated with PKother. Both public keys are presented as strings as given in the previous section.
  5. Send Cb to the other party.
  6. Wait to receive RNother
  7. Send RNmine
  8. Generate HashA = AES − CMACRNother(PKother||PKmine||RNmine)mod 232. RNmine is represented as an ASCII encoding of the random value represented as an integer. Now convert HashA into a decimal integer we’ll call IntHashA. Vb = IntHashA mod106. Thus Vb is a 6 digit decimal number.
  9. Display Vb and make sure it matches the Va value displayed on the other user’s screen.
  10. Goto 17.
  11. Wait to receive Cb from the other party
  12. Send RNmine
  13. Wait to receive RNother
  14. If Cb != AES − CMACRNother(PKother||PKmine) then fail!
  15. Generate HashB = AES − CMACRNmine(PKmine||PKother||RNother)mod 232. Now convert HashB into a decimal integer we’ll call IntHashB. Va = IntHashB mod 106. Thus Va is a 6 digit decimal number.
  16. Display Va and make sure it matches the Vb value displayed on the other user’s screen.
  17. End.

B Where does the 1,000,000 attack come from?

The core of a commitment protocol is that the attacker has to commit to their keys before learning anything useful. So in practice this means that Mallory has to pick PKM1 and PKM2 to give to Alice and Bob ahead of time. So the attack boils down to - given PKM1 will PKM2 when hashed with Bob’s information produce the same 6 digit hash code as PKM1 produced with Alice’s information?
Another way to think of this is that Alice uses PKM1 and gets some 6 digit hash code N. So now the question is - what is the probability that PKM2 when used by Bob will produce the same N that Alice got? If we assume that hash code results are evenly distributed given any arbitrary input this argues that the probability that PKM2 will result in hash code N is itself evenly distributed. In other words, this is the same as saying we have a 1,000,000 sided die (with a 6 digit hash code we have 1,000,000 possible values), what’s the chance that we will get N on that die?
That is, of course, 1 in 1,000,000.
So, yes, this means that if we reduced the hash code size to 5 digits then the chance of a successful attacks is 1 in 100,000 and if we choose 7 digits it would be 1 in 10,000,000.

C Does this mean I have to advertise my identity every time I exchange keys?

Let’s imagine that Alice and Bob are sitting together at the coffee shop and want to securely exchange keys. They both take out their phones and tell them to exchange identities. This will bring up a screen showing everyone within radio range who is trying to exchange identities. Bob sees Alice’s name listed and Alice sees Bob’s name listed. They choose each other. Their phones both display six digits, Bob confirms that Alice’s phone is showing the same digits as his and vice versa. Exchange is now complete.
An attacker could, of course, pretend to be Bob or Alice by broadcasting that they are Bob or Alice. In that case they would appear in the list of possible names. How would Bob know which Alice to pick? The good news is that even if Bob picks the wrong Alice the exchange only has a 1 in 1,000,000 chance of working. Most likely it will fail and the selected Alice will be marked as bad and Bob will have to choose from the remaining Alice’s.
So the exchange really is secure.
But there is a data leakage here. Bob and Alice both have to advertise some kind of identifier in order to kick off the exchange process. In other words at some point Alice told her phone “When I tell you to exchange an identity tell the other person my name is Alice”.
This means that once an exchange begins both Bob and Alice will be publicly, in the clear, advertising who they are. At least to the extent of advertising the string “Bob” or the string “Alice”.
For many scenarios this is o.k. The chance that an attacker just so happens to be in physical proximity when the exchange happens, happens to be listening and then can do something useful with the information is, for most people, reasonably small. Also if an attacker is that well provisioned then presumably they can also do fun things like track the MAC or cell ID of the phones. That is a much more devastating attack than happening to catch two strings.
But still, what if even this leakage is too much? There is a way around it but I suspect it’s not worth doing for most people. The work around is that Bob’s phone generates a random number and displays it on his screen. This is also the number that the phone will advertise itself as. Alice’s phone does the same. Bob and Alice both have to look at the other person’s phone and then pick from their list of potential pairing endpoints the endpoint with the same number.
All this does is just prevent them from having to display their real names.
But honestly if we are going to put people through that much trouble we should really just use QRCodes.

D Do we have to use numbers? Can’t we use words or something else?

A few people have tried to suggest this idea but I think “Philip” in the comments section did the best job of describing it. The idea at its core is a good one. Rather than having people compare numbers why not have them compare words? As Philip suggested we could have a 100 word dictionary and then break the 6 digits into 2 digit pairs and pick the word indexed to that pair. So, to use Philip’s example, 123456 would become 12 34 56 which would find the 12th, 34th and 56th word in the dictionary. Again, using Philip’s example, the 12th word could be “red”, the 34th “key” and the 56th “coin”. So instead of the screen showing ’123456’ the screen would show “Red Key Coin”. This is easier to read and to compare.
Philip also suggests a good optimization which is to use 3 different dictionaries for each of the 3 number pairs. That way there will be no repeat words which helps to prevent mistakes.
As I explained in my response to Philip’s comment, the problem with this approach is that it presumes that the people exchanging identities both read (or at least recognize) English. What happens if they both speak Japanese? Yes, we can try to complicate the protocol by negotiating dictionaries but what do we do if the two people involved speak different languages?
We also have to deal with dictionary divergence. Unless we have dictionaries for all languages and we freeze them forever it’s quite possible that two people using two different versions of the software can end up with two different dictionaries.
This is why we have so far stuck with numbers. Most of the world’s languages use Indo-Arabic numerals and even those that don’t quite likely recognize them. So it’s pretty much the most universal, robust solution we can come up with.
Note however that this does point out that we should try to make sure we use the same font/language for the numbers on all platforms and languages. That way object recognition can kick in to see if the patterns are the same.

7 thoughts on “How do we exchange identities in Thali without making our users hate us?”

  1. Not really directly relevant, but can I make one UX suggestion to you? If you have a six digit number, use three words instead.

    Have a 100 word dictionary, and break the six digits into three groups: 12, 34, 56. Then use words 12, 34 and 56. Instead of “hey bob, does your screen show ‘435612’?” The question could be “hey bob, does your screen say ‘red key coin’?”

    A 100 word dictionary is quite small, and much easier to validate (digit transposition would not be an issue). Better yet, use 3 100 word dictionaries if you have the space (like adjective verb noun – “red jumping squirrel”.

    With this, it becomes easier to validate reliably, and feels more human. You can even add a digit or two for better odds – “red jumping squirrel 32”

    1. This is actually a great idea if we have two English speakers. But what happens if we have two Russian speakers? Or better yet, what if we have a Russian and an English speaker?

      With numbers the two parties can show each other their screens, even without speaking a common language and since Indo-Arabic numerals are nearly universal they will each not just recognize the symbols but understand their meaning, thus reducing the probability of false positive matches. Although this does point out that we should make sure that the numbers are always displayed with identical fonts regardless of the underlying system’s language. Otherwise “object recognition” by the humans involved won’t work.

      There is also a problem of making sure that the two dictionaries are absolutely identical. As the system goes through different versions, as people re-implement the protocols (I hope) on different platforms, any skew in the dictionary contents will break the system. This isn’t an issue with numbers.

      So I kind of think we are stuck with numbers.

    1. So let’s be clear, my blog sucks. For example, there is the weird error one gets when submitting comments even though the comments actually do go through. There is the fact that when you click on links they will stay blue even if you have used them before. There is the lack of a search box on the front page. But by far the worst thing (other than possibly the content) on my blog is that scroll bar.

      The reason the scroll bar is there is because some of my articles include tables and those tables can be quite wide. The theme I’m using with WordPress has a fixed limit on the size of a column so those tables just get cut off. My awful temporary hack for this problem was to allow the column size to be dynamic but its now dynamic within a fixed size “framework” so the only way the browser can express the dynamic aspect is by sticking in the scroll bar. Which then breaks things like arrow navigation.

      I just don’t have time right now to hack all the CSS for some theme (and hence become the eternal owner of that theme’s maintenance, something else I don’t have time for) to make it properly handle dynamic sized columns up to maximum browser width. I’ve searched high and low to find a theme that supports a right column of unlimited width based on article content but I can’t find one! If you have a way that I can include tables like the one in http://www.goland.org/pickingabacklogforthali/ without needing the scroll bar I’d love to hear it!

      Help!!!

  2. Note that we plan on using HMAC-SHA256 rather than AES-CMAC because it’s easier to get a hold of HMAC-SHA256 in Node.js and for our purposes the two mechanisms are identical.

Leave a Reply

Your email address will not be published. Required fields are marked *