August 19, 2017

Why Making Johnny's Key Management Transparent is So Challenging

In light of the ongoing debate about the importance of using end-to-end encryption to protect our data and communications, several tech companies have announced plans to increase the encryption in their services. However, this isn’t a new pledge: since 2014, Google and Yahoo have been working on a browser plugin to facilitate sending encrypted emails using their services. Yet in recent weeks, some have criticized that only alpha releases of these tools exist, and have started asking why they’re still a work in progress.

One of the main challenges to building usable end-to-end encrypted communication tools is key management. Services such as Apple’s iMessage have made encrypted communication available to the masses with an excellent user experience because Apple manages a directory of public keys in a centralized server on behalf of their users. But this also means users have to trust that Apple’s key server won’t be compromised or compelled by hackers or nation-state actors to insert spurious keys to intercept and manipulate users’ encrypted messages. The alternative, and more secure, approach is to have the service provider delegate key management to the users so they aren’t vulnerable to a compromised centralized key server. This is how Google’s End-To-End works right now. But decentralized key management means users must “manually” verify each other’s keys to be sure that the keys they see for one another are valid, a process that several studies have shown to be cumbersome and error-prone for the vast majority of users. So users must make the choice between strong security and great usability.

In August 2015, we published our design for CONIKS, a key management system that addresses these usability and security issues. CONIKS makes the key management process transparent and publicly auditable. To evaluate the viability of CONIKS as a key management solution for existing secure communication services, we held design discussions with experts at Google, Yahoo, Apple and Open Whisper Systems, primarily over the course of 11 months (Nov ‘14 – Oct ‘15). From our conversations, we learned about the open technical challenges of deploying CONIKS in a real-world setting, and gained a better understanding for why implementing a transparent key management system isn’t a straightforward task.

Overview of CONIKS

In CONIKS, communication service providers (e.g. Google, Apple) run centralized key servers so that users don’t have to worry about encryption keys, but the main difference is CONIKS key servers store the public keys in a tamper-evident directory that is publicly auditable yet privacy-preserving. On a regular basis, CONIKS key servers publish directory summaries, which allow users in the system to verify they are seeing consistent information. To achieve this transparent key management, CONIKS uses various cryptographic mechanisms that leave undeniable evidence if any malicious outsider or insider were to tamper with any key in the directory and present different parties different views of the directory. These consistency checks can be automated and built into the communication apps to minimize user involvement.

Why deploying transparent key management is difficult

In addition to the strong security and privacy features, CONIKS is also designed to be efficient in terms of computational resources for clients and servers; CONIKS seems like an attractive choice for anyone looking to deploy a transparent key management system for their encrypted communication service. So, are we done? While our proof-of-concept secure chat application worked well in our experiments, we wanted to know if major online communication service providers would consider CONIKS to be a viable key management system. From our discussions with the engineers at Google, Yahoo, Apple and Open Whisper Systems, we identified five main challenges posing major barriers to a practical deployment of CONIKS.

1. Collaboration and interoperability.

Our original idea was that all CONIKS key servers would collaborate by auditing each other’s key directories, since this would make the consistency checks done by the communication app more efficient. But in reality, collaboration is hampered by the fact that there are two types of communication services: centralized services (or “walled gardens”) such as iMessage and Signal, and federated protocols such as SMTP (email) or XMPP (instant messaging). By design, these services can’t all interoperate. The main challenge for walled gardens is then to find third-party auditors the service provider and its users are willing to trust. On the other hand, federated services that interoperate for communication could still agree to audit each other, but this requires additional standardization of formats for the data used to ensure key transparency. We learned these engineers would largely support standardizing the transparency protocols, but also that establishing any standard in general often involves evaluating a number of existing deployments of the system, which can be a long process as multiple interested parties come to an agreement.

2. Maintaining trust in the provider.

Most engineers commended CONIKS’ ability to detect tampering and inconsistent views of a key directory. However, they were concerned CONIKS makes it difficult to attribute an inconsistency to the proper source, because key servers digitally sign all published data so that there’s no question who published it. Auditors have no way of determining whether an inconsistency was introduced by an outside attacker who compromised the key server, a system error, or a malicious employee. Because of this ambiguity, users may lose trust in the key server and migrate to a different communication provider whose transparent key server hasn’t exhibited any inconsistency, or worse, a provider that doesn’t support transparency at all. This is an undesirable outcome for any service provider. Is there a way the provider can recover from accidental inconsistencies? A full recovery would require being able to prove the source of an inconsistency, which stands in direct conflict with the current design of CONIKS. Building a high-assurance key server that uses formal verification and secure hardware would minimize server bugs, but would still not solve the problem of distinguishing between changes made by a single malicious employee from changes due to outside compromise or coercion.

3. Key loss and account recovery.

In an early version of CONIKS, once a key was bound to a specific username, this key could only change with authorization of the owner using the key that was being replaced. Unlike passwords, which may be recovered when lost, there is no way to recover data encrypted for a lost key, which means a user would lose access to her account. As a result of our early discussions, in which engineers suggested a more user-friendly approach, we designed a default account recovery mechanism for CONIKS: unauthorized key changes. However, this mechanism undermines users’ security since it doesn’t leave cryptographic evidence and other app clients have no way of distinguishing account recovery from a compromised account. The engineers viewed developing a more secure account recovery mechanism, in which it’s unambiguously clear who initiated the recovery, as one of the main barriers to deploying CONIKS. While key loss is also a problem in other applications such as Bitcoin wallets, we haven’t explored whether the solutions in those domains could apply to CONIKS.

4. Proactive attack prevention.

The fact that CONIKS’ key servers publish directory summaries at set time intervals allows app clients to establish a clear linear history of the directory. While these summaries provide strong cryptographic proof when a provider has published inconsistent views of its directory, publishing this proof at these intervals also means attacks can only be detected after they have already happened. From our discussions, we learned these strong detection guarantees may still be insufficient for some secure communication providers, as they leave an open (yet brief) window for attack. Finding a practical solution that is proactive and can mitigate the risks of key server compromise, instead of detecting attacks after-the-fact, remains an open problem.

The biggest challenge: Effective user warnings

Even with the above challenges solved, engineers still face one significant barrier: false positives will cause the app client to issue warnings prompting the user about a possible attack. For example, the client will detect an unexpected key if the user adds a new device or re-installs the app to recover a lost account, as well as when the key server maliciously changes the user’s key without proper authorization. Similarly, a warning will be issued for inconsistent directory summaries, but these may stem from time synchronization problems between the server and the client, or from an attempt by a malicious key server to publish different views of its directory.

Users are notorious for ignoring security warnings, so malicious CONIKS key servers may get away with attacks. In cases such as the fake account recovery, the attack leaves no hard evidence, so it’s crucial for the app client to be able to distinguish between the innocuous causes for warnings and the attacks. But even then, can we design the user interface to convey clearly to users when it’s important for their security to take action on a warning? Much like other security-critical applications, a significant challenge to deploying transparent key management is to design security warnings that are effective even for users without knowledge of the underlying encryption and protocols.

What lies ahead

We are very thankful to the engineers for having taken the time to review our design for CONIKS and to exchange ideas about possible improvements. Our discussions were instrumental in revealing the design challenges we hadn’t considered in our research on CONIKS, and we hope shedding light on these open problems will inform future research on usable security and practical key management. But more importantly, speaking with engineers working on Google End-to-End, Yahoo End-to-End, iMessage, and Signal provided us with a first-hand perspective of the technical challenges of deploying transparent key management and usable encryption tools. Not only is this a lens through which we as academic researchers had a rare opportunity to view a system we developed, it also gave us a better appreciation for the engineering effort and time that finding a practical solution to all of these open deployment challenges requires.

We have also been in contact with a few smaller secure communication service providers, most notably Tor Messenger, with whom we’ve discussed their plans to deploy CONIKS as part of their systems. Although these small providers largely face the same challenges, we believe their smaller (and often more niche) user base lowers the barrier to adoption of a system like CONIKS. Unfortunately, we have no concrete information on when CONIKS will reach practical deployment. However, with the ongoing debate about encryption-by-default and backdoors, and the fact that transparent key management can provide hard evidence of coercion by nation-state actors, we suspect the pressure to deploy a system like CONIKS has never been greater.

CONIKS has provided the first steps towards transparency and is changing how communication service providers are thinking about key management. But overcoming the remaining barriers isn’t a minor endeavor. While some of these tech companies have had a slow start, we hope the renewed public interest and attention on end-to-end encryption will shift the focus to their usable encryption tools. We also hope this debate can provide further proof of the importance of key transparency to the companies that have been releasing usable encryption tools for several years. The engineers who have been working on all of these tools are incredibly dedicated and passionate about solving these open problems, and we’re optimistic that transparent key management is within our reach.

Comments

  1. Andrew Nambudripad says:

    So this is, and has been for nearly 20 years, a “solved problem” from the mathematical sense. You put a lot out there, so I’ll try to address your points as systematically as possible.

    The issues are purely social. Culprit 1: Apathetic end users who don’t know or care that the government is storing data en masse[1]. Complacent with Facebook stalking exes after a few drinks and cat pictures. Culprit 2: As you mentioned, the corporations who fail to agree to collaborate in a federation type sense because vendor-lock-in is a large component of their business model. Interesting historical note: Google Hangouts, as I’m sure you know was fully XMPP compatible.[2] Google soon shutdown that service to the chagrin of many CLI’ers who used 3rd party clients.

    I’ll tackle actual federation in a bit, but SSL is interoperable enough between all of the major browsers and platforms that one does have a lot of interoperability. ‘Every time you go to Amazon.com and whip out your Visa, that little green lock in the corner of your browser signifies you are in fact contacting the proper server, as verified by the Signing Authority. While not ‘federated’ in the way I presume you mean (i.e., decentralized), the 20 or 30 primary certificate authorities like Verisign, THAWTE, etc, that come preloaded with your operating system and/or browser act as a fairly standardized source of identity validation cross-devices/platforms/browsers, etc.

    But yes, it’s centralized and as a result of that you’re susceptible to a nefarious (and/or compelled by the courts and/or under duress) primary authority provider compromising your communication. Peer-to-peer via well established Diffie-Hellman and friends can somewhat eliminate the MitM.

    For 3, there are standards for revoking keys in even the most halfbaked of protocols. (If we continue with my Verisign example, you can contact them the second youre aware of any security breach and issue a security revocation via CRLs. Again, not immune against three-letter agencies. (Don’t worry, I’ll get address this too).

    For 4: I’m not sure what you mean. ” publishing this proof at these intervals also means attacks can only be detected after they have already happened.”. That is a tautology. How could one detect an attack before they happen? I’m guessing you were a little tired when you chalked that and I’m going to assume you meant that even though there is a ‘single source of truth’ to which one can refer, there is still an interval between the moment of malice and the moment the other actors in the system have been made aware of the compromised state. This again is a social problem. which I submit to you, dear reader, my assertion that this fundamentally has no solution. (Anyone can go rogue at any time, to which I’m sure field operatives in the intelligence communities would attest.)

    5: The major problem. I don’t have an ACM subscription, but I’m going to guess the paper was about ‘user clicking yes to everything, as they’ve been desensitized by years of terms of services & such they’ve seen no value in, so they just think any dialog box that keeps them from their Facebook cats is mundane’. Again, this is social- the way you avoid is by making the event so exceptional that they are *not* blind. Firefox, for example. Im’ sure did fairly well with those huge red screens[3].

    People just don’t care. GPG has solved all of your issues. I’ll briefly go through it. The network is formed organically, often at “key signing parties” where you assign a value to someone based on how confidently you can verify them as..who they are. Best friend Bob gets a 5, the guy you went to summer camp with for 6 weeks in 1999 gets a 2. This organically forms a web of trust. If your laptop is stolen, the private key is protected by the passphrase, and you simply revoke the key and announce the revocation to the WoT (traditionally, that MIT server, that was around when I first found GPG at 12 and still remains up like 15 years later..) preventing transmission of new data to that keypair. Of course, you either need 1) backups of your keys (as well as the ability to recall your passphrase), or 2) what I do – give a trusted friend a revoke key. Apparently, there’s a 3rd option I wasn’t aware of until today “A key owner may designate a third party that has permission to revoke the key owner’s key (in case the key owner loses their own private key and thus loses the ability to revoke their own public key”[wikipedia]. In cases 2 or 3, if my designee transitions state to ‘nefarious’, worst case, Alice just gets a “oh his key is invalid now” notice, which is a false positive but no information has been compromised.

    How does this fit into .. well .. anything? Well, SAML2 is a SSO that implements PKI in order to authenticate ones identity. It’s a standard by the OASIS standards body, well formalized, deployed with great success on Internet2 amongst quite a few universities. Open source implementations exist within the community. But the problem is Bob just wants to see Facebook cats, and Facebook just wants to keep their walled garden (you’ll see how FB, Google, Twitter and Linkedin all will gladly let you use their widgets as a login mechanism, but none of them will inter-op.)

    I promise I’d address the final point, federated servers and compromised authorities. Well SAML2 conveniently has WS-Federation[3]. JWT (JSON Web Tokens) use a similar technology. In fact, the only way to solve your 2nd point is brilliantly simple. Run your own Identity Server (see [4]).

    [1] https://youtu.be/XEVlyP4_11M?t=450 Sigh.
    [2] Which, actually, has a XEP (think “RFC but for XMPP”) for PGP. See here: http://mail.jabber.org/pipermail/standards/2016-January/030755.html I’ll get to PGP in a bit…
    [3] https://docs.wso2.com/display/IS510/WS-Federation
    [4] https://github.com/IdentityServer/IdentityServer3.Samples/tree/master/source/WebHost%20(Windows%20Auth%20All-in-One)

    • Peter Gerdes says:

      As for 3:

      The issue isn’t revoking keys. It is allowing some external source (say apple) to attest to the fact that the user claimed to lose their password and proved their identity to that source’s satisfaction.

      In other words when a user losses access to a key we want the people corresponding with them to be able to say:

      Hey, John appears to have lost his key and reset it. I guess it may be that the service provider is helping someone else pose as John but I know this isn’t just John’s ex-girlfriend faking a key reset because the provider signed the reset.

      4:

      There are plenty of protocals that let you immediately detect tampering. For instance manual key verification.