November 23, 2024

CITP Expands Scope of RECAP

Today, we’re thrilled to announce the next version of our RECAP technology, dramatically expanding the scope of the project.

Having had some modest success at providing public access to legal documents, we’re now taking the next logical step, offering easy public access to illegal documents.

The Internet Archive, which graciously hosts RECAP’s repository of legal documents, was strangely unreceptive to our offer to let them store the world’s most comprehensive library of illegal documents. Fortunately, the Pirate Bay was happy to step in and help.

Interested in seeing what’s available? Then you might want to watch our brief instructional video.

Pseudonyms: The Natural State of Online Identity

I’ve been writing recently about the problems that arise when you try to use cryptography to verify who is at the other end of a network connection. The cryptographic math works, but that doesn’t mean you get the identity part right.

You might think, from this discussion, that crypto by itself does nothing — that cryptographic security can only be bootstrapped from some kind of real-world identity verification. That’s the way it works for website certificates, where a certificate authority has to check your bona fides before it will issue you a certificate.

But this intuition turns out to be wrong. There is one thing that crypto can do perfectly, without any real-world support: providing pseudonyms. Indeed, crypto is so good at supporting pseudonyms that we can practically say that pseudonyms are the natural state of identity online.

To explain why this is true, I need to offer a gentle introduction to a basic crypto operation: digital signatures. Suppose John Doe (“JD”) wants to use digital signatures. First, JD needs to create a private cryptographic key, which he does by generating some random numbers and combining them according to a special geeky recipe. The result is a unique private key that only JD knows. Next, JD uses a certain procedure to determine the public key that corresponds to his private key. He announces the public key to everyone. The math guarantees that (1) JD’s public key is unique and corresponds to JD’s private key, and (2) a person who knows JD’s public key can’t figure out JD’s private key.

Now JD can make digital signatures. If JD wants to “sign” a certain message M, he combines M with JD’s private key in a special way, and the result is JD’s “signature on M”. Now anybody can verify the signature, using JD’s public key. Only JD can make the signature, because only JD knows JD’s private key; but anybody can verify the signature.

At no point in this process does JD tell anybody who he is — I called him “John Doe” for a reason. Indeed, JD’s public key is a perfect pseudonym: it conveys nothing about JD’s actual identity, yet it has a distinct “owner” whose presence can be verified. (“You’re really the person who created this public key? Then you should be able to make a signature on the message ‘squeamish ossifrage’ for me….”)

Using this method, anybody can make up a fresh pseudonym whenever they want. If you can generate random numbers and do some math (or have your computer do those things for you), then you can make a fresh pseudonym. You can make as many as you want, without needing to coordinate with anybody. This is all easy to do.

These methods, pseudonyms and signatures, are used even in cases where we want to verify somebody’s real-world identity. When you connect to (say) https://mail.google.com, Google’s web server gives you its public key — a pseudonym — along with a digital certificate that attests that that public key — that pseudonym — belongs to Google Inc. Binding public keys — pseudonyms — to real-world identities is tedious and messy, but of course this is often necessary in practice.

Online, identities are hard to manage. Pseudonyms are easy.

Side-Channel Leaks in Web Applications

Popular online applications may leak your private data to a network eavesdropper, even if you’re using secure web connections, according to a new paper by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. (Chen is at Microsoft Research; the others are at Indiana.) It’s a sobering result — yet another illustration of how much information can be leaked by ordinary web technologies. It’s also really clever.

Here’s the background: Secure web connections encrypt traffic so that only your browser and the web server you’re visiting can see the contents of your communication. Although a network eavesdropper can’t understand the requests your browser sends, nor the replies from the server, it has long been known that an eavesdropper can see the size of the request and reply messages, and that these sizes sometimes leak information about which page you’re viewing, if the request size (i.e., the size of the URL) or the reply size (i.e., the size of the HTML page you’re viewing) is distinctive.

The new paper shows that this inference-from-size problem gets much, much worse when pages are using the now-standard AJAX programming methods, in which a web “page” is really a computer program that makes frequent requests to the server for information. With more requests to the server, there are many more opportunities for an eavesdropper to make inferences about what you’re doing — to the point that common applications leak a great deal of private information.

Consider a search engine that autocompletes search queries: when you start to type a query, the search engine gives you a list of suggested queries that start with whatever characters you have typed so far. When you type the first letter of your search query, the search engine page will send that character to the server, and the server will send back a list of suggested completions. Unfortunately, the size of that suggested completion list will depend on which character you typed, so an eavesdropper can use the size of the encrypted response to deduce which letter you typed. When you type the second letter of your query, another request will go to the server, and another encrypted reply will come back, which will again have a distinctive size, allowing the eavesdropper (who already knows the first character you typed) to deduce the second character; and so on. In the end the eavesdropper will know exactly which search query you typed. This attack worked against the Google, Yahoo, and Microsoft Bing search engines.

Many web apps that handle sensitive information seem to be susceptible to similar attacks. The researchers studied a major online tax preparation site (which they don’t name) and found that it leaks a fairly accurate estimate of your Adjusted Gross Income (AGI). This happens because the exact set of questions you have to answer, and the exact data tables used in tax preparation, will vary based on your AGI. To give one example, there is a particular interaction relating to a possible student loan interest calculation, that only happens if your AGI is between $115,000 and $145,000 — so that the presence or absence of the distinctively-sized message exchange relating to that calculation tells an eavesdropper whether your AGI is between $115,000 and $145,000. By assembling a set of clues like this, an eavesdropper can get a good fix on your AGI, plus information about your family status, and so on.

For similar reasons, a major online health site leaks information about which medications you are taking, and a major investment site leaks information about your investments.

The paper goes on to consider possible mitigations. The most obvious mitigation is to add padding to messages so that their sizes are not so distinctive — for example, every message might be padded to make its size a multiple of 256 bytes. This turns out to be less effective than you might expect — significant information can still leak even if messages are generously padded — and the padded messages are slower and more expensive to transmit.

We don’t know which sites the researchers studied, but it seems like a safe bet that most, if not all, of the sites in these product categories have similar problems. It’s important to keep these attacks in perspective — bear in mind that they can only be carried out by someone who can eavesdrop on the network between you and the site you’re visiting.

It’s becoming increasingly clear that securing web-based applications is very difficult, and that the basic tools for developing web apps don’t do much to help. The industry, and researchers, will be struggling with web app security issues for years to come.

Web Certification Fail: Bad Assumptions Lead to Bad Technology

It should be abundantly clear, from two recent posts here, that the current model for certifying the identity of web sites is deeply flawed. When you connect to a web site, and your browser displays an https URL and a happy lock or key icon indicating a secure connection, the odds that you’re connecting to an impostor site, despite your browser’s best efforts, are uncomfortably high.

How did this happen? The last two posts unpacked some of the detailed problems with the current system. Today I want to explore the root cause: today’s system is based on wildly unrealistic assumptions about organizations and trust.

The theory behind the system is simple. Browser vendors will identify a set of Certificate Authorities (CAs) who are trusted to certify identities. Browsers will automatically accept any identity certificate issued by any of the trusted CAs.

The first step in making this system work is identifying some CA who is trusted by everybody in the world.

If that last sentence didn’t strike you as odd, go back and read it again. That’s right, the system assumes that there is some party who is trusted by everyone in the world — a spectacularly naive assumption.

Network engineers like to joke about the “evil bit”, a hypothetical label put on each network packet, indicating whether the packet is evil. (See RFC 3514, Steve Bellovin’s classic parody standards document codifying the evil bit. I’ve always loved that the official Internet standards series accepts parody standards.) Well, the “trusted bit” for certificate authorities is pretty much as the same as the evil bit, only applied to organizations rather than network packets. Yet somehow we ended up with a design that relies on this “trusted bit”.

The reason, in part, is unclear thinking about institutional trust, abetted by the unclear language we often use in discussing trust online. For example, we tend to conflate two meanings of the word “trusted”. The first meaning of “trusted”, which is the everyday meaning, implies a judgment that a party is unlikely to misbehave. The second meaning of “trusted”, more common in military security settings, is a factual statement that someone is vulnerable to misbehavior by another. In an ideal world, we would make sure that someone was trusted in the first sense before they became trusted in the second sense, that is, we would make sure that someone was unlikely to misbehave before we we made ourselves vulnerable to their misbehavior. This isn’t easy to do — and we will forget entirely to do it if we confuse the two meanings of trusted.

The second linguistic problem is to use the passive-voice construction “A is trusted to do X” rather than the active form “B trusts A to do X.” The first form is problematic because it doesn’t say who is doing the trusting. Consider these two statements: (A) “CNNIC is a trusted certificate authority.” (B) “Everyone trusts CNNIC to be a certificate authority.” The first statement might sound plausible, but the second is obviously false.

If you try to explain to yourself why the existing web certification system is sound, while avoiding the two errors above (confusing two senses of “trusted”, and failing to say who is doing the trusting), you’ll see pretty quickly that the argument for the current system is tenuous at best. You’ll see, too, that we can’t fix the system by using different cryptography — what we need are new institutional arrangements.

Mozilla Debates Whether to Trust Chinese CA

[Note our follow-up posts on this topic: Web Security Trust Models, and Web Certification Fail: Bad Assumptions Lead to Bad Technology]

Sometimes geeky technical details matter only to engineers. But sometimes a seemingly arcane technical decision exposes deep social or political divisions. A classic example is being debated within the Mozilla project now, as designers decide whether the Mozilla Firefox browser should trust a Chinese certification authority by default.

Here’s the technical background: When you browse to a secure website (typically at a URL starting with “https:”), your browser takes two special security precautions: it sets up a private, encrypted “channel” to the server, and it authenticates the server’s identity. The second step, authentication, is necessary because a secure channel is useless if you don’t know who is on the other end. Without authentication, you might be talking to an impostor.

Suppose you’re connecting to https://mail.google.com, to pick up your Gmail. To authenticate itself to you, the server will (1) do some fancy math to prove to you that it knows a certain encryption key, and (2) present you with a digital certificate (or “cert”) attesting that only Google knows that encryption key. The cert is created by a Certification Authority (“CA”), which asserts that it has done the necessary due diligence to establish that the designated encryption key is known only to Google Inc.

If the CA is competent and honest, then you can rely on the cert, and your connection will be secure. But a dishonest CA can trick you into talking to an impostor site, so you need to be cautious about which CAs you trust. Your browser comes preinstalled with a list of CAs whom it will trust. In principle you can change this list, but almost nobody does. So browser vendors effectively decide which CAs their users will trust.

With this background in mind, let’s unpack the Mozilla debate. What set off the debate was the addition of the China Internet Network Information Center (CNNIC) as a trusted CA in Firefox. CNNIC is not part of the Chinese government but many people assert that it would be willing to act in concert with the Chinese government.

To see why this is worrisome, let’s suppose, just for the sake of argument, that CNNIC were a puppet of the Chinese government. Then CNNIC’s status as a trusted CA would give it the technical power to let the Chinese government spy on its citizens’ “secure” web connections. If a Chinese citizen tried to make a secure connection to Gmail, their connection could be directed to an impostor Gmail site run by the Chinese government, and CNNIC could give the impostor a cert saying that the government impostor was the real Gmail site. The Chinese citizen would be fooled by the fake Gmail site (having no reason to suspect anything was wrong) and would happily enter his Gmail password into the impostor site, giving the Chinese government free run of the citizen’s email archive.

CNNIC’s defenders respond that any CA could do such a thing. If the problem is that CNNIC is too close to a government, what about the CAs already on the Firefox CA list that are governments? Isn’t CNNIC being singled out because it is Chinese? Doesn’t the country with the largest Internet population deserve at least one slot among the dozens of already trusted CAs? These are all good questions, even if they’re not the whole story.

Mozilla’s decision touches deep questions of fairness, trust, and institutional integrity that I won’t even pretend to address in this post. No single answer will be right for all users.

Part of the problem is that the underlying technical design is fragile. Any CA can certify to any user that any server owns any name, so the consequences of a misplaced trust decision are about as bad as they can be. It’s tempting to write this off as bonehead design, but in truth the available design options are all unattractive.