April 25, 2024

Anticensorship in the Internet's Infrastructure

I’m pleased to announce a research result that Eric Wustrow, Scott Wolchok, Ian Goldberg, and I have been working on for the past 18 months: Telex, a new approach to circumventing state-level Internet censorship. Telex is markedly different from past anticensorship efforts, and we believe it has the potential to shift the balance of power in the censorship arms race.

What makes Telex different from previous approaches:

  • Telex operates in the network infrastructure — at any ISP between the censor’s network and non-blocked portions of the Internet — rather than at network end points. This approach, which we call “end-to-middle” proxying, can make the system robust against countermeasures (such as blocking) by the censor.
  • Telex focuses on avoiding detection by the censor. That is, it allows a user to circumvent a censor without alerting the censor to the act of circumvention. It complements anonymizing services like Tor (which focus on hiding with whom the user is attempting to communicate instead of that that the user is attempting to have an anonymous conversation) rather than replacing them.
  • Telex employs a form of deep-packet inspection — a technology sometimes used to censor communication — and repurposes it to circumvent censorship.
  • Other systems require distributing secrets, such as encryption keys or IP addresses, to individual users. If the censor discovers these secrets, it can block the system. With Telex, there are no secrets that need to be communicated to users in advance, only the publicly available client software.
  • Telex can provide a state-level response to state-level censorship. We envision that friendly countries would create incentives for ISPs to deploy Telex.

For more information, keep reading, or visit the Telex website.

The Problem

Government Internet censors generally use firewalls in their network to block traffic bound for certain destinations, or containing particular content. For Telex, we assume that the censor government desires generally to allow Internet access (for economic or political reasons) while still preventing access to specifically blacklisted content and sites. That means Telex doesn’t help in cases where a government pulls the plug on the Internet entirely. We further assume that the censor allows access to at least some secure HTTPS websites. This is a safe assumption, since blocking all HTTPS traffic would cut off practically every site that uses password logins.

<!– –>

Many anticensorship systems work by making an encrypted connection (called a “tunnel”) from the user’s computer to a trusted proxy server located outside the censor’s network. This server relays requests to censored websites and returns the responses to the user over the encrypted tunnel. This approach leads to a cat-and-mouse game, where the censor attempts to discover and block the proxy servers. Users need to learn the address and login information for a proxy server somehow, and it’s very difficult to broadcast this information to a large number of users without the censor also learning it.

How Telex Works

Telex turns this approach on its head to create what is essentially a proxy server without an IP address. In fact, users don’t need to know any secrets to connect. The user installs a Telex client app (perhaps by downloading it from an intermittently available website or by making a copy from a friend). When the user wants to visit a blacklisted site, the client establishes an encrypted HTTPS connection to a non-blacklisted web server outside the censor’s network, which could be a normal site that the user regularly visits. Since the connection looks normal, the censor allows it, but this connection is only a decoy.

The client secretly marks the connection as a Telex request by inserting a cryptographic tag into the headers. We construct this tag using a mechanism called public-key steganography. This means anyone can tag a connection using only publicly available information, but only the Telex service (using a private key) can recognize that a connection has been tagged.

As the connection travels over the Internet en route to the non-blacklisted site, it passes through routers at various ISPs in the core of the network. We envision that some of these ISPs would deploy equipment we call Telex stations. These devices hold a private key that lets them recognize tagged connections from Telex clients and decrypt these HTTPS connections. The stations then divert the connections to anti­censorship services, such as proxy servers or Tor entry points, which clients can use to access blocked sites. This creates an encrypted tunnel between the Telex user and Telex station at the ISP, redirecting connections to any site on the Internet.

<!– –>

Telex doesn’t require active participation from the censored websites, or from the non-censored sites that serve as the apparent connection destinations. However, it does rely on ISPs to deploy Telex stations on network paths between the censor’s network and many popular Internet destinations. Widespread ISP deployment might require incentives from governments.

Development so Far

At this point, Telex is a concept rather than a production system. It’s far from ready for real users, but we have developed proof-of-concept software for researchers to experiment with. So far, there’s only one Telex station, on a mock ISP that we’re operating in our lab. Nevertheless, we have been using Telex for our daily web browsing for the past four months, and we’re pleased with the performance and stability. We’ve even tested it using a client in Beijing and streamed HD YouTube videos, in spite of YouTube being censored there.

Telex illustrates how it is possible to shift the balance of power in the censorship arms race, by thinking big about the problem. We hope our work will inspire discussion and further research about the future of anticensorship technology.

You can find more information and prototype software at the Telex website, or read our technical paper, which will appear at Usenix Security 2011 in August.

"You Might Also Like:" Privacy Risks of Collaborative Filtering

Ann Kilzer, Arvind Narayanan, Ed Felten, Vitaly Shmatikov, and I have released a new research paper detailing the privacy risks posed by collaborative filtering recommender systems. To examine the risk, we use public data available from Hunch, LibraryThing, Last.fm, and Amazon in addition to evaluating a synthetic system using data from the Netflix Prize dataset. The results demonstrate that temporal changes in recommendations can reveal purchases or other transactions of individual users.

To help users find items of interest, sites routinely recommend items similar to a given item. For example, product pages on Amazon contain a “Customers Who Bought This Item Also Bought” list. These recommendations are typically public, and they are the product of patterns learned from all users of the system. If customers often purchase both item A and item B, a collaborative filtering system will judge them to be highly similar. Most sites generate ordered lists of similar items for any given item, but some also provide numeric similarity scores.

Although item similarity is only indirectly related to individual transactions, we determined that temporal changes in item similarity lists or scores can reveal details of those transactions. If you’re a Mozart fan and you listen to a Justin Bieber song, this choice increases the perceived similarity between Justin Bieber and Mozart. Because similarity lists and scores are based on perceived similarity, your action may result in changes to these scores or lists.

Suppose that an attacker knows some of your past purchases on a site: for example, past item reviews, social networking profiles, or real-world interactions are a rich source of information. New purchases will affect the perceived similarity between the new items and your past purchases, possibility causing visible changes to the recommendations provided for your previously purchased items. We demonstrate that an attacker can leverage these observable changes to infer your purchases. Among other things, these attacks are complicated by the fact that multiple users simultaneously interact with a system and updates are not immediate following a transaction.

To evaluate our attacks, we use data from Hunch, LibraryThing, Last.fm, and Amazon. Our goal is not to claim privacy flaws in these specific sites (in fact, we often use data voluntarily disclosed by their users to verify our inferences), but to demonstrate the general feasibility of inferring individual transactions from the outputs of collaborative filtering systems. Among their many differences, these sites vary dramatically in the information that they reveal. For example, Hunch reveals raw item-to-item correlation scores, but Amazon reveals only lists of similar items. In addition, we examine a simulated system created using the Netflix Prize dataset. Our paper outlines the experimental results.

While inference of a Justin Bieber interest may be innocuous, inferences could expose anything from dissatisfaction with a job to health issues. Our attacks assume that a victim reveals certain past transactions, but users may publicly reveal certain transactions while preferring to keep others private. Ultimately, users are best equipped to determine which transactions would be embarrassing or otherwise problematic. We demonstrate that the public outputs of recommender systems can reveal transactions without user knowledge or consent.

Unfortunately, existing privacy technologies appear inadequate here, failing to simultaneously guarantee acceptable recommendation quality and user privacy. Mitigation strategies are a rich area for future work, and we hope to work towards solutions with others in the community.

Worth noting is that this work suggests a risk posed by any feature that adapts in response to potentially sensitive user actions. Unless sites explicitly consider the data exposed, such features may inadvertently leak details of these underlying actions.

Our paper contains additional details. This work was presented earlier today at the 2011 IEEE Symposium on Security and Privacy. Arvind has also blogged about this work.

Identifying Trends that Drive Technology

I’m trying to compile a list of major technological and societal trends that influence U.S. computing research. Here’s my initial list. Please post your own suggestions!

  • Ubiquitous connectivity, and thus true mobility
  • Massive computational capability available to everyone, through the cloud
  • Exponentially increasing data volumes – from ubiquitous sensors, from higher-volume sensors (digital imagers everywhere!), and from the creation of all information in digital form – has led to a torrent of data which must be transferred, stored, and mined: “data to knowledge to action”
  • Social computing – the way people interact has been transformed; the data we have from and about people is transforming
  • All transactions (from purchasing to banking to voting to health) are online, creating the need for dramatic improvements in privacy and security
  • Cybercrime
  • The end of single-processor performance increases, and thus the need for parallelism to increase performance in operating systems and productivity applications, not just high-end applications; also power issues
  • Asymmetric threats, need for surveillance, reconnaissance
  • Globalization – of innovation, of consumption, of workforce
  • Pressing national and global challenges: climate change, education, energy / sustainability, health care (these replace the cold war)

What’s on your list? Please post below!

[cross-posted from CCC Blog]

Acceptance rates at security conferences

How competitive are security research conferences? Several people have been tracking this information. Mihai Christodorescu has a nice chart of acceptance and submission rates over time. The most recent data point we have is the 2009 Usenix Security Symposium, which accepted 26 of 176 submissions (a 14.8% acceptance ratio, consistent with recent years). Acceptance rates like that, at top security conferences, are now pretty much the norm.

With its deadline one week ago, ACM CCS 2009 got 317 submissions this year (up from 274 last year, and approx. 300 the year before) and ESORICS 2009, with a submission deadline last Friday night, got 222 submissions (up from about 170 last year).

Think about that: right now there are over 500 research manuscripts in the field of computer security fighting it out, and maybe 15-20% of those will get accepted. (And that’s not counting research in cryptography, or the security-relevant papers that regularly appear in the literature on operating systems, programming languages, networking, and other fields.) Ten years ago, when I first began as an assistant professor, there would be half as many papers submitted. At the time, I grumbled that we had too many security conferences and that the quality of the proceedings suffered. Well, that problem seems mostly resolved, except rather than having half as many conferences, we now have a research community that’s apparently twice as large. I suppose that’s a good thing, although there are several structural problems that we, the academic security community, really need to address.

  • What are we supposed to do with the papers that are rejected, resubmitted, rejected again, and so on? Clearly, some of this work has value and never gets seen. Should we make greater use of the arXiv.org pre-print service? There’s a crypto and computer security section, but it’s not heavily used. Alternatively, we could join on on the IACR Cryptology ePrint Archive or create our own.
  • Should we try to make the conference reviewing systems more integrated across conferences, such that PC comments from one conference show up in a subsequent conference, and the subsequent PC can see both drafts of the paper? This would make conference reviewing somewhat more like journal reviewing, providing a measure of consistency from one conference to the next.
  • Low acceptance ratios don’t necessarily achieve higher quality proceedings. There’s a distinctive problem that occurs when a conference has a huge PC and only three of them review any given paper. Great papers still get in and garbage papers are still rejected, but the outcomes for papers “on the bubble” becomes more volatile, depending on whether those papers get the right reviewers. Asking PC members to do more reviews is just going to lower the quality of the reviews or discourage people from accepting positions on PCs. Adding additional PC members could help, but it also can be unwieldy to manage a large PC, and there will be even more volatility.
  • Do we need another major annual computer security conference? Should more workshops be willing to take conference-length submissions? Or should our conferences raise their acceptance rates up to something like 25%, even if that means compressed presentations and the end of printed proceedings? How much “good” work is out there, if only there was a venue in which to print it?

About the only one of these ideas I don’t like is adding another top-level security conference. Otherwise, we could well do all-of-the-above, and that would be a good thing. I’m particularly curious if arbitrarily increasing the acceptance rates would resolve some of the volatility issues on the bubble. I think I’d rather that our conferences err on the side of taking the occasional bad/broken/flawed paper rather than rejecting the occasional good-but-misunderstood paper.

Maybe we just need to harness the power of our graduate students. When you give a grad student a paper to review, they treat it like a treasure and write a detailed review, even if they may not be the greatest expert in the field. Conversely, when you give an overworked professor a paper to review, they blast through it, because they don’t have the time to spend a full day on any given paper. Well, it’s not like our grad students have anything better to be doing. But does the additional time they can spend per paper make up for the relative lack of experience and perspective? Can they make good accept-or-reject judgements for papers on the bubble?

For additional thoughts on this topic, check out Matt Welsh’s thoughts on scaling systems conferences. He argues that there’s a real disparity between the top programs / labs and everybody else and that it’s worthwhile to take steps to fix this. (I’ll argue that security conferences don’t seem to have this particular problem.) He also points out what I think is the deeper problem, which is that hotshot grad students must get themselves a long list of publications to have a crack at a decent faculty job. This was emphatically not the case ten years ago.

See also, Birman and Schneider’s CACM article (behind a paywall, unless your university has a site license). They argue that the focus on short, incremental results is harming our field’s ability to have impact. They suggest improving the standing of journals in the tenure game and they suggest disincentivizing people from submitting junk / preliminary papers by creating something of a short-cut reject that gets little or no feedback and also, by virtue of the conferences not being blind-review, creates the possibility that a rejected paper could harm the submitter’s reputation.

Fingerprinting Blank Paper Using Commodity Scanners

Today Will Clarkson, Tim Weyrich, Adam Finkelstein, Nadia Heninger, Alex Halderman and I released a paper, Fingerprinting Blank Paper Using Commodity Scanners. The paper will appear in the Proceedings of the IEEE Symposium on Security and Privacy, in May 2009.

Here’s the paper’s abstract:

This paper presents a novel technique for authenticating physical documents based on random, naturally occurring imperfections in paper texture. We introduce a new method for measuring the three-dimensional surface of a page using only a commodity scanner and without modifying the document in any way. From this physical feature, we generate a concise fingerprint that uniquely identifies the document. Our technique is secure against counterfeiting and robust to harsh handling; it can be used even before any content is printed on a page. It has a wide range of applications, including detecting forged currency and tickets, authenticating passports, and halting counterfeit goods. Document identification could also be applied maliciously to de-anonymize printed surveys and to compromise the secrecy of paper ballots.

Viewed under a microscope, an ordinary piece of paper looks like this:

The microscope clearly shows individual wood fibers, laid down in a pattern that is unique to this piece of paper.

If you scan a piece of paper on an ordinary desktop scanner, it just looks white. But pick a small area of the paper, digitally enhance the contrast and expand the image, and you see something like this:

The light and dark areas you see are due to two factors: inherent color variation in the surface, and partial shadows cast by fibers in the paper surface. If you rotate the paper and scan again, the inherent color at each point will be the same, but the shadows will be different because the scanner’s light source will strike the paper from a different angle. These differences allow us to map out the tiny hills and valleys on the surface of the paper.

Here is a visualization of surface shape from one of our experiments:

This part of the paper had the word “sum” printed on it. You can clearly see the raised areas where toner was applied to the paper to make the letters. Around the letters you can see the background texture of the paper.

Computing the surface texture is only one part of the job. From the texture, you want to compute a concise, secure “fingerprint” which can survive ordinary wear and tear on the paper, such as crumpling, scribbling or printing, and moisture. You also want to understand how secure the technology will be in various applications. Our full paper addresses these issues too. The bottom-line result is a sort of unique fingerprint for each piece of paper, which can be determined using an ordinary desktop scanner.

For more information, see the project website or our research paper.