November 28, 2024

Anticensorship in the Internet's Infrastructure

I’m pleased to announce a research result that Eric Wustrow, Scott Wolchok, Ian Goldberg, and I have been working on for the past 18 months: Telex, a new approach to circumventing state-level Internet censorship. Telex is markedly different from past anticensorship efforts, and we believe it has the potential to shift the balance of power in the censorship arms race.

What makes Telex different from previous approaches:

  • Telex operates in the network infrastructure — at any ISP between the censor’s network and non-blocked portions of the Internet — rather than at network end points. This approach, which we call “end-to-middle” proxying, can make the system robust against countermeasures (such as blocking) by the censor.
  • Telex focuses on avoiding detection by the censor. That is, it allows a user to circumvent a censor without alerting the censor to the act of circumvention. It complements anonymizing services like Tor (which focus on hiding with whom the user is attempting to communicate instead of that that the user is attempting to have an anonymous conversation) rather than replacing them.
  • Telex employs a form of deep-packet inspection — a technology sometimes used to censor communication — and repurposes it to circumvent censorship.
  • Other systems require distributing secrets, such as encryption keys or IP addresses, to individual users. If the censor discovers these secrets, it can block the system. With Telex, there are no secrets that need to be communicated to users in advance, only the publicly available client software.
  • Telex can provide a state-level response to state-level censorship. We envision that friendly countries would create incentives for ISPs to deploy Telex.

For more information, keep reading, or visit the Telex website.

The Problem

Government Internet censors generally use firewalls in their network to block traffic bound for certain destinations, or containing particular content. For Telex, we assume that the censor government desires generally to allow Internet access (for economic or political reasons) while still preventing access to specifically blacklisted content and sites. That means Telex doesn’t help in cases where a government pulls the plug on the Internet entirely. We further assume that the censor allows access to at least some secure HTTPS websites. This is a safe assumption, since blocking all HTTPS traffic would cut off practically every site that uses password logins.

<!– –>

Many anticensorship systems work by making an encrypted connection (called a “tunnel”) from the user’s computer to a trusted proxy server located outside the censor’s network. This server relays requests to censored websites and returns the responses to the user over the encrypted tunnel. This approach leads to a cat-and-mouse game, where the censor attempts to discover and block the proxy servers. Users need to learn the address and login information for a proxy server somehow, and it’s very difficult to broadcast this information to a large number of users without the censor also learning it.

How Telex Works

Telex turns this approach on its head to create what is essentially a proxy server without an IP address. In fact, users don’t need to know any secrets to connect. The user installs a Telex client app (perhaps by downloading it from an intermittently available website or by making a copy from a friend). When the user wants to visit a blacklisted site, the client establishes an encrypted HTTPS connection to a non-blacklisted web server outside the censor’s network, which could be a normal site that the user regularly visits. Since the connection looks normal, the censor allows it, but this connection is only a decoy.

The client secretly marks the connection as a Telex request by inserting a cryptographic tag into the headers. We construct this tag using a mechanism called public-key steganography. This means anyone can tag a connection using only publicly available information, but only the Telex service (using a private key) can recognize that a connection has been tagged.

As the connection travels over the Internet en route to the non-blacklisted site, it passes through routers at various ISPs in the core of the network. We envision that some of these ISPs would deploy equipment we call Telex stations. These devices hold a private key that lets them recognize tagged connections from Telex clients and decrypt these HTTPS connections. The stations then divert the connections to anti­censorship services, such as proxy servers or Tor entry points, which clients can use to access blocked sites. This creates an encrypted tunnel between the Telex user and Telex station at the ISP, redirecting connections to any site on the Internet.

<!– –>

Telex doesn’t require active participation from the censored websites, or from the non-censored sites that serve as the apparent connection destinations. However, it does rely on ISPs to deploy Telex stations on network paths between the censor’s network and many popular Internet destinations. Widespread ISP deployment might require incentives from governments.

Development so Far

At this point, Telex is a concept rather than a production system. It’s far from ready for real users, but we have developed proof-of-concept software for researchers to experiment with. So far, there’s only one Telex station, on a mock ISP that we’re operating in our lab. Nevertheless, we have been using Telex for our daily web browsing for the past four months, and we’re pleased with the performance and stability. We’ve even tested it using a client in Beijing and streamed HD YouTube videos, in spite of YouTube being censored there.

Telex illustrates how it is possible to shift the balance of power in the censorship arms race, by thinking big about the problem. We hope our work will inspire discussion and further research about the future of anticensorship technology.

You can find more information and prototype software at the Telex website, or read our technical paper, which will appear at Usenix Security 2011 in August.

Yet again, why banking online .NE. voting online

One of the most common questions I get is “if I can bank online, why can’t I vote online”. A recently released (but undated) document ”Supplement to Authentication in an Internet Banking Environment” from the Federal Financial Institutions Examination Council addresses some of the risks of online banking. Krebs on Security has a nice writeup of the issues, noting that the guidelines call for ‘layered security
programs’ to deal with these riskier transactions, such as:

  1. methods for detecting transaction anomalies;

  2. dual transaction authorization through different access devices;

  3. the use of out-of-band verification for transactions;

  4. the use of ‘positive pay’ and debit blocks to appropriately limit
    the transactional use of an account;

  5. ‘enhanced controls over account activities,’ such as transaction
    value thresholds, payment recipients, the number of transactions
    allowed per day and allowable payment days and times; and

  6. ’enhanced customer education to increase awareness of the fraud
    risk and effective techniques customers can use to mitigate the
    risk.’

[I’ve replaced bullets with numbers in Krebs’ posting in the above list to make it
easier to reference below.]

So what does this have to do with voting? Well, if you look at them
in turn and consider how you’d apply them to a voting system:

  1. One could hypothesize doing this – if 90% of the people in a
    precinct vote R or D, that’s not a good sign – but too late to do
    much. Suggesting that there be personalized anomaly detectors (e.g.,
    “you usually vote R but it looks like you’re voting D today, are you
    sure?”) would not be well received by most voters!

  2. This is the focus of a lot of work – but it increases the effort for the voter.

  3. Same as #2. But have to be careful that we don’t make it too hard
    for the voter! See for example SpeakUp: Remote Unsupervised Voting as an example of how this might be done.

  4. I don’t see how that would apply to voting, although in places like Estonia where you’re allowed to vote more than once (but only the last vote counts) one could imagine limiting the number of votes that can be cast by one ID. Limiting the number of votes from a single IP address is a natural application – but since many ISPs use the same (or a few) IP addresses for all of their customers thanks to NAT, this would disenfranchise their customers.

  5. “You don’t usually vote in primaries, so we’re not going to let you
    vote in this one either.” Yeah, right!

  6. This is about the only one that could help – and try doing it on
    the budget of an election office!

Unsaid, but of course implied by the financial industry list is that the goal is to reduce fraud to a manageable level. I’ve heard that 1% to 2% of the online banking transactions are fraudulent, and at that level it’s clearly not putting banks out of business (judging by profit numbers). However, whether we can accept as high a level of fraud in voting as in banking is another question.

None of this is to criticize the financial industry’s efforts to improve security! Rather, it’s to point out that try as we might, just because we can bank online doesn’t mean we should vote online.

Supreme Court Takes Important GPS Tracking Case

This morning, the Supreme Court agreed to hear an appeal next term of United States v. Jones (formerly United States v. Maynard), a case in which the D.C. Circuit Court of Appeals suppressed evidence of a criminal defendant’s travels around town, which the police collected using a tracking device they attached to his car. For more background on the case, consult the original opinion and Orin Kerr’s previous discussions about the case.

No matter what the Court says or holds, this case will probably prove to be a landmark. Watch it closely.

(1) Even if the Court says nothing else, it will face the constitutionally of the use by police of tracking beepers to follow criminal suspects. In a pair of cases from the mid-1980’s, the Court held that the police did not need a warrant to use a tracking beeper to follow a car around on public, city streets (Knotts) but did need a warrant to follow a beeper that was moved indoors (Karo) because it “reveal[ed] a critical fact about the interior of the premises.” By direct application of these cases, the warrantless tracking in Jones seems constitutional, because it was restricted to movement on public, city streets.

Not so fast, said the D.C. Circuit. In Jones, the police tracked the vehicle 24 hours a day for four weeks. Citing the “mosaic theory often invoked by the Government in cases involving national security information,” the Court held that the whole can sometimes be more than the parts. Tracking a car continuously for a month is constitutionally different in kind not just degree from tracking a car along a single trip. This is a new approach to the Fourth Amendment, one arguably at odds with opinions from other Courts of Appeal.

(2) This case gives the Court the opportunity to speak generally about the Fourth Amendment and location privacy. Depending on what it says, it may provide hints for lower courts struggling with the government’s use of cell phone location information, for example.

(3) For support of its embrace of the mosaic theory, the D.C. Circuit cited a 1989 Supreme Court case, U.S. Department of Justice v. National Reporters Committee. In this case, which involved the Freedom of Information Act (FOIA) not the Fourth Amendment, the Court allowed the FBI to refuse to release compiled “rap sheets” about organized crime suspects, even though the rap sheets were compiled mostly from “public” information obtainable from courthouse records. In agreeing that the rap sheets nevertheless fell within a “personal privacy” exemption from FOIA, the Court embraced, for the first time, the idea that the whole may be worth more than the parts. The Court noted the difference “between scattered disclosure of the bits of information contained in a rap-sheet and revelation of the rap-sheet as a whole,” and found a “vast difference between the public records that might be found after a diligent search of courthouse files, county archives, and local police stations throughout the country and a computerized summary located in a single clearinghouse of information.” (FtT readers will see the parallels to the debates on this blog about PACER and RECAP.) In summary, it found that “practical obscurity” could amount to privacy.

Practical obscurity is an idea that hasn’t gotten much traction in the Courts since National Reporters Committee. But it is an idea well-loved by many privacy scholars, including myself, for whom it helps explain their concerns about the privacy implications of data aggregation and mining of supposedly “public” data.

The Court, of course, may choose a narrow route for affirming or reversing the D.C. Circuit. But if it instead speaks broadly or categorically about the viability of practical obscurity as a legal theory, this case might set a standard that we will be debating for years to come.

What Gets Redacted in Pacer?

In my research on privacy problems in PACER, I spent a lot of time examining PACER documents. In addition to researching the problem of “bad” redactions, I was also interested in learning about the pattern of redactions generally. To this end, my software looked for two redaction styles. One is the “black rectangle” redaction method I described in my previous post. This method sometimes fails, but most of these redactions were done successfully. The more common method (around two-thirds of all redactions) involves replacing sensitive information with strings of XXs.

Out of the 1.8 million documents it scanned, my software identified around 11,000 documents that appeared to have redactions. Many of them could be classified automatically (for example “123-45-xxxx” is clearly a redacted Social Security number, and “Exxon” is a false positive) but I examined several thousand by hand.

Here is the distribution of the redacted documents I found.

Type of Sensitive Information No. of Documents
Social Security number 4315
Bank or other account number 675
Address 449
Trade secret 419
Date of birth 290
Unique identifier other than SSN 216
Name of person 129
Phone, email, IP address 60
National security related 26
Health information 24
Miscellaneous 68
Total 6208

To reiterate the point I made in my last post, I didn’t have access to a random sample of the PACER corpus, so we should be cautious about drawing any precise conclusions about the distribution of redacted information in the entire PACER corpus.

Still, I think we can draw some interesting conclusions from these statistics. It’s reasonable to assume that the distribution of redacted sensitive information is similar to the distribution of sensitive information in general. That is, assuming that parties who redact documents do a decent job, this list gives us a (very rough) idea of what kinds of sensitive information can be found in PACER documents.

The most obvious lesson from these statistics is that Social Security numbers are by far the most common type of redacted information in PACER. This is good news, since it’s relatively easy to build software to automatically detect and redact Social Security numbers.

Another interesting case is the “address” category. Almost all of the redacted items in this category—393 out of 449—appear in the District of Columbia District. Many of the documents relate to search warrants and police reports, often in connection with drug cases. I don’t know if the high rate of redaction reflects the different mix of cases in the DC District, or an idiosyncratic redaction policy voluntarily pursued by the courts and/or the DC police but not by officials in other districts. It’s worth noting that the redaction of addresses doesn’t appear to be required by the federal redaction rules.

Finally, there’s the category of “trade secrets,” which is a catch-all term I used for documents whose redactions appear to be confidential business information. Private businesses may have a strong interest in keeping this information confidential, but the public interest in such secrecy here is less clear.

To summarize, out of 6208 redacted documents, there are 4315 Social Security that can be redacted automatically by machine, 449 addresses whose redaction doesn’t seem to be required by the rules of procedure, and 419 “trade secrets” whose release will typically only harm the party who fails to redact it.

That leaves around 1000 documents that would expose risky confidential information if not properly redacted, or about 0.05 percent of the 1.8 million documents I started with. A thousand documents is worth taking seriously (especially given that there are likely to be tens of thousands in the full PACER corpus). The courts should take additional steps to monitor compliance with the redaction rules and sanction parties who fail to comply with them, and they should explore techniques to automate the detection of redaction failures in these categories.

But at the same time, a sense of perspective is important. This tiny fraction of PACER documents with confidential information in them is a cause for concern, but it probably isn’t a good reason to limit public access to the roughly 99.9 percent of documents that contain no sensitive information and may be of significant benefit to the public.

Thanks again to Carl Malamud and Public.Resource.Org for their support of my research.

Universities in Brazil are too closed to the world, and that's bad for innovation

When Brazilian president Dilma Roussef visited China in the beginning of May, she came back with some good news (maybe too good to be entirely true). Among them, the announcement that Foxconn, the largest maker of electronic components, will invest US$12 billion to open a large industrial plant in the country. The goal is to produce iPads and other key electronic components locally.

The announcement was praised, and made it quickly to the headlines of all major newspapers. There is certainly reason for excitement. Brazil lost important waves of economic development, including industrialization (which only really happened in the 1940´s), or the semiconductor wave, an industry that has shown but a few signs of development in the country until now. (continue reading)