January 11, 2025

China's New Mandatory Censorware Creates Big Security Flaws

Today Scott Wolchok, Randy Yao, and Alex Halderman at the University of Michigan released a report analyzing Green Dam, the censorware program that the Chinese government just ordered installed on all new computers in China. The researchers found that Green Dam creates very serious security vulnerabilities on users’ computers.

The report starts with a summary of its findings:

The Chinese government has mandated that all PCs sold in the country must soon include a censorship program called Green Dam. This software monitors web sites visited and other activity on the computer and blocks adult content as well as politically sensitive material. We examined the Green Dam software and found that it contains serious security vulnerabilities due to programming errors. Once Green Dam is installed, any web site the user visits can exploit these problems to take control of the computer. This could allow malicious sites to steal private data, send spam, or enlist the computer in a botnet. In addition, we found vulnerabilities in the way Green Dam processes blacklist updates that could allow the software makers or others to install malicious code during the update process. We found these problems with less than 12 hours of testing, and we believe they may be only the tip of the iceberg. Green Dam makes frequent use of unsafe and outdated programming practices that likely introduce numerous other vulnerabilities. Correcting these problems will require extensive changes to the software and careful retesting. In the meantime, we recommend that users protect themselves by uninstalling Green Dam immediately.

The researchers have released a demonstration attack which will crash the browser of any Green Dam user. Another attack, for which they have not released a demonstration, allows any web page to seize control of any Green Dam user’s computer.

This is a serious blow to the Chinese government’s mandatory censorware plan. Green Dam’s insecurity is a show-stopper — no responsible PC maker will want to preinstall such dangerous software. The software can be fixed, but it will take a while to test the fix, and there is no guarantee that the next version won’t have other flaws, especially in light of the blatant errors in the current version.

On China's new, mandatory censorship software

The New York Times reports that China will start requiring censorship software on PCs. One interesting quote stands out:

Zhang Chenming, general manager of Jinhui Computer System Engineering, a company that helped create Green Dam, said worries that the software could be used to censor a broad range of content or monitor Internet use were overblown. He insisted that the software, which neutralizes programs designed to override China’s so-called Great Firewall, could simply be deleted or temporarily turned off by the user. “A parent can still use this computer to go to porn,” he said.

In this post, I’d like to consider the different capabilities that software like this could give to the Chinese authorities, without getting too much into their motives.

Firstly, and most obviously, this software allows the authorities to do filtering of web sites and network services that originate inside or outside of the Great Firewall. By operating directly on a client machine, this filter can be aware of the operations of Tor, VPNs, and other firewall-evading software, allowing connections to a given target machine to be blocked, regardless of how the client tries to get there. (You can’t accomplish “surgical” Tor and VPN filtering if you’re only operating inside the network. You need to be on the end host to see where the connection is ultimately going.)

Software like this can do far more, since it can presumably be updated remotely to support any feature desired by the government authorities. This could be the ultimate “Big Brother Inside” feature. Not only can the authorities observe behavior or scan files within one given computer, but every computer now because a launching point for investigating other machines reachable over a local area network. If one such machine were connected, for example, to a private home network, behind a security firewall, the government software could still scan every other computer on the same private network, log every packet, and so forth. Would you be willing to give your friends the password to log into your private wireless network, knowing their machine might be running this software?

Perhaps less ominously, software like this could also be used to force users to install security patches, to uninstall zombie/botnet systems, and perform other sorts of remote systems administration. I can’t imagine the difficulty in trying to run the Central Government Bureau of National Systems Administration (would they have a phone number you could call to complain when your computer isn’t working, and could they fix it remotely?), but the technological base is now there.

Of course, anybody who owns their own computer will be able to circumvent this software. If you control your machine, you can control what’s running on it. Maybe you can pretend to be running the software, maybe not. That would turn into a technological arms race which the authorities would ultimately fail to win, though they might succeed in creating enough fear, uncertainty, and doubt to deter would-be circumventors.

This software will also have a notable impact in Internet cafes, schools, and other sorts of “public” computing resources, which are exactly the sorts of places that people might go when they want to hide their identity, and where the authorities could have physical audits to check for compliance.

Big Brother is watching.

NJ Voting-machine trial update

Earlier this month I testified in Gusciora v. Corzine, the trial in which the plaintiffs argue that New Jersey’s voting machines (Sequoia AVC Advantage) can’t be trusted to count the votes, because they’re so easily hacked to make them cheat.

I’ve previously written about the conclusions of my expert report: in 7 minutes you can replace the ROM and make the machine cheat in every future election, and there’s no practical way for the State to detect cheating machines (in part because there’s no voter-verified paper ballot).

The trial started on January 27, 2009 and I testified for four and a half days. I testified that the AVC Advantage can be hacked by replacing its ROM, or by replacing its Z80 processor chip, so that it steals votes undetectably. I testified that fraudulent firmware can also be installed into the audio-voting daughterboard by a virus carried through audio-ballot cartridges. I testified about many other things as well.

Finally, I testified about the accuracy of the Sequoia AVC Advantage. I believe that the most significant source of inaccuracy is its vulnerability to hacking. There’s no practical means of testing whether the machine has been hacked, and certainly the State of New Jersey does not even attempt to test. If we could somehow know that the machine has not been hacked, then (as I testified) I believe the most significant _other_ inaccuracy of the AVC Advantage is that it does not give adequate feedback to voters and pollworkers about whether a vote has been recorded. This can lead to a voter’s ballot not being counted at all; or a voter’s ballot counting two or three times (without fraudulent intent). I believe that this error may be on the order of 1% or more, but I was not able to measure it in my study because it involves user-interface interaction with real people.

In the hypothetical case that the AVC Advantage has not been hacked, I believe this user-interface source of perhaps 1% inaccuracy would be very troubling, but (in my opinion) is not the main reason to disqualify it from use in elections. The AVC Advantage should be disqualified for the simple reason that it can be easily hacked to cheat, and there’s no practical method that will be sure of catching this hack.

Security seals. When I examined the State’s Sequoia AVC Advantage voting machines in July 2008, they had no security seals preventing ROM replacement. I demonstrated on video (which we played in Court in Jan/Feb 2009) that in 7 minutes I could pick the lock, unscrew some screws, replace the ROM with one that cheats, replace the screws, and lock the door.

In September 2008, after the State read my expert report, they installed four kinds of physical security seals on the AVC Advantage. These seals were present during the November 2008 election. On December 1, I sent to the Court (and to the State) a supplemental expert report (with video) showing how I could defeat all of these seals.

In November/December the State informed the Court that they were changing to four new seals. On December 30, 2008 the State Director of Elections, Mr. Robert Giles, demonstrated to me the installation of these seals onto the AVC Advantage voting machine and gave me samples. He installed quite a few seals (of these four different kinds, but some of them in multiple places) on the machine.

On January 27, 2009 I sent to the Court (and to the State) a supplemental expert report showing how I could defeat all those new seals. On February 5th, as part of my trial testimony I demonstrated for the Court the principles and methods by which each of those seals could be defeated.

On cross-examination, the State defendants invited me to demonstrate, on an actual Sequoia AVC Advantage voting machine in the courtroom, the removal of all the seals, replacement of the ROM, and replacement of all the seals leaving no evidence of tampering. I then did so, carefully and slowly; it took 47 minutes. As I testified, someone with more practice (and without a judge and 7 lawyers watching) would do it much faster.

More Privacy, Bit by Bit

Before the Holidays, Yahoo got a flurry of good press for the announcement that it would (as the LA Times puts it) “purge user data after 90 days.” My eagle-eyed friend Julian Sanchez noticed that the “purge” was less complete than privacy advocates might have hoped. It turns out that Yahoo won’t be deleting the contents of its search logs. Rather, it will merely be zeroing out the last 8 bits of users’ IP addresses. Julian is not impressed:

dropping the last byte of an IP address just means you’ve narrowed your search space down to (at most) 256 possibilities rather than a unique machine. By that standard, this post is anonymous, because I guarantee there are more than 255 other guys out there with the name “Julian Sanchez.”

The first three bytes, in the majority of cases, are still going to be enough to give you a service provider and a rough location. Assuming every address in the range is in use, dropping the least-significant byte just obscures which of the 256 users at that particular provider is behind each query. In practice, though, the search space is going to be smaller than that, because people are creatures of habit: You’re really working with the pool of users in that range who perform searches on Yahoo. If your not-yet-anonymized logs show, say, 45 IP addreses that match those first three bytes making routine searches on Yahoo (17.6% of the search market x 256 = 45) you can probably safely assume that an “anonymized” IP with the same three leading bytes is one of those 45. If different users tend to exhibit different usage patterns in search time, clustering of queries, expertise with Boolean operators, or preferred natural language, you can narrow it down further.

I think this isn’t quite fair to Yahoo. Dropping the last eight bits of the IP address certainly doesn’t protect privacy as much as deleting log entries entirely, but it’s far from useless. To start with, there’s often not a one-to-one correspondence between IP addresses and Internet users. Often a single user has multiple IPs. For example, when I connect to the Princeton wireless network, I’m dynamically assigned an IP address that may not be the same as the IP address I used the last time I logged on. I also access the web from my iPhone and from hotels and coffee shops when I travel. Conversely, several users on a given network may be sharing a single IP address using a technology called network address translation. So even if you know the IP address of the user who performed a particular search, that may simply tell you that the user works for a particular company or connected from a particular coffee shop. Hence, tracking a particular user’s online activities is already something of a challenge, and it becomes that much harder if several dozen users’ online activities are scrambled together in Yahoo!’s logs.

Now, whether this is “enough” privacy depends a lot on what kind of privacy problem you’re worried about. It seems to me that there are three broad categories of privacy concerns:

  • Privacy violations by Yahoo or its partners: Some people are worried that Yahoo itself is tracking their online activities, building an online profile about them, and selling this information to third parties. Obviously, Yahoo’s new policy will do little to allay such concerns. Indeed, as David Kravets points out, Yahoo will have already squeezed all the personal information it can out of those logs before it scours them. If you don’t trust Yahoo or its business partners, this move isn’t going to make you feel very much safer.
  • Data breaches: A second concern involves cases where customer data falls into the wrong hands due to a security breach. In this case, it’s not clear that search engine logs are especially useful to data thieves in the first place. Data thieves are typically looking for information such as credit card and Social Security numbers that can make them a quick buck. People rarely type such information into search boxes. Some searches may be embarrassing to users, but they probably won’t be so embarrassing as to enable blackmail or extortion. So search logs are not likely to be that useful to criminals, whether or not they are “anonymized.”
  • Court-ordered information release: This is the case where the new policy could have the biggest effect. Consider, for example, a case where the police seek a suspect’s search results. The new policy will help protect privacy in three ways: first, if Yahoo! can’t cleanly filter search logs by IP address, judges may be more reluctant to order the disclosure of several dozen users’ search results just to give police information from a single suspect. Second, scrubbing the last byte of the IP address will make searching through the data much more difficult. Finally, the resulting data will be less useful in the court of law, because prosecutors will need to convince a jury that a given search was performed by the defendant rather than another user who happened to have a similar IP address. At the margin, then, Yahoo’s new policy seems likely to significantly enhance user privacy against government information requests. The same principle applies in the case of civil suits: the recording and movie industries, for example, will have a harder time using Yahoo!’s search logs as evidence that a user was engaged in illegal file-sharing.

So based on the small amount of information Yahoo has made available, it seems that the new policy is a real, if small, improvement in users’ privacy. However, it’s hard to draw any definite conclusions without more specific information about what information Yahoo! is saving. Because anonymizing data is a lot harder than people think. AOL learned this the hard way in 2006 when “anonymized” search results were released to researchers. People quickly noticed that you could figure out who various users were by looking at the contents of their searches. The data wasn’t so anonymous after all.

One reason AOL’s data wasn’t so anonymous is that AOL had “anonymized” the data set by assigning each user a unique ID. That meant people could look at all searches made by a single user and find searches that gave clues to the user’s identity. Had AOL instead stripped off the user information without replacing it, it would have been much harder to de-anonymize the data because there would be no way to match up different searches by the same user. If Yahoo’s logs include information linking each user’s various searches together, then even deleting the IP address entirely probably won’t be enough to safeguard user privacy. On the other hand, if the only user-identifying information is the IP address, then stripping off the low byte of the IP address is a real, if modest, privacy enhancement.

Researchers Show How to Forge Site Certificates

Today at the Chaos Computing Congress, a group of researchers (Alex Sotirov, Marc Stevens, Jake Appelbaum, Arjen Lenstra, Benne de Weger, and David Molnar) announced that they have found a way to forge website certificates that will be accepted as valid by most browsers. This means that they can successfully impersonate any website, even for secure connections.

Let me unpack that for non-experts.

One of the cornerstones of web security is the use of secure connections. When your browser makes a secure connection to (say) Amazon and gets a page to display, the browser displays in its address bar a URL like “https://www.amazon.com”. The “https” indicates that the secure (https) protocol was used, and the browser also displays a happy blue lock or key icon to tell you the connection was secured.

The browser cooperates with Amazon’s web server to secure the connection via a two-step process. First, the two computers negotiate a shared secret key that they can use to communicate privately, using crypto tricks that I won’t describe here. Second, your browser authenticates Amazon’s web server, that is, it assures itself that the party on the other end of the connection is the genuine Amazon.com server.

Amazon has a digital certificate that it sends to your browser, as part of proving its identity. The certificate is issued by a party called a certification authority or CA. Your browser comes pre-programmed with a list of CAs its trusts; you can change the list but hardly anyone does. If your browser makes an encrypted connection to “amazon.com”, and the party on the other end of the connection owns a certificate for the name “amazon.com”, and that certificate was issued by a CA that your browser trusts, then your browser will conclude that it has a secure connection to amazon.com.

Now we can understand what the researchers accomplished: they showed how to forge a certificate corresponding to any address on the Web. For example, they can forge a certificate that allows themselves, or you, or me, or anybody, to impersonate amazon.com, or freedom-to-tinker.com, or maybe even fbi.gov. That is supposed to be impossible, for obvious reasons.

The forged certificates will say they were issued by a CA called “Equifax Secure Global eBusiness”, which is trusted by the major browsers. The forged certificates will be perfectly valid; but they will have been made by forgers, not by the Equifax CA.

To do this, the researchers exploited a cryptographic weakness in one of the digital signature methods, “MD5 with RSA”, supported by the Equifax CA. The first step in this digital signature method is to compute the hash (strictly speaking, the cryptographic hash) of the certificate contents.

The hash is a short (128-bit) code that is supposed to be a kind of unique digest of the certificate contents. To be secure, the hash method has to have several properties, one of which is that it should be infeasible to find a collision, that is, to find two values A and B which have the same hash.

It was already known how to find collisions in MD5, but the researchers improved the existing collision-finding methods, so that they can now find two values R and F that have the same hash, where R is a “real” certificate that the CA will be willing to sign, and F is a forged certificate. This is deadly, because it means that a digital signature on R will also be a valid signature on F — so the attacker can ask the CA to sign the real certificate R, then copy the resulting signature onto F — putting a valid CA signature onto a certificate that the CA would never voluntarily sign.

To demonstrate this, the researchers created a forged certificate signed by the Equifax CA. For safety, they made the forged certificate expire in the past and point to a harmless site. But it’s clear from their description that they can forge a certificate for any site they want.

Whose fault is this? Partly it’s a consequence of problems with the MD5 hash method. It’s been known for a few years that MD5 is in the process of melting down, so prudent designers have been moving away from MD5, replacing it with newer, better hash methods. Similarly, prudent CAs should not be signing certificates that use MD5-based signature methods; instead they should insist on signature methods involving stronger hashes. The Equifax CA did not follow this precaution.

The problem can be fixed, for now, by having CAs refuse to create new MD5-based signatures. But this is a sobering reminder that the certification process that underlies web site authentication — a mechanism we all rely upon daily — is far from bulletproof.