January 16, 2025

If Wikileaks Scraped P2P Networks for "Leaks," Did it Break Federal Criminal Law?

On Bloomberg.com today, Michael Riley reports that some of the documents hosted at Wikileaks may not be “leaks” at all, at least not in the traditional sense of the word. Instead, according to a computer security firm called Tiversa, “computers in Sweden” have been searching the files shared on p2p networks like Limewire for sensitive and confidential information, and the firm supposedly has proof that some of the documents found in this way have ended up on the Wikileaks site. These charges are denied as “completely false in every regard” by Wikileaks lawyer Mark Stephens.

I have no idea whether these accusations are true, but I am interested to learn from the story that if they are true they might provide “an alternate path for prosecuting WikiLeaks,” most importantly because the reporter attributes this claim to me. Although I wasn’t misquoted in the article, I think what I said to the reporter is a few shades away from what he reported, so I wanted to clarify what I think about this.

In the interview and in the article, I focus only on the Computer Fraud and Abuse Act (“CFAA”), the primary federal law prohibiting computer hacking. The CFAA defines a number of federal crimes, most of which turn on whether an action on a computer or network was done “without authorization” or in a way that “exceeds authorized access.”

The question presented by the reporter to me (though not in these words) was: is it a violation of the CFAA to systematically crawl a p2p network like Limewire searching for and downloading files that might be mistakenly shared, like spreadsheets or word processing documents full of secrets?

I don’t think so. With everything I know about the text of this statute, the legislative history surrounding its enactment, and the cases that have interpreted it, this kind of searching and downloading won’t “exceed the authorized access” of the p2p network. This simply isn’t a crime under the CFAA.

But although I don’t think this is a viable theory, I can’t unequivocally dismiss it for a few reasons, all of which I tried to convey in the interview. First, some courts have interpreted “exceeds authorized access” broadly, especially in civil lawsuits arising under the CFAA. For example, back in 2001, one court declared it a CFAA violation to utilize a spider capable of collecting prices from a travel website by a competitor, if the defendant built the spider by taking advantage of “proprietary information” from a former employee of the plaintiff. (For much more on this, see this article by Orin Kerr.)

Second, it seems self-evident that these confidential files are being shared on accident. The users “leaking” these files are either misunderstanding or misconfiguring their p2p clients in ways that would horrify them, if only they knew the truth. While this doesn’t translate directly into “exceeds authorized access,” it might weigh heavily in court, especially if the government can show that a reasonable searcher/downloader would immediately and unambiguously understand that the files were shared on accident.

Third, let’s be realistic: there may be judges who are so troubled by what they see as the harm caused by Wikileaks that they might be willing to read the open-textured and mostly undefined terms of the CFAA broadly if it might help throw a hurdle in Wikileaks’ way. I’m not saying that judges will bend the law to the facts, but I think that with a law as vague as the CFAA, multiple interpretations are defensible.

But I restate my conclusion: I think a prosecution under the CFAA against someone for searching a p2p network should fail. The text and caselaw of the CFAA don’t support such a prosecution. Maybe it’s “not a slam dunk either way,” as I am quoted saying in the story, but for the lawyers defending against such a theory, it’s at worst an easy layup.

Some Technical Clarifications About Do Not Track

When I last wrote here about Do Not Track in August, there were just a few rumblings about the possibility of a Do Not Track mechanism for online privacy. Fast forward four months, and Do Not Track has shot to the top of the privacy agenda among regulators in Washington. The FTC staff privacy report released in December endorsed the idea, and Congress was quick to hold a hearing on the issue earlier this month. Now, odds are quite good that some kind of Do Not Track legislation will be introduced early in this new congressional session.

While there isn’t yet a concrete proposal for Do Not Track on the table, much has already been written both in support of and against the idea in general, and it’s terrific to see the issue debated so widely. As I’ve been following along, I’ve noticed some technical confusion on a few points related to Do Not Track, and I’d like to address three of them here.

1. Do Not Track will most likely be based on an HTTP header.

I’ve read some people still suggesting that Do Not Track will be some form of a government-operated list or registry—perhaps of consumer names, device identifiers, tracking domains, or something else. This type of solution has been suggested before in an earlier conception of Do Not Track, and given its rhetorical likeness to the Do Not Call Registry, it’s a natural connection to make. But as I discussed in my earlier post—the details of which I won’t rehash here—a list mechanism is a relatively clumsy solution to this problem for a number of reasons.

A more elegant solution—and the one that many technologists seem to have coalesced around—is the use of a special HTTP header that simply tells the server whether the user is opting out of tracking for that Web request, i.e. the header can be set to either “on” or “off” for each request. If the header is “on,” the server would be responsible for honoring the user’s choice to not be tracked. Users would be able to control this choice through the preferences panel of the browser or the mobile platform.

2. Do Not Track won’t require us to “re-engineer the Internet.”

It’s also been suggested that implementing Do Not Track in this way will require a substantial amount of additional work, possibly even rising to the level of “re-engineering the Internet.” This is decidedly false. The HTTP standard is an extensible one, and it “allows an open-ended set of… headers” to be defined for it. Indeed, custom HTTP headers are used in many Web applications today.

How much work will it take to implement Do Not Track using the header? Generally speaking, not too much. On the client-side, adding the ability to send the Do Not Track header is a relatively simple undertaking. For instance, it only took about 30 minutes of programming to add this functionality to a popular extension for the Firefox Web browser. Other plug-ins already exist. Implementing this functionality directly into the browser might take a little bit longer, but much of the work will be in designing a clear and easily understandable user interface for the option.

On the server-side, adding code to detect the header is also a reasonably easy task—it takes just a few extra lines of code in most popular Web frameworks. It could take more substantial work to program how the server behaves when the header is “on,” but this work is often already necessary even in the absence of Do Not Track. With industry self-regulation, compliant ad servers supposedly already handle the case where a user opts out of their behavioral advertising programs, the difference now being that the opt-out signal comes from a header rather than a cookie. (Of course, the FTC could require stricter standards for what opting-out means.)

Note also that contrary to some suggestions, the header mechanism doesn’t require consumers to identify who they are or otherwise authenticate to servers in order to gain tracking protection. Since the header is a simple on/off flag sent with every Web request, the server doesn’t need to maintain any persistent state about users or their devices’ opt-out preferences.

3. Microsoft’s new Tracking Protection feature isn’t the same as Do Not Track.

Last month, Microsoft announced that its next release of Internet Explorer will include a privacy feature called Tracking Protection. Mozilla is also reportedly considering a similar browser-based solution (although a later report makes it unclear whether they actually will). Browser vendors should be given credit for doing what they can from within their products to protect user privacy, but their efforts are distinct from the Do Not Track header proposal. Let me explain the major difference.

Browser-based features like Tracking Protection basically amount to blocking Web connections from known tracking domains that are compiled on a list. They don’t protect users from tracking by new domains (at least until they’re noticed and added to the tracking list) nor from “allowed” domains that are tracking users surreptitiously.

In contrast, the Do Not Track header compels servers to cooperate, to proactively refrain from any attempts to track the user. The header could be sent to all third-party domains, regardless of whether the domain is already known or whether it actually engages in tracking. With the header, users wouldn’t need to guess whether a domain should be blocked or not, and they wouldn’t have to risk either allowing tracking accidentally or blocking a useful feature.

Tracking Protection and other similar browser-based defenses like Adblock Plus and NoScript are reasonable, but incomplete, interim solutions. They should be viewed as complementary with Do Not Track. For entities under FTC jurisdiction, Do Not Track could put an effective end to the tracking arms race between those entities and browser-based defenses—a race that browsers (and thus consumers) are losing now and will be losing in the foreseeable future. For those entities outside FTC jurisdiction, blocking unwanted third parties is still a useful though leaky defense that maintains the status quo.

Information security experts like to preach “defense in depth” and it’s certainly vital in this case. Neither solution fully protects the user, so users really need both solutions to be available in order to gain more comprehensive protection. As such, the upcoming features in IE and Firefox should not be seen as a technical substitute for Do Not Track.

——

To reiterate: if the technology that implements Do Not Track ends up being an HTTP header, which I think it should be, it would be both technically feasible and relatively simple. It’s also distinct from recent browser announcements about privacy in that Do Not Track forces server cooperation, while browser-based defenses work alone to fend off tracking.

What other technical issues related to Do Not Track remain murky to readers? Feel free to leave comments here, or if you prefer on Twitter using the #dntrack tag and @harlanyu.

Monitoring all the electrical and hydraulic appliances in your house

Dan Wallach recently wrote about his smart electric meter, which keeps track of the second-by-second current draw of his whole house. But what he might like to know is, exactly what appliance is on at what time? How could you measure that?

You might think that one would have to instrument each different circuit at the breaker box, or every individual electric plug at the outlet. This would be expensive, not particularly for all the little sensors but for the labor of an electrician to install everything.

Recent “gee whiz” research by Professor Shwetak Patel‘s group at the University of Washington provides a really elegant solution. Every appliance you own–your refrigerator, your flat-screen TV, your toaster–has a different “electrical noise signature” that it draws from the wires in your house. When you turn it on, this signal is (inadvertently) sent through the electric wires to the circuit-breaker box. It’s not necessary (as one commenter suggested) to buy “smart appliances” that send purpose-designed on-off signals; your “dumb” appliances already send their own noise signatures.

Patel’s group built a device that you plug in to an electrical outlet, which figures out when your appliances are turning on and off. The device is equipped with a database of common signatures (it can tell one brand of TV from another!) and with machine-learning algorithms that figure out the unique characteristics of your particular devices (if you have two “identical” Toshiba TVs, it can tell them apart!). Patel’s device could be an extremely useful “green technology” to help consumers painlessly reduce their electricity consumption.

Patel can do the same trick on your water pipes. Each toilet flush or shower faucet naturally sends a different acoustic pressure signal, and a single sensor can monitor all your devices.

Of course, in addition to the “green” advantages of this technology, there are privacy implications. Even without your consent, the electric company and the water company are permitted to continuously measure your use of electricity and water; taken to the extreme, this monitoring alone could tell them exactly when you use each and every device in your house.

Court Rules Email Protected by Fourth Amendment

Today, the United States Court of Appeals for the Sixth Circuit ruled that the contents of the messages in an email inbox hosted on a provider’s servers are protected by the Fourth Amendment, even though the messages are accessible to an email provider. As the court puts it, “[t]he government may not compel a commercial ISP to turn over the contents of a subscriber’s emails without first obtaining a warrant based on probable cause.”

This is a very big deal; it marks the first time a federal court of appeals has extended the Fourth Amendment to email with such care and detail. Orin Kerr calls the opinion, at least on his initial read, “quite persuasive” and “likely . . . influential,” and I agree, but I’d go further: this is the opinion privacy activists and many legal scholars, myself included, have been waiting and calling for, for more than a decade. It may someday be seen as a watershed moment in the extension of our Constitutional rights to the Internet.

And it may have a more immediate impact on Capitol Hill, because in its ruling the Sixth Circuit also declares part of the Stored Communications Act (SCA) of the Electronic Communications Privacy Act unconstitutional. 18 U.S.C. 2703(b) allows the government to obtain email messages with less than a search warrant. This section has been targeted for amendment by the Digital Due Process coalition of companies, privacy groups, and academics (I have signed on) for precisely the reason now attacked by this opinion, because it allows warrantless government access to communications stored online. I am sure some congressional staffers are paying close attention to this opinion, and I hope it helps clear the way for an amendment to the SCA, to fix a now-declared unconstitutional law, if not during the lame duck session, then early in the next Congressional term.

Update: Other reactions from Dissent and the EFF.

On Facebook Apps Leaking User Identities

The Wall Street Journal today reports that many Facebook applications are handing over user information—specifically, Facebook IDs—to online advertisers. Since a Facebook ID can easily be linked to a user’s real name, third party advertisers and their downstream partners can learn the names of people who load their advertisement from those leaky apps. This reportedly happens on all ten of Facebook’s most popular apps and many others.

The Journal article provides few technical details behind what they found, so here’s a bit more about what I think they’re reporting.

The content of a Facebook application, for example FarmVille, is loaded within an iframe on the Facebook page. An iframe essentially embeds one webpage (FarmVille) inside another (Facebook). This means that as you play FarmVille, your browser location bar will show http://apps.facebook.com/onthefarm, but the iframe content is actually controlled by the application developer, in this case by farmville.com.

The content loaded by farmville.com in the iframe contains the game alongside third party advertisements. When your browser goes to fetch the advertisement, it automatically forwards to the third party advertiser “referer” information—that is, the URL of the current page that’s loading the ad. For FarmVille, the URL referer that’s sent will look something like:

http://fb-tc-2.farmville.com/flash.php?…fb_sig_user=[User’s Facebook ID]…

And there’s the issue. Because of the way Zynga (the makers of FarmVille) crafts some of its URLs to include the user’s Facebook ID, the browser will forward this identifying information on to third parties. I confirmed yesterday evening that using FarmVille does indeed transmit my Facebook ID to a few third parties, including Doubleclick, Interclick and socialvi.be.

Facebook policy prohibits application developers from passing this information to advertising networks and other third parties. In addition, Zynga’s privacy policy promises that “Zynga does not provide any Personally Identifiable Information to third-party advertising companies.”

But evidence clearly indicates otherwise.

What can be done about this? First, application developers like Zynga can simply stop including the user’s Facebook ID in the HTTP GET arguments, or they can place a “#” mark before the sensitive information in the URL so browsers don’t transmit this information automatically to third parties.

Second, Facebook can implement a proxy scheme, as proposed by Adrienne Felt more than two years ago, where applications would not receive real Facebook IDs but rather random placeholder IDs that are unique for each application. Then, application developers can be free do whatever they want with the placeholder IDs, since they can no longer be linked back to real user names.

Third, browser vendors can give users easier and better control over when HTTP referer information is sent. As Chris Soghoian recently pointed out, browser vendors currently don’t make these controls very accessible to users, if at all. This isn’t a direct solution to the problem but it could help. You could imagine a privacy-enhancing opt-in browser feature that turns off the referer header in all cross-domain situations.

Some may argue that this leak, whether inadvertent or not, is relatively innocuous. But allowing advertisers and other third parties to easily and definitively correlate a real name with an otherwise “anonymous” IP address, cookie, or profile is a dangerous path forward for privacy. At the very least, Facebook and app developers need to be clear with users about their privacy rights and comply with their own stated policies.