November 21, 2024

Archives for July 2009

A Freedom-of-Speech Approach To Limiting Filesharing – Part I: Filesharing and Spam

[Today we kick off a series of three guest posts by Mitch Golden. Mitch was a professor of physics when, in 1995, he was bitten by the Internet bug and came to New York to become an entrepreneur and consultant. He has worked on a variety of Internet enterprises, including one in the filesharing space. As usual, the opinions expressed in these posts are Mitch’s alone. — Ed]

The battle between the record labels and filesharers has been somewhat out of the news a bit of late, but it rages on still. There is an ongoing court case Arista Records v LimeWire, in which a group of record labels are suing to have LimeWire held accountable for the copyright infringing done by its users. Though this case has attracted less attention than similar cases before it, it may raise interesting issues not addressed in previous cases. Though I am a technologist, not a lawyer, this series of posts will advocate a way of looking at the issues, including legal, using a freedom-of-speech based approach, which leads to some unusual conclusions.

Let’s start by reviewing some salient features of filesharing.

Filesharing is a way for a group of people – who generally do not know one another – to allow one another to see what files they collectively have on their machines, and to exchange desired files with each other. There are at least two components to a filesharing system: one allows a user who is looking for a particular file to see if someone has it, and another that allows the file to be transferred from one machine to the other.

One of the most popular filesharing programs in current use is LimeWire, which uses a protocol called gnutella. Gnutella is decentralized, in the sense that neither the search nor the exchange of files requires any central server. It is possible, therefore, for people to exchange copyrighted files – in violation of the law – without creating any log of the search or exchange in a central repository.

The gnutella protocol was originally created by developers from Nullsoft, the company that had developed the popular music player WinAmp, shortly after it was acquired by AOL. AOL was at that time merging with Time Warner, a huge media company, and so the idea that they would be distributing a filesharing client was quite unamusing to management. Work was immediately discontinued; however, the source for the client and the implementation of the protocol had already been released under the GPL, and so development continued elsewhere. LimeWire made improvements both to the protocol and the interface, and their client became quite popular.

The decentralized structure of filesharing does not serve a technical purpose. In general, centralized searching is simpler, quicker and more efficient, and so, for example, to search the web we use Google or Yahoo, which are gigantic repositories. In filesharing, the decentralized search structure instead serves a legal purpose: to diffuse the responsibility so no particular individual or organization can be held accountable for promoting the illegal copying of copyright materials. At the time the original development was going on, the Napster case was in the news, in which the first successful filesharing service was being sued by the record labels. The outcome of that case a few months later resulted in Napster being shut down, as the US courts held it (which was a centralized search repository) responsible for the copyright infringing file sharing its users were doing.

Whatever their legal or technical advantages, decentralized networks, by virtue of their openness, are vulnerable to a common problem: spam. For example, because anyone may send anyone else an e-mail, we are all subject to a deluge of messages trying to sell us penny stocks and weight loss remedies. Filesharing too is subject this sort of cheating. If someone is looking for, say, Rihanna’s recording Disturbia, and downloads an mp3 file that purports to be such, what’s to stop a spammer from instead serving a file with an audio ad for a Canadian pharmacy?

Spammers on the filesharing networks, however, have more than just the usual commercial motivations in mind. In general, there are four categories of fake files that find their way onto the network.

  • Commercial spam
  • Pornography and Ads for Pornography
  • Viruses and trojans
  • Spoof files

The last of these has no real analogue to anything people receive in e-mail It works as follows: if, for example, Rihanna’s record label wants to prevent you from downloading Disturbia, they might hire a company called MediaDefender. MediaDefender’s business is to put as many spoof files as possible on gnutella that purport to be Disturbia, but instead contain useless noise. If MediaDefender can succeed in flooding the network so that the real Disturbia is needle in a haystack, then the record label has thwarted gnutella’s users from violating their copyright.

Since people are still using filesharing, clearly a workable solution has been found to the problem of spoof files. In tomorrow’s post, I discuss this solution, and in the following post, I suggest its legal ramifications.

If You're Going to Track Me, Please Use Cookies

Web cookies have a bad name. People often complain — with good reason — about sites using cookies to track them. Today I want to say a few words in favor of tracking cookies.

[Technical background: An HTTP “cookie” is a small string of text. When your web browser gets a file from a site, the site can send along a cookie. Your browser stores the cookie. Later, if the browser gets another file from the same site, the browser will send along the cookie.]

What’s important about cookies, for our purposes, is that they allow a site to tell when it’s seeing the same browser (and therefore, probably, the same user) that it saw before. This has benign uses — it’s needed to implement the shopping cart feature of e-commerce sites (so the site knows which cart is yours) and to remember that you have logged in to a site so you don’t have to log in over and over.

The dark side of cookies involves “hidden” sites that track your activities across the web. Suppose you go to A.com, and A.com’s site includes a banner ad that is provided by the advertising service AdService.com. Later, you go to B.com, and B.com also includes a banner ad provided by AdService.com. When you’re reading A.com and your browser goes to AdService.com to get an ad, AdService.com gives you a cookie. Later, when you’re reading B.com and your browser goes back to AdService.com to get an ad, AdService.com will see the cookie it gave you earlier. This will allow AdService.com to link together your visits to A.com and B.com. Ad services that place ads on lots of sites can link together your activities across all of those sites, by using a “tracking cookie” in this way.

The obvious response is to limit or regulate the use of tracking cookies — the government could limit them, industry could self-regulate, or users could shun sites that associate themselves with tracking cookies.

But this approach could easily backfire. It turns out that there are lots of ways for a site to track users, by recognizing something distinctive about the user’s computer or by placing a unique marker on the computer and recognizing it later. These other tracking mechanisms are hard to detect — new tracking methods are discovered regularly — and unlike cookies they can be hard for users to manage. The tools for viewing, blocking, and removing cookies are far from perfect, but at least they exist. Other tracking measures leave users nearly defenseless.

My attitude, as a user, is that if a site is going to track me, I want them to do it openly, using cookies. Cookies offer me less transparency and control that I would like, but the alternatives are worse.

If I were writing a self-regulation code for the industry, I would have the code require that cookies be the only means used to track users across sites.

Thoughtcrime Experiments

Cosmic rays can flip bits in memory cells or processor datapaths. Once upon a time, Sudhakar and I asked the question, “can an attacker exploit rare and random bit-flips to bypass a programming-language’s type protections and thereby break out of the Java sandbox?

Thoughtcrime Experiments

A recently published science-fiction anthology Thoughtcrime Experiments contains a story, “Single-Bit Error” inspired by our research paper. What if you could use cosmic-ray bit flips in neurons to bypass the “type protections” of human rationality?

In addition to 9 stories and 6 original illustrations, the anthology is interesting for another reason. It’s an experiment in do-it-yourself paying-the-artists high-editorial-standards open-source Creative-Commons print-on-demand publishing. Theorists like Yochai Benkler and others have explained that production costs attributable to communications and coordination have been reduced down into the noise by the Internet, and that this enables “peer production” that was not possible back in the 19th and 20th centuries. Now the Appendix to Thoughtcrime Experiments explains how to edit and produce your own anthology, complete with a sample publication contract.

It’s not all honey and roses, of course. The authors got paid, but the editors didn’t! The Appendix presents data on how many hours they spent “for free”. In addition, if you look closely, you’ll see that the way the authors got paid is that the editors spent their own money.

Still, part of the new theory of open-source peer-production asks questions like, “What motivates people to produce technical or artistic works? What mechanisms do they use to organize this work? What is the quality of the work produced, and how does it contribute to society? What are the legal frameworks that will encourage such work?” This anthology and its appendix provide an interesting datapoint for the theorists.

Assorted targeted spam

You can run, but you can’t hide. Here are a few of the latest things I’ve seen, in no particular order.

  • On a PHPBB-style chat board which I sometimes frequent, there was a thread about do-it-yourself television repair, dormant for over a year. Recently, there was a seemingly robotic post, from a brand new user, that was still on-topic, giving general diagnosis advice and offering to sell parts for TV repair. The spam was actually somewhat germane to the main thread of the discussion. Is it still spam?
  • In my email, I recently got a press release for a local fried chicken franchise celebrating their 40th anniversary. My blogging output generally doesn’t extend to writing restaurant reviews (tempting as that might be), although I do sometimes link to foodie things from Google Reader which will also show up in my public FriendFeed. Spam or not spam?