February 24, 2018

Archives for February 2010

What Third Parties Know About John Doe

As David mentioned in his previous post, plaintiffs’ lawyers in online defamation suits will typically issue a sequence of two “John Doe” subpoenas to try to unmask the identity of anonymous online speakers. The first subpoena goes to the website or content provider where the allegedly defamatory remarks were posted, and the second subpoena is sent to the speaker’s ISP. Both entities—the content provider and the ISP—are natural targets for civil discovery. Their logs together will often contain enough information to trace the remarks back to the speaker’s real identity. But when this isn’t enough to identify the speaker, the discovery process traditionally fails.

Are plaintiffs in these cases out of luck? Not if their lawyers know where else to look.

There are numerous third party web services that may hold just enough clues to reidentify the speaker, even without the help of the content provider or the ISP. The vast majority of websites today depend on third parties to deliver valuable services that would otherwise be too expensive or time-consuming to develop in-house. Services such as online advertising, content distribution and web analytics are almost always handled by specialized servers from third party businesses. As such, a third party can embed its service into a wide variety of sites across the web, allowing it to track users across all the sites where it maintains a presence.

Take for example the popular online blog Boing Boing. Upon loading its main page while recording the HTTP session, I noticed that my browser is automatically redirected to domains owned by no fewer than 17 distinct third party entities: 10 services that engage in advertising or marketing, five that embed media or integrate social networking functionality, and two that provide web analytics. By visiting this single webpage, my digital footprints have been scattered to and collected by at least 17 other online entities that I made no deliberate attempt to contact. And each of these entities will likely have stored a cookie on my web browser, allowing it to identify me uniquely later when I browse to one of its other partner sites. I don’t mean to pick on Boing Boing specifically—taking advantage of third party services is a nearly universal practice on the web today, but it’s exactly this pervasiveness that makes it so likely, if not probable, that all of my digital footprints together could link much of my online activities back to my actual identity.

To make this point concrete, let’s say I post a potentially defamatory remark about someone using a pseudonym in the comments section of a Boing Boing article. It happens that for each article, Boing Boing displays the number of times that the article has been shared on Facebook. In order to fetch the current number, Boing Boing redirects my browser to api.facebook.com to make a real-time query to the Facebook API. Since I happen to be logged in to Facebook at the time of the request, my browser forwards with the query my unique Facebook cookie, which includes information that explicitly identifies me—namely, my e-mail address that doubles as my Facebook username.

In order to integrate a bit of useful social networking functionality, Boing Boing enables Facebook, a third party in this situation, to learn which articles I visit on Boing Boing and the dates and times of my visits. The same is true for Tweetmeme, which can now positively link my Twitter account—which I’m also logged in to—with my Boing Boing visits. Even without an authenticated login, the 15 other third parties present on Boing Boing could track me using any number of different methods, including browser fingerprinting, to build detailed dossiers that slowly begin to piece together who I am.

From the perspective of a plaintiff’s lawyer, even if Boing Boing is unwilling or unable to produce any useful information, these third parties might be able to uniquely identify me as the likely defamer, or at least narrow the list of possible speakers down to a handful of users. But tracing speech is not always this easy. Tomorrow, I’ll discuss more complicated discovery strategies and the extent to which they are technically feasible.

Identifying John Doe: It might be easier than you think

Imagine that you want to sue someone for what they wrote, anonymously, in a web-based online forum. To succeed, you’ll first have to figure out who they really are. How hard is that task? It’s a question that Harlan Yu, Ed Felten, and I have been kicking around for several months. We’ve come to some tentative answers that surprised us, and that may surprise you.

Until recently, I thought the picture was very grim for would-be plaintiffs, writing that it should be simple for “even a non-technical Internet user to engage in effectively untraceable speech online.” I still think it’s feasible for most users, if they make enough effort, to remain anonymous despite any level of scrutiny they are practically likely to face. But in recent months, as Harlan, Ed, and I have discussed this issue, we’ve started to see a flip side to the coin: In many situations, it may be far easier to unmask apparently anonymous online speakers than they, I, or many others in the policy community have appreciated. Today, I’ll tell a story that helps explain what I mean.

Anonymous online speech is a mixed bag: it includes some high value speech such as political dissent in repressive regimes, some dreck we happily tolerate on First Amendment grounds, and some material that violates the laws of many jurisdictions, including child pornography and defamatory speech. For purposes of this discussion, let’s focus on cases like the recent AutoAdmit controversy, in which a plaintiff wishes to bring a defamation suit against an anonymous or pseudonymous poster to a web based discussion forum. I’ll assume, as in the AutoAdmit suit, that the plaintiff has at least a facially plausible legal claim, so that if everyone’s identity were clear, it would also be clear that the plaintiff would have the legal option to bring a defamation suit. In the online context, these are usually what’s called “John Doe” suits, because the plaintiff’s lawyer does not know the name of the defendant in the suit, and must use “John Doe” as a stand in name for the defendant. After filing a John Doe suit, the plaintiff’s lawyer can use subpoenas to force third parties to reveal information that might help identify the John Doe defendant.

In situations like these, if a plaintiff’s lawyer cannot otherwise determine who the poster is, the lawyer will typically subpoena the forum web site, seeking the IP address of the anonymous poster. Many widely used web based discussion systems, including for example the popular WordPress blogging platform, routinely log the IP addresses of commenters. If the web site is able to provide an IP address for the source of the allegedly defamatory comment, the lawyer will do a reverse lookup, a WHOIS search, or both, on that IP address, hoping to discover that the IP address belongs to a residential ISP or another organization that maintains detailed information about its individual users. If the IP address does turn out to correspond to a residential ISP — rather than, say, to an open wifi hub at a coffee shop or library — then the lawyer will issue a second subpoena, asking the ISP to reveal the account details of the user who was using that IP address at the time it was used to transmit the potentially defamatory comment. This is known as a “subpoena chain” because it involves two subpoenas (one to the web site, and a second one, based on the results of the first, to the ISP).

Of course, in many cases, this method won’t work. The forum web site may not have logged the commenter’s IP address. Or, even if an address is available, it might not be readily traceable back to an ISP account: the anonymous commenter may been using an anonymization tool like Tor to hide his address. Or he may have been coming online from a coffee shop or similarly public place (which typically will not have logged information about its transient users). Or, even if he reached the web forum directly from his own ISP, that ISP might be located in a foreign jurisdiction, beyond the reach of an American lawyer’s usual legal tools.

Is this a dead end for the plaintiff’s lawyer, who wants to identify John Doe? Probably not. There are a range of other parties, not yet part of our story, who might have information that could help identify John Doe. When it comes to the AutoAdmit site, one of these parties is StatCounter.com, a web traffic measurement service that AutoAdmit uses to keep track of trends in its traffic over time.

At the moment I am writing this post, anyone can verify that AutoAdmit uses StatCounter by visiting AutoAdmit.com and choosing “View Source” from the web browser menu. The first screenfull of web page code that comes up includes a block of text helpfully labeled “StatCounter Code,” which in turn runs a small piece of javascript that places a personalized StatCounter cookie on the machine of every user who visits AutoAdmit, or else (if one is already present) detects and records exactly which cookie it is. That’s how StatCounter can tell which visitors to AutoAdmit.com are new, which ones are returning, and which pages on the site are of greatest interest to new and returning users. StatCounter is in a position to track not only each user, but also each page, and each visit by a user to a certain page, over time. This includes not only the home page, but also the particular web page for each discussion “thread” on the site. Moreover, each post (even if anonymous) is marked with the time it was posted, down to the minute. So the plaintiff’s lawyer in our story could go to StatCounter, and ask only about visits to the particular thread where the relevant message was posted. If the post went up at 6:03 p.m. on a certain date, the lawyer could ask StatCounter, “What if anything do you know about the person who visited this web page at 6:03 p.m. on this date?” Of course, if John Doe’s browser is configured to refuse cookies, he wouldn’t be trackable. But most web based discussion sites, including AutoAdmit, rely on cookies to let people log in to their pseudonymous accounts in order to post comments in the first place. In any case, the web is much less convenient place without cookies, and as a practical matter most users do allow them.

In fact, the lawyer may be able to do better still: The anonymous commenter will have accessed the page at least twice — once to view the discussion as it stood before he took part, and again after clicking the button to add his own post to the mix. If StatCounter recorded both visits, as it very likely would have, then it becomes even easier to tie the anonymous commenter to his StatCounter cookie (and to whatever browsing history StatCounter has associated with that cookie).

There are a huge number of things to discuss here, and we’ll tackle several in the coming days. What would a web analytics provider like StatCounter know? Likely answers include IP addresses, times, and durations for the anonymous commenter’s previous visits to AutoAdmit. What about other, similar services, used by other sites? What about “beacons” that simply and silently collect data about users, and pay webmasters for the privilege? What about behavioral advertisers, whose business model involves tracking users across multiple sites and developing knowledge of their browsing habits and interests? What about content distribution networks? How would this picture change if John Doe were taking affirmative steps, such as using Tor, to obfuscate his identity?

These are some of the questions that we’ll try to address in future posts.

CITP Seeks Visiting Faculty, Scholars or Policy Experts for 2010-2011

The Center for Information Technology Policy (CITP) at Princeton University seeks candidates for positions as visiting faculty members or researchers, or postdoctoral research associates for the 2010-2011 academic year.

About CITP

Digital technologies and public life are constantly reshaping each other—from net neutrality and broadband adoption, to copyright and file sharing, to electronic voting and beyond.

Realizing digital technology’s promise requires a constant sharing of ideas, competencies and norms among the technical, social, economic and political domains.

The Center for Information Technology Policy is Princeton University’s effort to meet this challenge. Its new home, which opened in September 2008, is a state of the art facility designed from the ground up for openness and collaboration. Located at the intellectual and physical crossroads of Princeton’s engineering and social science communities, the Center’s research, teaching and public programs are building the intellectual and human capital that our technological future demands.

To see what this mission can mean in practice, take a look at our website, at http://citp.princeton.edu.

About the Search

The Center has secured limited resources from a range of sources to support visiting faculty, scholars or policy experts for up to one-year appointments during the 2010-2011 academic year. We are interested in applications from academic faculty and researchers as well as from individuals who have practical experience in the policy arena. The rank and status of the successful applicant(s) will be determined on a case-by-case basis. We are particularly interested in hearing from faculty members at other universities and from individuals who have first-hand experience in public service in the technology policy area.

The successful applicant(s) will conduct research, engage in public programs, and may teach a seminar during their appointment subject to review and approval by the Dean of the Faculty. They’ll play an important role at a pivotal time in the development of this new center. They may be appointed to a visiting faculty or visiting fellow position, a term-limited research position, or a postdoctoral appointment, depending on qualifications.

We are happy to hear from anyone who works at the intersection of digital technology and public life. In addition to our existing strengths in computer science and sociology, we are particularly interested in identifying engineers, economists, lawyers, civil servants and policy analysts whose research interests are complementary to our existing activities.

If you are interested, please submit a CV and cover letter, stating background, intended research, and salary requirements, to https://jobs.princeton.edu.

Princeton University is an equal opportunity employer and complies with applicable EEO and affirmative action regulations. For information about applying to Princeton and voluntarily self-identifying, please see http://www.princeton.edu/dof/about_us/dof_job_openings/

Deadline: March 1, 2010.