April 17, 2014

avatar

Identifying John Doe: It might be easier than you think

Imagine that you want to sue someone for what they wrote, anonymously, in a web-based online forum. To succeed, you’ll first have to figure out who they really are. How hard is that task? It’s a question that Harlan Yu, Ed Felten, and I have been kicking around for several months. We’ve come to some tentative answers that surprised us, and that may surprise you.

Until recently, I thought the picture was very grim for would-be plaintiffs, writing that it should be simple for “even a non-technical Internet user to engage in effectively untraceable speech online.” I still think it’s feasible for most users, if they make enough effort, to remain anonymous despite any level of scrutiny they are practically likely to face. But in recent months, as Harlan, Ed, and I have discussed this issue, we’ve started to see a flip side to the coin: In many situations, it may be far easier to unmask apparently anonymous online speakers than they, I, or many others in the policy community have appreciated. Today, I’ll tell a story that helps explain what I mean.

Anonymous online speech is a mixed bag: it includes some high value speech such as political dissent in repressive regimes, some dreck we happily tolerate on First Amendment grounds, and some material that violates the laws of many jurisdictions, including child pornography and defamatory speech. For purposes of this discussion, let’s focus on cases like the recent AutoAdmit controversy, in which a plaintiff wishes to bring a defamation suit against an anonymous or pseudonymous poster to a web based discussion forum. I’ll assume, as in the AutoAdmit suit, that the plaintiff has at least a facially plausible legal claim, so that if everyone’s identity were clear, it would also be clear that the plaintiff would have the legal option to bring a defamation suit. In the online context, these are usually what’s called “John Doe” suits, because the plaintiff’s lawyer does not know the name of the defendant in the suit, and must use “John Doe” as a stand in name for the defendant. After filing a John Doe suit, the plaintiff’s lawyer can use subpoenas to force third parties to reveal information that might help identify the John Doe defendant.

In situations like these, if a plaintiff’s lawyer cannot otherwise determine who the poster is, the lawyer will typically subpoena the forum web site, seeking the IP address of the anonymous poster. Many widely used web based discussion systems, including for example the popular WordPress blogging platform, routinely log the IP addresses of commenters. If the web site is able to provide an IP address for the source of the allegedly defamatory comment, the lawyer will do a reverse lookup, a WHOIS search, or both, on that IP address, hoping to discover that the IP address belongs to a residential ISP or another organization that maintains detailed information about its individual users. If the IP address does turn out to correspond to a residential ISP — rather than, say, to an open wifi hub at a coffee shop or library — then the lawyer will issue a second subpoena, asking the ISP to reveal the account details of the user who was using that IP address at the time it was used to transmit the potentially defamatory comment. This is known as a “subpoena chain” because it involves two subpoenas (one to the web site, and a second one, based on the results of the first, to the ISP).

Of course, in many cases, this method won’t work. The forum web site may not have logged the commenter’s IP address. Or, even if an address is available, it might not be readily traceable back to an ISP account: the anonymous commenter may been using an anonymization tool like Tor to hide his address. Or he may have been coming online from a coffee shop or similarly public place (which typically will not have logged information about its transient users). Or, even if he reached the web forum directly from his own ISP, that ISP might be located in a foreign jurisdiction, beyond the reach of an American lawyer’s usual legal tools.

Is this a dead end for the plaintiff’s lawyer, who wants to identify John Doe? Probably not. There are a range of other parties, not yet part of our story, who might have information that could help identify John Doe. When it comes to the AutoAdmit site, one of these parties is StatCounter.com, a web traffic measurement service that AutoAdmit uses to keep track of trends in its traffic over time.

At the moment I am writing this post, anyone can verify that AutoAdmit uses StatCounter by visiting AutoAdmit.com and choosing “View Source” from the web browser menu. The first screenfull of web page code that comes up includes a block of text helpfully labeled “StatCounter Code,” which in turn runs a small piece of javascript that places a personalized StatCounter cookie on the machine of every user who visits AutoAdmit, or else (if one is already present) detects and records exactly which cookie it is. That’s how StatCounter can tell which visitors to AutoAdmit.com are new, which ones are returning, and which pages on the site are of greatest interest to new and returning users. StatCounter is in a position to track not only each user, but also each page, and each visit by a user to a certain page, over time. This includes not only the home page, but also the particular web page for each discussion “thread” on the site. Moreover, each post (even if anonymous) is marked with the time it was posted, down to the minute. So the plaintiff’s lawyer in our story could go to StatCounter, and ask only about visits to the particular thread where the relevant message was posted. If the post went up at 6:03 p.m. on a certain date, the lawyer could ask StatCounter, “What if anything do you know about the person who visited this web page at 6:03 p.m. on this date?” Of course, if John Doe’s browser is configured to refuse cookies, he wouldn’t be trackable. But most web based discussion sites, including AutoAdmit, rely on cookies to let people log in to their pseudonymous accounts in order to post comments in the first place. In any case, the web is much less convenient place without cookies, and as a practical matter most users do allow them.

In fact, the lawyer may be able to do better still: The anonymous commenter will have accessed the page at least twice — once to view the discussion as it stood before he took part, and again after clicking the button to add his own post to the mix. If StatCounter recorded both visits, as it very likely would have, then it becomes even easier to tie the anonymous commenter to his StatCounter cookie (and to whatever browsing history StatCounter has associated with that cookie).

There are a huge number of things to discuss here, and we’ll tackle several in the coming days. What would a web analytics provider like StatCounter know? Likely answers include IP addresses, times, and durations for the anonymous commenter’s previous visits to AutoAdmit. What about other, similar services, used by other sites? What about “beacons” that simply and silently collect data about users, and pay webmasters for the privilege? What about behavioral advertisers, whose business model involves tracking users across multiple sites and developing knowledge of their browsing habits and interests? What about content distribution networks? How would this picture change if John Doe were taking affirmative steps, such as using Tor, to obfuscate his identity?

These are some of the questions that we’ll try to address in future posts.

Comments

  1. jan says:

    You mention that the anonymous commenter might have used Tor to hide her IP address: in the most common setting, using Firefox + Torbutton + Tor, javascript and java are disabled; and Tor-state cookies are isolated from non-Tor cookies. Therefore, she would not be tracked by web-analytics providers or advertisement networks on the forum page.

  2. GaryM says:

    The StatCounter scenario would be circumvented if the user had NoScript or something similar to selectively disable JavaScript by site. I use NoScript and avoid enabling scripting for domains that do cross-site tracking.

    I have my browser configured to forget all cookies at the end of a session, which limits any site’s ability to track me by cookies alone.

  3. rp says:

    And once you’ve got the browsing history from statcounter (or whatever other tool is available) you can (I would assume) subpoena any site where the perp might have left identifying information, whether other forums or shopping sites. This could get around some of the difficulties of getting a determinative assignment of IP to identity from the ISP.

    The question, it seems, is what level of screening someone has to use to maintain a reasonable expectation of anonymity — use a free wifi connection for scurrilous posts, use a different browser, use Tor, use a different computer, all of the above…

  4. Fred von Lohmann says:

    On this general subject, I’d call your attention to the article in WIRED magazine about Evan Ratliff’s effort to “disappear.” Despite being technically savvy and his use of tools like Tor, he was ultimately caught by a Twitter + Facebook hack. One could certainly imagine setting bait for online defamers using similar tricks (e.g., a Facebook group claiming to defaming or defend the victim).

  5. Debianero Rumbero says:

    In spite of all those non-script, non-cookies, etc cautions people tend to forget local shared object (LSO).

    Guess what? LSOs are used by all versions of Adobe Flash Player.

  6. Mrten says:

    Besides cookies, there are lots of other ways to identify users: user-agent strings, installed plugins, screen size, system fonts. The only question that remains is: what gets logged?

    See this site for an eye-opening test: https://panopticlick.eff.org/

  7. golden says:

    This is why you should always install the firefox plugins:

    *) BetterPrivacy: which deletes all LSO cookies when you close your browser
    *) Ghostery: which shows you who’s bugs are on every page and allows you to turn any and all of them off
    *) TACO: opts you out of all the advertising network cookies

    You should also set your browser to clear all cookies when you close it.

    Not that this gets you all the way there, but it at least makes you a bit harder to track.

  8. Whoever says:

    Most people post both anonymous writings in some forums and non-anonymous writings in other forums. When will it be possible, or is it now possible for people to be identified by writing style?

  9. Alvaro Del Hoyo says:

    So aré ip addresses personal data or personal identifiable information?
    In Europe, Spain will depend on if these lawyer subpoena chain could be consider a “reasonable effort” or if lawyer will be sucessful in “reasonable term”
    in my opinion information is personal no matter if subject could be identified or identifiable. Information is personal in case its treatment could affect subjectcould rights or not. So IP addresses should be consider personal data, so privacy policies should include thiis kindergarten of data
    Regards

  10. jan says:

    Tor + Firefox + Torbutton disables active content such as Adobe Flash by default. An attacker could not access LSO in this way.
    Stylometrics is more promising. If the commenter has written enough, analyzing her writing style is very useful, specially when the context is a strong community, or a specialized subject matter.

  11. Arvind Narayanan says:

    Do you know of a single article/paper anywhere that contains the whole story of the Autoadmit controversy, with all the minutiae? Thanks.

  12. Natanael L says:

    My cut-n-paste-and-type-some list of addons and stuff to edit in Firefox for privacy:

    Good old cookies!!!
    Firefox is set to “Always Ask”. When the inevitable cookie popups occur, the vast majority of domains receive a permanent Deny and go away “for ever” as far as cookies are concerned. Always Ask also has a secondary benefit (see AdBlock entry below) in detecting and removing potential privacy threat vectors.

    Addons:

    NoScript – Blocks JavaScript, adds a whole lot of privacy back. Let’s you allow JS per page as well as temporarily per page

    BetterPrivacy – deletes all LSO cookies when you close your browser

    Ghostery – shows you who’s bugs are on every page and allows you to turn any and all of them off

    TACO: opts you out of all the advertising network cookies

    RequestPolicy – Blocks all requst to other domains the way NoScript blocks JavaScript. Note that it may be a real head-buster. Attempts on http://www.site.com to fetch images from img.site.com will be blocked by default…
    http://www.requestpolicy.com/

    AdBlock Plus – Use the subscription EasyPrivacy+EasyList. “any site that results in cookie popups from “suspicious” third-party domains earn themselves a close inspection with AdBlocks “List Blockable Content” window. They are typically found to be ad-servers or behavioral-trackers of some type, and are gleefully added to AdBlock. They never even got the chance to run their scripts (see previous entry). Stat counters are summarily “terminated” this way.”

    RefControl – “Referer, what referer? Unless, of course, http://news.google.com/ gets me over the wall and in the garden. Again, some work-related sites get exemptions.”

    User Agent Switcher – Why tell server stuff like this about you: “Mozilla/5.0 (Windows; U; Windows NT 6.1; ru-RU; rv:1.9.2) Gecko/20100105 MRA 5.6 (build 03278) Firefox/3.6 (.NET CLR 3.5.30729)” – When you can tell this: “Mozilla/5.0 (en-US) Gecko Firefox/3.6″ ? (http://user-agent-string.info/parse)

    * Don’t forget to visit https://panopticlick.eff.org/ !!!

    And of course, feel free to try TorButton, maybe even with Tor. :P

    And more:

    ” Tracking by writing style? – Comment by Whoever on February 9th, 2010 at 1:30 am.
    Most people post both anonymous writings in some forums and non-anonymous writings in other forums. When will it be possible, or is it now possible for people to be identified by writing style?”
    Yeah, it is possible! Gotta figure out a way to neutralize this one …

    ” LSO and writing style – Comment by jan on February 9th, 2010 at 3:03 am.
    [...] Stylometrics is more promising. If the commenter has written enough, analyzing her writing style is very useful, specially when the context is a strong community, or a specialized subject matter.”

  13. Not-so-anonymous says:

    Wow, that is eye opening. I normally don’t click links in web forums, but since I trust eff.org, I thought I would give Panopticlick a try. I don’t do a lot of weird things so I was a bit surprised at first that it claimed I was completely unique from the over 600K that have been tested. As I read through the collected data it became very clear how not-so-anonymous I am. The web-browser lists the fonts I have installed, and I have three in-house custom made fonts on this computer, one I made myself that is only maybe a dozen or so systems in the entire world. Yep, I guess I am pretty easy to track that way.