What Third Parties Know About John Doe

February 9, 2010 by Harlan Yu

As David mentioned in his previous post, plaintiffs’ lawyers in online defamation suits will typically issue a sequence of two “John Doe” subpoenas to try to unmask the identity of anonymous online speakers. The first subpoena goes to the website or content provider where the allegedly defamatory remarks were posted, and the second subpoena is sent to the speaker’s ISP. Both entities—the content provider and the ISP—are natural targets for civil discovery. Their logs together will often contain enough information to trace the remarks back to the speaker’s real identity. But when this isn’t enough to identify the speaker, the discovery process traditionally fails.

Are plaintiffs in these cases out of luck? Not if their lawyers know where else to look.

There are numerous third party web services that may hold just enough clues to reidentify the speaker, even without the help of the content provider or the ISP. The vast majority of websites today depend on third parties to deliver valuable services that would otherwise be too expensive or time-consuming to develop in-house. Services such as online advertising, content distribution and web analytics are almost always handled by specialized servers from third party businesses. As such, a third party can embed its service into a wide variety of sites across the web, allowing it to track users across all the sites where it maintains a presence.

Take for example the popular online blog Boing Boing. Upon loading its main page while recording the HTTP session, I noticed that my browser is automatically redirected to domains owned by no fewer than 17 distinct third party entities: 10 services that engage in advertising or marketing, five that embed media or integrate social networking functionality, and two that provide web analytics. By visiting this single webpage, my digital footprints have been scattered to and collected by at least 17 other online entities that I made no deliberate attempt to contact. And each of these entities will likely have stored a cookie on my web browser, allowing it to identify me uniquely later when I browse to one of its other partner sites. I don’t mean to pick on Boing Boing specifically—taking advantage of third party services is a nearly universal practice on the web today, but it’s exactly this pervasiveness that makes it so likely, if not probable, that all of my digital footprints together could link much of my online activities back to my actual identity.

To make this point concrete, let’s say I post a potentially defamatory remark about someone using a pseudonym in the comments section of a Boing Boing article. It happens that for each article, Boing Boing displays the number of times that the article has been shared on Facebook. In order to fetch the current number, Boing Boing redirects my browser to api.facebook.com to make a real-time query to the Facebook API. Since I happen to be logged in to Facebook at the time of the request, my browser forwards with the query my unique Facebook cookie, which includes information that explicitly identifies me—namely, my e-mail address that doubles as my Facebook username.

In order to integrate a bit of useful social networking functionality, Boing Boing enables Facebook, a third party in this situation, to learn which articles I visit on Boing Boing and the dates and times of my visits. The same is true for Tweetmeme, which can now positively link my Twitter account—which I’m also logged in to—with my Boing Boing visits. Even without an authenticated login, the 15 other third parties present on Boing Boing could track me using any number of different methods, including browser fingerprinting, to build detailed dossiers that slowly begin to piece together who I am.

From the perspective of a plaintiff’s lawyer, even if Boing Boing is unwilling or unable to produce any useful information, these third parties might be able to uniquely identify me as the likely defamer, or at least narrow the list of possible speakers down to a handful of users. But tracing speech is not always this easy. Tomorrow, I’ll discuss more complicated discovery strategies and the extent to which they are technically feasible.

Comments

tz says

February 9, 2010 at 9:18 pm

Add “betterprivacy” to take care of flash cookies (of course I also use flashblock) and noredirect which makes me confirm each jump.
You can whine (slanderously) wearing a balaclava, but your voice won’t be masked.
There is a difference between pseudonymous and anonymous.
The average user doesn’t necessarily know it, and there are evil entities, but then there’s a show called “the world’s dumbest criminals” for a reason.

There might be some mild expectation of anonymity, but if needed I would expect people to do at least a bit of study.
- tz says
  
  February 10, 2010 at 11:40 am
  
  The “ghostery” extension shows and blocks “web bugs” – though some of them need to be turned off (e.g. comments on blogspot disappear if some of the more generic google API things are disabled – which is how I was reminded about this).
Anonymous says

February 9, 2010 at 6:54 pm

Here are my browsing SOPs:

Cookies:

Firefox is set to “Always Ask”. When the inevitable cookie popups occur, the vast majority of domains receive a permanent Deny and go away “for ever” as far as cookies are concerned. Always Ask also has a secondary benefit (see AdBlock entry below) in detecting and removing potential privacy threat vectors.

After cookies, my Holy Trinity of Firefox plugins are:

NoScript:

The default is Off. A surprisingly high number of sites function quite well without Javascript, arguably even better (faster, less clutter) than they do with Javascript running. Of the rest, the vast majority (Expedia comes to mind) require me to temporarily (for the life of the browser session) turn on Javascript, and only for the primary domain. There are only a few (5 or so), work-related sites that get to have Javascript permanently switched on.

AdBlock Plus:

In addition to EasyList, any site that results in cookie popups from “suspicious” third-party domains earn themselves a close inspection with AdBlocks “List Blockable Content” window. They are typically found to be ad-servers or behavioral-trackers of some type, and are gleefully added to AdBlock. They never even got the chance to run their scripts (see previous entry). Stat counters are summarily “terminated” this way.

RefControl

Referer, what referer? Unless, of course, http://news.google.com/ gets me over the wall and in the garden. Again, some work-related sites get exemptions.

I’m under no illusions that even with the non-default settings above I’m actually browsing “privately”. I’m not aware of any browser that really provides any decent “privacy leakage” management tools. Chrome’s process-per-tab is sort of a starting point, but I’m thinking more along the lines of third-party-request controls.

But they are, at least, some reasonable lines of defense.
- Natanael L says
  
  February 10, 2010 at 7:36 pm
  
  RequestPolicy!
  http://www.requestpolicy.com/
  
  Try it out and tell me how it went.
  
  (Please do note that it’s by faaar “worse” then NoScript in terms of blocking things – if you hate building white lists of sites manually, don’t bother using it)
  - Anonymous says
    
    February 11, 2010 at 11:43 am
    
    Taking it out for a spin today, thanks.

What Third Parties Know About John Doe

Comments

Contributors

Archives by Month

What Third Parties Know About John Doe

Comments

What We Discuss

Contributors

Archives by Month