October 14, 2024

May 14-15: Future of News workshop

We’re excited to announce a workshop on “The Future of News“, to be held May 14 and 15 in Princeton. It’s sponsored by the Center for InfoTech Policy at Princeton.

Confirmed speakers include Kevin Anderson, David Blei, Steve Borriss, Dan Gillmor, Matthew Hurst, Markus Prior, David Robinson, Clay Shirky, Paul Starr, and more to come.

The Internet—whose greatest promise is its ability to distribute and manipulate information—is transforming the news media. What’s on offer, how it gets made, and how end users relate to it are all in flux. New tools and services allow people to be better informed and more instantly up to date than ever before, opening the door to an enhanced public life. But the same factors that make these developments possible are also undermining the institutional rationale and economic viability of traditional news outlets, leaving profound uncertainty about how the possibilities will play out.

Our tentative topics for panels are:

  • Data mining, visualization, and interactivity: To what extent will new tools for visualizing and artfully presenting large data sets reduce the need for human intermediaries between facts and news consumers? How can news be presented via simulation and interactive tools? What new kinds of questions can professional journalists ask and answer using digital technologies?
  • Economics of news: How will technology-driven changes in advertising markets reshape the news media landscape? Can traditional, high-cost methods of newsgathering support themselves through other means? To what extent will action-guiding business intelligence and other “private journalism”, designed to create information asymmetries among news consumers, supplant or merge with globally accessible news?
  • The people formerly known as the audience: How effectively can users collectively create and filter the stream of news information? How much of journalism can or will be “devolved” from professionals to networks of amateurs? What new challenges do these collective modes of news production create? Could informal flows of information in online social networks challenge the idea of “news” as we know it?
  • The medium’s new message: What are the effects of changing news consumption on political behavior? What does a public life populated by social media “producers” look like? How will people cope with the new information glut?

Registration: Registration, which is free, carries two benefits: We’ll have a nametag waiting for you when you arrive, and — this is the important part — we’ll feed you lunch on both days. To register, please contact CITP’s program assistant, Laura Cummings-Abdo, at . Include your name, affiliation and email address.

Online Symposium: Voluntary Collective Licensing of Music

Today we’re kicking off an online symposium on voluntary collective licensing of music, over at the Center for InfoTech Policy site.

The symposium is motivated by recent movement in the music industry toward the possibility of licensing large music catalogs to consumers for a fixed monthly fee. For example, Warner Music, one of the major record companies, just hired Jim Griffin to explore such a system, in which Internet Service Providers would pay a per-user fee to record companies in exchange for allowing the ISPs’ customers to access music freely online. The industry had previously opposed collective licenses, making them politically non-viable, but the policy logjam may be about to break, making this a perfect time to discuss the pros and cons of various policy options.

It’s an issue that evokes strong feelings – just look at the comments on David’s recent post.

We have a strong group of panelists:

  • Matt Earp is a graduate student in the i-school at UC Berkeley, studying the design and implementation of voluntary collective licensing systems.
  • Ari Feldman is a Ph.D. candidate in computer science at Princeton, studying computer security and information policy.
  • Ed Felten is a Professor of Computer Science and Public Affairs at Princeton.
  • Jon Healey is an editorial writer at the Los Angeles Times and writes the paper’s Bit Player blog, which focuses on how technology is changing the entertainment industry’s business models.
  • Samantha Murphy is an independent singer/songwriter and Founder of SMtvMusic.com.
  • David Robinson is Associate Director of the Center for InfoTech Policy at Princeton.
  • Fred von Lohmann is a Senior Staff Attorney at the Electronic Frontier Foundation, specializing in intellectual property matters.
  • Harlan Yu is a Ph.D. candidate in computer science at Princeton, working at the intersection of computer science and public policy.

Check it out!

Phorm's Harms Extend Beyond Privacy

Last week, I wrote about the privacy concerns surrounding Phorm, an online advertising company who has teamed up with British ISPs to track user Web behavior from within their networks. New technical details about its Webwise system have since emerged, and it’s not just privacy that now seems to be at risk. The report exposes a system that actively degrades user experience and alters the interaction with content providers. Even more importantly, the Webwise system is a clear violation of the sacred end-to-end principle that guides the core architectural design of the Internet.

Phorm’s system does more than just passively gain “access to customers’ browsing records” as previously suggested. Instead, they plan on installing a network switch at each participating ISP that actively interferes with the user’s browsing session by injecting multiple URL redirections before the user can retrieve the requested content. Sparing you most of the nitty-gritty technical details, the switch intercepts the initial HTTP request to the content server to check whether a Webwise cookie–containing the user’s randomly-assigned identifier (UID)– exists in the browser. It then impersonates the requested server to trick the browser into accepting a spoofed cookie (which I will explain later) that contains the same UID. Only then will the switch forward the request and return the actual content to the user. Basically, this amounts to a big technical hack by Phorm to set the cookies that track users as they browse the Web.

In all, a user’s initial request is redirected three times for each domain that is contacted. Though this may not seem like much, this extra layer of indirection harms the user by degrading the overall browsing experience. It imposes an unnecessary delay that will likely be noticeable by users.

The spoofed cookie that Phorm stores on the user’s browser during this process is also a highly questionable practice. Generally speaking, a cookie is specific to a particular domain and the browser typically ensures that a cookie can only be read and written by the domain it belongs to. For example, data in a yahoo.com cookie is only sent when you contact a yahoo.com server, and only a yahoo.com server can put data into that cookie.

But since Phorm controls the switch at the ISP, it can bypass this usual guarantee by impersonating the server to add cookies for other domains. To continue the example, the switch (1) intercepts the user’s request, (2) pretends to be a yahoo.com server, and (3) injects a new yahoo.com cookie that contains the Phorm UID. The browser, believing the cookie to actually be from yahoo.com, happily accepts and stores it. This cookie is used later by Phorm to identify the user whenever the user visits any page on yahoo.com.

Cookie spoofing is problematic because it can change the interaction between the user and the content-providing site. Suppose a site’s privacy policy promises the user that it does not use tracking cookies. But because of Phorm’s spoofing, the browser will store a cookie that (to the user) looks exactly like a tracking cookie from the site. Now, the switch typically strips out this tracking cookie before it reaches the site, but if the user moves to a non-Phorm ISP (say at work), the cookie will actually reach the site in violation of its stated privacy policy. The cookie can also cause other problems, such as a cookie collision if the site cookie inadvertently has the same name as the Phorm cookie.

Disruptive activities inside the network often create these sort of unexpected problems for both users and websites, which is why computer scientists are skeptical of ideas that violate the end-to-end principle. For the uninitiated, the principle, in short, states that system functionality should almost always be implemented at the end hosts of the network, with a few justifiable exceptions. For instance, almost all security functionality (such as data encryption and decryption) is done by end users and only rarely by machines inside the network.

The Webwise system has no business being inside the network and has no role in transporting packets from one end of the network to the other. The technical Internet community has been worried for years about the slow erosion of the end-to-end principle, particularly by ISPs who are looking to further monetize their networks. This principle is the one upon which the Internet is built and one which the ISPs must uphold. Phorm’s system, nearly in production, is a cogent realization of this erosion, and ISPs should keep Phorm outside the gate.

NJ Election Discrepancies Worse Than Previously Thought, Contradict Sequoia's Explanation

I wrote previously about discrepancies in the vote totals reported by Sequoia AVC Advantage voting machines in New Jersey’s presidential primary election, and the incomplete explanation offered by Sequoia, the voting machine vendor. I published copies of the “summary tapes” printed by nine voting machines in Union County that showed discrepancies; all of them were consistent with Sequoia’s explanation of what went wrong.

This week we obtained six new summary tapes, from machines in Bergen and Gloucester counties. Two of these new tapes contradict Sequoia’s explanation and show more serious discrepancies that we saw before.

Before we dig into the details, let’s review some background. At the end of Election Day, each Sequoia AVC Advantage voting machine prints a “summary tape” (or “results report”) that lists (among other things) the number of votes cast for each candidate on that machine, and the total voter turnout (number of votes cast) in each party. In the Super Tuesday primary, a few dozen machines in New Jersey showed discrepancies in which the number of votes recorded for candidates in one party exceeded the voter turnout in that party. For example, the vote totals section of a tape might show 61 total votes for Republican candidates, while the turnout section of the same tape shows only 60 Republican voters.

Sequoia’s explanation was that in certain circumstances, a voter would be allowed to vote in one party while being recorded in the other party’s turnout. (“It has been observed that the ‘Option Switch’ or Party Turnout Totals section of the Results Report may be misreported whereby turnout associated with the party or option switch choice is misallocated. In every instance, however, the total turnout, or the sum of the turnout allocation, is accurate.”) Sequoia’s memo points to a technical flaw that might cause this kind of misallocation.

The nine summary tapes I had previously were all consistent with Sequoia’s explanation. Though the total votes exceeded the turnout in one party, the votes were less than the turnout in the other party, so that the discrepancy could have been caused by misallocating turnout as Sequoia described. For example, a tape from Hillside showed 61 Republican votes cast by 60 voters, and 361 Democratic votes cast by 362 voters, for a total of 422 votes cast by 422 voters. Based on these nine tapes, Sequoia’s explanation, though incomplete, could have been correct.

But look at one of the new tapes, from Englewood Cliffs, District 4, in Bergen County. Here’s a relevant part of the tape:

The Republican vote totals are Giuliani 1, Paul 1, Romney 6, McCain 14, for a total of 22. The Democratic totals are Obama 33, Edwards 2, Clinton 49, for a total of 84. That comes to 106 total votes across the two parties.

The turnout section (or “Option Switch Totals”) shows 22 Republican voters and 83 Democratic voters, for a total of 105.

This is not only wrong – 106 votes cast by 105 voters – but it’s also inconsistent with Sequoia’s explanation. Sequoia says that all of the voters show up in the turnout section, but a few might show up in the wrong party’s turnout. (“In every instance, however, the total turnout, or the sum of the turnout allocation, is accurate.”) That’s not what we see here, so Sequoia’s explanation must be incorrect.

And that’s not all. Each machine has a “public counter” that keeps track of how many votes were cast on the machine in the current election. The public counter, which is found on virtually all voting machines, is one of the important safeguards ensuring that votes are not cast improperly. Here’s the top of the same tape, showing the public counter as 105.

The public counter is important enough that the poll workers actually sign a statement at the bottom of the tape, attesting to the value of the public counter. Here’s the signed statement from the same tape:

The public counter says 105, even though 106 votes were reported. That’s a big problem.

Another of the new tapes, this one from West Deptford in Gloucester County, shows a similar discrepancy, with 167 total votes, a total turnout of 166, and public counter showing 166.

How many more New Jersey tapes show errors? What’s wrong with Sequoia’s explanation? What really happened? We don’t know the answers to any of these questions.

Isn’t it time for a truly independent investigation?

UPDATE (April 11): The New Jersey Secretary of State, along with the two affected counties, are now saying that I am misreading the two tapes discussed here. In particular, they are now saying that the tape image included above shows 48 votes for Hillary Clinton, not 49. They’re also saying now that the West Deptford tape shows two votes for Ron Paul, not three.

It’s worth noting that the counties originally read the tapes as I did. When I sent an open records request for tapes showing discrepancies, they sent these tapes – which they would not have done had they read the tapes as they now do. Also, the Englewood Cliffs tape pictured above shows hand-written numbers that must have been written by a county official (they were on the tapes before they were copied and sent to us), showing 84 votes for Democratic candidates, consistent with the county’s original reading of the tape (but not its new reading).

In short, the Secretary of State talked to the counties, and then the counties changed their minds about how to read the tapes.

So: were the counties right before, or are they right now? Decide for yourself – here are the tapes: Englewood Cliffs, West Deptford.

UPDATE (April 14): Regardless of what these two tapes show, plenty of other tapes from the Feb. 5 primary show discrepancies that the state and counties are not disputing. These other discrepancies are consistent with Sequoia’s explanation (though that explanation is incomplete and more investigation is needed to tell whether it is correct). Thus far we have images of at least thirty such tapes.

Bad Phorm on Privacy

Phorm, an online advertising company, has recently made deals with several British ISPs to gain unprecedented access to every single Web action taken by their customers. The deals will let Phorm track search terms, URLs and other keywords to create online behavior profiles of individual customers, which will then be used to provide better targeted ads. The company claims that “No private or personal information, or anything that can identify you, is ever stored – and that means your privacy is never at risk.” Although Phorm might have honest intentions, their privacy claims are, at best, misleading to customers.

Their privacy promise is that personally-identifiable information is never stored, but they make no promises on how the raw logs of search terms and URLs are used before they are deleted. It’s clear from Phorm’s online literature that they use this sensitive data for ad delivery purposes. In one example, they claim advertisers will be able to target ads directly to users who see the keywords “Paris vacation” either as a search or within the text of a visited webpage. Without even getting to the storage question, users will likely perceive Phorm’s access and use of their behavioral data as a compromise of their personal privacy.

What Phorm does store permanently are two pieces of information about each user: (1) the “advertising categories” that the user is interested in and (2) a randomly-generated ID from the user’s browser cookie. Each raw online action is sorted into one or more categories, such as “travel” or “luxury cars”, that are defined by advertisers. The privacy worry is that as these categories become more specific, the behavioral profiles of each user becomes ever more precise. Phorm seems to impose no limit on the specificity of these defined categories, so for all intents and purposes, these categories over time will become nearly identical to the search terms themselves. Indeed, they market their “finely tuned” service as analogous to typical keyword search campaigns that advertisers are already used to. Phorm has a strong incentive to store arbitrarily specific interest categories about each user to provide optimally targeted ads, and thus boost the profits of their advertising business.

The second protection mechanism is a randomly-generated ID number stored in a browser cookie that Phorm uses to “anonymously” track a user as she browses the web. This ID number is stored with the list of the interest categories collected for that user. Phorm should be given credit for recognizing this as more privacy-protecting than simply using the customer’s name or IP address as an identifier (something even Google has disappointingly failed to recognize). But from past experience, these protections are unlikely to be enough. The storage of random user IDs mapped to keywords mirroring actual search queries is highly reminiscent of the AOL data fiasco from 2006, where AOL released “anonymized” search histories containing 20 million keywords. It turned out to be easy to identify the name of specific individuals based solely on their search history.

In the least, the company’s employees will be able to access an AOL-like dataset about the ISP’s customers. Granted, distinguishing whether particular datasets as personally-identifiable or not is a notoriously difficult problem and subject to further research. But it’s inaccurate for Phorm to claim that personally-identifiable information is not being stored and to promise users that their privacy is not at risk.