July 29, 2016

avatar

Brexit Exposes Old and Deepening Data Divide between EU and UK

After the Brexit vote, politicians, businesses and citizens are all wondering what’s next. In general, legal uncertainty permeates Brexit, but in the world of bits and bytes, Brussels and London have in fact been on a collision course at least since the 90s. The new British prime minister, Theresa May, has been personally responsible for a deepening divide across the North Sea on data and communication policy. Although EU citizens will see stronger privacy and cybersecurity protections through EU law post-Brexit, multinational companies should be particularly worried about how future regulation will treat the loads of data they traffic about customers, employees, and deals between the EU and the UK.  [Read more…]

avatar

Pokémon Go and The Law: Privacy, Intellectual Property, and Other Legal Concerns

Pokémon Go made 22-year-old Kyrie Tompkins fall and twist her ankle. “[The game]  vibrated to let me know there was something nearby and I looked up and just fell in a hole,” she told local news outlet WHEC 10.

So far, no one has sued Niantic or The Pokémon Company for injuries suffered while playing Pokémon Go. But it’s only a matter of time before the first big Pokémon Go related injury, whether that comes in the form of a pedestrian drowning while catching a Magikarp (the most embarrassing possible injury) or a car accident caused by a distracted driver playing the game.

Before the first lawsuits arrive, here’s a brief analysis of some of the legal issues involved with the new hit mobile game.

LIABILITY FOR INJURIES

A few minor injuries have already happened to Pokémon Go players.  If a serious accident does occur, injured players can look to legal precedent from Snapchat-related car crashes.

The Snapchat claimants sued on a theory of product liability, essentially stating that Snapchat created a product that had inherent risks of foreseeable harm to consumers and/or released a product without sufficient warnings against potential harms. Similarly, Pokémon Go players could argue that it’s predictable that players would stare at their phones while walking distractedly, ignoring natural hazards and oncoming cars.

However, many of the Snapchat lawsuits center on Snapchat’s speed filter encouraging drivers to Snap while driving. No such filter exists for Pokémon Go. In fact, the game is not playable if the player is moving above a certain speed.

Furthermore, Pokémon Go has a number of warnings and safeguards against playing while driving or walking at dangerous speeds. A full-screen warning is displayed during loading that warns users against distracted playing. The game’s Terms of Service also includes disclaimers against liability and a warning about Safe Play: “During game play, please be aware of your surroundings and play safely.”

PRIVACY

Let’s start with the good:

Niantic has properly covered the basic privacy law requirements. The app includes clearly visible links to their privacy policy, which is also written clearly and (relatively) understandably. The privacy policy includes the necessary information, for both U.S. and E.U. users. Niantic has taken the necessary steps to protect children’s privacy as well.

And now for the possibly less-good:

Early on, players noticed a concerning privacy setting that effectively allowed Niantic access and control over players’ Google accounts. Niantic quickly fixed this problem and removed the access controls in an update. It’s likely that this level of Google account control was a holdover from the days when Niantic was still under the Google umbrella. I would chalk this up as a wash for Niantic, as the privacy concern was resolved fairly quickly.

Now, the real concern here is that the app takes in a lot of information. A lot of information. Some of it is personally identifiable information (like your name and email address). Some of it is user-submitted, like names you give to the forty Rattattas you catch in one day, because even the Pokémon in Manhattan are mostly rats and pigeons. Pokemon Go collects so much information that Senator Al Franken was inspired to publish a letter to Niantic demanding more clarity on the game’s privacy protections.

The most concerning privacy issue with this app is the constant tracking of location data. Some of these concerns were already noted, to less fanfare, with the release of Ingress, the precursor to Pokémon Go. By agreeing to the Pokémon Go privacy policy, you explicitly agree to allow Niantic to track your location any time you use the app. Most players leave the app open at all times, waiting for that sweet, sweet buzz of a new wild Pokémon appearing. This means that, effectively, you give permission for Niantic to track your movements all day, every day, wherever you go.

Niantic also does not provide much information on how your data can be shared. The privacy policy allows Niantic to “share aggregated information and non-identifying information with third parties for research and analysis, demographic profiling, and other similar purposes.” This means data on your daily commute can be sold to marketing companies to better market to you, the consumer. Niantic promises not to share any of this data without aggregating the data (grouping it together with others’ data) and stripping it of identifying information (your name, email, etc.). [Read more…]

avatar

A Peek at A/B Testing in the Wild

[Dillon Reisman was previously an undergraduate at Princeton when he worked on a neat study of the surveillance implications of cookies. Now he’s working with the WebTAP project again in a research + engineering role. — Arvind Narayanan]

In 2014, Facebook revealed that they had manipulated users’ news feeds for the sake of a psychology study looking at users’ emotions. We happen to know about this particular experiment because it was the subject of a publicly-released academic paper, but websites do “A/B testing” every day that is completely opaque to the end-user. Of course, A/B testing is often innocuous (say, to find a pleasing color scheme), but the point remains that the user rarely has any way of knowing in what ways their browsing experience is being modified, or why their experience is being changed in particular.

By testing websites over time and in a variety of conditions we could hope to discover how users’ browsing experience is manipulated in not-so-obvious ways. But one third-party service actually makes A/B testing and user tracking human-readable — no reverse-engineering or experimentation necessary! This is the widely-used A/B testing provider Optimizely; Jonathan Mayer had told us it would be an interesting target of study.* Their service is designed to expose in easily-parsable form how its clients segment users and run experiments on them directly in the JavaScript they embed on websites. In other words, if example.com uses Optimizely, the entire logic used by example.com for A/B testing is revealed to every visitor of example.com.

That means that the data collected by our large-scale web crawler OpenWPM contains the details of all the experiments that are being run across the web using Optimizely. In this post I’ll show you some interesting things we found by analyzing this data. We’ve also built a Chrome extension, Pessimizely, that you can download so you too can see a website’s Optimizely experiments. When a website uses Optimizely, the extension will alert you and attempt to highlight any elements on the page that may be subject to an experiment. If you visit nytimes.com, it will also show you alternative news headlines when you hover over a title. I suggest you give it a try!

 

The New York Times website, with headlines that may be subject to an experiment highlighted by Pessimizely.

 

The Optimizely Scripts

Our OpenWPM web crawler collects and stores javascript embedded on every page it visits. This makes it straightforward to make a query for every page that uses Optimizely and grab and analyze the code they get from Optimizely. Once collected, we investigated the scripts through regular expression-matching and manual analysis.


  "4495903114": {
      "code": …
      "name": "100000004129417_1452199599 
               [A.] New York to Appoint Civilian to Monitor Police Surveillance -- 
               [B.] Sued Over Spying on Muslims, New York Police Get Oversight",
      "variation_ids": ["4479602534","4479602535"],
      "urls": [{
        "match": "simple",
        "value": "http://www.nytimes.com"
      }],
      "enabled_variation_ids": ["4479602534","4479602535"]
    },

An example of an experiment from nytimes.com that is A/B testing two variations of a headline in a link to an article.

From a crawl of the top 100k sites in January 2016, we found and studied 3,306 different websites that use Optimizely. The Optimizely script for each site contains a data object that defines:

  1. How the website owner wants to divide users into “audiences,” based on any number of parameters like location, cookies, or user-agent.
  2. Experiments that the users might experience, and what audiences should be targeted with what experiments.

The Optimizely script reads from the data object and then executes a javascript payload and sets cookies depending on if the user is in an experimental condition. The site owner populates the data object through Optimizely’s web interface – who on a website’s development team can access that interface and what they can do is a question for the site owner. The developer also helpfully provides names for their user audiences and experiments.

In total, we found around 51,471 experiments on the 3,306 websites in our dataset that use Optimizely. On average each website has approximately 15.2 experiments, and each experiment has about 2.4 possible variations. We have only scratched the surface of some of the interesting things sites use A/B testing for, and here I’ll share a couple of the more interesting examples:

 

News publishers test the headlines users see, with differences that impact the tone of the article

A widespread use of Optimizely among news publishers is “headline testing.” To use an actual recent example from the nytimes.com, a link to an article headlined:

“Turkey’s Prime Minister Quits in Rift With President”

…to a different user might appear as…

“Premier to Quit Amid Turkey’s Authoritarian Turn.”

The second headline suggests a much less neutral take on the news than the first. That sort of difference can paint a user’s perception of the article before they’ve read a single word. We found other examples of similarly politically-sensitive headlines changing, like the following from pjmedia.com:

“Judge Rules Sandy Hook Families Can Proceed with Lawsuit Against Remington”

…could appear to some users as…

“Second Amendment Under Assault by Sandy Hook Judge.”

While editorial concerns might inform how news publishers change headlines, it’s clear that a major motivation behind headline testing is the need to drive clicks. A third variation we found for the Sandy Hook headline above is the much vaguer sounding “Huge Development in Sandy Hook Gun Case.” The Wrap, an entertainment news outlet, experimented with replacing “Disney, Paramount  Had Zero LGBT Characters in Movies Last Year” with the more obviously “click-baity” headline “See Which 2 Major Studios Had Zero LGBT Characters in 2015 Movies.”

We were able to identify 17 different news websites in our crawl that in the past have done some form of headline testing. This is most likely an undercount in our crawl — most of these 17 websites use Optimizely’s integrations with other third-party platforms like Parse.ly and WordPress for their headline testing, making them more easily identified. The New York Times website, for instance, implements its own headline testing code.

Another limitation of what we’ve found so far is that the crawls that we analyzed only visit the homepage of each site. The OpenWPM crawler could be configured, however, to browse links from within a site’s homepage and collect data from those pages. A broader study of the practices of news publishers could use the tool to drill down deeper into news sites and study their headlines over time.

 

Websites identify and categorize users based on money and affluence

Many websites target users based on IP and geolocation. But when IP/geolocation are combined with notions of money the result is surprising. The website of a popular fitness tracker targets users that originate from a list of six hard-coded IP addresses labelled “IP addresses Spending more than $1000.” Two of the IP addresses appear to be larger enterprise customers — a medical research institute a prominent news magazine. Three belong to unidentified Comcast customers. These big-spending IP addresses were targeted in the past with an experiment presented the user a button that either suggested the user “learn more” about a particular product or “buy now.”

Connectify, a large vendor of networking software, uses geolocation on a coarser level — they label visitors from the US, Australia, UK, Canada, Netherlands, Switzerland, Denmark, and New Zealand as coming from “Countries that are Likely to Pay.”

Non-profit websites also experiment with money. charity: water (charitywater.org) and the Human Rights Campaign (hrc.org) both have experiments defined to change the default donation amount a user might see in a pre-filled text box.

 

Web developers use third-party tools for more than just their intended use

A developer following the path of least resistance might use Optimizely to do other parts of their job simply because it is the easiest tool available. Some of the more exceptional “experiments” deployed by websites are simple bug-fixes, described with titles like, “[HOTFIX][Core Commerce] Fix broken sign in link on empty cart,” or “Fix- Footer links errors 404.” Other experiments betray the haphazard nature of web development, with titles like “delete me,” “Please Delete this Experiment too,” or “#Bugfix.”

We might see these unusual uses because Optimizely allows developers to edit and rollout new code with little engineering overhead. With the inclusion of one third-party script, a developer can leverage the Optimizely web interface to do a task that might otherwise take more time or careful testing. This is one example of how third-parties have evolved to become integral to the entire functionality and development of the web, raising security and privacy concerns.

 

The need for transparency

Much of the web is curated by inscrutable algorithms running on servers, and a concerted research effort is needed to shed light on the less-visible practices of websites. Thanks to the Optimizely platform we can at least peek into that secret world.

We believe, however, that transparency should be the default on the web — not the accidental product of one third-party’s engineering decisions. Privacy policies are a start, but they generally only cover a website’s data collection and third-party usage on a coarse level. The New York Times Privacy Policy, for instance, does not even suggest that headline testing is something they might do, despite how it could drastically alter your consumption of the news. If websites had to publish more information about what third-parties they use and how they use them, regulators could use that information to better protect consumers on the web. Considering the potentially harmful effects of how websites might use third-parties, more transparency and oversight is essential.

 

@dillonthehuman


* This was a conversation a year ago, when Jonathan was a grad student at Stanford.

avatar

The Princeton Web Census: a 1-million-site measurement and analysis of web privacy

Web privacy measurement — observing websites and services to detect, characterize, and quantify privacy impacting behaviors — has repeatedly forced companies to improve their privacy practices due to public pressure, press coverage, and regulatory action. In previous blog posts I’ve analyzed why our 2014 collaboration with KU Leuven researchers studying canvas fingerprinting was successful, and discussed why repeated, large-scale measurement is necessary.

Today I’m pleased to release initial analysis results from our monthly, 1-million-site measurement. This is the largest and most detailed measurement of online tracking to date, including measurements for stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and “cookie syncing”.  These results represent a snapshot of web tracking, but the analysis is part of an effort to collect data on a monthly basis and analyze the evolution of web tracking and privacy over time.

Our measurement platform used for this study, OpenWPM, is already open source. Today, we’re making the datasets for this analysis available for download by the public. You can find download instructions on our study’s website.

New findings

We provide background information and summary of each of our main findings on our study’s website. The paper goes into even greater detail and provides the methodological details on the measurement and analysis of each finding. One of our more surprising findings was the discovery of two apparent attempts to use the HTML5 Audio API for fingerprinting.

The figure is a visualization of the audio processing executed on users’ browsers by third-party fingerprinting scripts. We found two different AudioNode configurations in use. In both configurations an audio signal is generated by an oscillator and the resulting signal is hashed to create an identifier. Initial testing shows that the techniques may have some limitations when used for fingerprinting, but further analysis is necessary. You can help us with that (and test your own device) by using our demonstration page here.

See the paper for our analysis of a consolidated third-party ecosystem, the effects of third parties on HTTPS adoption, and examine the performance of tracking protection tools. In addition to audio fingerprinting, we show that canvas fingerprint is being used by more third parties, but on less sites; that a WebRTC feature can and is being used for tracking; and how the HTML Canvas is being used to discover user’s fonts.

What’s next? We are exploring ways to share our data and analysis tools in a form that’s useful to a wider and less technical audience. As we continue to collect data, we will also perform longitudinal analyses of web tracking. In other ongoing research, we’re using the data we’ve collected to train machine-learning models to automatically detect tracking and fingerprinting.

avatar

Is Tesla Motors a Hidden Warrior for Consumer Digital Privacy?

Amid the privacy intrusions of modern digital life, few are as ubiquitous and alarming as those perpetrated by marketers. The economics of the entire industry are built on tools that exist in shadowy corners of the Internet and lurk about while we engage with information, products and even friends online, harvesting our data everywhere our mobile phones and browsers dare to go.

This digital marketing model, developed three decades ago and premised on the idea that it’s OK for third parties to gather our private data and use it in whatever way suits them, will grow into a $77 billion industry in the U.S. this year, up from $57 billion in 2014, according to Forrester Research.

Storm clouds are developing around the industry, however, and there are new questions being raised about the long-term viability of surreptitious data-gathering as a sustainable business model. Two factors are typically cited: Regulators in Europe have begun, and those in the U.S. are poised to begin, reining in the most intrusive of these marketing practices; and the growth of the mobile Internet, and the related reliance on apps rather than browsers for 85% of our mobile online activity, have made it more difficult to gather user data.

Then there is Tesla Motors and its advertising-averse marketing model, which does not use third-party data to raise awareness and interest in its brand, drive desire for its products or spur action by its customers. Instead, the electric carmaker relies on cultural branding, a concept popularized recently by Douglas Holt, formerly of the Harvard Business School, to do much of the marketing heavy lift that brought it to the top of the electric vehicle market. And while Tesla is not the only brand engaging digital crowd culture and shunning third-party data-gathering, its success is causing the most consternation within the ranks of intrusion marketers.

[Read more…]

avatar

The Interconnection Measurement Project

Building on the March 11 release of the “Revealing Utilization at Internet Interconnection Points” working paper, today, CITP is excited to announce the launch of the Interconnection Measurement Project. This unprecedented initiative includes the launch of a project-specific website and the ongoing collection, analysis, and release of capacity and utilization data from ISP interconnection points. CITP’s Interconnection Measurement Project uses the same method that I detailed in the working paper and includes the participation of seven ISPs—Bright House Networks, Comcast, Cox, Mediacom, Midco, Suddenlink, and Time Warner Cable.

The project website—which we aim to update regularly—includes additional views of the data that are not included in the working paper. The visualizations are organized into three categories: (1) Aggregate Views; (2) Regional Views; and (3) Views by Interconnect. The Aggregate Views provide peak utilization, growth in capacity and usage, as well as the distribution of peak utilization across interconnects and across participating ISPs, on a monthly basis across the entire data set. The Regional Views provide monthly peak utilization by region and distribution of peak utilization across interconnects by region. Finally, the Views by Interconnect provide details into daily per-link utilization statistics, as well as the distribution of peak utilization by link and by capacity, also on a monthly basis.The website visualizations also include an additional month of data (March 2016) beyond what the original working paper included. CITP plans to regularly update the visualizations with new data to provide a picture of how the Internet is evolving, and we will assess the project annually to ensure that the data, reports, and insights that we offer remain relevant.

The March data is consistent with the initial findings detailed in the working paper: that many interconnects have significant spare capacity, that this spare capacity exists both across ISPs in each region and in aggregate for any individual ISP, and that the aggregate utilization across interconnects is roughly 50 percent during peak periods.

The seven participating ISPs collectively account for about 50 percent of all US broadband subscribers. We at CITP hope that these ISPs are merely the pioneers of what may eventually become a much larger effort. As we continue to advance this field of research and deepen our understanding of traffic characteristics at interconnection points, we welcome the participation of even more ISPs as well as other network operators and edge providers in this important effort.

avatar

Apple Encryption Saga and Beyond: What U.S. Courts Can Learn from Canadian Caselaw

It has been said that privacy is “at risk of becoming a real human right.” The exponential increase of personal information in the hands of organizations, particularly sensitive data, creates a significant rise in the perils accompanying formerly negligible privacy incidents. At one time considered too intangible to merit even token compensation, risks of harm to privacy interests have become so ubiquitous in the past three years that they require special attention.

Legal and social changes have for their part also increased potential privacy liability for private and public entities when they promise – and fail – to guard our personal data (think Ashley Madison…). First among those changes has been the emergence of a “privacy culture” — a process bolstered by the trickle-down effect of the Julia Angwin’s investigative series titled “What They Know,” and the heightened attention that the mainstream media now attaches to privacy incidents. Second, courts in various common law jurisdictions are beginning to recognize intangible privacy harms and have been increasingly willing to certify class action lawsuits for privacy infringements that previously would have been summarily dismissed without hesitation.

Prior to 2012, it was difficult to find examples of judicially recognized losses arising from privacy breaches. Since then however, the legal environment in common law jurisdictions and in Canada in particular has changed dramatically. Claims related to privacy mishaps are now commonplace, and there has been an exponential multiplication in the number of matters involving inadvertent communication or improper disposal of personal data, portable devices, and cloud computing.
[Read more…]

avatar

The Defend Trade Secrets Act and Whistleblowers

As Freedom to Tinker readers know, I’ve been an active opponent of the federal Defend Trade Secrets Act (DTSA). Though my position on the DTSA remains unchanged, I was both surprised and pleased to see that the revised Defend Trade Secrets Act now includes a narrow, but potentially useful, provision intended to protect whistleblowers from trade secret misappropriation actions.

As attendees at yesterday’s wonderful CITP talk by Bart Gellman were fortunate to hear, whistleblowing remains a critical but imperfect tool of public access to the internal operations of our institutions, from corporations to government. Trade secrecy operates in the opposite direction, and has the robust ability to thwart regulation, limit public accountability, and criminalize whistleblowing. I’ve regularly called trade secrecy the most powerful intellectual property law (IP) tool of information control, as it prevents not just use of, but access to and even knowledge about the very existence of information. Indeed, it surpasses other IP law in that power by a wide margin. Thus, if the DTSA is moving forward, the inclusion of even a limited whistleblower exception in the DTSA is a good thing.

Nonetheless, it is very important to recognize what this provision won’t achieve. As written, the provision prevents liability under federal and state trade secret law for “the disclosure of a trade secret that … is made … in confidence to a Federal, State, or local government official, either directly or indirectly, or to an attorney; and … solely for the purpose of reporting or investigating a suspected violation of law; or … is made in a complaint or other document filed in a lawsuit or other proceeding, if such filing is made under seal.” Thus, as written, the provision does not appear to immunize sharing trade secret information with the press or the public at large. As Gellman’s work has shown, the press is often the first and only avenue for access to critical information about our public and private black boxes.

[Read more…]

avatar

Internet Voting? Really?

Recently I gave a TEDx talk—I spoke at the local Princeton University TEDx event.  My topic was voting: America’s voting systems in the 19th and 20th century, and should we vote using the Internet?  You can see the talk here:

 

Internet Voting? Really?

 

avatar

On distracted driving and required phone searches

A recent Arstechnica article discussed several U.S. states that are considering adding a “roadside textalyzer” that operates analogously to roadside Breathalyzer tests. In the same way that alcohol and drugs can impair a driver’s ability to navigate the road, so can paying attention to your phone rather than the world beyond. Many states “require” drivers to consent to Breathalyzer tests, where that “requirement” boils down to serious penalties if the driver declines. Vendors like Cellebrite are pushing for analogous requirements, for which they just happen to sell products.
[Read more…]