September 19, 2019

Watching You Watch: The Tracking Ecosystem of Over-the-Top TV Streaming Devices

By Hooman Mohajeri Moghaddam, Gunes Acar, Ben Burgess, Arunesh Mathur, Danny Y. Huang, Nick Feamster, Ed Felten, Prateek Mittal, and Arvind Narayanan

By 2020 one third of US households are estimated to “cut the cord”, i.e., discontinue their multichannel TV subscriptions and switch to internet-connected streaming services. Over-the-Top (“OTT”) streaming devices such as Roku and Amazon Fire TV, which currently sell between for $30 to $100, are cheap alternatives to smart TVs for cord-cutters. Instead of charging more for the hardware or the membership, Roku and Amazon Fire TV monetize their platforms through advertisements, which rely on tracking users’ viewing habits.

Although tracking of users on the web and on mobile is well studied, tracking on smart TVs and OTT devices has remained unexplored. To address this gap, we conducted the first study of tracking on OTT platforms. In a paper that we will present at the ACM CCS 2019 conference, we found that: 

  • Major online trackers such as Google and Facebook are also highly prominent in the OTT ecosystem. However, OTT channels also contain niche and lesser known trackers such as adrise.tv and monarchads.com.
  • The information shared with tracker domains includes video titles (see Figure 1), channel names, permanent device identifiers and wireless SSIDs.
  • Countermeasures made available to users are ineffective at preventing tracking.
  • Roku had a vulnerability that allowed malicious web pages visited by Roku users to geolocate users, read device identifiers and install channels without their consent.
 Figure 1. AsianCrush channel on Roku sends the device ID and video title to online video advertising platform spotxchange.com

Method and Findings:

Similar to how Android or iOS supports third-party apps, Amazon and Roku support third-party applications known as channels, ranging from popular channels like Netflix and CNN to several obscure ones.

Automation is one of the main challenges of studying how these channels track users. Tools that automate interaction with web pages (such as Selenium) do not exist for OTT platforms. To address this challenge, we developed a system that can automatically download OTT channels and interact with them all while intercepting the network traffic and performing best-effort TLS interception. We describe the different components of our tool in the Appendix. Using this crawler we collected data from the top 1000 channels on both Roku and the Amazon Fire TV channel stores.

The distribution of trackers by channel category and rank is shown in Figure 2. The “Games” category of Roku channels contact the most trackers: nine of the top ten channels (ordered by the number of trackers) are categorized as game channels. On the other hand, five of the ten Fire TV channels with the most trackers are “News” channels, where the top three channels contact close to 60 tracker domains each. Below we summarize our findings:

Figure 2. Distribution of trackers by channel ranks and channel categories.

Google and Facebook are among the most popular trackers

Google and Facebook domains (doubleclick.net, google-analytics.com, googlesyndication.com and facebook.com) are among the most prevalent trackers in the OTT channels on both platforms we studied. Google’s doubleclick.net appeared on 975 of the top 1000 Roku channels, while amazon-adsystem.com appeared on 687 of the top 1000 Amazon Fire TV channels.

Table 1. Most prevalent trackers on top 1000 channels on Roku (left) and Amazon (right).

User and device identifiers shared with trackers

Trackers have access to a wide range of device and user identifiers on OTT platforms. Some of these identifiers can be reset by users (e.g., Advertising IDs), while others are permanent (e.g., serial numbers, MAC addresses). To detect the identifiers shared with trackers, we followed the method described by Englehardt et al. to search for device and user identifiers in the network traffic of the top 1000 channels for each platform. This allowed us to detect leaks even when the identifiers were encoded or hashed. An overview of the leaked IDs on each platform is given in Table 2.

Table 2. Overview of identifier and information leakage detected in the Roku (left) and the FireTV (right) crawls.

Channels share video titles with third-party trackers

Out of 100 randomly selected channels on Roku and Amazon, we found 9 channels on Roku (e.g., “CBS News” and “News 5 Cleveland WEWS”)  and 14 channels on the Fire TV (e.g., “NBC News” and “Travel Channel”) that leaked the title of the video to a tracking domain. On Roku, all video titles were leaked over unencrypted connections, exposing user video history to eavesdroppers. On Fire TV, only two channels (NBC News and WRAL) used an unencrypted connection when sending the title to tracking domains.

Overwhelming majority of the channels use unencrypted connections

Out of the 1000 channels we studied on Roku and Amazon Fire TV, 794 channels on Roku and 762 on Amazon Fire TV had at least one unencrypted HTTP session, potentially exposing users’ information and identities to network adversaries.

Countermeasures

OTT platforms provide privacy options that purport to limit tracking on their devices: “Limit Ad Tracking” on Roku and ”Disable Interest-based Ads” on Amazon Fire TV. Our measurements show that these privacy options fall short of preventing tracking. Turning on these options did not change the number of trackers contacted. Turning on “Limit Ad Tracking” on Roku reduced the number of AD ID leaks from 390 to zero, but did not change the number of serial number leaks.

Roku Remote Control API Vulnerability

To investigate other ways OTT devices may compromise user privacy and security, we analyzed local API endpoints of Roku and Fire TV. OTT devices expose such interfaces to enable debugging, remote control, and home automation by mobile apps and other automation software. We discovered a vulnerability in the Roku’s remote control API that allows an attacker to:

  • send commands to install/uninstall/launch channels and collect unique identifiers from Roku devices – even when the connected display is turned off.
  • geolocate Roku users via the SSID of the wireless network
  • extract MAC address, serial number, and other unique identifiers to track users or respawn tracking identifiers (similar to evercookies).
  • get the list of installed channels and use it for profiling purposes.

We reported the vulnerability to Roku in December 2018. Roku addressed the issue and finalized rolling out their security fix by March 2019.

Going forward

Our research shows that users, who are already being pervasively tracked on the web and mobile, face another set of privacy-intrusive tracking practices when using their OTT streaming platforms. A combination of technical and policy solutions can be considered when addressing these privacy and security issues. OTT platforms should offer better privacy controls, similar to Incognito/Private Browsing Mode of modern web browsers. Insecure connections should be disincentivized by platform policies. For example, clear-text connections should be blocked unless an exception is requested by the channel. Regulators and policy makers should ensure the privacy protections available for brick and mortar video rental services, such as Video Privacy Protection Act (VPPA), are updated to cover emerging OTT platforms.

Appendix

Crawler architecture:

We set out to build a crawler to study tracking and privacy practices of OTT channels at scale. Our crawler installs a channel, launches it, and attempts to view a video on the channel, while collecting network traffic and attempting “best-effort” TLS interception. The crawler consists of a number of different hardware devices:

  • A desktop machine connected to the Internet acts as a wireless access point (AP).
  • An OTT stick connects to the Internet via the WiFi AP provided by the desktop machine. It also connects to a TV through an HDMI Capture and Split Card to sidestep the HDCP protections.

The desktop machine orchestrates our crawls and has the following software components:

  • Automatic interaction engine:
    • Remote Control API: OTT platforms provide an API to enable remote control apps to send commands such as switching or installing channels. We wrote our own wrappers for both Roku and Amazon Fire TV’s remote APIs.
    • Audio/Video processing: We process the audio from the OTT device on the desktop machine and use it to detect video playback, which guides our automatic interaction with channels. Video input is also saved as screenshots for post-processing and validation.
  • Network Capture: We collect network traffic of the OTT devices as pcap files and dump all DNS transactions in a Redis database.
  • TLS interception: We use mitmproxy to perform “best-effort” TLS interception. For each channel and each new TLS endpoint, we attempt to intercept the traffic using a self-signed certificate. If the interception fails, we add the endpoint to a no-intercept list to avoid further interception attempts. On Amazon Fire TV, we manage to root the device using a previously known vulnerability, and install mitmproxy’s self-signed certificate on the device certificate store. In addition, we use Frida to bypass certificate pinning.
Figure 3. Overview of our smart crawler.

Deconstructing Google’s excuses on tracking protection

By Jonathan Mayer and Arvind Narayanan.

Blocking cookies is bad for privacy. That’s the new disingenuous argument from Google, trying to justify why Chrome is so far behind Safari and Firefox in offering privacy protections. As researchers who have spent over a decade studying web tracking and online advertising, we want to set the record straight.

Our high-level points are:

1) Cookie blocking does not undermine web privacy. Google’s claim to the contrary is privacy gaslighting.

2) There is little trustworthy evidence on the comparative value of tracking-based advertising.

3) Google has not devised an innovative way to balance privacy and advertising; it is latching onto prior approaches that it previously disclaimed as impractical.

4) Google is attempting a punt to the web standardization process, which will at best result in years of delay.

What follows is a reproduction of excerpts from yesterday’s announcement, annotated with our comments.

Technology that publishers and advertisers use to make advertising even more relevant to people is now being used far beyond its original design intent – to a point where some data practices don’t match up to user expectations for privacy.

Google is trying to thread a needle here, implying that some level of tracking is consistent with both the original design intent for web technology and user privacy expectations. Neither is true.

If the benchmark is original design intent, let’s be clear: cookies were not supposed to enable third-party tracking, and browsers were supposed to block third-party cookies. We know this because the authors of the original cookie technical specification said so (RFC 2109, Section 4.3.5). 

Similarly, if the benchmark is user privacy expectations, let’s be clear: study after study has demonstrated that users don’t understand and don’t want the pervasive web tracking that occurs today. 

Recently, some other browsers have attempted to address this problem, but without an agreed upon set of standards, attempts to improve user privacy are having unintended consequences.

This is clearly a reference to Safari’s Intelligent Tracking Prevention and Firefox’s Enhanced Tracking Protection, which we think are laudable privacy features. We’ll get to the unintended consequences claim.

First, large scale blocking of cookies undermine people’s privacy by encouraging opaque techniques such as fingerprinting. With fingerprinting, developers have found ways to use tiny bits of information that vary between users, such as what device they have or what fonts they have installed to generate a unique identifier which can then be used to match a user across websites. Unlike cookies, users cannot clear their fingerprint, and therefore cannot control how their information is collected. We think this subverts user choice and is wrong.

To appreciate the absurdity of this argument, imagine the local police saying, “We see that our town has a pickpocketing problem. But if we crack down on pickpocketing, the pickpocketers will just switch to muggings. That would be even worse. Surely you don’t want that, do you?”

Concretely, there are several things wrong with Google’s argument. First, while fingerprinting is indeed a privacy invasion, that’s an argument for taking additional steps to protect users from it, rather than throwing up our hands in the air. Indeed, Apple and Mozilla have already taken steps to mitigate fingerprinting, and they are continuing to develop anti-fingerprinting protections.

Second, protecting consumer privacy is not like protecting security—just because a clever circumvention is technically possible does not mean it will be widely deployed. Firms face immense reputational and legal pressures against circumventing cookie blocking. Google’s own privacy fumble in 2012 offers a perfect illustration of our point: Google implemented a workaround for Safari’s cookie blocking; it was spotted (in part by one of us), and it had to settle enforcement actions with the Federal Trade Commission and state attorneys general. Afterward, Google didn’t double down—it completely backed away from tracking cookies for Safari users. Based on peer-reviewed research, including our own, we’re confident that fingerprinting continues to represent a small proportion of overall web tracking. And there’s no evidence of an increase in the use of fingerprinting in response to other browsers deploying cookie blocking.

Third, even if a large-scale shift to fingerprinting is inevitable (which it isn’t), cookie blocking still provides meaningful protection against third parties that stick with conventional tracking cookies. That’s better than the defeatist approach that Google is proposing.

This isn’t the first time that Google has used disingenuous arguments to suggest that a privacy protection will backfire. We’re calling this move privacy gaslighting, because it’s an attempt to persuade users and policymakers that an obvious privacy protection—already adopted by Google’s competitors—isn’t actually a privacy protection.

Second, blocking cookies without another way to deliver relevant ads significantly reduces publishers’ primary means of funding, which jeopardizes the future of the vibrant web. Many publishers have been able to continue to invest in freely accessible content because they can be confident that their advertising will fund their costs. If this funding is cut, we are concerned that we will see much less accessible content for everyone. Recent studies have shown that when advertising is made less relevant by removing cookies, funding for publishers falls by 52% on average.

The overt paternalism here is disappointing. Google is taking the position that it knows better than users—if users had all the privacy they want, they wouldn’t get the free content they want more. So no privacy for users.

As for the “recent studies” that Google refers to, that would be one paragraph in one blog post presenting an internal measurement conducted by Google. There is a glaring omission of the details of the measurement that are necessary to have any sort of confidence in the claim. And as long as we’re comparing anecdotes, the international edition of the New York Times recently switched from tracking-based behavioral ads to contextual and geographic ads—and it did not experience any decrease in advertising revenue.

Independent research doesn’t support Google’s claim either: the most recent academic study suggests that tracking only adds about 4% to publisher revenue. This is a topic that merits much more research, and it’s disingenuous for Google to cherry pick its own internal measurement. And it’s important to distinguish the economic issue of whether tracking benefits advertising platforms like Google (which it unambiguously does) from the economic issue of whether tracking benefits publishers (which is unclear).

Starting with today’s announcements, we will work with the web community to develop new standards that advance privacy, while continuing to support free access to content. Over the last couple of weeks, we’ve started sharing our preliminary ideas for a Privacy Sandbox – a secure environment for personalization that also protects user privacy. Some ideas include new approaches to ensure that ads continue to be relevant for users, but user data shared with websites and advertisers would be minimized by anonymously aggregating user information, and keeping much more user information on-device only. Our goal is to create a set of standards that is more consistent with users’ expectations of privacy.

There is nothing new about these ideas. Privacy preserving ad targeting has been an active research area for over a decade. One of us (Mayer) repeatedly pushed Google to adopt these methods during the Do Not Track negotiations (about 2011-2013). Google’s response was to consistently insist that these approaches are not technically feasible. For example: “To put it simply, client-side frequency capping does not work at scale.” We are glad that Google is now taking this direction more seriously, but a few belated think pieces aren’t much progress.

We are also disappointed that the announcement implicitly defines privacy as confidentiality. It ignores that, for some users, the privacy concern is behavioral ad targeting—not the web tracking that enables it. If an ad uses deeply personal information to appeal to emotional vulnerabilities or exploits psychological tendencies to generate a purchase, then that is a form of privacy violation—regardless of the technical details. 

We are following the web standards process and seeking industry feedback on our initial ideas for the Privacy Sandbox. While Chrome can take action quickly in some areas (for instance, restrictions on fingerprinting) developing web standards is a complex process, and we know from experience that ecosystem changes of this scope take time. They require significant thought, debate, and input from many stakeholders, and generally take multiple years.

Apple and Mozilla have tracking protection enabled, by default, today. And Apple is already testing privacy-preserving ad measurement. Meanwhile, Google is talking about a multi-year process for a watered-down form of privacy protection. And even that is uncertain—advertising platforms dragged out the Do Not Track standardization process for over six years, without any meaningful output. If history is any indication, launching a standards process is an effective way for Google to appear to be doing something on web privacy, but without actually delivering. 

In closing, we want to emphasize that the Chrome team is full of smart engineers passionate about protecting their users, and it has done incredible work on web security. But it is unlikely that Google can provide meaningful web privacy while protecting its business interests, and Chrome continues to fall far behind Safari and Firefox. We find this passage from Shoshana Zuboff’s The Age of Surveillance Capitalism to be apt:

“Demanding privacy from surveillance capitalists or lobbying for an end to commercial surveillance on the internet is like asking old Henry Ford to make each Model T by hand. It’s like asking a giraffe to shorten its neck, or a cow to give up chewing. These demands are existential threats that violate the basic mechanisms of the entity’s survival.”

It is disappointing—but regrettably unsurprising—that the Chrome team is cloaking Google’s business priorities in disingenuous technical arguments.

Thanks to Ryan Amos, Kevin Borgolte, and Elena Lucherini for providing comments on a draft.
 

Should we regulate the makers or users of insecure IoTs?

By Matheus V. X. Ferreira, Danny Yuxing Huang, Tithi Chattopadhyay, Nick Feamster, and S. Matthew Weinberg

Recent years have seen a proliferation of “smart-home” or IoT devices, many of which are known to contain security vulnerabilities that have been exploited to launch high-profile attacks and disrupt Internet-wide services such as Twitter and Reddit

The sellers (e.g., manufacturers) and buyers (e.g., consumers) of such devices could improve their security, but there is little incentive to do so. For the sellers, implementing security features on IoT devices, such as using encryption or having no default passwords, could introduce extra engineering cost. Similarly, security practices, such as regularly updating the firmware or using complex and difficult-to-remember passwords, may be a costly endeavor for many buyers.

As a result, sellers and buyers security practices are less than optimal, and this ends up increasing vulnerabilities that ultimately impact other buyers. In other words, their actions cause a negative externality to other buyers. This scenario where individuals act according to their own self-interest on detrimental to the common good is referred to as the tragedy of the commons.

One approach to incentivize agents to adopt optimal practices is through external regulations. In this blog post, we discuss two potential approaches that a regulator may adopt in the case of IoT security:

  • Regulating the seller – requiring minimum security standards on sellers of IoT devices;
  • Regulating the buyer – and/or encouraging IoT device buyers to adopt security practices through rewards (e.g., ISP discounts for buyers without signs of malicious network traffic) or penalties (e.g., fines for buyers of devices that engaged in DDoS attacks).

The goal of this hypothetical regulator is to minimize the negative externality due to compromised devices while maximizing the profitability of device manufacturers. We show that in some cases if buyers are rewarded for security practices (or penalized for the lack thereof), sellers can potentially earn higher profits if they implement extra security features on their devices.

Challenges in regulation

The hypothetical regulator’s ability to achieve the goal of minimizing negative externality depends on whether buyers can secure their devices more efficiently than sellers. 

If, for instance, buyers regularly update their devices’ firmware or set strong passwords, then regulating the sellers alone can be costly — i.e., inefficient. On the other hand, rewarding buyers for security practices (or penalizing them for the lack thereof) can still be inefficient if there is little buyers can do to improve security, or if they cannot distinguish good vs bad security practices. 

These challenges lead us to explore the impact the efficiency of buyers in improving their security has on regulatory effectiveness.

Modeling the efficiency of buyers’ security practices

A stochastic model captures the uncertainty of the efficiency of the buyer’s security practices when incentivized through regulation. A buyer has low efficiency when improving their effort towards security has a low impact in actually reducing security risks than if the same effort came from sellers. On the other hand, a buyer has high efficiency if improving their effort towards higher security translates in high improvements in security.

As an example, consider the buyer’s efficiency in a system where users (i.e., buyers) log into a website using passwords. Let’s first make two assumptions:

  1. We first assume that the website is secure. The probability of a user’s account being compromised depends on, for instance, how strong the password is. A weak password or a reused password is likely correlated with a high chance of the account being stolen; on the other hand, a strong, random password is correlated with the opposite. We say that the users/buyers are highly efficient in providing security with respect to the website operator (i.e., seller); in this case, efficiency > 1. Figure 1-a shows an example of the distribution of the buyers’ efficiency.
  2. We next assume that the website is not secure — e.g., running outdated server software. The probability of a buyer’s account being compromised depends less on password strength, for instance, but rather more on how insecure the website server is; in this case, efficiency < 1. Figure 1-b shows an example of the distribution of the buyers’ efficiency.

In reality, Assumptions (1) and (2) rarely exist in isolation but rather coexist in various degrees. We show an example of such in Figure 1-c.

Figure 1

The same model can be used to study scenarios where the actions of different agents cause externalities to common goods such as clean air.

Regulatory policies in a market of polluting cars often focus on regulating the production of vehicles (i.e., sellers). Once a car is purchased, there is little the buyer can do to lower pollution besides regular maintenance of the vehicle. In this case, the buyer’s efficiency would resemble Figure 1-b.

In the other extreme, in government (i.e., sellers) auctions of oil exploration licenses, firms (i.e., buyers) are regulated and fined for potential environmental impacts. When comparing the efficiency of the government vs. the efficiency of firms, firms are in a better position in adopting better practices and controlling the environmental impacts of their activities. The firms’ efficiency would resemble Figure 1-a.

Regulatory Impact on Manufacturer Profit

Another consideration for any regulator is the impact these regulations have on the profitability of a manufacturer.

Any regulation will directly (through higher production cost) or indirectly impact the sellers’ profit. By creating economic incentives for buyers to adopt better practices through fines (or taxes), we indirectly affect the prices a buyer is willing to pay for a product.

In Figure 2, we plot the maximum profit a seller can acquire in expectation from a population of buyers such that:

  • buyer’s value for the IoT device is drawn uniformly between $0 and $20; efficiency is uniform [0,1], [0,3] and [2, 3]; 
  • a regulator imposes on buyers a fine ranging from $0 to $10 and/or impose on sellers minimum production cost ranging from $0 to $5 (e.g., for investing in extra security/safety features). 

If buyers have low efficiency (Figure 2-a) and they are liable for the externalities caused by their devices, regulating the sellers can, in fact, increase the manufacturer’s profit since the regulation reduces the chance buyers are compromised. As buyers become more efficient (Figure 2-b and then 2-c), regulating the sellers can only lower profit since they prefer to provide security themselves.

Figure 2
Selecting the Best Regulation

To select the optimal regulatory strategy when constrained by minimum profit impact on sellers, we must understand the distribution of efficiency of buyers.

We show that in homogeneous markets where buyer’s ability to follow security practices is always high or always low (Figure 1-a and 1-b) — the optimal regulatory policy would be to regulate only the buyers or the sellers.

In arbitrary markets where buyer’s ability to follow security practices can have high variance (Figure 1-c), by contrast, we show that while the optimal policy may require regulating both buyers and sellers, there is always an approximately optimal policy which regulates just one. In other words, although an efficient regulation might be required to regulate both buyers and sellers, considering policies that either only creates incentives for buyers or only regulate the seller can approximate the optimal policy that potentially intervenes on both buyers and sellers.

In practice, it is challenging to completely infer all the features that can affect the efficiency of buyers — that is, precisely measure efficiency distributions Figure 1-a to 1-c. Our theoretical results provide a tool for security researchers to infer an approximately optimal regulation from an inaccurate model of the efficiency distribution.

By estimating that most of the population that purchase a device is highly efficient, we have shown that regulating only the buyer is approximately optimal. On the other hand, by estimating that the population that purchases a device is highly inefficient, regulating only the seller approximates the optimal regulation.

At the end of the day, by better understanding the efficiency of buyer’s security practices, we will be in a better position to make a decision about regulatory strategies for different information technology markets such as for the market of IoT devices without the need for complex regulation.

For more details, the full paper can be accessed at https://arxiv.org/abs/1902.10008 which was presented at The Web Conference 2019.