October 15, 2019

Watching You Watch: The Tracking Ecosystem of Over-the-Top TV Streaming Devices

By Hooman Mohajeri Moghaddam, Gunes Acar, Ben Burgess, Arunesh Mathur, Danny Y. Huang, Nick Feamster, Ed Felten, Prateek Mittal, and Arvind Narayanan

By 2020 one third of US households are estimated to “cut the cord”, i.e., discontinue their multichannel TV subscriptions and switch to internet-connected streaming services. Over-the-Top (“OTT”) streaming devices such as Roku and Amazon Fire TV, which currently sell between for $30 to $100, are cheap alternatives to smart TVs for cord-cutters. Instead of charging more for the hardware or the membership, Roku and Amazon Fire TV monetize their platforms through advertisements, which rely on tracking users’ viewing habits.

Although tracking of users on the web and on mobile is well studied, tracking on smart TVs and OTT devices has remained unexplored. To address this gap, we conducted the first study of tracking on OTT platforms. In a paper that we will present at the ACM CCS 2019 conference, we found that: 

  • Major online trackers such as Google and Facebook are also highly prominent in the OTT ecosystem. However, OTT channels also contain niche and lesser known trackers such as adrise.tv and monarchads.com.
  • The information shared with tracker domains includes video titles (see Figure 1), channel names, permanent device identifiers and wireless SSIDs.
  • Countermeasures made available to users are ineffective at preventing tracking.
  • Roku had a vulnerability that allowed malicious web pages visited by Roku users to geolocate users, read device identifiers and install channels without their consent.
 Figure 1. AsianCrush channel on Roku sends the device ID and video title to online video advertising platform spotxchange.com

Method and Findings:

Similar to how Android or iOS supports third-party apps, Amazon and Roku support third-party applications known as channels, ranging from popular channels like Netflix and CNN to several obscure ones.

Automation is one of the main challenges of studying how these channels track users. Tools that automate interaction with web pages (such as Selenium) do not exist for OTT platforms. To address this challenge, we developed a system that can automatically download OTT channels and interact with them all while intercepting the network traffic and performing best-effort TLS interception. We describe the different components of our tool in the Appendix. Using this crawler we collected data from the top 1000 channels on both Roku and the Amazon Fire TV channel stores.

The distribution of trackers by channel category and rank is shown in Figure 2. The “Games” category of Roku channels contact the most trackers: nine of the top ten channels (ordered by the number of trackers) are categorized as game channels. On the other hand, five of the ten Fire TV channels with the most trackers are “News” channels, where the top three channels contact close to 60 tracker domains each. Below we summarize our findings:

Figure 2. Distribution of trackers by channel ranks and channel categories.

Google and Facebook are among the most popular trackers

Google and Facebook domains (doubleclick.net, google-analytics.com, googlesyndication.com and facebook.com) are among the most prevalent trackers in the OTT channels on both platforms we studied. Google’s doubleclick.net appeared on 975 of the top 1000 Roku channels, while amazon-adsystem.com appeared on 687 of the top 1000 Amazon Fire TV channels.

Table 1. Most prevalent trackers on top 1000 channels on Roku (left) and Amazon (right).

User and device identifiers shared with trackers

Trackers have access to a wide range of device and user identifiers on OTT platforms. Some of these identifiers can be reset by users (e.g., Advertising IDs), while others are permanent (e.g., serial numbers, MAC addresses). To detect the identifiers shared with trackers, we followed the method described by Englehardt et al. to search for device and user identifiers in the network traffic of the top 1000 channels for each platform. This allowed us to detect leaks even when the identifiers were encoded or hashed. An overview of the leaked IDs on each platform is given in Table 2.

Table 2. Overview of identifier and information leakage detected in the Roku (left) and the FireTV (right) crawls.

Channels share video titles with third-party trackers

Out of 100 randomly selected channels on Roku and Amazon, we found 9 channels on Roku (e.g., “CBS News” and “News 5 Cleveland WEWS”)  and 14 channels on the Fire TV (e.g., “NBC News” and “Travel Channel”) that leaked the title of the video to a tracking domain. On Roku, all video titles were leaked over unencrypted connections, exposing user video history to eavesdroppers. On Fire TV, only two channels (NBC News and WRAL) used an unencrypted connection when sending the title to tracking domains.

Overwhelming majority of the channels use unencrypted connections

Out of the 1000 channels we studied on Roku and Amazon Fire TV, 794 channels on Roku and 762 on Amazon Fire TV had at least one unencrypted HTTP session, potentially exposing users’ information and identities to network adversaries.

Countermeasures

OTT platforms provide privacy options that purport to limit tracking on their devices: “Limit Ad Tracking” on Roku and ”Disable Interest-based Ads” on Amazon Fire TV. Our measurements show that these privacy options fall short of preventing tracking. Turning on these options did not change the number of trackers contacted. Turning on “Limit Ad Tracking” on Roku reduced the number of AD ID leaks from 390 to zero, but did not change the number of serial number leaks.

Roku Remote Control API Vulnerability

To investigate other ways OTT devices may compromise user privacy and security, we analyzed local API endpoints of Roku and Fire TV. OTT devices expose such interfaces to enable debugging, remote control, and home automation by mobile apps and other automation software. We discovered a vulnerability in the Roku’s remote control API that allows an attacker to:

  • send commands to install/uninstall/launch channels and collect unique identifiers from Roku devices – even when the connected display is turned off.
  • geolocate Roku users via the SSID of the wireless network
  • extract MAC address, serial number, and other unique identifiers to track users or respawn tracking identifiers (similar to evercookies).
  • get the list of installed channels and use it for profiling purposes.

We reported the vulnerability to Roku in December 2018. Roku addressed the issue and finalized rolling out their security fix by March 2019.

Going forward

Our research shows that users, who are already being pervasively tracked on the web and mobile, face another set of privacy-intrusive tracking practices when using their OTT streaming platforms. A combination of technical and policy solutions can be considered when addressing these privacy and security issues. OTT platforms should offer better privacy controls, similar to Incognito/Private Browsing Mode of modern web browsers. Insecure connections should be disincentivized by platform policies. For example, clear-text connections should be blocked unless an exception is requested by the channel. Regulators and policy makers should ensure the privacy protections available for brick and mortar video rental services, such as Video Privacy Protection Act (VPPA), are updated to cover emerging OTT platforms.

Appendix

Crawler architecture:

We set out to build a crawler to study tracking and privacy practices of OTT channels at scale. Our crawler installs a channel, launches it, and attempts to view a video on the channel, while collecting network traffic and attempting “best-effort” TLS interception. The crawler consists of a number of different hardware devices:

  • A desktop machine connected to the Internet acts as a wireless access point (AP).
  • An OTT stick connects to the Internet via the WiFi AP provided by the desktop machine. It also connects to a TV through an HDMI Capture and Split Card to sidestep the HDCP protections.

The desktop machine orchestrates our crawls and has the following software components:

  • Automatic interaction engine:
    • Remote Control API: OTT platforms provide an API to enable remote control apps to send commands such as switching or installing channels. We wrote our own wrappers for both Roku and Amazon Fire TV’s remote APIs.
    • Audio/Video processing: We process the audio from the OTT device on the desktop machine and use it to detect video playback, which guides our automatic interaction with channels. Video input is also saved as screenshots for post-processing and validation.
  • Network Capture: We collect network traffic of the OTT devices as pcap files and dump all DNS transactions in a Redis database.
  • TLS interception: We use mitmproxy to perform “best-effort” TLS interception. For each channel and each new TLS endpoint, we attempt to intercept the traffic using a self-signed certificate. If the interception fails, we add the endpoint to a no-intercept list to avoid further interception attempts. On Amazon Fire TV, we manage to root the device using a previously known vulnerability, and install mitmproxy’s self-signed certificate on the device certificate store. In addition, we use Frida to bypass certificate pinning.
Figure 3. Overview of our smart crawler.

User Perceptions of Smart Home Internet of Things (IoT) Privacy

by Noah Apthorpe

This post summarizes a research paper, authored by Serena Zheng, Noah Apthorpe, Marshini Chetty, and Nick Feamster from Princeton University, which is available here. The paper will be presented at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) on November 6, 2018.

Smart home Internet of Things (IoT) devices have a growing presence in consumer households. Learning thermostats, energy tracking switches, video doorbells, smart baby monitors, and app- and voice-controlled lights, speakers, and other devices are all increasingly available and affordable. Many of these smart home devices continuously monitor user activity, raising privacy concerns that may pose a barrier to adoption.

In this study, we conducted 11 interviews of early adopters of smart home technology in the United States, investigating their reasons for purchasing smart-home IoT devices, perceptions of smart home privacy risks, and actions taken to protect their privacy from entities external to the home who create, manage, track, or regulate IoT devices and their data.

We recruited participants by posting flyers in the local area, emailing listservs, and asking through word of mouth. Our recruiting resulted in six female and five male interviewees, ranging from 23–45 years old. The majority of participants were from the Seattle metropolitan area, but included others from New Jersey, Colorado, and Texas. The participants came from a variety of living arrangements, including families, couples, and roommates. All participants were fairly affluent, technically skilled, and highly interested in new technology, fitting the profile of “early adopters.” Each interview began with a tour of the participant’s smart home, followed by a semi-structured conversation with specific questions from an interview guide and open-ended follow-up discussions on topics of interest to each participant.

The participants owned a wide variety of smart home devices and shared a broad range of experiences about how these devices have impacted their lives. They also expressed a range of privacy concerns, including intentional purchasing and device interaction decisions made based on privacy considerations. We performed open coding on transcripts of the interviews and identified four common themes:

  1. Convenience and connectedness are priorities for smart home device users. These values dictate privacy opinions and behaviors. Most participants cited the ability to stay connected to their homes, families, or pets as primary reasons for purchasing and using smart home devices. Values of convenience and connectedness outweighed other concerns, including obsolescence, security, and privacy. For example, one participant commented, “I would be willing to give up a bit of privacy to create a seamless experience, because it makes life easier.”
  2. User opinions about who should have access to their smart home data depend on perceived benefit from entities external to the home, such as device manufacturers, advertisers, Internet service providers, and the government. For example, participants felt more comfortable sharing their smart home data with advertisers if they believed that they would receive improved targeted advertising experiences.
  3. User assumptions about privacy protections are contingent on their trust of IoT device manufacturers. Participants tended to trust large technology companies, such as Google and Amazon, to have the technical means to protect their data, although they could not confirm if these companies actually performed encryption or anonymization. Participants also trusted home appliance and electronics brands, such as Philips and Belkin, although these companies have limited experience making Internet-connected appliances. Participants generally rationalized their reluctance to take extra steps to protect their privacy by referring to their trust in IoT device manufacturers to not do anything malicious with their data.
  4. Users are less concerned about privacy risks from devices that do not record audio or video. However, researchers have demonstrated that metadata from non-A/V smart home devices, such as lightbulbs and thermostats, can provide enough information to infer user activities, such as home occupancy, work routines, and sleeping patterns. Additional outreach is needed to inform consumers about non-A/V privacy risks.

Recommendations. These themes motivate recommendations for smart home device designers, researchers, regulators, and industry standards bodies. Participants’ desires for convenience and trust in IoT device manufacturers limit their willingness to take action to verify or enforce smart home data privacy. This means that privacy notifications and settings must be exceptionally clear and convenient, especially for smart home devices without screens. Improved cybersecurity and privacy regulation, combined with industry standards outlining best privacy practices, would also reduce the burden on users to manage their own privacy. We encourage follow-up studies examining the effects of smart home devices on privacy between individuals within a household and comparing perceptions of smart home privacy in different countries.

For more details about our interview findings and corresponding recommendations, please read our paper or see our presentation at CSCW 2018.

Full citation: Serena Zheng, Noah Apthorpe, Marshini Chetty, and Nick Feamster. 2018. User Perceptions of Smart Home IoT Privacy. In Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW, Article 200 (November 2018), 20 pages. https://doi.org/10.1145/3274469

Internet of Things in Context: Discovering Privacy Norms with Scalable Surveys

by Noah Apthorpe, Yan Shvartzshnaider, Arunesh Mathur, Nick Feamster

Privacy concerns surrounding disruptive technologies such as the Internet of Things (and, in particular, connected smart home devices) have been prevalent in public discourse, with privacy violations from these devices occurring frequently. As these new technologies challenge existing societal norms, determining the bounds of “acceptable” information handling practices requires rigorous study of user privacy expectations and normative opinions towards information transfer.

To better understand user attitudes and societal norms concerning data collection, we have developed a scalable survey method for empirically studying privacy in context.  This survey method uses (1) a formal theory of privacy called contextual integrity and (2) combinatorial testing at scale to discover privacy norms. In our work, we have applied the method to better understand norms concerning data collection in smart homes. The general method, however, can be adapted to arbitrary contexts with varying actors, information types, and communication conditions, paving the way for future studies informing the design of emerging technologies. The technique can provide meaningful insights about privacy norms for manufacturers, regulators, researchers and other stakeholders.  Our paper describing this research appears in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.

Scalable CI Survey Method

Contextual integrity. The survey method applies the theory of contextual integrity (CI), which frames privacy in terms of the appropriateness of information flows in defined contexts. CI offers a framework to describe flows of information (attributes) about a subject from a sender to a receiver, under specific conditions (transmission principles).  Changing any of these parameters of an information flow could result in a violation of privacy.  For example, a flow of information about your web searches from your browser to Google may be appropriate, while the same information flowing from your browser to your ISP might be inappropriate.

Combinatorial construction of CI information flows. The survey method discovers privacy norms by asking users about the acceptability of a large number of information flows that we automatically construct using the CI framework. Because the CI framework effectively defines an information flow as a tuple (attributes, subject, sender, receiver, and transmission principle), we can automate the process of constructing information flows by defining a range of parameter values for each tuple and generating a large number of flows from combinations of parameter values.

Applying the Survey Method to Discover Smart Home Privacy Norms

We applied the survey method to 3,840 IoT-specific information flows involving a range of device types (e.g., thermostats, sleep monitors), information types (e.g., location, usage patterns), recipients (e.g., device manufacturers, ISPs) and transmission principles (e.g., for advertising, with consent). 1,731 Amazon Mechanical Turk workers rated the acceptability of these information flows on a 5-point scale from “completely unacceptable” to “completely acceptable”.

Trends in acceptability ratings across information flows indicate which context parameters are particularly relevant to privacy norms. For example, the following heatmap shows the average acceptability ratings of all information flows with pairwise combinations of recipients and transmission principles.

Average acceptability scores of information flows with given recipient/transmission principle pairs.

Average acceptability scores of information flows with given recipient/transmission principle pairs. For example, the top left box shows the average acceptability score of all information flows with the recipient “its owner’s immediate family” and the transmission principle “if its owner has given consent.” Higher (more blue) scores indicate that flows with the corresponding parameters are more acceptable, while lower (more red) scores indicate that the flows are less acceptable. Flows with the null transmission principle are controls with no specific condition on their occurrence. Empty locations correspond to less intuitive information flows that were excluded from the survey. Parameters are sorted by descending average acceptability score for all information flows containing that parameter.

These results provide several insights about IoT privacy, including the following:

  • Advertising and Indefinite Data Storage Generally Violate Privacy Norms. Respondents viewed information flows from IoT devices for advertising or for indefinite storage as especially unacceptable. Unfortunately, advertising and indefinite storage remain standard practice for many IoT devices and cloud services.
  • Transitive Flows May Violate Privacy Norms. Consider a device that sends its owner’s location to a smartphone, and the smartphone then sends the location to a manufacturer’s cloud server. This device initiates two information flows: (1) to the smartphone and (2) to the phone manufacturer. Although flow #1 may conform to user privacy norms, flow #2 may violate norms. Manufacturers of devices that connect to IoT hubs (often made by different companies), rather than directly to cloud services, should avoid having these devices send potentially sensitive information with greater frequency or precision than necessary.

Our paper expands on these findings, including more details on the survey method, additional results, analyses, and recommendations for manufacturers, researchers, and regulators.

We believe that the survey method we have developed is broadly applicable to studying societal privacy norms at scale and can thus better inform privacy-conscious design across a range of domains and technologies.