May 24, 2017

Archives for October 2016

The Effects of the Forthcoming FCC Privacy Rules on Internet Security

Last week, the Federal Communications Commission (FCC) announced new privacy rules that govern how Internet service providers can share information about consumers with third parties.  One focus of this rulemaking has been on the use and sharing of so-called “Consumer Proprietary Network Information (CPNI)”—information about subscribers—for advertising. The Center for Information Technology Policy and the Center for Democracy and Technology jointly hosted a panel exploring this topic last May, and I have previously written on certain aspects of this issue, including what ISPs might be able to infer about user behavior, even if network traffic were encrypted.

Although the forthcoming rulemaking targets the collection, use, and sharing of customer data with “third parties”, an important—and oft-forgotten—facet of this discussion is that (1) ISPs rely on the collection, use, and sharing of CPNI to operate and secure their networks and (2) network researchers (myself included) rely on this data to conduct our research.  As one example of our work that is discussed today in the Wall Street Journal, we used DNS domain registration data to identify cybercriminals before they launch attacks. Performing this research required access to all .com domain registrations. We have also developed algorithms that detect the misuse of DNS domain names by analyzing the DNS lookups themselves. We have also worked with ISPs to explore the relationship between Internet speeds and usage, which required access to byte-level usage data from individual customers. ISPs also rely on third parties, including Verisign and Arbor Networks, to detect and mitigating attacks; network equipment vendors also use traffic traces from ISPs to test new products and protocols. In summary, although the goal of the FCC’s rulemaking is to protect the use of consumer data, the rulemaking could have had unintended negative consequences for the stability and security of the Internet, as well as for Internet innovation.

In response to the potential negative effects this rule could have created for Internet security and networking researchers, I filed comment with the FCC highlighting how network operators researchers depend on data to keep the network operating well, to keep it secure, and to foster continued innovation.  My comment in May highlights the type of data that Internet service providers (ISPs) collect, how they use it for operational and research purposes, and potential privacy concerns with each of these datasets.  In my comment, I exhaustively enumerate the types of data that ISPs collect; the following data types are particularly interesting because ISPs and researchers rely on them heavily, yet they also introduce certain privacy concerns:

  • IPFIX (“NetFlow”) data, which is the Internet traffic equivalent of call data records. IPFIX data is collected at a router and contains statistics about each traffic flow that traverses the router. It contains information about the “metadata” of each flow (e.g., the source and destination IP address, the start and end time of the flow). This data doesn’t contain “payload” information, but as previous research on information like telephone metadata has shown, a lot can be learned about a user from this kind of information. Nonetheless, this data has been used in research and security for many purposes, including (among other things) detecting botnets and denial of service attacks.
  • DNS Query data, which contains information about the domain names that each IP address (i.e., customer) is looking up (i.e., from a Web browser, from an IoT device, etc.). DNS query data can be highly revealing, as we have shown in previous work. Yet, at the same time, DNS query data is also incredibly valuable for detecting Internet abuse, including botnets and malware.

Over the summer, I gave a follow-up a presentation and filed follow-up comments (several of which were jointly authored with members of the networking and security research community) to help draw attention to how much Internet research depends on access to this type of data.  In early August, a group of us filed a comment with proposed wording for the upcoming rule. In this comment, we delineated the types of work that should be exempt from the upcoming rules. We argue that research should be exempt from the rulemaking if the research: (1) aims to promote security, stability, and reliability of networks, (2) does not have the end-goal of violating user privacy; (3) has benefits that outweigh the privacy risks; (4) takes steps to mitigate privacy risks; (5) would be enhanced by access to the ISP data.  In delineating this type of research, our goal was to explicitly “carve out” researchers at universities and research labs without opening a loophole for third-party advertisers.

Of course, the exception notwithstanding, researchers also should be mindful of user privacy when conducting research. Just because a researcher is “allowed” to receive a particular data trace from an ISP does not mean that such data should be shared. For example, much network and security research is possible with de-identified network traffic data (e.g., data with anonymized IP addresses), or without packet “payloads” (i.e., the kind of traffic data collected with Deep Packet Inspection). Researchers and ISPs should always take care to apply data minimization techniques that limit the disclosure of private information to only the granularity that is necessary to perform the research. Various practices for minimization exist, such as hashing or removing IP addresses, aggregating statistics over longer time windows, and so forth. The network and security research communities should continue developing norms and standard practices for deciding when, how, and to what degree private data from ISPs can be minimized when it is shared.

The FCC, ISPs, customers, and researchers should all care about the security, operation, and performance of the Internet.  Achieving these goals often involves sharing customer data with third-parties, such as the network and security research community. As a member of the research community, I am looking forward to reading the text of the rule, which, if our comments are incorporated, will help preserve both customer privacy and the research that keeps the Internet secure and performing well.

Learning Privacy Expectations by Crowdsourcing Contextual Informational Norms

[This post reports on joint work with Schrasing Tong, Thomas Wies (NYU), Paula Kift (NYU), Helen Nissenbaum (NYU), Lakshminarayanan Subramanian (NYU), Prateek Mittal (Princeton) — Yan]

To appear in the proceedings of the Fourth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2016)

We would like to thank Joanna Huey for helpful comments and feedback.

Motivation

The advent of social apps, smart phones and ubiquitous computing has brought a great transformation to our day-to-day life. The incredible pace with which the new and disruptive services continue to emerge challenges our perception of privacy. To keep apace with this rapidly evolving cyber reality, we need to devise agile methods and frameworks for developing privacy-preserving systems that align with evolving user’s privacy expectations.

Previous efforts [1,2,3] have tackled this with the assumption that privacy norms are provided through existing sources such law, privacy regulations and legal precedents. They have focused on formally expressing privacy norms and devising a corresponding logic to enable automatic inconsistency checks and efficient enforcement of the logic.

However, because many of the existing regulations and privacy handbooks were enacted well before the Internet revolution took place, they often lag behind and do not adequately reflect the application of logic in modern systems. For example, the Family Rights and Privacy Act (FERPA) was enacted in 1974, long before Facebook, Google and many other online applications were used in an educational context. More recent legislation faces similar challenges as novel services introduce new ways to exchange information, and consequently shape new, unconsidered information flows that can change our collective perception of privacy.

Crowdsourcing Contextual Privacy Norms

Armed with the theory of Contextual Integrity (CI) in our work, we are exploring ways to uncover societal norms by leveraging the advances in crowdsourcing technology.  

In our recent paper, we present the methodology that we believe can be used to extract a societal notion of privacy expectations. The results can be used to fine tune the existing privacy guidelines as well as get a better perspective on the users’ expectations of privacy. [Read more…]

Sign up now for the first workshop on Data and Algorithmic Transparency

I’m excited to announce that registration for the first workshop on Data and Algorithmic Transparency is now open. The workshop will take place at NYU on Nov 19. It convenes an emerging interdisciplinary community that seeks transparency and oversight of data-driven algorithmic systems through empirical research.

Despite the short notice of the workshop’s announcement (about six weeks before the submission deadline), we were pleasantly surprised by the number and quality of the submissions that we received. We ended up accepting 15 papers, more than we’d originally planned to, and still had to turn away good papers. The program includes both previously published work and original papers submitted to the workshop, and has just the kind of multidisciplinary mix we were looking for.

We settled on a format that’s different from the norm but probably familiar to many of you. We have five panels, one on each of the five main themes that emerged from the papers. The panels will begin with brief presentations, with the majority of the time devoted to in-depth discussions led by one or two commenters who will have read the papers beforehand and will engage with the authors. We welcome the audience to participate; to enable productive discussion, we encourage you to read or skim the papers beforehand. The previously published papers are available to read; the original papers will be made available in a few days.

I’m very grateful to everyone on our program committee for their hard work in reviewing and selecting papers. We received very positive feedback from authors on the quality of reviews of the original papers, and I was impressed by the work that the committee put in.

Finally, note that the workshop will take place at NYU rather than Columbia as originally announced. We learnt some lessons on the difficulty of finding optimal venues in New York City on a limited budget. Thanks to Solon Barocas and Augustin Chaintreau for their efforts in helping us find a suitable venue!

See you in three weeks, and don’t forget the related and colocated DTL and FAT-ML events.