October 1, 2022

Archives for April 2022

 A Multi-pronged Strategy for Securing Internet Routing

By Henry Birge-Lee, Nick Feamster, Mihir Kshirsagar, Prateek Mittal, Jennifer Rexford

The Federal Communications Commission (FCC) is conducting an inquiry into how it can help protect against security vulnerabilities in the internet routing infrastructure. A number of large communication companies have weighed in on the approach the FCC should take. 

CITP’s Tech Policy Clinic convened a group of experts in information security, networking, and internet policy to submit an initial comment offering a public interest perspective to the FCC. This post summarizes our recommendations on why the government should take a multi-pronged strategy to promote security that involves incentives and mandates. Reply comments from the public are due May 11.

The core challenge in securing the internet routing infrastructure is that the original design of the network did not prioritize security against adversarial attacks. Instead, the original design focused on how to route traffic through decentralized networks with the goal of delivering information packets efficiently, while not dropping traffic. 

At the heart of this routing system is the Border Gateway Protocol (BGP), which allows independently-administered networks (Autonomous Systems or ASes) to announce reachability to IP address blocks (called prefixes) to neighboring networks. But BGP has no built-in mechanism to distinguish legitimate routes from bogus routes. Bogus routing information can redirect internet traffic to a strategic adversary, who can launch a variety of attacks, or the bogus routing can lead to accidental outages or performance issues. Network operators and researchers have been actively developing measures to counteract this problem.

At a high level, the current suite of BGP security measures depend on building systems to validate routes. But for these technologies to work, most participants have to adopt them or the security improvements will not be realized. In other words, it has many of the hallmarks of a “chicken and egg” situation. As a result, there is no silver bullet to address routing security.

Instead, we argue, the government needs a cross-layer strategy that embraces pushing different elements of the infrastructure to adopt security measures that protect legitimate traffic flows using a carrot-and-stick approach. Our comment identifies specific actions Internet Service Providers, Content Delivery Networks and Cloud Providers, Internet Exchange Points, Certificate Authorities, Equipment Manufacturers, and DNS Providers should take to improve security. We also recommend that the government funds and supports academic research centers that collect real-time data from a variety of sources that measure traffic and how it is routed across the internet.  

We anticipate several hurdles to our recommended cross-layer approach: 

First, to mandate the cross-layer security measures, the FCC has to have regulatory authority over the relevant players. And, to the extent a participant does not fall under the FCC’s authority, the FCC should develop a whole-of-government approach to secure the routing infrastructure.

Second, large portions of the internet routing infrastructure lie outside the jurisdiction of the United States. As such, there are international coordination issues that the FCC will have to navigate to achieve the security properties needed. That said, if there is a sufficient critical mass of providers who participate in the security measures, that could create a tipping point for a larger global adoption.

Third, the package of incentives and mandates that the FCC develops has to account for the risk that there will be recalcitrant small and medium sized firms who might undermine the comprehensive approach that is necessary to truly secure the infrastructure.

Fourth, while it is important to develop authenticated routes for traffic to counteract adversaries, there is an under-appreciated risk from a flipped threat model – the risk that an adversary takes control of an authenticated node and uses that privileged position to disrupt routing. There are no easy fixes to this threat – but an awareness of this risk can allow for developing systems to detect such actions, especially in international contexts.  

How the National AI Research Resource can steward the datasets it hosts

Last week I participated on a panel about the National AI Research Resource (NAIRR), a proposed computing and data resource for academic AI researchers. The NAIRR’s goal is to subsidize the spiraling costs of many types of AI research that have put them out of reach of most academic groups.

My comments on the panel were based on a recent study by researchers Kenny Peng, Arunesh Mathur, and me (NeurIPS ‘21) on the potential harms of AI. We looked at almost 1,000 research papers to analyze how they used datasets, which are the engine of AI.

Let me briefly mention just two of the many things we found, and then I’ll present some ideas for NAIRR based on our findings. First, we found that “derived datasets” are extremely common. For example, there’s a popular facial recognition dataset called Labeled Faces in the Wild, and there are at least 20 new datasets that incorporate the original data and extend it in some way. One of them adds race and gender annotations. This means that a dataset may enable new harms over time. For example, once you have race annotations, you can use it to build a model that tracks the movement of ethnic minorities through surveillance cameras, which some governments seem to be doing.

We also found that dataset creators are aware of their potential for misuse, so they often have licenses restricting their use for research and not for commercial purposes. Unfortunately, we found evidence that many companies simply get around this by downloading a model pre-trained on that dataset (in a research context) and using that model in commercial products.

Stepping back, the main takeaway from our paper is that dataset creators can sometimes — but not always — anticipate the ways in which a dataset might be used or misused in harmful ways. So we advocate for what we call dataset stewarding, which is a governance process that lasts throughout the lifecycle of a dataset. Note that some prominent datasets see active use for decades.

I think NAIRR is ideally positioned to be the steward of the datasets that it hosts, and perform a vital governance role over datasets and, in turn, over AI research. Here are a few specific things NAIRR could do, starting with the most lightweight ones.

1. NAIRR should support a communication channel between a dataset creator and the researchers who use that dataset. For example, if ethical problems — or even scientific problems — are uncovered in a dataset, it should be possible to notify users about it. As trivial as this sounds, it is not always the case today. Prominent datasets have been retracted over ethical concerns without a way to notify the people who had downloaded it.

2. NAIRR should standardize dataset citation practices, for example, by providing Digital Object Identifiers (DOIs) for datasets. We found that citation practices are chaotic, and there is currently no good way to find all the papers that use a dataset to check for misuse.

3. NAIRR could publish standardized dataset licenses. Dataset creators aren’t legal experts, and most of the licenses don’t accomplish what dataset creators want them to accomplish, enabling misuse.

4. NAIRR could require some analog of broader impact statements as part of an application for data or compute resources. Writing a broader impact statement could encourage ethical reflection by the authors. (A recent study found evidence that the NeurIPS broader impact requirement did result in authors reflecting on the societal consequences of their technical work.) Such reflection is valuable even if the statements are not actually used for decision making about who is approved. 

5. NAIRR could require some sort of ethical review of proposals. This goes beyond broader impact statements by making successful review a condition of acceptance. One promising model is the Ethics and Society Review instituted at Stanford. Most ethical issues that arise in AI research fall outside the scope of Institutional Review Boards (IRBs), so even a lightweight ethical review process could help prevent obvious-in-hindsight ethical lapses.

6. If researchers want to use a dataset to build and release a derivative dataset or pretrained model, then there should be an additional layer of scrutiny, because these involve essentially republishing the dataset. In our research, we found that this is the start of an ethical slippery slope, because data and models can be recombined in various ways and the intent of the original dataset can be lost.

7. There should be a way for people to report to NAIRR that some ethics violation is going on. The current model, for lack of anything better, is vigilante justice: journalists, advocates, or researchers sometimes identify ethical issues in datasets, and if the resulting outcry is loud enough, dataset creators feel compelled to retract or modify them. 

8. NAIRR could effectively partner with other entities that have emerged as ethical regulators. For example, conference program committees have started to incorporate ethics review. If NAIRR made it easy for peer reviewers to check the policies for any given data or compute resource, that would let them verify that a submitted paper is compliant with those policies.

There is no single predominant model for ethical review of AI research analogous to the IRB model for biomedical research. It is unlikely that one will emerge in the foreseeable future. Instead, a patchwork is taking shape. The NAIRR is set up to be a central player in AI research in the United States and, as such, bears responsibility for ensuring that the research that it supports is aligned with societal values.

——–

I’m grateful to the NAIRR task force for inviting me and to my fellow panelists and moderators for a stimulating discussion.  I’m also grateful to Sayash Kapoor and Mihir Kshirsagar, with whom I previously submitted a comment on this topic to the relevant federal agencies, and to Solon Barocas for helpful discussions.

A final note: the aims of the NAIRR have themselves been contested and are not self-evidently good. However, my comments (and the panel overall) assumed that the NAIRR will be implemented largely as currently conceived, and focused on harm mitigation.

CITP Case Study on Regulating Facial Recognition Technology in Canada

Canada, like many jurisdictions in the United States, is grappling with the growing usage of facial recognition technology in the private and public sectors. This technology is being deployed at a rapid pace in airports, retail stores, social media platforms, and by law enforcement – with little oversight from the government. 

To help address this challenge, I organized a tech policy case study on the regulation of facial recognition technology with Canadian members of parliament – The Honorable Greg Fergus and Matthew Green. Both sit on the House of Commons’ Standing Committee on Access to Information, Privacy, and Ethics (ETHI) Committee and I served as a legislative aide to them through the Parliamentary Internship Programme before joining CITP. Our goal for the session was to put policymakers in conversation with subject matter experts. 

The core problem is that there is lack of accountability in the use of facial recognition technology that excarbates historical forms of discrimination and puts marginalized communities at risk for a wide range of harms. For instance, a recent story describes the fate of three black men who were wrongfully arrested because of being misidentified by facial recognition software. As the Canadian Civil Liberties Association argues, the police’s use of facial recognition technology, notably provided by the New York-based company, Clearview AI, “points to a larger crisis in police accountability when acquiring and using emerging surveillance tools.

A number of academics and researchers – such as DAIR Instititute’s Timnit Gebru and the Algorithmic Justice League’s Joy  Buolamwini, who documented the missclassification of darker-skinned women in a recent paper – are bringing attention to the discriminatory algorithms associated with facial recognition that have put racialized people, women, and members of the LGBTIQ community, at greater risk of false identification.  

Meanwhile, Canadian officials are beginning to tackle the real world consequences of the use of facial recognition. A year ago, the Office of the Privacy Commissioner found that Clearview AI, had scraped billions of images of people from from the internet in what “represented mass surveillance and was a clear violation of the privacy rights of Canadians.” 

Following that investigation, Clearview AI stopped providing services to the Canadian market, including the Royal Canadian Mounted Police. In light of these findings and the absence of dedicated legislation, the ETHI Committee began studying the uses of facial recognition technology in May 2021, and has recently resumed this work by focusing on the use by various levels of government in Canada, law enforcement agencies, and private corporations. 

The CITP case study session on March 24, began with a presentation by Angelina Wang, a graduate affiliate of CITP, who provided a technical overview where she explained the different functions and harms associated with this technology. Following Wang’s presentation, I provided a regulatory overview of how U.S. lawmakers have addressed facial recognition by noting the different legislative strategies deployed for law enforcement, private, and public sector uses. We then had a substantive, free-flowing discussion with CITP researchers and the policymakers about the challenges and opportunities for different regulatory strategies. 

Following CITP’s case study session, Wang and Dr. Elizabeth Anne Watkins, a CITP Fellow, were invited to testify before the ETHI committee in an April 4 hearing. Wang discussed the different tasks facial recognition technology can and cannot perform, how the models are created, why they are susceptible to adversarial attacks, and the ethical implications behind the creation of this technology. Dr. Watkins’ testimony provided an overview of the privacy, security, and safety concerns related to the private industry’s use of facial verification on workers as informed by her research.  The committee is expected to report its findings by the end of May 2022. 

We continue to do research on how Canada might regulate facial recognition technology and will publish those analyses in the coming months.