October 1, 2022

Archives for March 2022

Recommendations for introducing greater safeguards and transparency into CS conference funding

In Part 1 of this piece, I provided evidence of the extent to which some of the world’s top computer science conferences are financially reliant upon some of the world’s most powerful technology companies. In this second part, I lay out a set of recommendations for ways to help ensure that these entanglements of industry and academia don’t grant companies undue influence over the conditions of knowledge creation and exchange.

To be clear, I am not suggesting that conferences stop accepting money from tech companies, nor am I saying there is no place for Big Tech investment in academic research. I am simply advocating for conference organizers to adopt greater safeguards to increase transparency and mitigate the potential agenda-setting effects associated with companies’ funding of and presence in academic spaces.

While I am not claiming that sponsors have any say over which papers are or aren’t published, in the next few paragraphs I will show how agenda-setting can happen in a much more subtle yet pervasive way.

Resurrecting conferences as “trading zones”

Setting the agenda in a given field means determining and prioritizing topics of focus and investment. Research priorities are not neutral or naturally occurring—they are the result of social and political construction. And because a great deal of CS funding comes from tech companies, these priorities are likely to be shaped by what is considered valuable or profitable to those companies.

An example of the tech industry’s agenda-setting power includes the way in which AI/ML research has been conceptualized in narrower terms to prioritize technical work. For instance, despite its valuable contributions to the understanding of priorities inherent in ML research, the Birhane et. al. paper I cited in Part 1 was rejected from publication at the 2021 NeurIPS Conference with a dismissive meta-review, which is just one example of how the ML community has marginalized critical work and elevated technical work. Other examples of corporate agenda-setting in CS include the aforementioned way in which tech companies’ definitions of privacy and security vary from those of consumer advocates, and the way in which the field of human-computer interaction (HCI) often focuses on influencing user behavior rather than stepping back to reflect on necessary systemic changes at the platform level.

In deciding which conferences to fund, and shaping which ideas and work get elevated within those conferences, tech companies contribute to the creation of a prestige hierarchy. This, in turn, influences which kinds of people who self-select into submitting their work to and attending those conferences. Further, the sponsorship perks afford companies a prominent presence at CS conferences through expos and other events. Combined, these factors mold CS conferences into sites of commercially oriented activity.

It is important to make space at top conferences for work that doesn’t necessarily advance commercial innovation. Beyond simply serving as a channel for publishing and broadcasting academic papers, conferences have the potential to serve as sites of critique, activism and advocacy. These seemingly secondary functions of academic gatherings are, in actuality, critical functions that need to be preserved.

In “Engaging, Designing and Making Digital Systems,” Janet Vertesi et al. describe spaces of collaboration between scholarship and design as “trading zones”, where engagements can be corporate, critical, inventive, or focused on inquiry. While corporate work engages from within companies, critical engagement requires the existence of a trading zone in which domain scientists, computer scientists and engineers can meet and engage in dialogue. Vertesi et al. write, “Critical engagements typically embrace intersections between IT research and corporations yet eschew immediate pay-offs for companies or designers.”

Even if sponsoring companies don’t have a direct hand in deciding which work gets published, their presence at academic conferences gives them both insight into ideas and work being shared among attendees, and opportunities to push specific messaging around their brand through advertising and recruitment events. Therefore, instituting sponsorship policies and increasing transparency would help to both curb their potential influence, as well as make clear to conference participants the terms of companies’ financial contributions.

Introducing greater safeguards around conference sponsorship would not be unprecedented; for example, there have been similar efforts in the medical community to curb the influence of pharmaceutical and medical device manufacturing companies on clinical conferences.

Asking accountability conferences to practice what they preach

In particular, tech conferences whose mission is explicitly related to ethics and accountability deserve a higher level of scrutiny for their donor relationships. However, my survey of some of the most prominent conferences in this space found that many of them do not provide a list of donors, nor do they disclose any sponsorship policies on their websites.

That said, some conferences have been reevaluating their fundraising practices after recognizing that certain sponsors’ actions were not aligning with their values. For example, in March 2021, the ACM Conference for Fairness, Accountability, and Transparency (FAccT) suspended its sponsorship relationship with Google in protest of the company’s firing of two of its top Ethical AI researchers, who had been examining biases built into the company’s AI systems.

FAccT committee member Suresh Venkatasubramanian tweeted that the decision to drop Google as a supporter was “in the best interests of the community” while the committee revised its sponsorship policy. Conference sponsorship co-chair Michael Ekstrand told VentureBeat that having Google as a sponsor could impede FAccT’s Strategic Plan. (It should be noted that FAccT still accepted funding from DeepMind, a subsidiary of Google’s parent company Alphabet, for its 2021 conference.)

The conference recently published a new sponsorship policy, acknowledging that “outside contributions raise serious concerns about the independence of the conference and the legitimacy that the conference may confer on sponsors and supporters.” Other conferences, like the ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO) and the Association for Computational Linguistics (ACL) Conference have also posted sponsorship and/or conflict of interest policies on their websites.

While it might be expected that ethics and fairness-oriented conferences would have a more robust protocol around which funds they accept, it is in the best interest of all CS conferences to think critically about and mitigate the constraints associated with accepting corporate sponsorship. 

Recommendation of Best Practices

In many instances, accepting corporate sponsorship is a necessary evil that enables valuable work to be done and allows greater access to resources and opportunities like conferences. In the long term, there should be a concerted effort to resurrect computer science conferences as a neutral territory for academic exploration based on what scholars, not corporations, deem to be worthy of pursuit. However, a more immediate solution could be to establish and enforce a series of best practices to ensure greater academic integrity of conferences that do rely on corporate sponsorship. 

Many scholars, like those who signed the Funding Matters petition in 2018, have argued in favor of establishing rigorous criteria and guidelines for corporate sponsorship of research conferences. I have developed a set of recommendations for conferences to serve as a jumping-off point for ensuring greater transparency and accountability in their decision-making process: 

  • Evaluate sponsors through the lens of your organization’s mission and values. Determine which lines you’re not willing to cross.
    • Are there companies whose objectives or outputs run counter to your values? Are there actions you refuse to legitimize or companies whose reputation might significantly compromise the integrity of the conferences they fund? Review your existing sponsors to ensure that none of them are crossing that line, and use it as a threshold for determining whether to accept funding from others in the future.
    • For example, in the sponsorship policy for the ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), organizers reserve the right to “decline or return any funding should the sponsorship committee decide that the funding source is misaligned with the mission of the initiative and conference.”
  • Be transparent about who is sponsoring your conference, how much they are contributing, and what benefits they receive as a condition of their contributions.
    • While many conferences do list the logos of their sponsors on their websites, it is not often clear how much money those organizations gave and how exactly it was used. To ensure greater transparency, publish a list of sponsors on your website and in other promotional materials and make the details of your call for sponsorship publicly available and easily accessible. 
    • Make sure to make this information public ahead of the conference, so that invited speakers and other attendees can make an informed decision about whether or not they want to participate. (1)
  • Develop rigorous policies to prevent sponsors from influencing the content or speakers of conference events. 
    • Establish a solid gift acceptance policy and thorough gift agreement outlining the kinds of funding you will and will not accept to ensure that your donors’ support is not restricted and does not come with strings attached.
    • For example, the FAccT conference recently published a new statement outlining their practices around sponsorship and financial support, which denies sponsors say over any part of the conference organization or content. In addition, sponsors can only contribute to a general fund, rather than being able to specify how their contributions are used.
  • Encourage open discussion during the conference about the implications of accepting corporate funding and potential alternatives.
    • For example, the ACM Conference on Computer Science and Law has committed to devoting time to a “discussion of practical strategies for and ethical implications of different funding models for both research and conference sponsorship in the nascent ACM Computer Science and Law community.”
  • Make sure the industry in general, or any one company in particular, is not over-represented among sponsors or conference organizers
    • Consider whether certain sponsors might be working to whitewash or silence certain areas of research. What are the interests or intentions of the organization offering you sponsorship funds? What do they hope to gain from this relationship? (2)
    • For example, the EEAMO sponsorship committee commits to “seek[ing] funding from a diverse set of sources which may include academic institutions, charitable organizations, foundations, industry, and government sources.”
  • Consider seeking alternative, industry-independent sources of funding whose interests are less likely to conflict with the subject/mission of your conference.
    • That being said, it is important to bear in mind that, as Phan et al. pointed out in their recent paper, “philanthropic foundation funding from outside Big Tech interests present different and complex considerations for researchers as producers and suppliers of ethics work.” This is why having a diversity of sources is preferable.

In working to reclaim conferences as a space of academic exploration untainted by corporate interests, the field of computer science can help to ensure that their research is better positioned to serve the best interests of the public.

(1) Several speakers backed out of their scheduled appearances at the UCLA Institute for Technology, Law & Policy’s November 2021 Power and Accountability in Tech conference after learning the center had accepted sponsorship money from the Koch Foundation, which has funded attacks on antiracist scholarship.

(2) For example, in 2016, the Computer, Privacy, & Data Protection Conference (CPDP) chose to stop accepting sponsorship funding from Palantir after participants like Aral Balkan pulled out of a panel and described CPDP’s acceptance of the company’s contributions as “privacy-washing”.

Many thanks, once again, to Prof. Arvind Narayanan for his guidance and support.

Holding Purveyors of “Dark Patterns” for Online Travel Bookings Accountable

Last week, my former colleagues at the New York Attorney General’s Office (NYAG), scored a $2.6 million settlement with Fareportal – a large online travel agency that used deceptive practices, known as “dark patterns,” to manipulate consumers to book online travel.

The investigation exposes how Fareportal, which operates under several brands, including CheapOair and OneTravel — used a series of deceptive design tricks to pressure consumers to buy tickets for flights, hotels, and other travel purchases. In this post, I share the details of the investigation’s findings and use them to highlight why we need further regulatory intervention to prevent similar conduct from becoming entrenched in other online services.

The NYAG investigation picks up on the work of researchers at Princeton’s CITP that exposed the widespread use of dark patterns on shopping websites. Using the framework we developed in a subsequent paper for defining dark patterns, the investigation reveals how the travel agency weaponized common cognitive biases to take advantage of consumers. The company was charged under the Attorney General’s broad authority to prohibit deceptive acts and practices. In addition to paying $2.6 million, the New York City-based company agreed to reform its practices.

Specifically, the investigation documents how Fareportal exploited the scarcity bias by displaying, next to the top two flight search results, a false and misleading message about the number of tickets left for those flights at the advertised price. It manipulated consumers through adding 1 to the number of tickets the consumer had searched for to show that there were only X+1 tickets left at that price. So, if you searched for one round trip ticket from Philadelphia to Chicago, the site would say “Only 2 tickets left” at that price, while a consumer searching for two such tickets would see a message stating “Only 3 tickets left” at the advertised price. 

In 2019, Fareportal added a design feature that exploited the bandwagon effect by displaying how many other people were looking at the same deal. The site used a computer-generated random number between 28 and 45 to show the number of other people “looking” at the flight. It paired this with a false countdown timer that displayed an arbitrary number that was unrelated to the availability of tickets. 

Similarly, Fareportal exported its misleading tactics to the making of hotel bookings on its mobile apps. The apps misrepresented the percentage of rooms shown that were “reserved” by using a computer-generated number keyed to when the customer was trying to book a room. So, for example, if the check-in date was 16-30 days away, the message would indicate that between 41-70% of the hotel rooms were booked, but if it was less than 7 days away, it showed that 81-99% of the rooms were reserved. But, of course, those percentages were pure fiction. The apps used a similar tactic for displaying the number of people “viewing” hotels in the area. This time, they generated the number based on the nightly rate for the fifth hotel returned in the search by using the difference between the numerical value of the dollar figure and the numerical value of the cents figure. (If the rate was $255.63, consumers were told 192 people were viewing the hotel listings in the area.)

Fareportal used these false scarcity indicators across its websites and mobile platforms for pitching products such as travel protection and seat upgrades, through inaccurately representing how many other consumers that had purchased the product in question. 

In addition, the NYAG charged Fareportal with using a pressure tactic of making consumers accept or decline purchase a travel protection policy to “protect the cost of [their] trip” before completing a purchase. This practice is described in the academic literature as a covert pattern that uses “confirmshaming” and “forced action” to influence choices. 

Finally, the NYAG took issue with how Fareportal manipulated price comparisons to suggest it was offering tickets at a discounted price, when in fact, most of the advertised tickets were never offered for sale at the higher comparison price. The NYAG rejected Fareportal’s attempt to use a small pop-up to cure the false impression conveyed by the visual slash-through image that conveyed the discount. Similarly, the NYAG called out how Fareportal hid its service fees by disguising them as being part of the “Base Price” of the ticket rather than the separate line item for “Taxes and Fees.” These tactics are described in the academic literature as using “misdirection” and “information hiding” to influence consumers. 


The findings from this investigation illustrate why dark patterns are not simply aggressive marketing practices, as some commentators contend, but require regulatory intervention. Specifically, such shady practices are difficult for consumers to spot and to avoid, and, as we argued, risk becoming entrenched across different travel sites who have the incentive to adopt similar practices. As a result, Fareportal, unfortunately, will not be the first or the last online service to deploy such tactics. But this creates an opportunity for researchers, consumer advocates, and design whistleblowers to step forward and spotlight such practices to protect consumers and help create a more trustworthy internet.    

The tech industry controls CS conference funding. What are the dangers?

Research about the influence of computing technologies, such as artificial intelligence (AI), on society relies heavily upon the financial support of the very companies that produce those technologies. Corporations like Google, Microsoft, and IBM spend millions of dollars each year to sponsor labs, professorships, PhD programs, and conferences in fields like computer science (CS) and AI ethics at some of the world’s top institutions. Industry is the main consumer of academic CS research, and 84% of CS professors receive at least some industry funding. All of these factors contribute to the significant influence tech firms wield over the kinds of questions that are and aren’t asked about their products, and which information is and isn’t made available about their social impact. 

As consciousness about these conflicts of interest builds, we are seeing growing calls from scholars in and around CS to disentangle the discipline from Big Tech’s corporate agenda. However, given the extent to which much of CS academia relies on funding from major tech corporations, this is much easier said than done. As I argue below, a more achievable yet valuable goal might be to introduce better safeguards in spaces like conferences to mitigate undue corporate influence over essential research.

I will make my case in two parts. First, in today’s post, I will:

  • Provide a quick overview of discourse regarding Big Tech’s dominance in CS research, and
  • Use a dataset I’ve compiled to illustrate the extent to which conferences—an essential arena for knowledge sharing in the field of computer science—are financially reliant upon some of the world’s most powerful technology companies.

In my second post, I will follow up with my recommendations for steps that can be taken to minimize the potential chilling or agenda-setting effects brought on by corporate funding on CS research.

A short survey of concerns about Big Tech’s influence

Relying on large companies and the resources they control can create significant limitations for the kinds of CS research that are proposed, funded and published. The tech industry plays a large hand in deciding what is and isn’t worthy of examination, or how issues are framed. For instance, a tech company might have a very different definition of privacy from that which is used by consumer rights advocates. But if the company is determining the parameters for the kinds of research it wishes to sponsor, it can choose to fund proposals that align with or uphold its own interpretation. 

The scope of what is reasonable to study is therefore shaped by what is of value to tech companies. There is little incentive for these corporations to fund academic research about issues that they consider more marginal or which don’t relate to their priorities. 

A 2020 study on artificial intelligence research found that “with respect to AI, firms have increased corporate research significantly,” in the form of both company-level publications as well as collaborations with elite universities. This trend was illustrated in an analysis by Birhane et al. of top-cited papers published at premier machine learning conferences, which revealed “substantive and increasing corporate presence” in that research. In 2018-19, nearly 80% of the annotated papers had some sort of corporate ties, by either author affiliation or funding. Moreover, the analysis found that corporate presence is more pronounced in the conference papers that end up receiving the most citations.

Birhane et al. write, “the top stated values of ML… such as performance, generalization, and efficiency may not only enable and facilitate the realization of Big Tech’s objectives, but also suppress values such as beneficence, justice, and inclusion.”

One of the most vocal critics of Big Tech’s “capture” of CS academia is Meredith Whittaker, a former Google employee-turned Senior Advisor on AI at the Federal Trade Commission. She argues that tech companies, hoping to muffle critics and fend off mounting regulatory pressure, are eager to shape the narrative around their technologies’ social impact by funding favorable research. This has led to widespread corporate sponsorship of labs, faculty positions, graduate programs, and conferences—all of which are reliant on these companies for not only funding, but often also access to data and computing resources. This industry capture of tech research—wherein corporations are strategically funding research or public campaigns in a way that serves their own agenda—has been described by scholars like Thao Phan et al. as “philanthrocapitalism.”

Furthermore, as Whittaker argues, the tech industry’s dominance in CS research “threatens to deprive frontline communities, policymakers, and the public of vital knowledge about the costs and consequences of AI and the industry responsible for it—right at the time that this work is most needed.” Recognizing this threat, other ex-Googlers like Timnit Gebru and Alex Hanna have taken the initiative to launch the Distributed AI Research Institute, in an effort to create space for “independent, community-rooted AI research free from Big Tech’s pervasive influence.”

I do wish to make clear that receiving funding from an organization that doesn’t completely align with one’s values does not necessarily mean one’s research is compromised. Corporate funding of AI research is not inherently bad, and academics who do not accept Big Tech money can still produce ethically questionable research. Furthermore, individuals who accept Big Tech funding can still be critical of the corporations’ products and their influence on society. 

However, I agree with academics like Moshe Y. Vardi who argue that we must grapple with the contradictions inherent in accepting funding for research such as AI ethics from companies whose interests may run counter to the public good. In a recent article, Vardi, who is the senior editor of Communications of the ACM(1), urged his colleagues to think more critically about their field’s relationship to “surveillance-capitalism corporations”, writing: “The biggest problem that computing faces today is not that AI technology is unethical—though machine bias is a serious issue—but that AI technology is used by large and powerful corporations to support a business model that is, arguably, unethical.” 

Analysis: FAAMG companies dominate conference sponsorship

One way to begin to address these conflicts of interest is by reflecting on the conditions of knowledge creation and exchange—in spaces such as academic conferences—and thinking critically and openly about the compromises and tradeoffs inherent in accepting funding from the industry that controls the subject of one’s study. In the field of computer science, conferences are the primary venue for sharing one’s research with others in the discipline. Therefore, sponsoring these gatherings gives firms valuable influence over and insight into what’s happening at the cutting edge of topics like machine learning and human-computer interaction.

In an effort to get a better understanding of who the major players are in this realm, I reviewed the websites for the top 25 CS conferences (based on H-5 index and impact score) to compile information about all of the organizations that have financially supported them between 2019 and 2021. I found that a majority of the most frequent and most generous sponsors, often donating tens of thousands of dollars per conference, were powerful technology companies.

This spreadsheet contains sponsorship data for the top 25 most frequent sponsors (2). Of the 10 sponsors who supported the largest numbers of different conferences in the past three years, five are “FAAMG” companies (Facebook, Apple, Amazon, Microsoft, Google)—six if you count DeepMind, a subsidiary of Google’s parent company Alphabet. No non-profit organizations, government science funding agencies, or sponsors from outside the U.S. or China appeared among the top 10.

Overall, among the most frequent and most generous supporters of the top 25 CS conferences, the only non-tech/non-corporate donor was the National Science Foundation, which sponsored five different conferences (11 total gatherings) with donations typically ranging between $15,000 and $25,000. 

In addition to having their company name and logo listed on conference promotional materials, top sponsors (who often give upwards of $50,000) receive perks such as opportunities to sponsor prizes or students grants, complimentary registrations and private meeting rooms, access to databases of conference registrants interested in recruitment opportunities, virtual booths or priority exhibition spaces, advertising opportunities and press support, and access to attendee metrics on “exhibitor dashboards”. A “Hero Sponsor” who gave $50,000 or more to the 2021 Conference on Human Factors in Computing Systems (CHI), for example, would have received 34 different benefits – which cumulatively create opportunities for continuous access to and influence on attendees throughout the event.

It is difficult to get an accurate estimate of exactly how much money each company donates to these conferences, as these numbers are not consistently reported to the public. Some conferences only publish a list of supporters with no details about how much each one gave. Others assign sponsorship levels such as “Platinum” or “Diamond”, but the monetary value associated with each level varies by conference and year. When dollar amounts are provided, they often represent a potential range of several thousand dollars—for instance, a Platinum Sponsor of the 2021 SIGMOD/PODS conference might have given anywhere between $16,000 and $31,999. Furthermore, it is difficult gain insight into how exactly these funds are used.

Given the extent of financial entanglement between Big Tech and academia, it might be unrealistic to expect CS scholars to completely resist accepting any industry funding—instead, it may be more practicable to make a concerted effort to establish higher standards for and greater transparency regarding sponsorship.

In Part 2 of this article, I will recommend steps that can be taken to minimize the potential chilling or agenda-setting effects brought on by corporate funding on CS research.

(1) Six of the top 25 CS conferences in the world are organized by ACM, the Association for Computing Machinery. Between 2019 and 2021, many of those conferences were largely funded by American tech companies like Apple, Amazon, Facebook, Google, IBM, and Microsoft, and Chinese ones like Alibaba, Baidu, ByteDance, and Huawei.

(2)  I have compiled a conference sponsorship database that includes extensive data that is not included in this spreadsheet. If you are interested in reviewing it, or in collaborating on further data collection, I would be happy to share it privately.

Many, many thanks to Prof. Arvind Narayanan and Karen Rouse for their thoughtful guidance on and support with this piece. 

Attackers exploit fundamental flaw in the web’s security to steal $2 million in cryptocurrency

By Henry Birge-Lee, Liang Wang, Grace Cimaszewski, Jennifer Rexford and Prateek Mittal

On Thursday, Feb. 3, 2022, attackers stole approximately $2 million worth of cryptocurrency from users of the Korean crypto exchange KLAYswap. This theft, which was detailed in a Korean-language blog post by the security firm S2W, exploited systemic vulnerabilities in the Internet’s routing ecosystem and in the Public Key Infrastructure (PKI), leaving the Internet’s most sensitive financial, medical and other websites vulnerable to attack.

Remarkably, years earlier, researchers at Princeton University predicted such attacks in the wild and successfully developed initial countermeasures against it, which we will describe here. But unless these flaws are addressed holistically, a vast number of applications can be compromised by the exact same type of attack.

Unlike many attacks that are caused by zero-day vulnerabilities (which are often patched rapidly) or a blatant disregard for security precautions, the KLAYswap attack was not related to any software or security configuration used by KLAYswap. Rather, it was a well-crafted example of a cross-layer attack exploiting weaknesses across the routing system, public key infrastructure, and web development practices. We’ll discuss defenses more in a subsequent blog post, but protecting against this attack demands security improvements across all layers of the web ecosystem.

The vulnerabilities exploited in this attack have not been mitigated. They are just as viable today as they were when this attack was launched. That is because the hack exploited structural vulnerabilities in the trust the PKI places in the Internet’s routing infrastructure

Postmortem

The February 3 attack happened precisely at 1:04:18 a.m. GMT (10:04 a.m. Korean Time), when KLAYswap was compromised using a fundamental vulnerability in the trust placed in various layers of the web’s architecture. 

KLAYswap is an online cryptocurrency exchange that offers users a web interface for trading cryptocurrency. As part of their platform, KLAYswap relied on a javascript library written by Korean tech company Kakao Corp. When users were on the cryptocurrency exchange, their browsers would load Kakao’s javascript library directly from Kakao’s servers at the following URL (see diagram):

https://developers[.]kakao.com/sdk/js/kakao.min.js

It was actually this URL that was the attacker’s target, not any of the resources operated by KLAYswap itself. Attackers exploited a technique known as a Border Gateway Protocol (BGP) hijack to launch this attack. A BGP hijack happens when a malicious network essentially lies to neighboring networks about what Internet addresses (or IP addresses) it can reach. If the neighboring networks believe this lie, they will route the victim’s traffic to the malicious network for delivery instead of the networks connecting to the legitimate owner of those IP addresses, allowing it to be hijacked. 

Specifically, the domain name in the URL above: developers.kakao.com resolves to two IP addresses: 121.53.104.157 and 211.249.221.246. Packets going to these IP addresses are supposed to be routed to Kakao. During the attack, the adversary’s malicious network announced two IP prefixes (i.e., blocks of IP addresses that are used when routing traffic) that caused traffic to these addresses to be routed to the adversary

When KLAYswap customers requested kakao.min.js from the adversary, the adversary served them a malicious javascript file that caused users’ cryptocurrency transactions to transfer funds to the adversary instead of the intended destination. After running the attack for several hours, the adversary withdrew its route and cashed out by converting its coins to untraceable currencies. By the time the dust settled, the adversary had stolen approximately $2 million worth of various currencies from users of KLAYswap and walked away with approximately $1 million dollars worth of various cryptocurrencies. (Some losses were due to fees and exchange rates associated with exfiltrating the currencies from the KLAYswap ecosystem.) 

But what about cryptography?

The second and most dangerous element of the attack was its neutralization of the Internet’s encryption defenses. While there is a moderate level of complexity associated with BGP hijacks, they do happen relatively often (some of the most egregious examples involve China Telecom routing about 15 percent of Internet traffic through its network for 18 minutes and Pakistan Telecom accidently taking down Youtube in a botched attempt at local censorship). 

What is unprecedented in this attack (to our knowledge) is the complete bypassing of the cryptographic protections offered by the TLS protocol. TLS is the workhorse of encryption of the World Wide Web and is part of the reason the web is trusted with more and more secure applications like financial services and medical systems. Among other security properties, TLS is designed to protect the confidentiality and integrity of user data. TLS allows a web service and a client (like a user of KLAYswap) to securely exchange data even over a potentially untrusted network (like the adversary’s network in the event of this attack) and also ensure (in theory) they are talking to the legitimate endpoint. 

Yet, ironically, KLAYswap and Kakao were properly using TLS, and it was not a vulnerability in the TLS protocol that was exploited during the attack. Instead, the attack exploited the false trust that TLS places in the routing infrastructure. TLS relies on the Public Key Infrastructure (PKI) to confirm the identity of the web servers. The PKI is tasked with distributing digitally signed certificates that verify the server’s identity (in this case the domain name like developers.kakao.com) and the server’s cryptographic key. If a server presents a valid certificate, even if there is another network in the middle, a client can encrypt data that only the real server can read.

Using its BGP hijack, the adversary first targeted the PKI and launched a man-in-the-middle attack on the certificate distribution process.  Only after it had acquired a valid digital certificate for the target domain did it aim its attack towards real users by serving its malicious javascript file over an encrypted connection.

Certificate Authorities (or CAs, the entities that sign digital certificates in the PKI) have a similar identity problem to the one in TLS connections. CAs are approached by customers with requests to sign certificates. The CA needs to make sure the customer requesting a certificate actually controls the associated domain name. To verify identity (and thus bootstrap trust for the entire TLS ecosystem), CAs perform domain control validation requiring users to prove control of the domain listed in their certificate requests. Since the server might be getting a TLS certificate for the first time, domain control validation is often performed over no-security-attached HTTP. 

But now we are back to square one: the adversary simply needs to perform a BGP hijack to attract the domain control validation traffic from the CA, pretend to be the victim website, and serve the content the CA requested. After receiving a signed certificate for the victim’s domain, the adversary can serve real users over the supposedly “secure” TLS connection. This is indeed what happened in the KLAYswap attack and makes the attack particularly scary for other secure applications across the Internet. The attackers hijacked developers.kakao.com, approached the certificate authority ZeroSSL, requested a certificate for developers.kakao.com, and served this certificate to KLAYswap users that were downloading the javascript library over presumably “secure” TLS.

While Princeton researchers anticipated this attack and effectively deployed the first countermeasures against it, fully securing the web from it is still an ongoing effort.

Ever since our live demo of this type of attack at HotPETS’17 and our USENIX Security ‘18 paper “Bamboozling Certificate Authorities with BGP” that developed a taxonomy of BGP attacks on the PKI, we have actively been working on developing defenses against it. The defense that has had the biggest impact (that our group developed in our 2018 USENIX Security paper) is known as multiple vantage point domain control verification. 

In multiple vantage point verification, a CA performs domain control validation from many vantage points spread throughout the Internet instead of a single vantage point that can easily be affected by a BGP attack. As we measured in our 2021 USENIX Security paper, this is effective because many BGP attacks are localized to only a part of the Internet, so it becomes significantly less likely that an adversary will hijack all of a CAs diverse vantage points (compared to traditional domain control validation). We have worked with Let’s Encrypt, the world’s largest web PKI CA, to fully deploy multiple vantage point validation, and every certificate they sign is validated using this technology (over a billion since the deployment in Feb 2020). Cloudflare also has developed a deployment as well, which is available for other interested CAs.

But multiple vantage point validation at just a single CA is still not enough. The Internet is only as strong as its weakest link. Currently, Let’s Encrypt is the only certificate authority using multiple vantage point validation and an adversary can, for many domains, pick which CA to use in an attack. To prevent this, we advocate for universal adoption through the CA/Browser Forum (the governing body for CAs). 

Additionally, some BGP attacks can still fool all of a CA’s vantage points. To reduce the impact of BGP attacks, we need security improvements in the routing infrastructure as well. In the short term, deployed routing technologies like the Resource Public Key Infrastructure (RPKI) could significantly limit the spread of BGP attacks and make them much less likely to be successful. Today only about 35 percent of the global routing table is covered by RPKI, but this is rapidly growing as more networks adopt this new technology. In the long run, we need a much more secure underlying routing layer for the Internet. Examples of this are BGPsec, where routers cryptographically sign and verify BGP update messages (although current router hardware cannot perform the cryptographic operations quickly enough) and clean-slate initiatives like SCION that change the format of IP packets to offer significantly more secure packet forwarding and routing decisions.

Overall, seeing an adversary execute this attack in the real world puts immense importance on securing the PKI from routing attacks. Moving forward with RPKI and multiple vantage point domain validation is a must if we want to continue trusting the web with secure applications. In the meantime, thousands of secure applications that trust TLS to protect against network attacks are vulnerable the same way KLAYswap was.

Calling for Investing in Equitable AI Research in Nation’s Strategic Plan

By Solon Barocas, Sayash Kapoor, Mihir Kshirsagar, and Arvind Narayanan

In response to the Request for Information to the Update of the National Artificial Intelligence Research and Development Strategic Plan (“Strategic Plan”) we submitted comments  providing suggestions for how the Strategic Plan for government funding priorities should focus resources to address societal issues such as equity, especially in communities that have been traditionally underserved. 

The Strategic Plan highlights the importance of investing in research about developing trust in AI systems, which includes requirements for robustness, fairness, explainability, and security. We argue that the Strategic Plan should go further by explicitly including a commitment to making investments in research that examines how AI systems can affect the equitable distribution of resources. Specifically, there is a risk that without such a commitment, we make investments in AI research that can marginalize communities that are disadvantaged. Or, even in cases where there is no direct harm to a community, the research support focuses on classes of problems that benefit the already advantaged communities, rather than problems facing disadvantaged communities.  

We make five recommendations for the Strategic Plan:  

First, we recommend that the Strategic Plan outline a mechanism for a broader impact review when funding AI research. The challenge is that the existing mechanisms for ethics review of research projects – Institutional Review Boards (“IRB”) –  do not adequately identify downstream harms stemming from AI applications. For example, on privacy issues, an IRB ethics review would focus on the data collection and management process. This is also reflected in the Strategic Plan’s focus on two notions of privacy: (i) ensuring the privacy of data collected for creating models via strict access controls, and (ii) ensuring the privacy of the data and information used to create models via differential privacy when the models are shared publicly. 

But both of these approaches are focused on the privacy of the people whose data has been collected to facilitate the research process, not the people to whom research findings might be applied. 

Take, for example, the potential impact of face recognition for detecting ethnic minorities. Even if the researchers who developed such techniques had obtained approval from the IRB for their research plan, secured the informed consent of participants, applied strict access control to the data, and ensured that the model was differentially private, the resulting model could still be used without restriction for surveillance of entire populations, especially as institutional mechanisms for ethics review such as IRBs do not consider downstream harms during their appraisal of research projects. 

We recommend that the Strategic Plan include as a research priority supporting the development of alternative institutional mechanisms to detect and mitigate the potentially negative downstream effects of AI systems. 

Second, we recommend that the Strategic Plan include provisions for funding research that would help us understand the impact of AI systems on communities, and how AI systems are used in practice. Such research can also provide a framework for informing decisions on which research questions and AI applications are too harmful to pursue and fund. 

We recognize that it may be challenging to determine what kind of impact AI research might have as it affects a broad range of potential applications. In fact, many AI research findings will have dual use: some applications of these findings may promise exciting benefits, while others would seem likely to cause harm. While it is worthwhile to weigh these costs and benefits, decisions about where to invest resources should also depend on distributional considerations: who are the people likely to suffer these costs and who are those who will enjoy the benefits? 

While there have been recent efforts to incorporate ethics review into the publishing processes of the AI research community, adding similar considerations to the Strategic Plan would help to highlight these concerns much earlier in the research process. Evaluating research proposals according to these broader impacts would help to ensure that ethical and societal considerations are incorporated from the beginning of a research project, instead of remaining an afterthought.

Third, our comments highlight the reproducibility crisis in fields adopting machine learning methods and the need for the government to support the creation of computational reproducibility infrastructure and a reproducibility clearinghouse that sets up benchmark datasets for measuring progress in scientific research that uses AI and ML. We suggest that the Strategic Plan borrow from the NIH’s practices to make government funding conditional on disclosing research materials, such as the code and data, that would be necessary to replicate a study.

Fourth, we focus attention on the industry phenomenon of using a veneer of AI to lend credibility to pseudoscience as AI snake oil. We see evaluating validity as a core component of ethical and responsible AI research and development. The strategic plan could support such efforts by prioritizing funding for setting standards for and making tools available to independent researchers to validate claims of effectiveness of AI applications. 


Fifth, we document the need to address the phenomenon of “runaway datasets” — the practice of broadly releasing datasets used for AI applications without mechanisms of oversight or accountability for how that information can be used. Such datasets raise serious privacy concerns and they may be used to support research that is counter to the intent of the people who have contributed to them. The Strategic Plan can play a pivotal role in mitigating these harms by establishing and supporting appropriate data stewardship models, which could include supporting the development of centralized data clearinghouses to regulate access to datasets.