October 2, 2022

Archives for August 2021

We need a personal digital advocate

I recently looked up a specialized medical network. For weeks following the search, I was bombarded with ads for the network and other related services: the Internet clearly thought I was on the market for a new doctor. The funny thing is that I was looking this up for someone else and all this information, which was being pushed on me across browsers and across devices, was not really relevant. I wish I could muster such relentlessness and consistency for things that really matter to me!

This is but one example of the huge imbalance between the power of algorithms that track our economic interactions online and the power of individual consumers to have a say in the information being collected about them. So here we are being offered products that might not be in our best interest based on our past search histories. Even worse, individually, there is no way for us to know whether economic opportunities are being advertised equitably. This is true in small things, such as price discrimination on shoes, and in important things, such as job searches. Does your internet reality reflect a lower-wage world because you are known to the internet to be female?

Current rules and regulations that attempt to protect our best interests online are woefully lacking, which is understandable. They were never designed for the digital world. It is not just the difficulty of documenting (and proving) bad behavior such as discrimination or dark patterns, but the task of allocating responsibility – untangling the stack of intertwined ad technologies and entities responsible for the bad behavior. The most viable regulation-based proposals tend to revolve around holding companies such as Facebook or Google accountable through regulation. This approach is important but is limited by several factors: it does not work well across corporate and national boundaries, it exposes companies to a significant conflict of interests while implementing such regulations, and it does nothing to address the growing imbalance between the user and the data centers behind the phone screen.

What we would like to propose is a radically different approach to righting the balance of power between algorithms and individual users: the Personal Digital Advocate. The broad point of this advocate would be to give the consumers (both as individuals and as a group) an algorithm that will be answerable only to them and have equal computing power and equal access to information to that which companies currently possess. Here is a sample of the benefits such an advocate could provide:

  1. You can’t possibly know when you are being upsold by a company because your previous purchase history indicates that you are not going to check. The advocate will be able to detect that by having access to the prices on the same product that were offered to other people in the past several months.
  2. The advocate will be able to detect instances of gender and race-based discrimination in job searches by being able to detect that you are not getting access to the full range of jobs for which you qualify.
  3. Instead of incomplete, messy, often wrong data about you being bought and sold on the internet behind your back and outside of your control (which is the default mode now), you will be able to use your digital advocate to freely offer certain information about yourself to companies in a way that will benefit you, but also the company. For instance, suppose you are shopping for a minivan. You have looked at all kinds of brands, but you know that you will only buy a Toyota or a Honda. This is information that you might not mind sharing if there were a way to do so. It could mean that Kia dealerships will stop wasting their money and your time by advertising to you, and the ads you do get might actually become more relevant.

For a digital advocate to become viable, two major policy changes will need to take place – one in the legal and one in the technical domain:

Legal: In the absence of a legal framework, it will always be more profitable for a digital advocate to sell the consumer out (e.g., it is easy to see how it could start steering people toward certain products for a commission). Fortunately, a legal framework to prevent this already exists in other arenas. One good example is the lawyer/client relationship.  It might otherwise be very profitable for a law firm to betray a client and use his information against him (e.g., by leaking willingness to pay in a real estate deal and then collecting commission), but any lawyer who does that will immediately be disbarred, or worse. There needs to be a “bar” of sorts for the digital advocate.

Technical: At a technical level, a technological framework will need to be developed that would allow the advocate to access all the information it needs when it needs it. “Digital rights” laws such as GDPR and CCPA will need to incorporate a digital access mandate – allowing the end-user to nominate a bot to uphold her rights (such as the right to refuse cookies without having to go through a dark pattern, or the ability to download one’s data in a timely manner).

Regulations always tend to fall behind advances in technology. This is true across different industries and historically. For instance, there was a notable lag between the time when medications started being mass produced and the emergence of the FDA. Our ancestors who lived in the “gap” probably consumed “medications” ranging from harmlessly ineffective to outright dangerous. The algorithms that govern our online lives (which merges more and more with our regular lives) change more quickly than any other industry, and moreover, are able to adapt automatically to regulations. Regulations, which have trouble keeping up with progress in general, will especially struggle against such an adaptive opponent. Thus, the only sustainable way to protect ourselves online is to create an algorithm that will protect us and will be able to develop at the same rate as the ones that can wittingly or unwittingly harm us.

Mark Braverman is a professor of computer science at Princeton University, and is part of the theory group. His research focuses on algorithms and computational complexity theory, as well as building connections to other disciplines, including information theory, mathematical analysis, and mechanism design.

A longer version of this post is available in the form of an essay here.

It’s still practically impossible to secure your computer (or voting machine) against attackers who have 30 minutes of access

It has been understood for decades that it’s practically impossible to secure your computer (or computer-based device such as a voting machine) from attackers who have physical access. The basic principle is that someone with physical access doesn’t have to log in using the password, they can just unscrew your hard drive (or SSD, or other memory) and read the data, or overwrite it with modified data, modified application software, or modified operating system. This is an example of an “Evil Maid” attack, in the sense that if you leave your laptop alone in your hotel room while you’re out, the cleaning staff could, in principle, borrow your laptop for half an hour and perform such attacks. Other “Evil Maid” attacks may not require unscrewing anything, just plug into the USB port, for example.

And indeed, though it may take a lot of skill and time to design the attack, anyone can be trained to carry it out. Here’s how to do it on an unsophisticated 1990s-era voting machine (still in use in New Jersey):

Andrew Appel replacing a memory chip on an AVC Advantage voting machine, circa 2007. The sophisticated tool being used here is: a screwdriver

More than twenty years ago, computer companies started implementing protections against these attacks. Full-disk encryption means that the data on the disk isn’t readable without the encryption key. (But that key must be present somewhere in your computer, so that it can access the data!) Trusted platform modules (TPM) encapsulate the encryption key, so attackers (even Evil Maids) can’t get the key. So in principle, the attacker can’t “hack” the computer by installing unauthorized software on the disk. (TPMs can serve other functions as well, such as “attestation of the boot process,” but here I’m focusing on their use in protecting whole-disk encryption keys.)

So it’s worth asking, “how well do these protections work?” If you’re running a sophisticated company and you hire a well-informed and competent CIO to implement best practices, can you equip all your employees with laptops that resist evil-maid attacks? And the answer is: It’s still really hard to secure your computers against determined attackers.

In this article, “From stolen laptop to inside the company network,” the Dolos Group (a cybersecurity penetration-testing firm) documents an assessment they did for an unnamed corporate client. The client asked, “if a laptop is stolen, can someone use it to get into our internal network?” The fact that the client was willing to pay money to have this question answered, already indicates how serious this client is. And, in fact, Dolos starts their report by listing all the things the client got right: There are many potential entry points that the client’s cybersecurity configuration had successfully shut down.

Indeed, this laptop had full disk encryption (FDE); it had a TPM (trusted platform module) to secure the FDE encryption key; the BIOS was configured well, locked with a BIOS password, attack pathways via NetBIOS Name Service were shut down, and so on. But there was a vulnerability in the way that FDE talked to TPM over the SPI bus. And if that last sentence doesn’t speak to you, then how about this: They found one chip on the motherboard (labeled CMOS in the picture),

Photo from Dolos Group, https://dolosgroup.io/blog/2021/7/9/from-stolen-laptop-to-inside-the-company-network

that was listening in on the conversation between the trusted platform module and the full-disk encryption. They built a piece of equipment; they could give their equipment to an Evil Maid who could clip one of these onto the CMOS chip:

and in a few seconds the Evil Maid could learn the secret key; then (in a few minutes) read the entire (decrypted) disk drive, or install a new operating system to run in Virtualized mode. FDE has been made irrelevant, so the TPM is also irrelevant.

Then, the attacker can get into the corporate network. Or, what Dolos doesn’t describe, is that the attacker could install spyware or malware into the hard drive, remove the blue clip, screw the cover back on, and return the laptop to the hotel room.

This vulnerability can be patched over; but computer systems are very complex these days; there will almost always be another security slip-up.

And what about voting machines? Are voting machines well protected by TPM and FDE, and can the protections in voting machines be bypassed? For voting machines, the Evil Maid is not a hotel employee, it may be a corrupt election warehouse worker, a corrupt pollworker at 6am, or anyone who has unattended access to the voting machine for half an hour. In many jurisdictions, voting machines are left unattended at polling places before and after elections.

We would like to know, “Is the legitimate vote-counting program installed in the voting machine, or has some hacker replaced it with a cheating program?”

One way the designer/vender of a voting machine could protect the firmware (operating system and vote-counting program) against hacking is, “store it in an whole-disk-encrypted drive, and lock the key inside the TPM.” This is supposed to work, but the Dolos report shows that in practice there tend to be slip-ups.

As an alternative to FDE+TPM that I’ve described above, there are other ways to (try to) ensure that the right firmware is running; they have names such as “Secure Boot” and “Trusted Boot”, and use hardware such as UEFI and TPM. Again, ideally they are supposed to be secure; in practice they’re a lot more secure than doing nothing; but in the implementation there may be slip-ups.

The new VVSG 2.0, the “Voluntary Voting Systems Guidelines 2.0” in effect February 2021, requires cryptographic boot verification (see section 14.13.1-A) — that is, “cryptographically verify firmware and software integrity before the operating system is loaded into memory.” But the VVSG 2.0 doesn’t require anything as secure as (hardware-assisted) “Secure Boot” or “Trusted Boot”. They say, “This requirement does not mandate hardware support for cryptographic verification” and “Verifying the bootloader itself is excluded from this requirement.” That leaves voting machines open to the kind of security gap described in Voting Machine Hashcode Testing: Unsurprisingly insecure, and surprisingly insecure. That wasn’t just a slip-up, it was a really insecure policy and practice.

And by the way, no voting machines have yet been certified to VVSG 2.0, and there’s not even a testing lab that’s yet accredited to test voting machines to the 2.0 standard. Existing voting machines are certified to a much weaker VVSG 1.0 or 1.1 that doesn’t even consider these issues.

Even the most careful and sophisticated Chief Information Officers using state-of-the-art practices find it extremely difficult to secure their computers against Evil Maid attacks. And there has never been evidence that voting-machine manufacturers are among the most careful and sophisticated cyberdefense practitioners. Most voting machines are made to old standards that have zero protection against Evil Maid attacks; the new standards require Secure Boot but in a weaker form than TPMs; and no voting machines are even qualified to those new standards.

Here’s an actual voting-machine hacking device made by scientists studying India’s voting machines. Just turn the knob on top to program which candidate you want to win:

Voting-machine hacking device made by the authors of “Security Analysis of India’s Electronic Voting Machines”, by Hari K. Prasad et al., 17th ACM Conference on Computer and Communications Security, 2010. https://indiaevm.org/

Wholesale attacks on Election Management computers

And really, the biggest danger is not a “retail” attack on one machine by an Evil Maid; it’s a “wholesale” attack that penetrates a corporate network (of a voting-machine manufacturer) or a government network (of a state or county running an election) and “hacks” thousands of voting machines all at once. The Dolos report can reminds us again why it’s a bad idea for voting machines to “phone home” on a cell-phone network to connect themselves to the internet (or to a corporate or county network): it’s not only that this exposes the voting machine to hackers anywhere on the internet, it also allows the voting machine (hacked by an Evil Maid attack) to attack the county network it phones up.

Even more of a threat is that an attacker with physical access to an Election Management System (that is, state or county computer used to manage elections) can spread malware to all the voting machines that are programmed by the EMS. How hard is it to hack into an EMS? Just like the PC that Dolos hacked into, an EMS is just a laptop computer; but the county or state that owns it may not be as security-expert as Dolos’s client is. Likely enough, the EMS is not hard to hack into, with physical access.

Conclusion: Don’t let your security depend entirely on “an attacker with physical access still can’t hack me.”

So you can’t be sure what vote-counting (or vote-stealing) software is running in your voting machine. But we knew that already. Our protection, for accurate vote counts, is to vote on hand-marked paper ballots, counted by optical-scan voting machines. If those optical-scan voting machines are hacked, by an Evil Maid, by a corrupt election worker, or by anyone else who gains access for half an hour, then we can still be protected by the consistent use of Risk-Limiting Audits (RLAs) to detect when the computers claim results different from what’s actually marked on the ballots; and by recounting those paper ballots by hand, to correct the results. More states should consistently use RLAs.

The use of hand-marked paper ballots with routine RLAs can protect us from wholesale attacks. But it would be better to have, in addition, proper cybersecurity hygiene in election management computers and voting machines.

I thank Ars Technica for bringing the Dolos report to my attention. Their article concludes with many suggestions made by security experts for shutting down the particular vulnerability that Dolos found. But remember, even though you can (with expertise) shut down some particular loophole, you can’t know how many more are out there.

Facebook’s Illusory Promise of Transparency

By Orestis Papakyriakopoulos, Ashley Gorham, Eli Lucherini, Mihir Kshirsagar, and Arvind Narayanan.

Facebook’s latest move to obstruct academic research about its platform by disabling NYU’s Ad Observatory is deeply troubling. While Facebook claims to offer researchers access to its FORT Researcher Platform as an alternative, that is an illusory offer as we have recently learned first hand in connection with our ongoing research project that studies how the social media platforms amplified or moderated the distribution of political ads in the 2020 U.S. elections.

As part of our research, in March 2021, we attempted to gain access to the FORT dataset. We were told by Facebook that we had to sign a “strictly non-negotiable” agreement that was “mandated by Cambridge Analytica and the FTC.” We pushed back on this ‘take-it-or-leave-it’ approach, noting that there was nothing in the consent decree that mandated such an agreement. Facebook later conceded in a subsequent email that they were under no legal mandate and that their approach was simply based on their internal business justification. 

We then continued to attempt to negotiate the terms of access with Facebook. In particular, a few clauses in the agreement were problematic for us. The most prominent one was a pre-publication review. We sought to clarify whether Facebook would assert that information about how the Facebook advertising platform was used to target political ads in the 2020 elections is “Confidential Information” that the agreement would allow them to “remove” from our publication. Understandably, we did not want to expend time on research without some assurance that we could publish our work without Facebook’s permission. Indeed, as we subsequently discovered, one project had negotiated to exclude such a clause. But Facebook has, to date, not explained its position to us on the pre-publication review.

Separately, we had a more basic question about what additional data fields were available to researchers through the FORT Platform and whether there were any restrictions on the types of tools we could use to analyze the data. Despite promising that they would get back to us “shortly,” we are still waiting for a response since May, despite following up diligently. 

Our experience dealing with Facebook highlights their long running pattern of misdirection and doublespeak to dodge meaningful scrutiny of their actions. While researchers and investigative journalists have other means of analyzing the platform’s practices (e.g., Citizen Browser and Mozilla Rally), the reality is that Facebook has control over the information that the public needs to understand its powerful role in our society. And, if Facebook continues to hide behind illusory offers, we need legislation to force them to provide meaningful access. 

Studying the societal impact of recommender systems using simulation

By Eli Lucherini, Matthew Sun, Amy Winecoff, and Arvind Narayanan.

For those interested in the impact of recommender systems on society, we are happy to share several new pieces:

  • a software tool for studying this interface via simulation
  • the accompanying paper
  • a short piece on methodological concerns in simulation research
  • a talk offering a critical take on research on filter bubbles.

We elaborate below.

Simulation is a valuable way to study the societal impact of recommender systems.

Recommender systems in social media platforms such as Facebook and Twitter have been criticized due to the risks they might pose to society, such as amplifying misinformation or creating filter bubbles. But there isn’t yet consensus on the scope of these concerns, the underlying factors, or ways to remedy them. Because these phenomena arise through repeated system interactions over time, methods that assess the system at a single time point provide minimal insight into the mechanisms behind them. In contrast, simulations can model how users, items, and algorithms interact over arbitrarily long timescales. As a result, simulation has proved to be a valuable tool in assessing the impact of recommendation systems on the content users consume and on society.

This is a burgeoning area of research. We identified over a dozen studies that use simulation to study questions such as filter bubbles and misinformation. As an example of a study we admire, Chaney et al. illustrate the detrimental effects of algorithmic confounding, which occurs when a recommendation algorithm is trained on user interaction data that is itself influenced by the prior recommendations of the algorithm. Like all simulation research, this is a statement about a model and not a real platform. But the benefit is that it helps isolate the variables of interest so that relationships between them can be probed deeply in a way that improves our scientific understanding of these systems.

T-RECS: A new tool for simulating recommender systems

So far, most simulation studies of algorithmic systems have relied upon ad-hoc code implemented from scratch, which is time consuming, raises the likelihood of bugs, and limits reproducibility. We present T-RECS (Tools for RECommender system Simulation), an open-source simulation tool designed to enable investigations of emerging complex phenomena caused by millions of individual actions and interactions in algorithmic systems including filter bubbles, political polarization, and (mis)information diffusion. In the accompanying paper, we describe its design in detail and present two case studies.

T-RECS is flexible and can simulate just about any system in which “users” interact with “items” mediated by an algorithm. This is broader than just recommender systems: for example, we used T-RECS to reproduce a study on the virality of online content. T-RECS also supports two-sided platforms, i.e., those that include both users and content creators. The system is not limited to social media either: it can also be used to study music recommender systems or e-commerce platforms. With T-RECS, researchers with expertise in social science but limited engineering expertise can still leverage simulation to answer important questions about the societal effects of algorithmic systems.

What’s wrong with current recsys simulation research?

In a companion paper to T-RECS, we offer a methodological critique of current recommender systems simulation research. First, we observe that each paper tends to operationalize constructs such as polarization in subtly different ways. Despite seemingly minor differences, the effects may be vastly different, making comparisons between papers infeasible. We acknowledge that this is natural in the early stages of a discipline and is not necessarily a crisis by itself. Unfortunately, we also observe low transparency: papers do not specify their constructs in enough detail to allow others to reproduce and build on them, and practices such as sharing code and data are not yet the norm in this community.

We advocate for the adoption of software tools such as T-RECS that would help address both issues. Researchers would be able to draw upon a standard library of models and constructs. Further, they would be easily able to share reproduction materials as notebooks, containing code, data, results, and documentation packaged together.

Why do we need simulation, again?

Given that it is tricky to do simulation correctly and even harder to do it in a way that allows us to draw meaningful conclusions that apply to the real world, one may wonder why we need simulation for understanding the societal impacts of recommender systems at all. Why not stick with auditing or observational studies of real platforms? A notable example of such a study is “Exposure to ideologically diverse news and opinion on Facebook” by Bakshy et al. The study found that while Facebook’s users primarily consume ideologically-aligned content, the role of Facebook’s news feed algorithm is minimal compared to users’ own choices.

In a recent talk, one of us (Narayanan) discussed the limitations of quantitative studies of real platforms, focusing on the question of filter bubbles. The argument is this: the question of interest is causal in nature, but we can’t answer causal questions because the entire system evolves as one unit over a long period of time. Faced with this inherent limitation, studies such as the Facebook study above inevitably study very narrow versions of the question, focusing on a snapshot in time and ignoring feedback loops and other complications. Thus, while there is nothing wrong with these studies, they tell us little about the questions we really care about, and yet are widely misinterpreted to mean more than they do.

In conclusion, every available method for studying the societal impact of recommender systems has severe limitations. Yet this is an urgent question with enormous consequences; the study of these questions has been called a crisis discipline. We need every tool in the toolbox, even if none is perfect for the job. We need auditing and observational studies; we need qualitative studies; and we need simulation. Through T-RECS and its accompanying papers, we hope to both systematize research in this area and provide foundational infrastructure.