August 8, 2022

Archives for July 2022

Magical thinking about Ballot-Marking-Device contingency plans

The Center for Democracy and Technology recently published a report, “No Simple Answers: A Primer on Ballot Marking Device Security”, by William T. Adler.   Overall, it’s well-informed, clearly presents the problems as of 2022, and it’s definitely worth reading.  After explaining the issues and controversies, the report presents recommendations, most of which make a lot of sense, and indeed the states should act upon them.  But there’s one key recommendation in which Dr. Adler tries to provide a simple answer, and unfortunately his answer invokes a bit of magical thinking.  This seriously compromises the conclusions of his report.  By asking but not answering the question of “what should an election official do if there are reports of BMDs printing wrong votes?”, Dr. Adler avoids having to make the inevitable conclusion that BMDs-for-all-voters is a hopelessly flawed, insecurable method of voting.  Because the answer to that question is, unfortunately, there’s nothing that election officials could usefully do in that case.

BMDs (ballot marking devices) are used now in several states and there is a serious problem with them (as the report explains): “a hacked BMD could corrupt voter selections systematically, such that a candidate favored by the hacker is more likely to win.”  That is, if a state’s BMDs are hacked by someone who wants to change the result of an election, the BMDs can print ballots with votes on them different from what the voters indicated on the touchscreen.  Because most voters won’t inspect the ballot paper carefully enough before casting their ballot, most voters won’t notice that their vote has been changed.  The voters who do notice are (generally) allowed to “spoil” their ballot and cast a new one; but the substantial majority of voters, those who don’t check their ballot paper carefully, are vulnerable to having their votes stolen.

One simple answer is not to use BMDs at all: let voters mark their optical-scan paper ballots with a pen (that is, HMPB: hand-marked paper ballots).  A problem with this simple answer (as the report explains) is that some voters with disabilities cannot mark a paper ballot with a pen.  And (as the report explains) if BMDs are reserved just for the use of voters with disabilities, then those BMDs become “second class”: pollworkers are unfamiliar with how to set them up, rarely used machines may not work in the polling place when turned on, paper ballots cast by the disabled are distinguishable from those filled in with a pen, and so on.

So Dr. Adler seems to accept that BMDs, with their serious vulnerabilities, are inevitably going to be adopted—and so he makes recommendations to mitigate their insecurities.  And most of his recommendations are spot-on:  incorporate the cybersecurity measures required by the VVSG 2.0, avoid the use of bar codes and QR codes, adopt risk-limiting audits (RLAs).  Definitely worth doing those things, if election officials insist on adopting this seriously flawed technology in the first place.

But then he makes a recommendation intended to address the problem that if the BMD is cheating then it can print fraudulent votes that will survive any recount or audit.  The report recommends,

Another way is to depend on voter reports. In an election with compromised BMDs modifying votes in a way visible to voters who actively verify and observe those modifications, it is likely that election officials would receive an elevated number of reported errors. In order to notice a widespread issue, election officials must be monitoring election errors in real-time across a county or state. If serious problems are revealed with the BMDs that cast doubt on whether votes were recorded properly, either via parallel testing or from voter reports, election officials must respond. Accordingly, election officials should have a contingency plan in the event that BMDs appear to be having widespread issues. Such a plan would include, for instance, having the ability to substitute paper ballots for BMDs, decommissioning suspicious BMDs, and investigating whether other machines are also misbehaving. Stark (2019) has warned, however, that because it is likely not possible to know how many or which ballots were affected, the only remedy to this situation may be to hold a new election.

This the magical thinking:  “election officials should have a contingency plan.”  The problem is, when you try to write down such a plan, there’s nothing that actually works!  Suppose the election officials rely on voter reports (or on the rate of spoiled ballots); suppose the “contingency plan” says (for example) says “if x percent of the voters report malfunctioning BMDs, or y percent of voters spoil their ballots, then we will . . .”   Then we will what?  Remove those BMDs from service in the middle of the day?  But then all the votes already cast on those BMDs will have been affected by the hack; that could be thousands of votes.  Or what else?  Discard all the paper ballots that were cast on those BMDs?  Clearly you can’t do that without holding an entirely new election.  And what if those x% or y% of voters were fraudulently reporting BMD malfunction or fraudulently spoiling their ballots to trigger the contingency plan?  There’s no plan that actually works.

Everything I’ve explained here was already written down in “Ballot-marking devices cannot ensure the will of the voters” (2020 [non-paywall version]) and in “There is no reliable way to detect hacked ballot-marking devices” (2019), both of which Dr. Adler cites.  But an important purpose of magical thinking is to avoid facing difficult facts.

It’s like saying, “to prevent climate change we should just use machines to pull 40 billion tons of CO2 out of the atmosphere each year.”  But there is no known technology that can do this.  All the direct-air-capture facilities deployed to date can capture just 0.00001 billion tons.  Just because we really, really want something to work is not enough.

There is an inherent problem with BMDs: they can change votes in a way that will survive any audit or recount.  Not only is there “no simple solution” to this problem, there’s no solution period.  Perhaps someday a solution will be identified.  Until then, BMDs-for-all-voters is dangerous, even with all known mitigations.

Toward Trustworthy Machine Learning: An Example in Defending against Adversarial Patch Attacks (2)

By Chong Xiang and Prateek Mittal

In our previous post, we discussed adversarial patch attacks and presented our first defense algorithm PatchGuard. The PatchGuard framework (small receptive field + secure aggregation) has become the most popular defense strategy over the past year, subsuming a long list of defense instances (Clipped BagNet, De-randomized Smoothing, BagCert, Randomized Cropping, PatchGuard++, ScaleCert, Smoothed ViT, ECViT). In this post, we will present a different way of building robust image classification models: PatchCleanser. Instead of using small receptive fields to suppress the adversarial effect, PatchCleanser directly masks out adversarial pixels in the input image. This design makes PatchCleanser compatible with any high-performance image classifiers and achieve state-of-the-art defense performance.

PatchCleanser: Removing the Dependency on Small Receptive Fields

The limitation of small receptive fields. We have seen the small receptive field plays an important role in PatchGuard: it limits the number of corrupted features and lays a foundation for robustness. However, the small receptive field also limits the information received by each feature; as a result, it hurts the clean model performance (when there is no attack). For example, the PatchGuard models (BagNet+robust masking) can only have a 55%-60% clean accuracy on the ImageNet dataset while state-of-the-art undefended models, which all have large receptive fields, can achieve an accuracy of 80%-90%.

This huge drop in clean accuracy discourages the real-world deployment of PatchGuard-style defenses. A natural question to ask is:

Can we achieve strong robustness without the use of small receptive fields? 

YES, we can. We propose PatchCleanser with an image-space pixel masking strategy to make the defense compatible with any state-of-the-art image classification model (with larger receptive fields).

A pixel-masking defense strategy. The high-level idea of PatchCleanser is to apply pixel masks to the input image and evaluate model predictions on masked images. If a mask removes the entire patch, the attacker has no influence over the classification, and thus any image classifier can make an accurate prediction on the masked image. However, the challenge is: how can we mask out the patch, especially when the patch location is unknown? 

Pixel masking: the first attempt. A naive approach is to choose a mask and apply it to all possible image locations. If the mask is large enough to cover the entire patch, then at least one mask location can remove all adversarial pixels. 

We provide a simplified visualization below. When we apply masks to an adversarial image (top of the figure), the model prediction is correct as “dog” when the mask removes the patch at the upper left corner. Meanwhile, the predictions on other masked images are incorrect since they are influenced by the adversarial patch — we see a prediction disagreement among different masked images. On the other hand, when we consider a clean image (bottom of the figure), the model predictions usually agree on the correct label since both we and the classifier can easily recognize the partially occluded dog. 

visual examples of one-mask predictions

Based on these observations, we can use the disagreement in one-mask predictions to detect a patch attack; a similar strategy is used by the Minority Reports defense, which takes inconsistency in the prediction voting grid as an attack indicator. However, can we recover the correct prediction label instead of merely detecting an attack? Or equivalently, how can we know which mask removes the entire patch? Which class label should an image classifier trust — dog, cat, or fox?  

Pixel masking: the second attempt. The solution turns out to be super simple: we can perform a second round of masking on the one-masked images (see visual examples below). If the first-round mask already removes the patch (top of the figure), then our second-round masking is applied to a “clean” image, and thus all two-mask predictions will have a unanimous agreement. On the other hand, if the patch is not removed by the first-round mask (bottom of the figure), the image is still “adversarial”. We will then see a disagreement in two-mask predictions; we shall not trust the prediction labels.

visual examples of two-mask predictions

In our PatchCleanser paper, we further discuss how to generate a mask set such that at least one mask can remove the entire patch regardless of the patch location. We further prove that if the model predictions on all possible two-masked images are correct, PatchCleanser can always make correct predictions. 

PatchCleanser performance. In the figure below, we plot the defense performance on the 1000-class ImageNet dataset (against a 2%-pixel square patch anywhere on the image). We can see that PatchCleanser significantly outperforms prior works (which are all PatchGuard-style defenses with small receptive fields). Notably, (1) the certified robust accuracy of PatchCleanser (62.1%) is even higher than the clean accuracy of all prior defenses, and (2) the clean accuracy of PatchCleanser (83.9%) is similar to vanilla undefended models! These results further demonstrate the strength of defenses that are compatible with any state-of-the-art classification models (with large receptive fields). 

Clean accuracy and certified robust accuracy on the ImageNet dataset; certified robust accuracy evaluated against a 2%-pixel square patch anywhere on the image

(certified robust accuracy is a provable lower bound on model robust accuracy; see the PatchCleanser paper for more details)

Takeaways. In PatchCleanser, we demonstrate that small receptive fields are not necessary for strong robustness. We design an image-space pixel masking strategy that is compatible with any image classifier. The compatibility allows us to use state-of-the-art image classifiers and achieves significant improvements over prior works.

Conclusion: Using Logical Reasoning for Building Trustworthy ML systems

In the era of big data, we have been amazed at the power of statistical reasoning/learning: an AI model can automatically extract useful information from a large amount of data and significantly outperforms manually designed models (e.g., hand-crafted features). However, these learned models can have unexpected behaviors when encountered with “adversarial examples” and thus lacks reliability for security-critical applications. In our posts, we demonstrate that we can additionally apply logical reasoning to statistically learned models to achieve strong robustness. We believe the combination of logical and statistical reasoning is a promising and important direction for building trustworthy ML systems.

Additional Reading

New Study Analyzing Political Advertising on Facebook, Google, and TikTok

By Orestis Papakyriakopoulos, Christelle Tessono, Arvind Narayanan, Mihir Kshirsagar

With the 2022 midterm elections in the United States fast approaching, political campaigns are poised to spend heavily to influence prospective voters through digital advertising. Online platforms such as Facebook, Google, and TikTok will play an important role in distributing that content. But our new study – How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok — that will appear in the Artificial Intelligence, Ethics, and Society conference in August, shows that the platforms’ tools for voluntary disclosures about political ads do not provide the necessary transparency the public needs. More details can also be found on our website: campaigndisclosures.princeton.edu.

Our paper conducts the first large-scale analysis of public data from the 2020 presidential election cycle to critically evaluate how online platforms affect the distribution of political advertisements. We analyzed a dataset containing over 800,000 ads about the 2020 U.S. presidential election that ran in the 2 months prior to the election, which we obtained from the ad libraries of Facebook and Google that were created by the companies to offer more transparency about political ads. We also collected and analyzed 2.5 million TikTok videos from the same time period. These ad libraries were created by the platforms in an attempt to stave off potential regulation such as the Honest Ads Act, which sought to impose greater transparency requirements for platforms carrying political ads. But our study shows that these ad libraries fall woefully short of their own objectives to be more transparent about who pays for the ads and who sees the ads, as well the objectives of bringing greater transparency about the role of online platforms in shaping the distribution of political advertising. 

We developed a three-part evaluative framework to assess the platform disclosures: 

1. Do the disclosures meet the platforms’ self-described objective of making political advertisers accountable?

2. How do the platforms’ disclosures compare against what the law requires for radio and television broadcasters?

3. Do the platforms disclose all that they know about the ad targeting criteria, the audience for the ads, and how their algorithms distribute or moderate content?

Our analysis shows that the ad libraries do not meet any of the objectives. First, the ad libraries only have partial disclosures of audience characteristics and targeting parameters of placed political ads. But these disclosures do not allow us to understand how political advertisers reached prospective voters. For example, we compared ads in the ad libraries that were shown to different audiences with dummy ads that we created on the platforms (Figure 1). In many cases, we measured a significant difference between the calculated cost-per-impression between the two types of ads, which we could not explain with the available data.

  • Figure 1. We plot the generated cost per impression of ads in the ad-libraries that were (1) targeted to all genders & ages on Google, (2) to Females, between 25-34 on YouTube, (3) were seen by all genders & ages in the US on Facebook, and (4) only by females of all ages located in California on Facebook.  For Facebook, lower & upper bounds are provided for the impressions. For Google, lower & upper bounds are provided for cost & impressions, given the extensive “bucketing” of the parameters performed by the ad libraries when reporting them, which are denoted in the figures with boxes. Points represent the median value of the boxes. We compare the generated cost-per impression of ads with the cost-per impression of a set of dummy ads we placed on the platforms with the exact same targeting parameters & audience characteristics. Black lines represent the upper and lower boundaries of an ad’s cost-per-impression as we extracted them from the dummy ads. We label an ad placement as “plausible targeting”, when the ad cost-per-impression overlaps with the one we calculated, denoting that we can assume that the ad library provides all relevant targeting parameters/audience characteristics about an ad.  Similarly, an placement labeled as `”unexplainable targeting’”  represents an ad whose cost-per-impression is outside the upper and lower reach values that we calculated, meaning that potentially platforms do not disclose full information about the distribution of the ad.

Second, broadcasters are required to offer advertising space at the same price to political advertisers as they do to commercial advertisers. But we find that the platforms charged campaigns different prices for distributing ads. For example, on average, the Trump campaign on Facebook paid more per impression (~18 impressions/dollar) compared to the Biden campaign (~27 impressions/dollar). On Google, the Biden campaign paid more per impression compared to the Trump campaign. Unfortunately, while we attempted to control for factors that might account for different prices for different audiences, the data does not allow us to probe the precise reason for the differential pricing. 

Third, the platforms do not disclose the detailed information about the audience characteristics that they make available to advertisers. They also do not explain how the algorithms distribute or moderate the ads. For example, we see that campaigns placed ads on Facebook that were not ostensibly targeted by age, but the ad was not distributed uniformly.  We also find that platforms applied their ad moderation policies inconsistently, with some instances of moderated ads being removed and some others not, and without any explanation for the decision to remove an ad. (Figure 2) 

  • Figure 2. Comparison of different instances of moderated ads across platforms. The light blue bars show how many instances of a single ad were moderated, and maroon bars show how many instances of the same ad were not. Results suggests an inconsistent moderation of content across platforms, with some instances of the same ad being removed and some others not.

Finally, we observed new forms of political advertising that are not captured in the ad libraries. Specifically, campaigns appear to have used influencers to promote their messages without adequate disclosure. For example, on TikTok, we document how political influencers, who were often linked with PACs, generated billions of impressions from their political content. This new type of campaigning still remains unregulated and little is known about the practices and relations between influencers and political campaigns.  

In short, the online platform self-regulatory disclosures are inadequate and we need more comprehensive disclosures from platforms to understand their role in the political process. Our key recommendations include:

– Requiring that each political entity registered with the FEC use a single, universal identifier for campaign spending across platforms to allow the public to track their activity.

– Developing a cross-platform data repository, hosted and maintained by a government or independent entity, that collects political ads, their targeting criteria, and the audience characteristics that received them. 

– Requiring platforms to disclose information that will allow the public to understand how the algorithms distribute content and how platforms price the distribution of political ads. 

– Developing a comprehensive definition of political advertising that includes influencers and other forms of paid promotional activity.