January 22, 2022

“Signal Loss” and advertising privacy on Facebook

The 2021 Kyoto Prize in Advanced Technology, a major award administered by a Japanese foundation, goes to Andrew Chi-Chih Yao, a Chinese computer scientist who earned PhDs from Harvard and the University of Illinois before being a professor at MIT, Stanford, and Princeton and then becoming Dean of an important theoretical computer science education program at Tsinghua University.  Professor Yao is a theorist, his many major important results are in “computational complexity theory,” so how did he win an international award in “Advanced Technology?”    Well, one of his major results led to the invention of Secure Multiparty Computation (MPC), by which two or more people can pool their data to compute a result without actually disclosing their data to each other.  And in this article I’ll explain how one present day company seems to be applying MPC to try to comply with privacy rules issued by regulators.

Facebook tracks your web browsing in order to make money delivering ads to you.  Facebook has been under pressure from the European Union and from Apple to be less invasive of your privacy.  For example, in 2017 the EU put out a new Privacy Directive, and in 2019 Apple’s Safari browser stopped attaching cookies to third-party image requests.  In this article I’ll discuss some indications that Facebook is beginning to adjust its advertising-tracking model so they can make money without invading your privacy quite as much.  They are experimenting with secure multiparty computation, a “privacy enhancing technology” developed in academia, to measure which ad “impressions” convert to purchases on the average–but without knowing which individuals saw an ad and then made a purchase.

When you browse from one web site to another, many sites snoop on your browsing history, by tracking mechanisms such as cookies and single-pixel images (whose purpose is to track your http image-load requests).  Much of this tracking is for the purpose of making money by targeting ads to you.  Merchants (for example Nike) pay web sites (such as Facebook) to deliver ad views (“impressions”), and Nike pays more for impressions that “convert”, that is, lead to a purchase.  So, Facebook and (independently) Nike would like to (1) deliver ads that are likely to convert, and (2) measure which impressions are converting.  Facebook wants to make more money by delivering to you the ads most likely to convert, and Nike wants to make sure it’s getting its money’s worth from its ad budget.

One way to do this is: when you make a purchase at the Nike online store, the browser sends Facebook a copy of your nike.com shopping cart and your Facebook user ID.  Then Facebook looks up what Nike ads they displayed recently to that user ID; those ads converted to shoe sales, and Nike is happy to pay more for such ads.

That tracking can be a terrible invasion of privacy.  So for years now, regulators (in California and the European Union) and browser makers (like Firefox) have been adjusting restrictions on cookies (and other kinds of tracking) to try to improve privacy.  Facebook’s internal euphemism for privacy enforcement is “signal loss.”  Here’s an analysis of the problem, from an advertiser’s point of view (warning: much marketing-speak!).  The “signal” is the data that Facebook needs to manage its core revenue stream, advertising.  

(When the browser maker (Google, or Apple) is also a major advertising platform, there’s an inherent conflict of interest:  Apple’s Safari restricts Facebook and Google’s ad tracking more than it restricts Apple’s own ad tracking, and Google-the-browser-maker delayed tightening Chrome’s cookie-rules for two years because Google-the-ad-platform needed those cookies.)

So for years now, advertising platforms (like Facebook) have been adapting the way they intrusively track you, so they can still make money delivering relevant ads.   For example, aside from using cookies and tracking pixels inside the browser, Nike and Facebook share things they know about you outside, like your e-mail address, phone number, and home address.  Facebook’s system for that is their “Conversions API”, a software interface for merchants, to measure which advertisements “convert”, using server-to-server communications in the back end.

In any case, there’s pressure on Facebook (and Google and other advertising platforms) to be more respectful of privacy.  When it was just the U.S. Congress asking Zuck to testify at hearings, Facebook could perhaps laugh it off, but when entities with real enforcement power (Apple and California and the EU) start to insist on their users’ and citizens’ privacy, then Facebook might ask themselves, “How can we make money without so much privacy invasion?”

Google is trying one method, called “Federated Learning of Cohorts” (FLoC), but privacy advocates have severely criticized it, for good reason: instead of sharing your entire history, FLoC labels you with a summary of your history.  That’s still a significant privacy invasion, and it may even make it easier for bad guys to track large numbers of people in harmful ways.

Is there a better way?  Academic research on secure multiparty computation (MPC) has shown how to measure a global property (how many Nike ads converted into sales) without identifying specific users’ histories.  In particular, with the right multiparty protocol, Nike (or Facebook) can’t tell which specific purchases at nike.com resulted from ads, and they can’t tell which specific ad-impressions resulted in sales, but they can measure the average.  And that’s good enough:  good enough for Facebook to target you with ads that are more likely to convert; good enough for Nike to know that they’re getting their money’s worth from Facebook.

The way this kind of MPC would work is,  the ad platform (such as Facebook) knows what ads it shows to each user.  The merchant (such as Nike) knows which shoes it sold to each purchaser.  They want to jointly compute the effectiveness of the ad campaign, but without Facebook revealing to Nike anything about individual users, and without Nike revealing to Facebook anything about individual purchasers.  (but see note 1 below) So Nike would encrypt its collection of shopping carts, and Facebook would encrypt its collection of ad-impression data, and they use homomorphic encryption to compute the “join” of these relations without either one seeing the other’s unencrypted data.

And indeed,  Facebook claims to be adopting this method (though this explainer is very short on technical details).  But Facebook has a public github repo for their new API for advertisers, based on MPC.  And this press release says they’re already testing their “Private Lift Measurement” with some advertisers.  

Will Facebook adopt this for all advertisers?  If they do, then I think it really will be a privacy improvement.  It’s more private than Google’s solution of publicly labeling each user with a summary of their history.  Of course, Facebook still knows where on Facebook you’ve been, in every detail; and Nike still knows what you browsed in their on-line store; but nobody will know both at once.

Although MPC can measure ad conversions–whether Facebook is delivering ads that will increase shoe sales–it probably cannot target ads quite as precisely.  That is, Facebook’s machine-learning criteria to decide which ads to show you might work better if they do their super-privacy-intrusive tracking of everything you do on and off Facebook.  By limiting their tracking to on-Facebook-only, they may find that ad impressions have a slightly lower conversion rate, so Facebook makes slightly less money.  Time will tell whether they’re willing to take that hit.

And SMP won’t solve other societal problems that aren’t related to privacy:  The duopoly of Google/Youtube and Facebook/Instagram in online advertising, Youtube and Facebook’s recommender systems pushing users towards extreme views, Youtube and Facebook trying to maximize the amount of time you waste on-line, Instagram harmful to teenage girls–none of these are about privacy, and secure multiparty computation doesn’t address those problems.

Note 1. Sarah Scheffler, a postdoctoral fellow at Princeton’s Center for Information Technology Policy (CITP), writes, MPC’s “private” nature in these descriptions depends not only on using MPC, but also on using MPC to compute a privacy-preserving function.  MPC could be used in the way you describe to privately compute average ad conversions, but could also be used to say, “privately” compute the list of users who are shared between Nike and Facebook or something (and I’ve heard suggestions of it being used for exactly that purpose).  In the latter case, it’s still technically more private than Facebook and Nike comparing lists in the clear, but I don’t think it’s what most people want from “private computing”. 

So Sarah and I took a look at Facebook’s open-source MPC repo, where we see strong evidence that they are computing appropriately private functions (such as “total value of an ad campaign”).

Could quantum computers be cost-effective by 2036?

In theory, quantum computers could be much more efficient at some kinds of tasks, which could be potentially disruptive in applications areas such as cryptography. But you know: in theory, theory and practice are the same, but in practice, they are not. So it’s interesting to find applications where quantum computing might possibly be useful and cost-effective in the near(ish) future.

So let’s talk about cell-phone towers. A 5G base station contains a phased-array antenna, that is, a 2-d array of small antennas that can be aimed in a particular direction (for sending or receiving) by adjusting the phase of their signal. And with the superposition principle, this phased-array antenna can simultaneously aim one signal at your iPhone here, while aiming another signal at that Android phone over there, and hundreds more devices all at once. To do that the base station must compute a difficult (NP-complete) combinatorial optimization problem every few milliseconds, as the phones move around. This is called “Massive MIMO,” Massive Multiple-Input-Multiple-Output communication.

Why bother? Well, in dense and expensive cities, it’s so expensive to construct another cell-phone tower that it’s worth the cost of continuous heavy computation to optimize the number of simultaneous connections that each tower can handle. That cost is mostly in the electricity it takes to run a big high-powered compute server for the base station–both in dollars and in carbon footprint.

Recently my colleague Kyle Jamieson and his student Srikar Kasi published a couple of papers demonstrating that a commercially available Quantum Annealing computer, the D-Wave, can solve toy-scale MIMO antenna optimization problems. Unlike “gate model” quantum computers, which implement Von Neumann machines that can compute in a superposition of states, quantum annealing finds optimal or near-optimal solutions to node labeling on a weighted undirected graph. You might think that this is not very general, but plenty of real-world optimization problems can be encoded into this graph problem. And, more to the point: small-to-medium scale quantum annealing computers are practical and commercially available now, whereas gate-model quantum computers are not yet practical except at tiny scale.

But still, quantum annealing computers (such as the D-Wave) are expensive, and require special refrigeration units to get them down to 15 milliKelvin. Could there really be an application where it’s more cost-effective to buy one of those, instead of just a rack full of Arm servers?

Massive MIMO, the antenna-aiming problem, is a good candidate: it demands that the same specialized NP-complete problem be solved every millisecond; and improving the optimization means that more cell-phone conversations can be handled by the same expensive cell tower.

A new paper by Srikar Kasi and colleagues at Princeton, InterDigital, and University College London tries to answer the question, “based on current trends in the improvement of commercially shipped quantum annealing hardware, how many years in the future will be it cheaper to run a quantum annealer than to run a rack full of conventional servers?” And the answer is, maybe in the range 2027-2036 the two solutions will be equally cost-effective for medium-large cellular base stations. And maybe after 2036 the quantum annealer will consume 45% less electricity (be 45% cheaper to run) than conventional computers for large cellular base stations.

That’s pretty exciting. Because up to now I thought that quantum computing is the technology of the future and always will be. But 2036 is only fifteen years away, and maybe we’ll really see this happen.

Additional remark: Quantum simulators are another kind of special-purpose (not gate-model) soon-or-now practical quantum computer. What I find interesting about Massive MIMO is that the quantum computer is solving a combinatorial optimization problem that is not directly about quantum mechanics.

Another 2020 lawsuit over internet voting

Last week I summarized 4 lawsuits filed in 2020 over internet voting, in VA, NJ, NY, NH. Then I learned there was another in North Carolina.

In 2020 the North Carolina Council of the Blind sued the State Board of Elections, demanding that the Board offer “alternative format absentee ballots allowing private and independent method of absentee ballots that is accessible for Plaintiffs and others with vision disabilities.”

The Plaintiffs did not specifically demand internet ballot return. Instead, they demanded compliance with the Americans for Disabilities Act that requires “reasonable modifications in the Absentee Voting Program to avoid discrimination against Plaintiffs on the basis of disability.” The Plaintiffs also explained methods that other states were already using to accomplish this [all excerpted verbatim from the Complaint]:

  • Maryland developed an online marking tool that allows voters to access and mark their absentee ballots on their computers. . . . Although the absentee ballot must still be printed, signed, and returned to the voter’s local board of election, voters need not sacrifice the secrecy of their ballots to receive assistance with signing because the signature page prints separately from the ballot.
  • Oregon, Wisconsin, and New Hampshire have employed an accessible electronic voting system that can be used for both in-person and absentee voting. Using the platform, voters can access their absentee ballots through their web browser and mark their ballots on their computers. Voters then print their ballots and mail them back to their local boards of election where their votes are counted. … The platform used by these states has been released as open source technology, meaning that it can be used by the NCSBOE, or by any other entity, free of charge.
  • West Virginia has similarly provided an electronic absentee ballot delivery and marking tool to voters with disabilities. Voters access the electronic absentee ballot tool via a web portal, where they are guided with on-screen instructions on how to open and complete the electronic ballot. After completing the ballot, West Virginia law provides that qualifying voters may either (1) print and mail their absentee ballot to their county clerk, or (2) submit their absentee ballot electronically to their county clerk directly through the web portal.
  • Alaska [has] have provided accessible remote voting options for some elections . . . an electronic absentee ballot that can be completed and transmitted using the voter’s computer.
  • Michigan made its UOCAVA PDF ballots accessible and available to blind voters . . . [but no electronic return of voted ballots]
  • New York . . . agreed to email accessible absentee ballots to qualified voters with disabilities [but no electronic return of voted ballots]

In summary, the Complaint laid out a variety of ways in which other states had complied with the ADA, but did not specifically request internet ballot return. The motion for preliminary injunction was similarly open-ended.

The Court, apparently, felt that such an open-ended injunction would be difficult to enforce. And apparently North Carolina was already using Democracy Live’s system for UOCAVA voters, and said so in court. So the Court’s order read, “Defendants are hereby ORDERED to open the Democracy Live portal to plaintiffs and other blind voters as expeditiously as possible so that it may be utilized for the November 3, 2020 election.”

A year later, in August 2021, the State agreed to settle the case, having agreed to use the Democracy Live “accessible electronic voting portal” through which “the State Board provides visually impaired voters the ability to request to vote absentee, to mark their absentee ballots, and to return their absentee ballots.” In addition they “identified . . . vendors who are capable providing Large Print, Braille, or accessible electronic formats of absentee ballots and other communications related to voting processes.”

Judge Boyle’s final judgment omits any mention of Democracy Live, and orders “The North Carolina State Board of Elections (“NCSBE”) shall ensure that blind voters can request, mark, and return their absentee ballot through accessible electronic means in the 2021 municipal elections (whenever held) and in all subsequent elections.” (emphasis mine) So it appears that internet voting for blind voters in North Carolina currently has the force of law behind it.

It’s too bad that an opportunity was lost here. Last week I wrote, “it would be a good idea for the National Federation of the Blind to remove internet ballot return from its set of demands, and focus on the other reasonable and practical reforms that they have been requesting (or suing for).” And here is a case where the North Carolina Council for the Blind had already done just what I suggested, demanding “reasonable modifications in the Absentee Ballot program” but not specifically demanding the insecure, unsafe, unverifiable extreme of internet ballot return.

The North Carolina State Board of Elections then unnecessarily chose to include internet voting in their response, because their vendor (Democracy Live) made that convenient for them. (Democracy Live does that even more insecurely than you might have imagined! And each of their competitors does internet voting insecurely in its own surprising way!) And then Judge Boyle enshrined this expedient into an Order.

I am not a lawyer. But a lawyer friend writes: This judgment is only a legal way station; don’t assume this is anything near a permanent order, if other evidence is later adduced undermining its soundness. And really, the Plaintiffs’ initial motion was very poor lawyering in one critical respect: the request for “reasonable modifications” with examples from other States was basically punting to the Federal Judge to do their thinking for them, for the court to supply the terms for a precise order that for some inexplicable reason, they did not craft and then support in their brief.  It’s not surprising that, given the lack of expertise in the federal courts on internet voting insecurity, election security ambiguities, and various NC election administrative requirements, that the court (likely law clerks) jumped to the Democracy Live solution, which NC had already adopted for a different purpose.