April 24, 2024

“Signal Loss” and advertising privacy on Facebook

The 2021 Kyoto Prize in Advanced Technology, a major award administered by a Japanese foundation, goes to Andrew Chi-Chih Yao, a Chinese computer scientist who earned PhDs from Harvard and the University of Illinois before being a professor at MIT, Stanford, and Princeton and then becoming Dean of an important theoretical computer science education program at Tsinghua University.  Professor Yao is a theorist, his many major important results are in “computational complexity theory,” so how did he win an international award in “Advanced Technology?”    Well, one of his major results led to the invention of Secure Multiparty Computation (MPC), by which two or more people can pool their data to compute a result without actually disclosing their data to each other.  And in this article I’ll explain how one present day company seems to be applying MPC to try to comply with privacy rules issued by regulators.

Facebook tracks your web browsing in order to make money delivering ads to you.  Facebook has been under pressure from the European Union and from Apple to be less invasive of your privacy.  For example, in 2017 the EU put out a new Privacy Directive, and in 2019 Apple’s Safari browser stopped attaching cookies to third-party image requests.  In this article I’ll discuss some indications that Facebook is beginning to adjust its advertising-tracking model so they can make money without invading your privacy quite as much.  They are experimenting with secure multiparty computation, a “privacy enhancing technology” developed in academia, to measure which ad “impressions” convert to purchases on the average–but without knowing which individuals saw an ad and then made a purchase.

When you browse from one web site to another, many sites snoop on your browsing history, by tracking mechanisms such as cookies and single-pixel images (whose purpose is to track your http image-load requests).  Much of this tracking is for the purpose of making money by targeting ads to you.  Merchants (for example Nike) pay web sites (such as Facebook) to deliver ad views (“impressions”), and Nike pays more for impressions that “convert”, that is, lead to a purchase.  So, Facebook and (independently) Nike would like to (1) deliver ads that are likely to convert, and (2) measure which impressions are converting.  Facebook wants to make more money by delivering to you the ads most likely to convert, and Nike wants to make sure it’s getting its money’s worth from its ad budget.

One way to do this is: when you make a purchase at the Nike online store, the browser sends Facebook a copy of your nike.com shopping cart and your Facebook user ID.  Then Facebook looks up what Nike ads they displayed recently to that user ID; those ads converted to shoe sales, and Nike is happy to pay more for such ads.

That tracking can be a terrible invasion of privacy.  So for years now, regulators (in California and the European Union) and browser makers (like Firefox) have been adjusting restrictions on cookies (and other kinds of tracking) to try to improve privacy.  Facebook’s internal euphemism for privacy enforcement is “signal loss.”  Here’s an analysis of the problem, from an advertiser’s point of view (warning: much marketing-speak!).  The “signal” is the data that Facebook needs to manage its core revenue stream, advertising.  

(When the browser maker (Google, or Apple) is also a major advertising platform, there’s an inherent conflict of interest:  Apple’s Safari restricts Facebook and Google’s ad tracking more than it restricts Apple’s own ad tracking, and Google-the-browser-maker delayed tightening Chrome’s cookie-rules for two years because Google-the-ad-platform needed those cookies.)

So for years now, advertising platforms (like Facebook) have been adapting the way they intrusively track you, so they can still make money delivering relevant ads.   For example, aside from using cookies and tracking pixels inside the browser, Nike and Facebook share things they know about you outside, like your e-mail address, phone number, and home address.  Facebook’s system for that is their “Conversions API”, a software interface for merchants, to measure which advertisements “convert”, using server-to-server communications in the back end.

In any case, there’s pressure on Facebook (and Google and other advertising platforms) to be more respectful of privacy.  When it was just the U.S. Congress asking Zuck to testify at hearings, Facebook could perhaps laugh it off, but when entities with real enforcement power (Apple and California and the EU) start to insist on their users’ and citizens’ privacy, then Facebook might ask themselves, “How can we make money without so much privacy invasion?”

Google is trying one method, called “Federated Learning of Cohorts” (FLoC), but privacy advocates have severely criticized it, for good reason: instead of sharing your entire history, FLoC labels you with a summary of your history.  That’s still a significant privacy invasion, and it may even make it easier for bad guys to track large numbers of people in harmful ways.

Is there a better way?  Academic research on secure multiparty computation (MPC) has shown how to measure a global property (how many Nike ads converted into sales) without identifying specific users’ histories.  In particular, with the right multiparty protocol, Nike (or Facebook) can’t tell which specific purchases at nike.com resulted from ads, and they can’t tell which specific ad-impressions resulted in sales, but they can measure the average.  And that’s good enough:  good enough for Facebook to target you with ads that are more likely to convert; good enough for Nike to know that they’re getting their money’s worth from Facebook.

The way this kind of MPC would work is,  the ad platform (such as Facebook) knows what ads it shows to each user.  The merchant (such as Nike) knows which shoes it sold to each purchaser.  They want to jointly compute the effectiveness of the ad campaign, but without Facebook revealing to Nike anything about individual users, and without Nike revealing to Facebook anything about individual purchasers.  (but see note 1 below) So Nike would encrypt its collection of shopping carts, and Facebook would encrypt its collection of ad-impression data, and they use homomorphic encryption to compute the “join” of these relations without either one seeing the other’s unencrypted data.

And indeed,  Facebook claims to be adopting this method (though this explainer is very short on technical details).  But Facebook has a public github repo for their new API for advertisers, based on MPC.  And this press release says they’re already testing their “Private Lift Measurement” with some advertisers.  

Will Facebook adopt this for all advertisers?  If they do, then I think it really will be a privacy improvement.  It’s more private than Google’s solution of publicly labeling each user with a summary of their history.  Of course, Facebook still knows where on Facebook you’ve been, in every detail; and Nike still knows what you browsed in their on-line store; but nobody will know both at once.

Although MPC can measure ad conversions–whether Facebook is delivering ads that will increase shoe sales–it probably cannot target ads quite as precisely.  That is, Facebook’s machine-learning criteria to decide which ads to show you might work better if they do their super-privacy-intrusive tracking of everything you do on and off Facebook.  By limiting their tracking to on-Facebook-only, they may find that ad impressions have a slightly lower conversion rate, so Facebook makes slightly less money.  Time will tell whether they’re willing to take that hit.

And SMP won’t solve other societal problems that aren’t related to privacy:  The duopoly of Google/Youtube and Facebook/Instagram in online advertising, Youtube and Facebook’s recommender systems pushing users towards extreme views, Youtube and Facebook trying to maximize the amount of time you waste on-line, Instagram harmful to teenage girls–none of these are about privacy, and secure multiparty computation doesn’t address those problems.

Note 1. Sarah Scheffler, a postdoctoral fellow at Princeton’s Center for Information Technology Policy (CITP), writes, MPC’s “private” nature in these descriptions depends not only on using MPC, but also on using MPC to compute a privacy-preserving function.  MPC could be used in the way you describe to privately compute average ad conversions, but could also be used to say, “privately” compute the list of users who are shared between Nike and Facebook or something (and I’ve heard suggestions of it being used for exactly that purpose).  In the latter case, it’s still technically more private than Facebook and Nike comparing lists in the clear, but I don’t think it’s what most people want from “private computing”. 

So Sarah and I took a look at Facebook’s open-source MPC repo, where we see strong evidence that they are computing appropriately private functions (such as “total value of an ad campaign”).