September 14, 2024

PrivAds: Behavioral Advertising without Tracking

There’s an interesting new paper out of Stanford and NYU, about a system called “PrivAds” that tries to provide behavioral advertising on web sites, without having a central server gather detailed information about user behavior. If the paper’s approach turns out to work, it could have an important impact on the debate about online advertising and privacy.

Advertisers have obvious reasons to show you ads that match your interests. You can benefit too, if you see ads that are relevant to your needs, rather than ones you don’t care about. The problem, as I argued in my Congressional testimony, comes when sites track your activities, and build up detailed files on you, in order to do the targeting.

PrivAds tries to solve this problem by providing behavioral advertising without having any server track you. The idea is that your own browser will track you, and analyze your online activities to build a model of your interests, but your browser won’t reveal this information to anyone else. When a site wants to show you an interest-based ad, your browser will choose the ad from a portfolio of ads offered by the ad service.

The tricky part is how your browser can do all of this without incidentally leaking your activities to the server. For example, the ad agency needs to know how many times each ad was shown. How can you report this to the ad service without revealing which ads you saw? PrivAds offers a solution based on fancy cryptography, so that the ad agency can aggregate reports from many users, without being able to see the users’ individual reports. Similarly, every interaction between your browser and the outside must be engineered carefully so that behavioral advertising can occur but the browser doesn’t telegraph your actions.

It’s not clear at this point whether the PrivAds approach will work, in the sense of protecting privacy without reducing the effectiveness of ad targeting. It’s clear, though, that PrivAds is asking an important question.

If the PrivAds approach succeeds, demonstrating that behavioral advertising does not require tracking, this doesn’t mean that companies will stop wanting to track you — but it does mean that they won’t be able to use advertising as an excuse to track you.

Comments

  1. I have not read the paper and maybe I should. but one obvious problem would be the advertisers confidence that the browser is interpreting the user behavior and profile consistent with how the advertiser defines the demographic target of their advertisement.

    I can’t see advertisers being happy giving this kind of control to a browser, and even if it ever gets used I doubt it will last long before the advertisers start gaming the system, perhaps leading to a whole new class of privacy concerns.

    • The more important limitation is that it relies on client side behaviour. The servers won’t have control over the clients, and so the client could just choose not to show ads. In fact as pointed out that option already exists with ad-block and similar plugins.

      Relying on the client more is not going to be popular with adversizers.

      • It seems to me that would be easily solved by forcing the client to select, display and certify display of the ads before receiving the rest of the content. Or perhaps before receiving a key to decrypt the rest of the content. It would cause pages to load slower, but would still preserve anonymity for the client while ensuring ads do actually get shown. The owner of the content clearly has a right to ensure the ads are shown as a condition of releasing the content, but it would turn the client’s ad blocker software into a content blocker as well, which wouldn’t be popular with clients.
        The next thing on the market would be a blocker that lies to the server about displaying the ads, and the server would have to begin only serving to clients it has previously certified won’t lie to it. But that breaks down the whole “open architecture” concept.

      • Yeah, I agree on the point that the servers won’t have control over the clients. The clients should understand how to set it up. Or else, if you have an up-dated antivirus software, it just could easily being blocked.

        World CruisesMini CruisesAll Inclusive Cruise Deals

  2. but it does mean that they won’t be able to use advertising as an excuse to track you.

    Since when have “they” needed an “excuse”?

    As has been pointed out upthread, technologies to accomplish this have been around for 10+ years. The reason ad servers track individuals around the web is that somebody may be willing to pay for this data someday. Why on Earth do you think Rupert Murdoch bought MySpace? To keep up with his favorite indie bands?

    Perhaps you’re taking the position that, without the “excuse” of contextual advertising being tied to user tracking, the ad servers will be under some kind of market and/or political pressure to mend their privacy-invading ways. I submit to you that, for the past 50+ years, the advertising industry has maintained publicly that 1) advertising only exists to inform consumers and should therefore be protected under the First Amendment, and 2) advertising is capable of manipulating people into preferring one of two identical alternatives and therefore all businesses should invest heavily in advertising. The internal contradiction between these two positions is obvious, yet somehow they’ve never been pressured to choose between these two positions… I’m guessing they’re not going to feel a whole lot of heat after being deprived of their “excuse” for spying on people.

  3. Kiaser Zohay says

    I already have a browser plug-in that shows me ads based on my preferences.

    It is called Adblock.

    https://addons.mozilla.org/firefox/addon/1865

    kz

  4. Is this any different from the infomediaries that were proposed in the late 1990s? I discuss the challenges of client-side behavioral targeting (including the possible interference of anti-adware laws) at http://ssrn.com/abstract=912524 Eric.

    • What’s new in the PrivAds paper, compared to your previous “Coasean filters” work, is that they drill down into the specific mechanisms you would need to prevent the client-side “agent” or “filter” from leaking information to outsiders about your interests.

      As an example, an ad network wants to know how many times each ad was viewed, but your agent doesn’t want to reveal to anyone how many times you viewed each ad, because that would reveal information about your interests. So PrivAds provides a mechanism for letting the ad network learn the aggregate viewing data of a population of users, without being able to learn viewing data for any particular user (or small group of users). This requires use of fancy homomorphic encryption technology.

      Their goal is to enumerate the possible information leaks in a realistic ad service scenario, and to find specific technical methods for eliminating each of those leaks. They haven’t entirely succeeded at that goal, in my view, but they get a lot closer than anybody has before.

      • “As an example, an ad network wants to know how many times each ad was viewed, but your agent doesn’t want to reveal to anyone how many times you viewed each ad, because that would reveal information about your interests. So PrivAds provides a mechanism for letting the ad network learn the aggregate viewing data of a population of users, without being able to learn viewing data for any particular user (or small group of users). This requires use of fancy homomorphic encryption technology.”

        Really? I’d have thought a much simpler mechanism would suffice: if three ads are to be retrieved, request and retrieve the three chosen ads along with three more, randomly-selected ones. Discard the latter. Mix the six requests up in their temporal order (so the non-random ones aren’t the first three, or alternate, or some other pattern, but are randomly positioned and differently-positioned every time).

        The aggregate statistics will still show which are more popular than which, but there’ll be a bit of random noise. For instance, if the portfolio has 100 ads, every selected ad is matched by exactly one random ad, and one ad is selected twice as often as a second, and there are, say, 600 selections of ads, with 200 of the popular one and 100 of that second one, then:

        The popular one is loaded 200 times by selection. Of the other 400 ad requests, 4 are accompanied by a random selection of the popular one. The popular one is retrieved 204 times.

        The second one is loaded 100 times by selection. Of the other 500 ad requests, 5 are accompanied by a random selection of the second one. The second one is retrieved 105 times.

        Someone analyzing the ads to determine view rates knows that of the 1200 total ad retrievals, 600 were random, and of the ad inventory, each can therefore be expected to have had 6 random retrievals that don’t count as views. Subtracting, the popular one gets an estimated number of views of 198, and the second one 97. These are not only close to the true 2:1 ratio (twice 97 is 194) but close to the true numbers of selections (200 and 100).

        The downside is that a statistical picture might still be built up for individuals, though again a noisy one. There is deniability that a particular ad was shown to a particular user because of that user’s interests. If a user retrieves the popular ad above, knowing that 4 out of 204 retrievals were random, one estimates a 98% chance that the ad was selected rather than random in that particular instance.

        However, you’ll know with high likelihood that a popular ad was selected, but this tells you the least about the user, since making a commonplace selection most people make tells little about one. Conversely, when an ad is unpopular, the random retrievals will swamp the targeted ones, so the ad selections that would tell you more, and more specific, things about a person are simultaneously the ones you can least reliably determine particular people selected.

        So, this simple randomization scheme provides some protection of privacy, particularly around more esoteric interests. If you follow the crowd in some respect, this bland fact is not concealed; if you don’t in some other, the specifics are concealed (though if you rarely retrieve a popular ad, your unusual disinterest in some topic may be exposed to profiling).

        A more sophisticated randomization scheme could involve the ad service exposing ad retrieval frequencies in some form, say XML, and the random retrievals skewing toward the frequencies. The eventual effect is for every selection of an ad to be accompanied by a random retrieval of the SAME ad (by a different user). So the traffic statistics for the ads can be approximately deconvolved to their true viewing statistics simply by halving the numbers in this case, and unusual disinterests are masked by frequent random retrievals of the popular ad. (The random retrieval always being of a different ad than the corresponding selection retrieval means the more the popular ad is selected, the less it’s randomly retrieved by the same user’s browser, and vice versa.)

        On the other hand, this whole scheme seems to put ad retrieval under the browser’s control to such an extent that ad-blocking becomes completely trivial: an open source browser can easily be hacked to follow any ad protocol exactly, except that anything thus retrieved is not displayed at all. This is perfect, undetectable ad blocking. The only way this ad system beats the current server-side animated-GIF Russian roulette is if people have a strong incentive NOT to disable the ads. Making the ads so useful and so targeted that people prefer their presence to their absence seems implausible, because ads are usually an unwanted distraction when they pop up, regardless of whether they’d be welcome as, say, a sponsored search result. And we already have good ways of targeting search results.

        Now, a technology similar to these but anonymizing SEARCH QUERIES sounds enormously interesting…

  5. John Millington says

    “the ad agency can aggregate reports from many users, without being able to see the users’ individual reports.”

    Why does this remind me of another topic I’ve seen on this blog recently? 😉