August 29, 2016


Can Facebook really make ads unblockable?

[This is a joint post with Grant Storey, a Princeton undergraduate who is working with me on a tool to help users understand Facebook’s targeted advertising.]

Facebook announced two days ago that it would make its ads indistinguishable from regular posts, and hence impossible to block. But within hours, the developers of Adblock Plus released an update which enabled the tool to continue blocking Facebook ads. The ball is now back in Facebook’s court. So far, all it’s done is issue a rather petulant statement. The burning question is this: can Facebook really make ads indistinguishable from content? Who ultimately has the upper hand in the ad blocking wars?

There are two reasons — one technical, one legal — why we don’t think Facebook will succeed in making its ads unblockable, if a user really wants to block them.

The technical reason is that the web is an open platform. When you visit, Facebook’s server sends your browser the page content along with instructions on how to render them on the screen, but it is entirely up to your browser to follow those instructions. The browser ultimately acts on behalf of the user, and gives you — through extensions — an extraordinary degree of control over its behavior, and in particular, over what gets displayed on the screen. This is what enables the ecosystem of ad-blocking and tracker-blocking extensions to exist, along with extensions for customizing web pages in various other interesting ways.

Indeed, the change that Adblock Plus made in order to block the new, supposedly unblockable ads is just a single line in the tool’s default blocklist:[id^="substream_"] div[id^="hyperfeed_story_id_"][data-xt]

What’s happening here is that Facebook’s HTML code for ads has slight differences from the code for regular posts, so that Facebook can keep things straight for its own internal purposes. But because of the open nature of the web, Facebook is forced to expose these differences to the browser and to extensions such as Adblock Plus. The line of code above allows Adblock Plus to distinguish the two categories by exploiting those differences.

Facebook engineers could try harder to obfuscate the differences. For example, they could use non-human-readable element IDs to make it harder to figure out what’s going on, or even randomize the IDs on every page load. We’re surprised they’re not already doing this, given the grandiose announcement of the company’s intent to bypass ad blockers. But there’s a limit to what Facebook can do. Ultimately, Facebook’s human users have to be able to tell ads apart, because failure to clearly distinguish ads from regular posts would run headlong into the Federal Trade Commission’s rules against misleading advertising — rules that the commission enforces vigorously. [1, 2] And that’s the second reason why we think Facebook is barking up the wrong tree.

Facebook does allow human users to easily recognize ads: currently, ads say “Sponsored” and have a drop-down with various ad-related functions, including a link to the Ad Preferences page. And that means someone could create an ad-blocking tool that looks at exactly the information that a human user would look at. Such a tool would be mostly immune to Facebook’s attempts to make the HTML code of ads and non-ads indistinguishable. Again, the open nature of the web means that blocking tools will always have the ability to scan posts for text suggestive of ads, links to Ad Preferences pages, and other markers.

But don’t take our word for it: take our code for it instead. We’ve created a prototype tool that detects Facebook ads without relying on hidden HTML code to distinguish them. [Update: the source code is here.] The extension examines each post in the user’s news feed and marks those with the “Sponsored” link as ads. This is a simple proof of concept, but the detection method could easily be made much more robust without incurring a performance penalty. Since our tool is for demonstration purposes, it doesn’t block ads but instead marks them as shown in the image below.  

All of this must be utterly obvious to the smart engineers at Facebook, so the whole “unblockable ads” PR push seems likely to be a big bluff. But why? One possibility is that it’s part of a plan to make ad blockers look like the bad guys. Hand in hand, the company seems to be making a good-faith effort to make ads more relevant and give users more control over them. Facebook also points out, correctly, that its ads don’t contain active code and aren’t delivered from third-party servers, and therefore aren’t as susceptible to malware.

Facebook does deserve kudos for trying to clean up and improve the ad experience. If there is any hope for a peaceful resolution to the ad blocking wars, it is that ads won’t be so annoying as to push people to install ad blockers, and will be actually useful at least some of the time. If anyone can pull this off, it is Facebook, with the depth of data it has about its users. But is Facebook’s move too little, too late? On most of the rest of the web, ads continue to be creepy malware-ridden performance hogs, which means people will continue to install ad blockers, and as long as it is technically feasible for ad blockers to block Facebook ads, they’re going to continue to do so. Let’s hope there’s a way out of this spiral.

[1] Obligatory disclaimer: we’re not lawyers.

[2] Facebook claims that Adblock Plus’s updates “don’t just block ads but also posts from friends and Pages”. What they’re most likely referring to that Adblock Plus blocks ads that are triggered by one of your friends Liking the advertiser’s page. But these are still ads: somebody paid for them to appear in your feed. Facebook is trying to blur the distinction in its press statement, but it can’t do that in its user interface, because that is exactly what the FTC prohibits.


The workshop on Data and Algorithmic Transparency

From online advertising to Uber to predictive policing, algorithmic systems powered by personal data affect more and more of our lives. As our society begins to grapple with the consequences of this shift, empirical investigation of these systems has proved vital to understand the potential for discrimination, privacy breaches, and vulnerability to manipulation.

This emerging field of research, which we’re calling Data and Algorithmic Transparency, seems poised to grow dramatically. But it faces a number of methodological challenges which can only be solved by bringing together expertise from a variety of disciplines. That is why Alan Mislove and I are organizing the first workshop on Data and Algorithmic Transparency at Columbia University on Nov 19, 2016.

Here are three reasons you should participate in this workshop.

  1. Start of a new, interdisciplinary community. The set of disciplines represented on the Program Committee is strikingly diverse: Internet measurement, information privacy/security, computer systems, human-computer interaction, law, and media studies. Industrial research and government are also represented. We expect the workshop itself to have a similar mix of participants, and that is exactly what is needed to make transparency research a success. Alan and I (and others including Nikolaos Laoutaris) are committed to growing and nurturing this community over the next several years.
  1. Co-located with two other exciting events: the Data Transparency Lab conference (DTL ‘16) and the Fairness, Accountability, and Transparency in Machine Learning workshop (FAT-ML ‘16). DTL shares many of the goals of the DAT workshop, but is non-academic. FAT-ML has a complementary relationship with the goals of DAT: it seeks to develop machine learning techniques for developers of algorithmic systems to improve fairness and accountability, whereas DAT seeks to analyze existing systems, typically “from the outside”. The events are consecutive and non-overlapping, and participants of each event are encouraged to attend the others.
  1. A format that makes the most of everyone’s time. At most computer science conferences, each speaker mumbles through their slides while the audience is a sea of laptops, awaiting their turn. DAT will be the opposite. We plan to have paper discussions instead of paper presentations, with commenters and participants, rather than authors, doing most of the speaking about each paper. This first edition of DAT will be non-archival (but peer-reviewed), and one goal of the discussions is to help authors improve their papers for later publication. We are also soliciting talk proposals about already published work; groups of accepted talks will be organized into panels.

See you in New York City!


Revealing Algorithmic Rankers

By Julia Stoyanovich (Assistant Professor of Computer Science, Drexel University) and Ellen P. Goodman (Professor, Rutgers Law School)

ProPublica’s story on “machine bias” in an algorithm used for sentencing defendants amplified calls to make algorithms more transparent and accountable. It has never been more clear that algorithms are political (Gillespie) and embody contested choices (Crawford), and that these choices are largely obscured from public scrutiny (Pasquale and Citron). We see it in controversies over Facebook’s newsfeed, or Google’s search results, or Twitter’s trending topics. Policymakers are considering how to operationalize “algorithmic ethics” and scholars are calling for accountable algorithms (Kroll, et al.).

One kind of algorithm that is at once especially obscure, powerful, and common is the ranking algorithm (Diakopoulos). Algorithms rank individuals to determine credit worthiness, desirability for college admissions and employment, and compatibility as dating partners. They encode ideas of what counts as the best schools, neighborhoods, and technologies. Despite their importance, we actually can know very little about why this person was ranked higher than another in a dating app, or why this school has a better rank than that one. This is true even if we have access to the ranking algorithm, for example, if we have complete knowledge about the factors used by the ranker and their relative weights, as is the case for US News ranking of colleges. In this blog post, we argue that syntactic transparency, wherein the rules of operation of an algorithm are more or less apparent, or even fully disclosed, still leaves stakeholders in the dark: those who are ranked, those who use the rankings, and the public whose world the rankings may shape.

Using algorithmic rankers as an example, we argue that syntactic transparency alone will not lead to true algorithmic accountability (Angwin). This is true even if the complete input data is publicly available. We advocate instead for interpretability, which rests on making explicit the interactions between the program and the data on which it acts. An interpretable algorithm allows stakeholders to understand the outcomes, not merely the process by which outcomes were produced.

Opacity in Algorithmic Rankers

Algorithmic rankers take as input a database of items and produce a ranked list of items as output. The relative ranking of the items may be computed based on an explicitly provided scoring function. Or the ranking function may be learned, using learning-to-rank methods that are deployed extensively in information retrieval and recommender systems.

The simplest kind of a ranker is a score-based ranker, which applies a scoring function independently to each item and then sorts the items on their scores. Many of these rankers use monotone aggregation scoring functions, such as weighted sums of attribute values with non-negative weights. In the very simplest case, the score of an item is computed by sorting on the value of just one attribute, i.e., by setting the weight of that attribute to 1 and of all other attributes to 0.

This is illustrated in our running example in Table 1, which gives a ranking of 51 computer science departments as per (CSR). We augmented the data with several attributes from the assessment of research-doctorate programs by the National Research Council (NRC) to illustrate some points. Source of an attribute (CSR or NRC) is listed next to the attribute name. We recognize that the augmented CS rankings are already syntactically transparent. What’s more, they provide the entire data set. We use them for illustrative purposes.

Table 1: A ranking of Computer Science departments per, with additional attributes from the NRC assessment dataset. Here, the average count computes the geometric mean of the adjusted number of publications in each area by institution, faculty is the number of faculty in the department, pubs is the average number of publications per faculty (2000-2006) , GRE is the average GRE scores (2004-2006). Departments are ranked by average count.

Rank (CSR) Name Average Count (CSR) Faculty (CSR) Pubs (NRC) GRE (NRC)
1 Carnegie Mellon University 18.3 122 2 791
2 Massachusetts Institute of Technology 15 64 3 772
3 Stanford University 14.3 55 5 800
4 University of California–Berkeley 11.4 50 3 789
5 University of Illinois–Urbana-Champaign 10.5 55 3 772
full table
45 Yale University 1.5 18 2 800
45 University of Virginia 1.5 18 2 789
45 University of Rochester 1.5 18 3 786
48 Arizona State University 1.4 14 2 787
48 University of Arizona 1.4 18 2 784
48 Virginia Polytechnic Institute and State University 1.4 32 1 780
48 Washington University in St. Louis 1.4 17 2 790

Ranked results are difficult for people to interpret, whether a ranking is computed explicitly or learned, whether the method (e.g., the scoring function or, more generally, the model) is known or unknown, and whether the user can access the entire output or only the highest-ranked items (the top-k). There are several sources of this opacity, illustrated below for score-based rankers.

Sources of Opacity

Source 1: The scoring formula alone does not indicate the relative rank of an item. Rankings are, by definition, relative, while scores are absolute. Knowing how the score of an item is computed says little about the outcome — the position of a particular item in the ranking, relative to other items. Is 10.5 a high score or a low score? That depends on how 10.5 compares to the scores of other items, for example to the highest attainable score and to the highest score of some actual item in the input. In our example in Table 1 this kind of opacity is mitigated because there is both syntactic transparency (the scoring formula is known) and the input is public.

Source 2: The weight of an attribute in the scoring formula does not determine its impact on the outcome. Consider again the example in Table 1, and suppose that we first normalize the values of the attributes, and then compute the score of each department by summing up the values of faculty (with weight 0.2), average count (with weight 0.3) and GRE (with weight 0.5). According to this scoring method, the size of the department (faculty) is the least important factor. Yet, it will be the deciding factor that sets apart top-ranked departments from those in lower ranks, both because the value of this attribute changes most dramatically in the data, and because it correlates with average count (in effect, double-counting). In contrast, GRE is syntactically the most important factor in the formula, yet in this dataset it has very close values for all items, and so has limited actual effect on the ranking.

Source 3: The ranking output may be unstable. A ranking may be unstable because of the scores generated on a particular dataset. An example would be tied scores, where the tie is not reflected in the ranking. In this case, the choice of any particular rank order is arbitrary. Moreover, unless raw scores are disclosed, the user has no information about the magnitude of the difference in scores between items that appear in consecutive ranks. In Table 1, CMU (18.3) has a much higher score than the immediately following MIT (15). This is in contrast to, e.g., UIUC (10.5, rank 5) and UW (10.3, rank 6), which are nearly tied. The difference in scores between distinct adjacent ranks decreases dramatically as we move down the list: it is at most 0.3, and usually 0.1, for departments in ranks 16 through 48. CSRankings’ syntactic transparency (disclosing its ranking method to the user) and accessible data allow us to see the instability, but this is unusual.

Source 4: The ranking methodology may be unstable. The scoring function may produce vastly different rankings with small changes in attribute weights. This is difficult to detect even with syntactic transparency, and even if the data is public. Malcolm Gladwell discusses this issue and gives compelling examples in his 2011 piece, The Order of Things. In our example in Table 1, a scoring function that is based on a combination of pubs and GRE would be unstable, because the values of these attributes are both very close for many of the items and induce different rankings, and so prioritizing one attribute over the other slightly would cause significant re-shuffling.

The opacity concerns described here are all due to the interaction between the scoring formula (or, more generally, an a priori postulated model) and the actual dataset being ranked. In a recent paper, one of us observed that structured datasets show rich correlations between item attributes in the presence of ranking, and that such correlations are often local (i.e., are present in some parts of the dataset but not in others). To be clear, this kind of opacity is present whether or not there is syntactic transparency.

Harms of Opacity

Opacity in algorithmic rankers can lead to four types of harms:

(1) Due process / fairness. The subjects of the ranking cannot have confidence that their ranking is meaningful or correct, or that they have been treated like similarly situated subjects. Syntactic transparency helps with this but it will not solve the problem entirely, especially when people cannot interpret how weighted factors have impacted the outcome (Source 2 above).

(2) Hidden normative commitments. A ranking formula implements some vision of the “good.” Unless the public knows what factors were chosen and why, and with what weights assigned to each, it cannot assess the compatibility of this vision with other norms. Even where the formula is disclosed, real public accountability requires information about whether the outcomes are stable, whether the attribute weights are meaningful, and whether the outcomes are ultimately validated against the chosen norms. Did the vendor evaluate the actual effect of the features that are postulated as important by the scoring / ranking mode? Did the vendor take steps to compensate for mutually-reinforcing correlated inputs, and for possibly discriminatory inputs? Was stability of the ranker interrogated on real or realistic inputs? This kind of transparency around validation is important for both learning algorithms which operate according to rules that are constantly in flux and responsive to shifting data inputs, and for simpler score-based rankers that are likewise sensitive to the data.

(3) Interpretability. Especially where ranking algorithms are performing a public function (e.g., allocation of public resources or organ donations) or directly shaping the public sphere (e.g., ranking politicians), political legitimacy requires that the public be able to interpret algorithmic outcomes in a meaningful way. At the very least, they should know the degree to which the algorithm has produced robust results that improve upon a random ordering of the items (a ranking-specific confidence measure). In the absence of interpretability, there is a threat to public trust and to democratic participation, raising the dangers of an algocracy (Danaher) – rule by incontestable algorithms.

(4) Meta-methodological assessment. Following on from the interpretability concerns is a meta question about whether a ranking algorithm is the appropriate method for shaping decisions. There are simply some domains, and some instances of datasets, in which rank order is not appropriate. For example, if there are very many ties or near-ties induced by the scoring function, or if the ranking is too unstable, it may be better to present data through an alternative mechanism such as clustering. More fundamentally, we should question the use of an algorithmic process if its effects are not meaningful or if it cannot be explained. In order to understand whether the ranking methodology is valid, as a first order question, the algorithmic process needs to be interpretable.

The Possibility of Knowing

Recent scholarship on the issue of algorithmic accountability has devalued transparency in favor of verification. The claim is that because algorithmic processes are protean and extremely complex (due to machine learning) or secret (due to trade secrets or privacy concerns), we need to rely on retrospective checks to ensure that the algorithm is performing as promised. Among these checks would be cryptographic techniques like zero knowledge proofs (Kroll, et al.) to confirm particular features, audits (Sandvig) to assess performance, or reverse engineering (Perel and Elkin-Koren) to test cases.

These are valid methods of interrogation, but we do not want to give up on disclosure. Retrospective testing puts a significant burden on users. Proofs are useful only when you know what you are looking for. Reverse engineering with test cases can lead to confirmation bias. All these techniques put the burden of inquiry exclusively on individuals for whom interrogation may be expensive and ultimately fruitless. The burden instead should fall more squarely on the least cost avoider, which will be the vendor who is in a better position to reveal how the algorithm works (even if only partially). What if food manufacturers resisted disclosing ingredients or nutritional values, and instead we were put to the trouble of testing their products or asking them to prove the absence of a substance? That kind of disclosure by verification is very different from having a nutritional label.

What would it take to provide the equivalent of a nutritional label for the process and the outputs of algorithmic rankers? What suffices as an appropriate and feasible explanation depends on the target audience.

For an individual being ranked, a useful description would explain his specific ranked outcome and suggest ways to improve the outcome. What changes can NYU CS make to improve its ranking? Why is the NYU CS department ranked 24? Which attributes make this department perform worse than those ranked higher? As we argued above, the answers to these questions depend on the interaction between the ranking method and the dataset over which the ranker operates. When working with data that is not public (e.g., involving credit or medical information about individuals), an explanation mechanism of this kind must be mindful of any privacy considerations. Individually-responsive disclosures could be offered in a widget that allows ranked entities to experiment with the results by changing the inputs.

An individual consumer of a ranked output would benefit from a concise and intuitive description of the properties of the ranking. Based on this explanation, users will get a glimpse of, e.g., the diversity (or lack thereof) that the ranking exhibits in terms of attribute values. Both attributes that comprise the scoring function, if known (or, more generally, features that make part of the model), and attributes that co-occur or even correlate with the scoring attributes, can be described explicitly. In our example in Table 1, a useful explanation may be that a ranking on average count will over-represent large departments (with many faculty) at the top of the list, while GRE does not strongly influence rank.

Figure 1: A hypothetical Ranking Facts label.

Figure 1 presents a hypothetical “nutritional label” for rankings, using the augmented CSRankings in Table 1 as input. Inspired by Nutrition Facts, our Ranking Facts label is aimed at the consumer, such as a prospective CS program applicant, and addresses three of the four opacity sources described above: relativity, impact, and output stability. We do not address methodological stability in the label. How this dimension should be quantified and presented to the user is an open technical problem.

The Ranking Facts show how the properties of the 10 highest-ranked items compare to the entire dataset (Relativity), making explicit cases where the ranges of values, and the median value, are different at the top-10 vs. overall (median is marked with red triangles for faculty size and average publication count). The label lists the attributes that have most impact on the ranking (Impact), presents the scoring formula (if known), and explains which attributes correlate with the computed score. Finally, the label graphically shows the distribution of scores (Stability), explaining that scores differ significantly up to top-10 but are nearly indistinguishable in later positions.

Something like the Rankings Facts makes the process and outcome of algorithmic ranking interpretable for consumers, and reduces the likelihood of opacity harms, discussed above. Beyond Ranking Facts, it is important to develop Interpretability tools that enable vendors to design fair, meaningful and stable ranking processes, and that support external auditing. Promising technical directions include, e.g., quantifying the influence of various features on the outcome under different assumptions about availability of data and code, and investigating whether provenance techniques can be used to generate explanations.


A Peek at A/B Testing in the Wild

[Dillon Reisman was previously an undergraduate at Princeton when he worked on a neat study of the surveillance implications of cookies. Now he’s working with the WebTAP project again in a research + engineering role. — Arvind Narayanan]

In 2014, Facebook revealed that they had manipulated users’ news feeds for the sake of a psychology study looking at users’ emotions. We happen to know about this particular experiment because it was the subject of a publicly-released academic paper, but websites do “A/B testing” every day that is completely opaque to the end-user. Of course, A/B testing is often innocuous (say, to find a pleasing color scheme), but the point remains that the user rarely has any way of knowing in what ways their browsing experience is being modified, or why their experience is being changed in particular.

By testing websites over time and in a variety of conditions we could hope to discover how users’ browsing experience is manipulated in not-so-obvious ways. But one third-party service actually makes A/B testing and user tracking human-readable — no reverse-engineering or experimentation necessary! This is the widely-used A/B testing provider Optimizely; Jonathan Mayer had told us it would be an interesting target of study.* Their service is designed to expose in easily-parsable form how its clients segment users and run experiments on them directly in the JavaScript they embed on websites. In other words, if uses Optimizely, the entire logic used by for A/B testing is revealed to every visitor of

That means that the data collected by our large-scale web crawler OpenWPM contains the details of all the experiments that are being run across the web using Optimizely. In this post I’ll show you some interesting things we found by analyzing this data. We’ve also built a Chrome extension, Pessimizely, that you can download so you too can see a website’s Optimizely experiments. When a website uses Optimizely, the extension will alert you and attempt to highlight any elements on the page that may be subject to an experiment. If you visit, it will also show you alternative news headlines when you hover over a title. I suggest you give it a try!


The New York Times website, with headlines that may be subject to an experiment highlighted by Pessimizely.


The Optimizely Scripts

Our OpenWPM web crawler collects and stores javascript embedded on every page it visits. This makes it straightforward to make a query for every page that uses Optimizely and grab and analyze the code they get from Optimizely. Once collected, we investigated the scripts through regular expression-matching and manual analysis.

  "4495903114": {
      "code": …
      "name": "100000004129417_1452199599 
               [A.] New York to Appoint Civilian to Monitor Police Surveillance -- 
               [B.] Sued Over Spying on Muslims, New York Police Get Oversight",
      "variation_ids": ["4479602534","4479602535"],
      "urls": [{
        "match": "simple",
        "value": ""
      "enabled_variation_ids": ["4479602534","4479602535"]

An example of an experiment from that is A/B testing two variations of a headline in a link to an article.

From a crawl of the top 100k sites in January 2016, we found and studied 3,306 different websites that use Optimizely. The Optimizely script for each site contains a data object that defines:

  1. How the website owner wants to divide users into “audiences,” based on any number of parameters like location, cookies, or user-agent.
  2. Experiments that the users might experience, and what audiences should be targeted with what experiments.

The Optimizely script reads from the data object and then executes a javascript payload and sets cookies depending on if the user is in an experimental condition. The site owner populates the data object through Optimizely’s web interface – who on a website’s development team can access that interface and what they can do is a question for the site owner. The developer also helpfully provides names for their user audiences and experiments.

In total, we found around 51,471 experiments on the 3,306 websites in our dataset that use Optimizely. On average each website has approximately 15.2 experiments, and each experiment has about 2.4 possible variations. We have only scratched the surface of some of the interesting things sites use A/B testing for, and here I’ll share a couple of the more interesting examples:


News publishers test the headlines users see, with differences that impact the tone of the article

A widespread use of Optimizely among news publishers is “headline testing.” To use an actual recent example from the, a link to an article headlined:

“Turkey’s Prime Minister Quits in Rift With President”

…to a different user might appear as…

“Premier to Quit Amid Turkey’s Authoritarian Turn.”

The second headline suggests a much less neutral take on the news than the first. That sort of difference can paint a user’s perception of the article before they’ve read a single word. We found other examples of similarly politically-sensitive headlines changing, like the following from

“Judge Rules Sandy Hook Families Can Proceed with Lawsuit Against Remington”

…could appear to some users as…

“Second Amendment Under Assault by Sandy Hook Judge.”

While editorial concerns might inform how news publishers change headlines, it’s clear that a major motivation behind headline testing is the need to drive clicks. A third variation we found for the Sandy Hook headline above is the much vaguer sounding “Huge Development in Sandy Hook Gun Case.” The Wrap, an entertainment news outlet, experimented with replacing “Disney, Paramount  Had Zero LGBT Characters in Movies Last Year” with the more obviously “click-baity” headline “See Which 2 Major Studios Had Zero LGBT Characters in 2015 Movies.”

We were able to identify 17 different news websites in our crawl that in the past have done some form of headline testing. This is most likely an undercount in our crawl — most of these 17 websites use Optimizely’s integrations with other third-party platforms like and WordPress for their headline testing, making them more easily identified. The New York Times website, for instance, implements its own headline testing code.

Another limitation of what we’ve found so far is that the crawls that we analyzed only visit the homepage of each site. The OpenWPM crawler could be configured, however, to browse links from within a site’s homepage and collect data from those pages. A broader study of the practices of news publishers could use the tool to drill down deeper into news sites and study their headlines over time.


Websites identify and categorize users based on money and affluence

Many websites target users based on IP and geolocation. But when IP/geolocation are combined with notions of money the result is surprising. The website of a popular fitness tracker targets users that originate from a list of six hard-coded IP addresses labelled “IP addresses Spending more than $1000.” Two of the IP addresses appear to be larger enterprise customers — a medical research institute a prominent news magazine. Three belong to unidentified Comcast customers. These big-spending IP addresses were targeted in the past with an experiment presented the user a button that either suggested the user “learn more” about a particular product or “buy now.”

Connectify, a large vendor of networking software, uses geolocation on a coarser level — they label visitors from the US, Australia, UK, Canada, Netherlands, Switzerland, Denmark, and New Zealand as coming from “Countries that are Likely to Pay.”

Non-profit websites also experiment with money. charity: water ( and the Human Rights Campaign ( both have experiments defined to change the default donation amount a user might see in a pre-filled text box.


Web developers use third-party tools for more than just their intended use

A developer following the path of least resistance might use Optimizely to do other parts of their job simply because it is the easiest tool available. Some of the more exceptional “experiments” deployed by websites are simple bug-fixes, described with titles like, “[HOTFIX][Core Commerce] Fix broken sign in link on empty cart,” or “Fix- Footer links errors 404.” Other experiments betray the haphazard nature of web development, with titles like “delete me,” “Please Delete this Experiment too,” or “#Bugfix.”

We might see these unusual uses because Optimizely allows developers to edit and rollout new code with little engineering overhead. With the inclusion of one third-party script, a developer can leverage the Optimizely web interface to do a task that might otherwise take more time or careful testing. This is one example of how third-parties have evolved to become integral to the entire functionality and development of the web, raising security and privacy concerns.


The need for transparency

Much of the web is curated by inscrutable algorithms running on servers, and a concerted research effort is needed to shed light on the less-visible practices of websites. Thanks to the Optimizely platform we can at least peek into that secret world.

We believe, however, that transparency should be the default on the web — not the accidental product of one third-party’s engineering decisions. Privacy policies are a start, but they generally only cover a website’s data collection and third-party usage on a coarse level. The New York Times Privacy Policy, for instance, does not even suggest that headline testing is something they might do, despite how it could drastically alter your consumption of the news. If websites had to publish more information about what third-parties they use and how they use them, regulators could use that information to better protect consumers on the web. Considering the potentially harmful effects of how websites might use third-parties, more transparency and oversight is essential.



* This was a conversation a year ago, when Jonathan was a grad student at Stanford.


Why Making Johnny’s Key Management Transparent is So Challenging

In light of the ongoing debate about the importance of using end-to-end encryption to protect our data and communications, several tech companies have announced plans to increase the encryption in their services. However, this isn’t a new pledge: since 2014, Google and Yahoo have been working on a browser plugin to facilitate sending encrypted emails using their services. Yet in recent weeks, some have criticized that only alpha releases of these tools exist, and have started asking why they’re still a work in progress.

One of the main challenges to building usable end-to-end encrypted communication tools is key management. Services such as Apple’s iMessage have made encrypted communication available to the masses with an excellent user experience because Apple manages a directory of public keys in a centralized server on behalf of their users. But this also means users have to trust that Apple’s key server won’t be compromised or compelled by hackers or nation-state actors to insert spurious keys to intercept and manipulate users’ encrypted messages. The alternative, and more secure, approach is to have the service provider delegate key management to the users so they aren’t vulnerable to a compromised centralized key server. This is how Google’s End-To-End works right now. But decentralized key management means users must “manually” verify each other’s keys to be sure that the keys they see for one another are valid, a process that several studies have shown to be cumbersome and error-prone for the vast majority of users. So users must make the choice between strong security and great usability.

In August 2015, we published our design for CONIKS, a key management system that addresses these usability and security issues. CONIKS makes the key management process transparent and publicly auditable. To evaluate the viability of CONIKS as a key management solution for existing secure communication services, we held design discussions with experts at Google, Yahoo, Apple and Open Whisper Systems, primarily over the course of 11 months (Nov ‘14 – Oct ‘15). From our conversations, we learned about the open technical challenges of deploying CONIKS in a real-world setting, and gained a better understanding for why implementing a transparent key management system isn’t a straightforward task.
[Read more…]


Apple, FBI, and Software Transparency

The Apple versus FBI showdown has quickly become a crucial flashpoint of the “new Crypto War.” On February 16 the FBI invoked the All Writs Act of 1789, a catch-all authority for assistance of law enforcement, demanding that Apple create a custom version of its iOS to help the FBI decrypt an iPhone used by one of the San Bernardino shooters. The fact that the FBI allowed Apple to disclose the order publicly, on the same day, represents a rare exception to the government’s normal penchant for secrecy.

The reasons behind the FBI’s unusually loud entrance are important – but even more so is the risk that after the present flurry concludes, the FBI and other government agencies will revert to more shadowy methods of compelling companies to backdoor their software. This blog post explores these software transparency risks, and how new technical measures could help ensure that the public debate over software backdoors remains public.
[Read more…]


Free Law Project Partnering in Stewardship of RECAP

More than five years ago, I spoke at CITP about the US Federal Courts electronic access system called PACER. I noted that despite centuries of precedent stating that the public should have access to the law as openly and freely as possible, the courts were charging unreasonable rates for access to the public record. As it happened, David Robinson, Harlan Yu, Bill Zeller, and Ed Felten had recently written their paper “Government Data and the Invisible Hand“, arguing that:

…the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data.

After my talk, Harlan Yu and Tim Lee approached me with an idea to make millions of court records available for free: a simple browser extension that made it easy for individuals to share the records that they had purchased from PACER with others who were looking for the same records. The idea became RECAP (“turning PACER around”), and the tool has indeed helped to liberate millions of public records in the years since then. But the time has come to turn over our stewardship, and we could not be more pleased that CITP is announcing a new partnership with Free Law Project to take over and expand upon RECAP.
[Read more…]


Software Transparency

Thanks to the recent NSA leaks, people are more worried than ever that their software might have backdoors. If you don’t believe that the software vendor can resist a backdoor request, the onus is on you to look for a backdoor. What you want is software transparency.

Transparency of this type is a much-touted advantage of open source software, so it’s natural to expect that the rise of backdoor fears will boost the popularity of open source code. [Read more…]


Open Internet Advisory Committee kick-off

Last Friday, we had the first meeting of the Open Internet Advisory Committee (OIAC), called for by the FCC in the recent Open Internet Order. The members of the OIAC  consist of a mix of folks from venture capital firms, ISPs, governance organizations, community organizations, and academics like myself.  The OIAC’s mission is to “track and evaluate the effect of the FCC’s Open Internet rules, and to provide any recommendations it deems appropriate to the FCC regarding policies and practices related to preserving the open Internet.”  The video of our kick-off meeting is online.

In addition to the full meeting of the OIAC, we broke into four working groups — on mobile broadband, economic impacts of Open Internet frameworks, specialized services, and transparency. I’m chairing the group looking at Mobile Broadband, a tremendously important area since wireless and cellular networks are rapidly becoming the dominant way users access the Internet.  I’m looking forward to working with the rest of the working group, and the rest of the OIAC, to better understand how to ensure openness and transparency, while preserving the ability of service providers to manage their networks.  One thing is certain — fun and interesting times lie ahead!


The Engine of Job Growth? Tracking SBA-backed Loans Through

Last week at a Town Hall Meeting in New Hampshire, President Obama stated that “we’re going to start where most new jobs start—with small businesses,” and he encouraged Congress to transfer $30 billion from the Troubled Asset Relief Program to a new program called the Small Business Lending Fund. As this proposal was unveiled, the Administrator of the U.S. Small Business Administration (SBA) Karen Mills sat directly behind the President, reflecting the fact that the Administration’s proposal is a vote of confidence in the SBA and its existing loan programs.

The central role proposed for the SBA invites questions about existing SBA loans made with Recovery Act funds. These loans can be tracked through, the official “user-friendly, public-facing website” that has evolved under the direction of the Recovery Accountability and Transparency Board, an agency created when the President signed into law the American Recovery and Reinvestment Act of 2009 (ARRA) on February 17, 2009.

Curious about how well works, I analyzed a stimulus loan to a business in Red Lodge, Montana, where I live. First I accessed “Agency Reported” data through, and then compared that information with what I could learn from field visits with the loan recipient and the community banker who made the loan.

What the drill-down map at tells you: According to the map available at the official website, a local business called “Sheep Mountain Feed” received an $81,000 loan through the Small Business Administration’s (SBA) “Rural Lender Advantage.”

What the drill-down map at doesn’t tell you: The official website does not specify how the loan proceeds were spent. Nor does the website explain if the $81,000 is the face value of the loan or the amount guaranteed by the SBA. For that matter, SBA’s role in making the loan is not clarified.

To learn more about these things, I called Sheep Mountain Feed and arranged a visit with the owner, a woman named Deb Padget who, before opening the store, had ranched 2,000 head of bison. I also met with the local banker who arranged the loan (the SBA relies on lenders to make the loans it guarantees), and an SBA employee based in Helena Montana. And for background I reviewed the June 8, 2009 Federal Register Notice relating to SBA’s temporary 90% guarantee (thanks to Princeton’s Fed Thread project).

Sheep Mountain Feed is a retail store catering to animal farmers and pet owners that sells animal feed, electric fencing, baby chicks, and other odds and ends such as buckets and horseshoes sold at any rural animal store. When Deb decided to buy the business in April of 2009, she had managed the retail store for three years, and she wanted to make some changes. Without abandoning the “large-animal” owners who had built the feed business, she saw an opportunity to focus more on pet owners. “Everybody in Red Lodge has a dog,” she told me. “Not everybody has a horse.”

She would need to buy pet supplies to take things in this new direction, and she would also need money to buy the business and remodel the interior of the store. This is how she spent the loan proceeds that she eventually received—buying and remodeling Sheep Mountain Feed, and purchasing inventory. However, the first bank she visited rejected her within ten minutes. At the second bank she tried out, she met with local loan officer and learned quickly that he was also from a North Dakota farming family. Here she got a warmer welcome, and was told that her timing was good: In March 2009, about one month before Deb’s visit, the SBA received $730 million in funding from the ARRA to offer increased loan guarantees and the temporary elimination of loan fees.

To get this “stimulus loan” Deb would need to submit a business plan with her loan application, but she’d never before needed a business plan and didn’t even have an executive summary. She was sent to an SBA employee in Billings for free counseling, and this employee helped Deb to prepare a business plan from scratch. (At one point, in order to develop Deb’s financial projections, the SBA contact called her own dog-groomer to find out about the going-rate for grooming sessions in Billings).

The U.S. Small Business Administration (SBA) was created in 1953 as an independent agency of the federal government to help people start and grow businesses. Even without the stimulus money, SBA’s so-called 7(a) loan program guarantees up to 85% of a qualifying loan made to a local business through a local bank. The guarantee is designed to induce local banks to lend more into the community by removing most of the risk of default. And as previously mentioned, in early 2009 the SBA received Recovery money to guarantee up to 90% of 7(a) loans. This is the kind of loan that Deb received.

In addition to subsidizing SBA’s temporary 90 percent guarantee, the Recovery Act also allowed SBA to temporarily waive certain fees that it charges. Usually the agency collects fees equal to three percent of the loan’s face value to cover delinquencies. Lenders and borrowers pay these fees. In this case, the community bank that made the loan and Deb would have had to pay $2,790 just to close the deal. We know this because the breakdown of the loan to Sheep Mountain Feed at shows an “original subsidy cost” of $2,790. By studying the data at USASpending, and interviewing offline sources, it also emerged that $81,000 is the amount guaranteed by the SBA (Sheep Mountain Feed got $90,000).

The takeaway from this study is that provides good data, but not always enough context (e.g. an explanation of SBA’s role) to understand the data. Yet in the absence of, even learning that Sheep Mountain Feed received a government-backed loan would have been difficult, so the official website is a helpful starting point for people motivated to track stimulus money.

By disseminating information about a Montana-based loan to citizens in every state, including citizens not predisposed to support any specific local project, provides the public with information about what the government is doing and invites feedback. How the government processes this feedback—and in general takes advantage of the insight of people inside and outside the Federal government—is an open question, but at least the Recovery Board is on it, and now it’s also the focus of a working group (pursuant to OMB’s December 8, 2009 Open Government Directive).

In that spirit, here are a few suggestions for making more useful to people trying to track SBA-backed stimulus loans.

(1) Create web links to the SBA website where the agency explains how the standard and stimulus-enriched 7(a) loan program works (SBA itself does not make loans, but instead guarantees a portion of loans made and administered by banks);

(2) Create links to the Small Business Act (15 U.S.C. § 636, as amended), the relevant provisions of the American Recovery and Reinvestment Act of 2009 affecting the SBA, (ARRA, P. L. 111-5, §§501-502), and the provisions of the Department of Defense Appropriations Act, 2010 that extend the stimulus-enriched SBA program through the end of February 2010;

(3) Establish links from to, particularly targeted links showing the source of the stimulus loan information. does explain that “Agency Reported Data” comes from three sources, including, but there are no links from stimulus projects to USASpending.

This project was more about than the SBA, but listening to President Obama urge the creation of a Small Business Lending Fund because it “will help small banks do even more of what our economy needs – and that’s ensure that small businesses are once again the engine of job growth in America,” there was the obvious question about the $90,000 loan to Sheep Mountain Feed: Would it create or retain any jobs? I put this question to Deb. She said that the loan “created” one full-time job, her job running the business. She’s also employing a dog-groomer part-time, and another part-time employee (a student) who works on weekends. Getting these facts is easier than knowing if the full $90,000 loan to Sheep Mountain Feed should be credited to the Recovery Act. Would the business have received the loan anyway, even without SBA’s extra 5% guarantee and the temporary elimination of $2,790.00 in fees? The only sure thing is that estimating the employment impact of the Recovery Act is complicated (it was the subject of a recent OMB Guidance Memorandum). That’s something everybody can agree on.