December 9, 2022

New Study Analyzing Political Advertising on Facebook, Google, and TikTok

By Orestis Papakyriakopoulos, Christelle Tessono, Arvind Narayanan, Mihir Kshirsagar

With the 2022 midterm elections in the United States fast approaching, political campaigns are poised to spend heavily to influence prospective voters through digital advertising. Online platforms such as Facebook, Google, and TikTok will play an important role in distributing that content. But our new study – How Algorithms Shape the Distribution of Political Advertising: Case Studies of Facebook, Google, and TikTok — that will appear in the Artificial Intelligence, Ethics, and Society conference in August, shows that the platforms’ tools for voluntary disclosures about political ads do not provide the necessary transparency the public needs. More details can also be found on our website: campaigndisclosures.princeton.edu.

Our paper conducts the first large-scale analysis of public data from the 2020 presidential election cycle to critically evaluate how online platforms affect the distribution of political advertisements. We analyzed a dataset containing over 800,000 ads about the 2020 U.S. presidential election that ran in the 2 months prior to the election, which we obtained from the ad libraries of Facebook and Google that were created by the companies to offer more transparency about political ads. We also collected and analyzed 2.5 million TikTok videos from the same time period. These ad libraries were created by the platforms in an attempt to stave off potential regulation such as the Honest Ads Act, which sought to impose greater transparency requirements for platforms carrying political ads. But our study shows that these ad libraries fall woefully short of their own objectives to be more transparent about who pays for the ads and who sees the ads, as well the objectives of bringing greater transparency about the role of online platforms in shaping the distribution of political advertising. 

We developed a three-part evaluative framework to assess the platform disclosures: 

1. Do the disclosures meet the platforms’ self-described objective of making political advertisers accountable?

2. How do the platforms’ disclosures compare against what the law requires for radio and television broadcasters?

3. Do the platforms disclose all that they know about the ad targeting criteria, the audience for the ads, and how their algorithms distribute or moderate content?

Our analysis shows that the ad libraries do not meet any of the objectives. First, the ad libraries only have partial disclosures of audience characteristics and targeting parameters of placed political ads. But these disclosures do not allow us to understand how political advertisers reached prospective voters. For example, we compared ads in the ad libraries that were shown to different audiences with dummy ads that we created on the platforms (Figure 1). In many cases, we measured a significant difference between the calculated cost-per-impression between the two types of ads, which we could not explain with the available data.

  • Figure 1. We plot the generated cost per impression of ads in the ad-libraries that were (1) targeted to all genders & ages on Google, (2) to Females, between 25-34 on YouTube, (3) were seen by all genders & ages in the US on Facebook, and (4) only by females of all ages located in California on Facebook.  For Facebook, lower & upper bounds are provided for the impressions. For Google, lower & upper bounds are provided for cost & impressions, given the extensive “bucketing” of the parameters performed by the ad libraries when reporting them, which are denoted in the figures with boxes. Points represent the median value of the boxes. We compare the generated cost-per impression of ads with the cost-per impression of a set of dummy ads we placed on the platforms with the exact same targeting parameters & audience characteristics. Black lines represent the upper and lower boundaries of an ad’s cost-per-impression as we extracted them from the dummy ads. We label an ad placement as “plausible targeting”, when the ad cost-per-impression overlaps with the one we calculated, denoting that we can assume that the ad library provides all relevant targeting parameters/audience characteristics about an ad.  Similarly, an placement labeled as `”unexplainable targeting’”  represents an ad whose cost-per-impression is outside the upper and lower reach values that we calculated, meaning that potentially platforms do not disclose full information about the distribution of the ad.

Second, broadcasters are required to offer advertising space at the same price to political advertisers as they do to commercial advertisers. But we find that the platforms charged campaigns different prices for distributing ads. For example, on average, the Trump campaign on Facebook paid more per impression (~18 impressions/dollar) compared to the Biden campaign (~27 impressions/dollar). On Google, the Biden campaign paid more per impression compared to the Trump campaign. Unfortunately, while we attempted to control for factors that might account for different prices for different audiences, the data does not allow us to probe the precise reason for the differential pricing. 

Third, the platforms do not disclose the detailed information about the audience characteristics that they make available to advertisers. They also do not explain how the algorithms distribute or moderate the ads. For example, we see that campaigns placed ads on Facebook that were not ostensibly targeted by age, but the ad was not distributed uniformly.  We also find that platforms applied their ad moderation policies inconsistently, with some instances of moderated ads being removed and some others not, and without any explanation for the decision to remove an ad. (Figure 2) 

  • Figure 2. Comparison of different instances of moderated ads across platforms. The light blue bars show how many instances of a single ad were moderated, and maroon bars show how many instances of the same ad were not. Results suggests an inconsistent moderation of content across platforms, with some instances of the same ad being removed and some others not.

Finally, we observed new forms of political advertising that are not captured in the ad libraries. Specifically, campaigns appear to have used influencers to promote their messages without adequate disclosure. For example, on TikTok, we document how political influencers, who were often linked with PACs, generated billions of impressions from their political content. This new type of campaigning still remains unregulated and little is known about the practices and relations between influencers and political campaigns.  

In short, the online platform self-regulatory disclosures are inadequate and we need more comprehensive disclosures from platforms to understand their role in the political process. Our key recommendations include:

– Requiring that each political entity registered with the FEC use a single, universal identifier for campaign spending across platforms to allow the public to track their activity.

– Developing a cross-platform data repository, hosted and maintained by a government or independent entity, that collects political ads, their targeting criteria, and the audience characteristics that received them. 

– Requiring platforms to disclose information that will allow the public to understand how the algorithms distribute content and how platforms price the distribution of political ads. 

– Developing a comprehensive definition of political advertising that includes influencers and other forms of paid promotional activity.

Holding Purveyors of “Dark Patterns” for Online Travel Bookings Accountable

Last week, my former colleagues at the New York Attorney General’s Office (NYAG), scored a $2.6 million settlement with Fareportal – a large online travel agency that used deceptive practices, known as “dark patterns,” to manipulate consumers to book online travel.

The investigation exposes how Fareportal, which operates under several brands, including CheapOair and OneTravel — used a series of deceptive design tricks to pressure consumers to buy tickets for flights, hotels, and other travel purchases. In this post, I share the details of the investigation’s findings and use them to highlight why we need further regulatory intervention to prevent similar conduct from becoming entrenched in other online services.

The NYAG investigation picks up on the work of researchers at Princeton’s CITP that exposed the widespread use of dark patterns on shopping websites. Using the framework we developed in a subsequent paper for defining dark patterns, the investigation reveals how the travel agency weaponized common cognitive biases to take advantage of consumers. The company was charged under the Attorney General’s broad authority to prohibit deceptive acts and practices. In addition to paying $2.6 million, the New York City-based company agreed to reform its practices.

Specifically, the investigation documents how Fareportal exploited the scarcity bias by displaying, next to the top two flight search results, a false and misleading message about the number of tickets left for those flights at the advertised price. It manipulated consumers through adding 1 to the number of tickets the consumer had searched for to show that there were only X+1 tickets left at that price. So, if you searched for one round trip ticket from Philadelphia to Chicago, the site would say “Only 2 tickets left” at that price, while a consumer searching for two such tickets would see a message stating “Only 3 tickets left” at the advertised price. 

In 2019, Fareportal added a design feature that exploited the bandwagon effect by displaying how many other people were looking at the same deal. The site used a computer-generated random number between 28 and 45 to show the number of other people “looking” at the flight. It paired this with a false countdown timer that displayed an arbitrary number that was unrelated to the availability of tickets. 

Similarly, Fareportal exported its misleading tactics to the making of hotel bookings on its mobile apps. The apps misrepresented the percentage of rooms shown that were “reserved” by using a computer-generated number keyed to when the customer was trying to book a room. So, for example, if the check-in date was 16-30 days away, the message would indicate that between 41-70% of the hotel rooms were booked, but if it was less than 7 days away, it showed that 81-99% of the rooms were reserved. But, of course, those percentages were pure fiction. The apps used a similar tactic for displaying the number of people “viewing” hotels in the area. This time, they generated the number based on the nightly rate for the fifth hotel returned in the search by using the difference between the numerical value of the dollar figure and the numerical value of the cents figure. (If the rate was $255.63, consumers were told 192 people were viewing the hotel listings in the area.)

Fareportal used these false scarcity indicators across its websites and mobile platforms for pitching products such as travel protection and seat upgrades, through inaccurately representing how many other consumers that had purchased the product in question. 

In addition, the NYAG charged Fareportal with using a pressure tactic of making consumers accept or decline purchase a travel protection policy to “protect the cost of [their] trip” before completing a purchase. This practice is described in the academic literature as a covert pattern that uses “confirmshaming” and “forced action” to influence choices. 

Finally, the NYAG took issue with how Fareportal manipulated price comparisons to suggest it was offering tickets at a discounted price, when in fact, most of the advertised tickets were never offered for sale at the higher comparison price. The NYAG rejected Fareportal’s attempt to use a small pop-up to cure the false impression conveyed by the visual slash-through image that conveyed the discount. Similarly, the NYAG called out how Fareportal hid its service fees by disguising them as being part of the “Base Price” of the ticket rather than the separate line item for “Taxes and Fees.” These tactics are described in the academic literature as using “misdirection” and “information hiding” to influence consumers. 


The findings from this investigation illustrate why dark patterns are not simply aggressive marketing practices, as some commentators contend, but require regulatory intervention. Specifically, such shady practices are difficult for consumers to spot and to avoid, and, as we argued, risk becoming entrenched across different travel sites who have the incentive to adopt similar practices. As a result, Fareportal, unfortunately, will not be the first or the last online service to deploy such tactics. But this creates an opportunity for researchers, consumer advocates, and design whistleblowers to step forward and spotlight such practices to protect consumers and help create a more trustworthy internet.    

When Terms of Service limit disclosure of affiliate marketing

By Arunesh Mathur, Arvind Narayanan and Marshini Chetty

In a recent paper, we analyzed affiliate marketing on YouTube and Pinterest. We found that on both platforms, only about 10% of all content with affiliate links is disclosed to users as required by the FTC’s endorsement guidelines.

One way to improve the situation is for affiliate marketing companies (and other “influencer” agencies) to hold their registered content creators to the FTC’s endorsement guidelines. To better understand affiliate marketing companies’ current practices, we examined the terms and conditions of eleven of the most common affiliate marketing companies in our dataset, and specifically noted whether they required content creators to disclose their affiliate content or whether they mentioned the FTC’s guidelines upon registration.

Affiliate program Requires disclosure?
AliExpress No
Amazon Yes
Apple No
Commission Junction No
Ebay Yes
Impact Radius No
Rakuten Marketing No
RewardStyle N/A
ShopStyle Yes
ShareASale No

The table above summarizes our findings. All the terms and conditions were accessed May 1, 2018 from the affiliate marketing companies’ websites. We did not hyperlink those terms and conditions that were not available publicly. All the companies that required disclosure also mentioned the FTC’s endorsement guidelines.

Out of the top 10 programs in our corpus, only 3 explicitly instructed their creators to disclose their affiliate links to their users. In all three cases (Amazon, Ebay, and ShopStyle), the companies called out the FTC’s endorsement guidelines. Of particular interest is Amazon’s affiliate marketing terms and conditions (Amazon was the largest affiliate marketing program in our dataset).

Amazon’s terms and conditions: When content creators sign up on Amazon’s website, they are bound by the programs terms and agreements Section 5 titled: “Identifying Yourself as an Associate”.

Figure 1: The disclosure requirement in Section 5 of Amazon’s terms and conditions document.

As seen in Figure 1, the terms of Section 5 do not explicitly mention the FTC’s endorsement guidelines but constrain participants to add only the following disclosure to their content: “As an Amazon Associate I earn from qualifying purchases”. In fact, the terms go so far as to warn users that “Except for this disclosure, you will not make any public communication with respect to this Agreement or your participation in the Associates Program”.

However, if participants click on the “Program Policies” link in the terms and conditions—which they are also bound to by virtue of agreeing to the terms and conditions—they are specifically asked to be responsible for the FTC’s endorsement guidelines (Figure 2): “For example, you will be solely responsible for… all applicable laws (including the US FTC Guides Concerning the Use of Endorsement and Testimonials in Advertising)…”. Here, Amazon asks the content creators to comply with the FTC’s guidelines, without exactly specifying how. It is important to note that the FTC’s guidelines themselves do not enforce any specific disclosure statement constraints on content creators, but rather suggest that content creators use clear and explanatory disclosures that convey the advertising relationship behind affiliate marketing to users.

Figure 2: The disclosure requirement from Amazon’s “Program Policies” page.

We learned about these clauses from the coverage of our paper on BBC’s You and Yours podcast (~ 16 mins in). A YouTuber on the show pointed out that he was constrained by the Amazon’s clause to not disclose anything about the affiliate program publicly.

Indeed, as we describe in the above sections, Amazon’s terms and conditions seem contradictory to their Program Policies. On the one hand, Amazon binds its participants to the FTC’s endorsement guidelines but on the other, Amazon severely constrains the disclosures content creators can make about their participation in the program.

Further, researchers are still figuring out which types of disclosures are effective from a user perspective. Content creators might want to adapt the form and content of disclosures based on the findings of such research and the affordances of the social platforms. For example, on YouTube, it might be best to call out the affiliate relationship in the video itself—when content creators urge participants to “check out the links in the description below”—rather than merely in the description. The rigid wording mandated by Amazon seemingly prevents such customization, and may not make the affiliate relationship adequately clear to users.

Affiliate marketing companies wield strong influence over the content creators that register with their programs, and can hold them accountable to ensure they disclose these advertising relationships in their content. At the very least, they should not make it harder to comply with applicable laws and regulations.

Refining the Concept of a Nutritional Label for Data and Models

By Julia Stoyanovich (Assistant Professor of Computer Science at Drexel University)  and Bill Howe (Associate Professor in the Information School at the University of Washington)

In August 2016,  Julia Stoyanovich and Ellen P. Goodman spoke in this forum about the importance of bringing interpretability to the algorithmic transparency debate.  They focused on algorithmic rankers, discussed the harms of opacity, and argued that the burden on making ranked outputs transparent rests with the producer of the ranking.   They went on to propose a “nutritional label” for rankings called Ranking Facts.

In this post, Julia Stoyanovich and Bill Howe discuss their recent technical progress on bringing the idea of Ranking Facts to life, placing the nutritional label metaphor in the broader context of the ongoing algorithmic accountability and transparency debate.

In 2016, we began with a specific type of nutritional label that focuses on algorithmic rankers.  We have since developed a Web-based Ranking Facts tool, which will be presented at the upcoming ACM SIGMOD 2018 conference.   

Figure 1: Ranking Facts on the CS departments datasetThe Ingredients widget (green) has been expanded to show the details of the attributes that strongly influence the ranking.  The Fairness widget (blue) has been expanded to show details of the fairness computation.

Figure 1 presents Ranking Facts for CS department rankings, the same dataset as was used for illustration in our August 2016 post.  The nutritional label was constructed automatically, and consists of a collection of visual widgets, each with an overview and a detailed view.  

  • Recipe widget succinctly describes the ranking algorithm. For example, for score-based ranker that uses a linear scoring formula to assign as score to each item, each attribute would be listed together with its weight.
  • Ingredients widget lists attributes most material to the ranked outcome, in order of importance. For example, for a linear model, this list could present the attributes with the highest learned weights.
  • Stability widget explains whether the ranking methodology is robust on this particular dataset – would small changes in the data, such as those due to uncertainty or noise, result in significant changes in the ranked order?  
  • Fairness and Diversity widgets quantify whether the ranked outcome exhibits parity (according to some measure – three such measures are presented in Figure 1), and whether the set of results is diverse with respect to one or several demographic characteristics.

What’s new about nutritional labels?

The database and cyberinfrastructure communities have been studying systems and standards for metadata, provenance, and transparency for decades.  For example, the First Provenance Challenge in 2008 led to the creation of the Open Provenance Model that standardized years of previous efforts across multiple communities,   We are now seeing renewed interest in these topics due to the proliferation of machine learning applications that use data opportunistically.  Several projects are emerging that explore this concept, including Dataset Nutrition Label at the Berkman Klein Center at Harvard & the MIT Media LabDatasheets for Datasets, and some emerging work about Data Statements for NLP datasets from Bender and Friedman.  In our work, we are interested in automating the creation of nutritional labels, for both datasets and models, and in providing open source tools for others to use in their projects.

Is a nutritional label simply an apt new name for an old idea?  We think not! We see nutritional labels as a unifying metaphor that is responsive to changes in how data is being used today.  

Datasets are now increasingly used to train models to make decisions once made by humans.  In these automated systems, biases in the data are propagated and amplified with no human in the loop.  The bias, and the effect of the bias on the quality of decisions made, is not easily detectable due to the relative opacity of the system.  As we have seen time and time again, models will appear to work well, but will silently and dangerously reinforce discrimination. Worse, these models will legitimize the bias — “the computer said so.”  So we are designing nutritional labels for data and models to respond specifically to the harms implied by these scenarios, in contrast to the more general concept of just “data about data.”

Use cases for nutritional labels: Enhancing data sharing in the public sector

Since we first began discussing nutritional labels in 2016, we’ve seen increased interest from  the public sector in scenarios where data sharing is considered high-risk. Nutritional labels can be used to support data sharing, while mitigating some of the associated risks. Consider these examples:

Algorithmic transparency law in New York City

New York City recently passed a law requiring that a task force be put in place to survey the current use of “automated decision systems,” defined as “computerized implementations of algorithms, including those derived from machine learning or other data processing or artificial intelligence techniques, which are used to make or assist in making decisions,” in City agencies.  The task force will develop a set of recommendations for enacting algorithmic transparency, which, as we argued in our testimony before the New York City Council Committee on Technology regarding Automated Processing of Data, cannot be achieved without data transparency. Nutritional labels can support data transparency and interpretability,  surfacing the statistical properties of a dataset, the methodology that was used to produce it, and, ultimately, substantiating the “fitness for use” of a dataset in the context of a specific automated decision system or task.

Addressing the opioid epidemic

An effective response to the opioid epidemic requires coordination between at least three sectors: health care, criminal justice, and emergency housing.  An optimization problem is to effectively, fairly and transparently assign resources, such as hospital rooms, jail cells, and shelter beds,  to at-risk citizens.  Yet, centralizing all data is disallowed by law, and solving the global optimization problem is therefore difficult. We’ve seen interest in nutritional labels to share the details of local resource allocation strategies, to help bootstrap a coordinated response without violating data sharing principles.  In this case the nutritional labels are shared separately from the datasets themselves.

Mitigating urban homelessness

With the Bill and Melinda Gates Foundation, we are integrating data about homeless families from multiple government agencies and non-profits to understand how different pathways through the network of services affect outcomes.  Ultimately, we are using machine learning to deliver prioritized recommendations to specific families. But the families and case workers need to understand how a particular recommendation was made, so they can in turn make an informed decision about whether to follow it.  For example, income levels, substance abuse issues, or health issues may all affect the recommendation, but only the families themselves know whether the information is reliable.

Sharing transportation data

At the University of Washington, we are developing the Transportation Data Collaborative, an honest broker system that can provide reports and research to policy makers while maintaining security and privacy for sensitive information about companies and individuals.  We are releasing nutritional labels for reports, models, and synthetic datasets that we produce to share known biases about the data and our methods of protecting privacy.

Properties of a nutritional label

To differentiate a nutritional label from more general forms of metadata, we articulate several properties:

  • Comprehensible: The label is not a complete (and therefore overwhelming) history of every processing step applied to produce the result.  This approach has its place and has been extensively studied in the literature on scientific workflows, but is unsuitable for the applications we target.  The information on a nutritional label must be short, simple, and clear.
  • Consultative: Nutritional labels should provide actionable information, rather than just descriptive metadata.  For example, universities may invest in research to improve their ranking, or consumers may cancel unused credit card accounts to improve their credit score.
  • Comparable: Nutritional labels enable comparisons between related products, implying a standard.
  • Concrete: The label must contain more than just general statements about the source of the data; such statements do not provide sufficient information to make technical decisions on whether or not to use the data.

Data and models are chained together into complex automated pipelines — computational systems “consume” datasets at least as often as people do, and therefore also require nutritional labels!  We articulate additional properties in this context:

  • Computable: Although primarily intended for human consumption, nutritional labels should be machine-readable to enable specific applications: data discovery, integration, automated warnings of potential misuse.  
  • Composable: Datasets are frequently integrated to construct training data; the nutritional labels must be similarly integratable.  In some situations, the composed label is simple to construct: the union of sources. In other cases, the biases may interact in complex ways: a group may be sufficiently represented in each source dataset, but underrepresented in their join.  
  • Concomitant: The label should be carried with the dataset; systems should be designed to propagate labels through processing steps, modifying the label as appropriate, and implementing the paradigm of transparency by design.

Going forward

We are interested in the application of nutritional labels at various stages in the data science lifecycle: Data scientists triage datasets for use to train their models; data practitioners inspect and validate trained models before deploying them in their domains; consumers review nutritional labels to understand how decisions that affect them were made and how to respond.  

The software infrastructure implied by nutritional labels suggests a number of open questions for the computer science community: Under what circumstances can nutritional labels be generated automatically for a given dataset or model? Can we automatically detect and report potential misuse of datasets or models, given the information in a nutritional label?  We’ve suggested that nutritional labels should be computable, composable, and concomitant — carried with the datasets to which they pertain; how can we design systems that accommodate these requirements?  

We look forward to opening these discussions with the database community at two upcoming events:  at ACM SIGMOD 2018, where we are organizing a special session on a technical research agenda in data ethics and responsible data management,  and at VLDB 2018, where we will run a debate on data and algorithmic ethics.

Is affiliate marketing disclosed to consumers on social media?

By Arunesh Mathur, Arvind Narayanan and Marshini Chetty

YouTube has millions of videos similar in spirit to this one:

The video reviews Blue Apron—an online grocery service—describing how it is efficient and cheaper than buying groceries at the store. The description of the video has a link to Blue Apron which gets you a $30 off your first order, a seemingly sweet offer.

The video’s description contains an affiliate link (marked in red).

What you might miss, though, is that the link in question is an “affiliate” link. Clicking on it takes you through five redirects courtesy of Impact—an affiliate marketing company—which tracks the subsequent sale and provide a kickback to the YouTuber, in this case Melea Johnson. YouTubers use affiliate marketing to monetize their channels and support their activities.

This example is not unique to YouTube or affiliate marketing. There are several marketing strategies that YouTubers, Instagrammers, and other content creators on social media (called influencers in marketing-speak) engage in to generate revenue: affiliate marketing, paid product placements, product giveaways, and social media contests.

Endorsement-based marketing is regulated. In the United States, the Federal Trade Commission requires that these endorsement-based marketing strategies be disclosed to end-users so they can give appropriate weightage to content creators’ endorsements. In 2017 alone, the FTC sent cease and desist letters to Instagram celebrities who were partnering with brands and reprimanded YouTubers with gaming channels who were endorsing gambling companies—all without appropriate disclosure. The need to ensure content creators disclose will likely become all the more important as advertisers and brands attempt to target consumers on consumers’ existing social networks, and as lack of disclosure causes harm to end-users.

Our research. In a paper that is set to appear at the 2018 IEEE Workshop on Consumer Protection in May, we conducted a study to better understand how content creators on social media disclose their relationships with advertisers to end-users. Specifically, we examined affiliate marketing disclosures—ones that need to accompany affiliate links—-which content creators placed along with their content, both on YouTube and Pinterest.

How we found affiliate links. To study this empirically, we gathered two large datasets consisting of nearly half a million YouTube videos and two million Pinterest pins. We then examined the description of the YouTube videos and the Pinterest pins to look for affiliate links. This was a challenging problem, since there is no comprehensive public repository of affiliate marketing companies and links.

However, affiliate links do contain predictable patterns, because they are designed to carry information about the specific content creator and merchant. For instance, an affiliate link to Amazon contains the tag URL parameter that carries the name of the creator who is set to make money from the sale. Using this insight, we created a database containing all sub-domains, paths and parameters that appeared with a given domain. We then examined this database and manually classified each entry either as affiliate or non-affiliate by searching for information about the organization owning that domain and sometimes even signing up as affiliates to validate our findings. Through this process, we compiled a list of 57 URL patterns from 33 affiliate marketing companies, the most comprehensive publicly available curated list of this kind (see Appendix in the paper, and GitHub repo).

How we scanned for disclosures. We could expect to find affiliate link disclosures either in the description of the videos or pins, during the course of the video, or on the pin’s image. We began our analysis by manually inspecting 20 randomly selected affiliate videos and pins, searching for any mention about the affiliate nature of the accompanying URLs. We found that none these videos or pins conveyed this information.

Instead, we turned our attention to inspecting the descriptions of the videos and pins. Given that any sentence (or phrase) could contain a disclosure, we first parsed descriptions into sentences using automated methods. We then clustered these sentences using hierarchical clustering, and manually identified the clusters of sentences that represented disclosure wording.

What we found. Of all the YouTube videos and Pinterest pins that contained affiliate links, only ~10% and ~7% respectively contained accompanying disclosures. When these disclosures were present, we could classify them into three types:

  • Affiliate link disclosures: The first type of disclosures simply stated that the link was an “affiliate link”, or that “affiliate links were included”. On YouTube and Pinterest these type of disclosures were present on ~7% and 4.5% of all affiliate videos and pins respectively.
  • Explanation disclosures: The second type of disclosures attempted to explain what an affiliate link was, on the lines of “This is an affiliate link and I receive a commission for the sales”. These disclosures—which are of the type the FTC expects in its guidelines—only appeared ~2% each of all affiliate videos and pins.
  • Support channel disclosures: Finally, the third type of disclosures—exclusive to YouTube—told users that they would be supporting the channel by clicking on the links in the description (without exactly specifying how). These disclosures were present in about 2.5% of all affiliate videos.

In the paper, we present additional findings, including how the disclosures varied by content type, and compare the engagement metrics of affiliate and non-affiliate content.

Cause for concern. Our results paint a bleak picture: the vast majority of affiliate content on both platforms has no accompanying disclosures. Worse, Affiliate link disclosures—ones that the FTC specifically advocates against using—were the most prevalent. In future work, we hope to investigate the reason behind this lack of disclosure. Is it because the affiliates are unaware that they need to disclose? How aware are they of the FTC’s specific guidelines?

Further, we are concluding a user study that examines the efficacy of these disclosures as they exist today: Do users think of affiliate content as an endorsement by the content creator? Do users notice the accompanying disclosures? What do the disclosures communicate to users?

What can be done? Our results also provide several starting points for improvement by various stakeholders in the affiliate marketing industry. For instance, social media platforms can do a lot more to ensure content creators disclose their relationships with advertisers to end-users, and that end-users understand the relationship. Recently, YouTube and Instagram have taken steps in this direction, releasing tools that enable disclosures, but it’s unlikely that any one type of disclosure will cover all marketing practices.

Similarly, affiliate marketing companies can hold their registered content creators accountable to better standards. On examining the affiliate terms and conditions of the eight most common affiliate marketing companies in our dataset, we noted only two explicitly pointed to the FTC’s guidelines.

Finally, we argue that web browsers can do more in helping users identify disclosures by means of automated detection of these disclosures and content that needs to be disclosed. Machine learning and natural language processing techniques can be of particular help in designing tools that enable such automatic analyses. We are working towards building a browser extension that can detect, present and explain these disclosures to end-users.