March 20, 2019

Do Mobile News Alerts Undermine Media’s Role in Democracy? Madelyn Sanfilippo at CITP

Why do different people sometimes get different articles about the same event, sometimes from the same news provider? What might that mean for democracy?

Speaking at CITP today is Dr. Madelyn Rose Sanfilippo, a postdoctoral research associate here at CITP. Madelyn empirically studies the governance of sociotechnical systems, as well as outcomes, inequality, and consequences within these systems–through mixed method research design.

Today, Madelyn tells us about a large scale project with Yafit Lev-Aretz  to examine how push notifications and personalized distribution and consumption of news might influence readers and democracy. The project is funded by the Tow Center for Digital Journalism at Columbia University and the Knight Foundation.

Why Do Push Notification Matters for Democracy?

Americans’ trust in media have been diverging in recent years, even as society worries about the risks to democracy from echo chambers. Madelyn also tells us about changes in how Americans get their news.

Push notifications are one of those changes– news organizations that send alerts to people’s computers and to our mobile phones about news they think are important. And we get a lot of them. In 2017, Tow Center researcher Pete Brown found that people get almost one push notification per minute on their phones– interrupting us with news.

In 2017, 85% of Americans were getting news via their mobile devices, and while it’s not clear how many of that came from push notifications, mobile phones tend to come with news apps that have push notifications enabled by default.

When Madelyn and Yafit  started to analyze push notifications, they noticed something fascinating: the same publisher often pushes different headlines to different platforms. They also found that news publishers use language with less objectivity and more subjective, emotional content in those notifications.

Madelyn and Yafit especially wanted to know if media outlets covered breaking news differently based on political affiliation of their readers. Comparing notifications of disasters, gun violence, and terrorism, they found differences in the number of push notifications published by publishers with higher and lower affiliation. They also found differences in the machine-coded subjectivity and objectivity of how these publishers covered those stories.

Composite subjectivity of different sources (higher is more subjective)

Do Push Notifications Create Political Filter Bubbles?

Finally, Madelyn and Yafit wanted to know if the personalization of push notifications shaped what people might be aware of. First, Madelyn explains to us that personalization takes multiple forms:

  • Curation: sometimes which articles we see is curated by personalized algorithms (like Google News)
  • Sometimes the content itself is personalized, where two people see very different text even though they’re reading the same article

Together, they found that location based personalization is common. Madelyn tells us about three different notifications that NBC news sent to people the morning after the Democratic primary. Not only did national audiences get different notifications, but different cities received notes that mentioned Democrat and Republican candidates differently. Aside from midterms, Madelyn and her colleagues found out that sports news is often location-personalized.

Behavioral Personalization

Madelyn tells us that many news publishers also personalize news articles based on information about their readers, including their reading behavior and surveys. They found that some news publishers personalize messages based on what they consider to be a person’s reading level. They also found evidence that publishers tailor news based on personal information that they never provided to the publisher.

Governing News Personalization

How can we ensure that news publishers are serving democracy in the decisions that they make and the knowledge they contribute to society? In many publishers, decisions about the structure of news personalization are made by the business side of the organization.

Madelyn tells us about future research she hopes to do. She’s looking at the means available to news readers to manage these notifications as well as policy avenues for governing news personalization.

Madelyn also thanks her funders for supporting this collaboration with Yafit Lev-Aretz: the Knight Foundation and the Tow Center for Digital Journalism.

All the News That’s Fit to Change: Insights into a corpus of 2.5 million news headlines

[Thanks to Joel Reidenberg for encouraging this deeper dive into news headlines!]

There is no guarantee that a news headline you see online today will not change tomorrow, or even in the next hour, or will even be the same headlines your neighbor sees right now. For a real-life example of the type of change that can happen, consider this explosive headline from NBC News…

“Bernanke: More Execs Deserved Jail for Financial Crisis”

…contrasted with the much more subdued…

“Bernanke Thinks More Execs Should Have Been Investigated”

These headlines clearly suggest different stories, which is worrying because of the effect that headlines have on our perception of the news — a recent survey found that, “41 percent of Americans report that they watched, read, or heard any in-depth news stories, beyond the headlines, in the last week.”

As part of the Princeton Web Transparency and Accountability Project (WebTAP), we wanted to understand more about headlines. How often do news publishers change headlines on articles? Do variations offer different editorial slants on the same article? Are some variations ‘clickbait-y’?

To answer these questions we collected over ~1.5 million article links seen since June 1st, 2015 on 25 news sites’ front pages through the Internet Archive’s Wayback Machine. Some articles were linked to with more than one headline (at different times or on different parts of the page), so we ended up with a total of ~2.5 million headlines.[1] To clarify, we are defining headlines as the text linking to articles on the front page of news websites — we are not talking about headlines on the actual article pages themselves. Our corpus is available for download here. In this post we’ll share some preliminary research and outline further research questions.

 

One in four articles had more than one headline associated with it

We were limited in our analysis to how many snapshots of the news sites the Wayback Machine took. For the six months of data from 2015 especially, some of the less-popular news sites did not have as many daily snapshots as the more popular sites — the effect of this might suppress the measure of headline variation on less popular websites. Even so, we were able to capture many instances of articles with multiple headlines for each site we looked at.

 

Clickbait is common, and hasn’t changed much in the last year

We took a first pass at our data using an open source library to classify headlines as clickbait. The classifier was trained by the developer using Buzzfeed headlines as clickbait and New York Times headlines as non-clickbait, so it can more accurately be called a Buzzfeed classifier. Unsurprisingly then, Buzzfeed had the most clickbait headlines detected of the sites we looked at.

But we also discovered that more “traditional” news outlets regularly use clickbait headlines too. The Wall Street Journal, for instance, has used clickbait headlines in place of more traditional headlines for its news stories, as in two variations they tried for an article on the IRS:

‘Think Tax Time Is Tough? Try Being at the IRS’

vs.

‘Wait Times Are Down, But IRS Still Faces Challenges’

Overall, we found that at least 10% of headlines were classified as clickbait on a majority of sites we looked at. We also found that overall, clickbait does not appear to be any more or less common now than it was in June 2015.

 

Using lexicon-based heuristics we were able to identify many instances of bias in headlines

Identifying bias in headlines is a much harder problem than finding clickbait. One research group from Stanford approached detecting bias as a machine learning problem — they trained a classifier to recognize when Wikipedia edits did or did not reflect a neutral point of view, as identified by thousands of human Wikipedia editors. While Wikipedia edits and headlines differ in some pretty important ways, using their feature set was informative. They developed a lexicon of suspect words, curated from decades of research on biased language. Consider the use of the root word “accuse,” as in this example we found from Time Magazine:

‘Roger Ailes Resigns From Fox News’

vs.

‘Roger Ailes Resigns From Fox News Amid Sexual Harassment Accusations’

The first headline just offers the “who” and “what” of the news story — the second headline’s “accusations” add the much more attention-grabbing “why.” Some language use is more subtle, like in this example from Fox News:

‘DNC reportedly accuses Sanders campaign of improperly accessing Clinton voter data’

vs.

‘DNC reportedly punishes Sanders campaign for accessing Clinton voter data’

The facts implied by these headlines are different in a very important way. The second headline, unlike the first, can cause a reader presuppose that the Sanders campaign did do something wrong or malicious, since they are being punished. The first headline hedges the story significantly, only saying that the Sanders campaign may have done something “improper” — the truth of that proposition is not suggested. The researchers identify this as a bias of entailment.

Using a modified version of the biased-language lexicon, we looked at our own corpus of headlines and identified when headline variations added or dropped these biased words. We found approximately 3000 articles in which headline variations for the same article used different biased words, which you can look at here. From our data collection we clearly have evidence of editorial bias playing a role in the different headlines we see on news sites.

 

Detecting all instances of bias and avoiding false positives is an open research problem

While identifying bias in 3000 articles’ headlines is a start, we think we’ve identified only a fraction of biased articles. One reason for missing bias is that our heuristic defines differential bias narrowly (explicit use of biased words in one headline not present in another). There are also false positives in the headlines that we detected as biased. For instance, an allegation or an accusation might show a lack of neutrality in a Wikipedia article, but in a news story an allegation or accusation may simply be the story.

We know for sure that our data contains evidence of editorial and biased variations in headlines, but we still have a long way to go. We would like to be able to identify at scale and with high confidence when a news outlet experiments with its headlines. But there are many obstacles compared to the previous work on identifying bias in Wikipedia edits:

– Without clear guidelines, finding bias in headlines is a more subjective exercise than finding it in Wikipedia articles.

– Headlines are more information-dense than Wikipedia articles — fewer words in headlines contribute to a headline’s implication.

– Many of the stories that the news publishes are necessarily more political than most Wikipedia articles.

If you have any ideas on how to overcome these obstacles, we invite you to reach out to us or take a look at the data yourself, available for download here.

@dillonthehuman

[1] Our main measurements ignore subpages like nytimes.com/pages/politics, which appear to often have article links that are at some point featured on the front page. For each snapshot of the front page, we collected the links to articles seen on the page along with the ‘anchor text’ of those links, which are generally the headlines that are being varied.

Sloppy Reporting on the "University Personal Records" Data Breach by the New York Times Bits Blog

This morning I ran across a distressing headline while perusing my RSS feeds. The New York Times’ Bits Blog proclaimed that, “Hackers Breach 53 Universities and Dump Thousands of Personal Records Online.” I clicked, and was informed that:

Hackers published online Monday thousands of personal records from 53 universities, including Harvard, Stanford, Cornell, Princeton, Johns Hopkins, the University of Zurich and other universities around the world.

I stifled the instinct to do a spit-take with my morning cup of coffee.

[Read more…]