December 5, 2024

SESTA May Encourage the Adoption of Broken Automated Filtering Technologies

The Senate is currently considering the Stop Enabling Sex Traffickers Act (SESTA, S. 1693), with a scheduled hearing tomorrow. In brief, the proposed legislation threatens to roll back aspects of Section 230 of the Communications Decency Act (CDA), which relieve content providers, or so-called “intermediaries” (e.g., Google, Facebook, Twitter) of liability for the content that is hosted on their platforms. Section 230 protects these platforms from prosecution in federal civil or state courts for the activities of their customers.

One of the corollaries of SESTA is that, with increased liability, content providers might feel compelled to rely more on automated classification filters and algorithms to detect and eliminate unwanted content on the Internet. Having spent more than ten years on developing these types of classifiers to detect “unwanted traffic” ranging from spam to phishing attacks to botnets, I am deeply familiar with the potential—and limitations—of automated filtering algorithms for identifying such content. Existing algorithms can be effective for detecting and predicting certain types of “unwanted traffic”—most notably, attack traffic—but the current approaches to detecting unwanted speech fall far short of being able to reliably detect illegal speech.

Content filters are inaccurate. Notably, the oft-referenced technologies for detecting illegal speech or imagery (e.g., PhotoDNA, EchoPrint), rely on matching content that is posted online against a corpus of content that is known to contain illegal content (e.g., text, audio, imagery). Unfortunately, because these technologies rely on analyzing the content of the posted material. the potential for false positives (i.e., mistakenly identifying innocuous content as illegal) and false negatives (i.e., failing to detect illegal content entirely) are both possible. The network security community has been through this scenario before, in the context of spam filtering: years ago, spam filters would analyze the text of messages to determine whether a particular message was legitimate or spam; it wasn’t long before spammers developed tens of thousands of ways to spell “Rolex” and “Viagra” to evade these filters. They also came up with other creative ways to evade them, by stuffing messages with Shakespeare, and delivering their messages through a variety of formats, ranging from compressed audio to images to spreadsheets.

In short, content-based filters have largely failed to keep up with the agility of spammers.  Evaluation of EchoPrint, for example, suggests that false positive rates are far too high to be used in an automated filtering context. Depending on the length of the file and the type of encoding, error rates are around 1–2 %, where an error could either be a false negative or a false positive. On the other hand, when we were working on spam filters, our discussions with online email service providers suggested that any spam filtering algorithm whose false positive rate exceeded 0.01% would be far too high to avoid raising free speech questions and concerns. In other words, some of the existing automated fingerprinting services that providers might rely on as a result of SESTA might have false positive rates that are many orders of magnitude greater than might otherwise be considered acceptable. We have written extensively about the limitations of these automated filters in the context of copyright.

Content filters cannot identify context. Similarly, today, users who post content online have many tools at their disposal to evade the relatively brittle content-based filters. Detecting unwanted or illegal content on intermediary platforms is even more challenging. Instead of simply classifying unwanted email traffic such as spam (which are typically readily apparent, as they have links to advertisers, phishing sites, and so forth), the challenge on intermediary platforms entails detecting copyright, hate speech, terrorist speech, sex trafficking, and so forth. Yet, simply detecting the presence of something that matches content in a database cannot evaluate considerations fair use, parody, or coercion. Relying on automated content filters will not only produce mistakes in classifying content, but also these filters have no hope of classifying context.

A possible step forward: Classifiers based on network traffic and sending pattens. About ten years ago, we realized the failure of content filters and began exploring how network traffic patterns might produce a stronger signal for illicit activity. We observed that while it was fairly easy for a spammer to change the content of a message it was potentially much more costly for a spammer to change sending patterns, such as email volumes and where messages were originating from and going to. We devised classifiers for email traffic that relied on “network-level features” that now form the basis of many online spam filters. I think there are several grand challenges that lie ahead in determining whether similar approaches could be used to identify unwanted or illegal posts on intermediary content platforms. For example, it might be the case that certain types of illegal speech are characterized by high volumes of re-tweets, short reply times, many instances of repeated messages, or some other feature that is characteristic of the traffic or the accounts that post those messages.

Unfortunately, the reality is that we are far from having automated filtering technology that can reliably detect a wide range of illegal content. Determining how and whether various types of illegal content could be identified remains an open research problem. To suggest that “Any start-up has access to low cost and virtually unlimited computing power and to advanced analytics, artificial intelligence and filtering software.”—a comment that was made in a recent letter to Congress on the question of SESTA—vastly overstates the current state of the art. The bottom line is that whether we can design automated filters to detect illegal content on today’s online platforms is an open research question. A potentially unwanted side effect of SESTA is that intermediaries might feel compelled to deploy these imperfect technologies on their platforms as a result of this law, for fear of liability—thus potentially resulting in over-blocking of legal, legitimate content while failing to effectively deter or prevent the illegal speech that can easily evade today’s content-based filters.

Summary: Automated filters are not “there yet”. Automated filters are often incapable of simply matching content against known offending content, typically because content-based filters are so easily evaded. An interesting question concerns whether other “signals”, such as network traffic and posting patterns, or other characteristics of user accounts (e.g., age of account, number and characteristics of followers) might help us identify illegal content of various types. But, much research is needed before we can comfortably say that these algorithms are even remotely effective at curbing illegal speech. And, even as we work to improve the effectiveness of these automated fingerprinting and filtering technologies, they will likely at best remain an aid that intermediaries might opt to use; I cannot foresee false positive rates ever reaching zero; by no means should we require intermediaries to use these algorithms and technologies in hopes that doing so will eliminate all illegal speech. Doing so would undoubtedly also curb legal and legitimate speech, even as we work to improve them.