March 19, 2024

Content Moderation for End-to-End Encrypted Messaging

Thursday evening, the Attorney General, the Acting Homeland Security Secretary, and top law enforcement officials from the U.K. and Australia sent an open letter to Mark Zuckerberg. The letter emphasizes the scourge of child abuse content online, and the officials call on Facebook to press pause on end-to-end encryption for its messaging platforms.

The letter arrived the same week as a widely shared New York Times article, describing how reports of child abuse content are multiplying. The article provides a heartbreaking account of how the National Center for Missing and Exploited Children (NCMEC) and law enforcement agencies are overburdened and under-resourced in addressing horrible crimes against children.

Much of the public discussion about content moderation and end-to-end encryption over the past week has appeared to reflect two common technical assumptions:

  1. Content moderation is fundamentally incompatible with end-to-end encrypted messaging.
  2. Enabling content moderation for end-to-end encrypted messaging fundamentally poses the same challenges as enabling law enforcement access to message content.

In a new discussion paper, I provide a technical clarification for each of these points.

  1. Forms of content moderation may be compatible with end-to-end encrypted messaging, without compromising important security principles or undermining policy values.
  2. Enabling content moderation for end-to-end encrypted messaging is a different problem from enabling law enforcement access to message content. The problems involve different technical properties, different spaces of possible designs, and different information security and public policy implications.

I aim to demonstrate these clarifications by formalizing specific content moderation properties for end-to-end encrypted messaging, then offering at least one possible protocol design for each property.

  • User Reporting: If a user receives a message that he or she believes contains harmful content, can the user report that message to the service provider?
  • Known Content Detection: Can the service provider automatically detect when a user shares content that has previously been labeled as harmful?
  • Classifier-based Content Detection: Can the service provider detect when a user shares new content that has not been previously identified as harmful, but that an automated classifier predicts may be harmful?
  • Content Tracing: If the service provider identifies a message that contains harmful content, and the message has been forwarded by a sequence of users, can the service provider trace which users forwarded the message?
  • Popular Content Collection: Can the service provider curate a set of content that has been shared by a large number of users, without knowing which users shared the content?

The discussion paper is inherently preliminary and an agenda for further interdisciplinary research (including my own). I am not yet prepared to normatively advocate for or against the protocol designs that I describe. I am not claiming that these concepts provide sufficient content moderation capabilities, the same content moderation capabilities as current systems, or sufficient robustness against evasion. I am also not claiming that these designs adequately address information security risks or public policy values, such as free speech, international human rights, or economic competitiveness.

I do not know if there is a viable path forward for content moderation and end-to-end encrypted messaging that will be acceptable to technology platforms, law enforcement, NCMEC, civil society groups, information security experts, and other stakeholders. I do have confidence that, if such a path exists, we will only find it through open research and dialogue.