September 24, 2018

Is It Time for an Data Sharing Clearinghouse for Internet Researchers?

Today’s Senate hearing with Facebook’s Mark Zuckerberg will start a long discussion on data collection and privacy from Internet companies. Although the spotlight is currently on Facebook, we shouldn’t forget that the picture is  broader: companies from device manufacturers to ISPs collect network traffic and use it for a variety of purposes.

The uses that we will hear about today are largely about the widespread collection of data about Internet users for targeted content delivery and advertising.  Meanwhile, yesterday Facebook announced an initiative to share data with independent researchers to study social media’s impact on elections. At the same time Facebook is being raked over the coals for sharing their data with “researchers” (Cambridge Analytica), they’ve announced a program to share their data with (presumably more “legitimate”) researchers.

Internet researchers depend on data. Sometimes, we can gather the data ourselves, using measurement tools deployed at the edge of the Internet (e.g., in home networks, on phones). In other cases, we need data from the companies that operate parts of the Internet, such as an Internet service provider (ISP), an Internet registrar, or an application provider (e.g., Facebook).

  • If incentives align, data flows to the researcher. Interacting with a company can work very well when goals are aligned. I’ve worked well with companies to develop new spam filtering algorithms, to develop new botnet detection algorithms, and to highlight empirical results that have informed policy debates.
  • If incentives do not align, then the researcher probably won’t get the data.  When research is purely technical, incentives often align. When the technical work crosses over into policy (as it does in areas like net neutrality, and as we are seeing with Facebook), there can be (insurmountable) hurdles to data access.

How an Internet Researcher Gets Data Today

How do Internet researchers get data from companies today? An Internet operator I know aptly characterizes the status quo:

“Show Internet operators you can do something useful, and they’ll give you data.”

Researchers get access to Internet data from companies in two ways: (1) working for the company (as an employee), or (2) working with the company (as an “independent” researcher).

Option #1: Work for a Company.

Working for a company offers privileged access to data, which can be used to mint impressive papers (irreproducibility aside) simply because nobody else has the same data. I have taken this approach myself on a number of occasions, having worked for an ISP (AT&T), a DNS company (Verisign), and an Internet security service provider (Damballa).

How this approach works. In the 2000s, research labs at AT&T and Sprint had privileged access to data, which gave rise to a proliferation of papers on “Everything You Wanted to Know About the Internet Backbone But Were Afraid to Ask”.  Today, the story repeats itself, except that the players are Google and Facebook, and the topic du jour is data-center networks.

Shortcomings of This Approach. Much research—from projects with a longer arc to certain policy-oriented questions—would never come to light if we only relied on company employees to do it. By the nature of their work, however, company employees lack independence. They lack both autonomy of selecting problems and in the ability to take positions or publish results that run counter to the company’s goals or priorities. This shortcoming may not matter if what the researcher wants to work on and what the company want to accomplish are the same. For many technical problems, this is the case (although there is still the tendency for the technical community to develop tunnel vision around areas where there is an abundance of data, while neglecting other areas). But for many problems—ranging from problems with a longer arc to deployment to those that may run counter to priorities—we can’t rely on industry to do the work.

#2: Work with a Company. 

How this approach works. A researcher may instead work with a company, typically gaining privileged access to data for a particular project. Sometimes, we demonstrate the promise of a technique with some data that we can gather or bootstrap without any help and use that initial study to pique the interest of a company who may then share data with us to further develop the idea.

Shortcomings of this approach. Research done in collaboration with a company often has similar shortcomings as the research that is done within a company’s walls. If the results of the research align with the company’s perspectives and viewpoints, then data sharing is copacetic. Even these cooperative settings do pose some risks to researchers, who may create the perception that they are not independent, merely by their association with the company. With purely technical research risks are lower, though still non-zero: for example, because the work depends on privileged data access, the researcher may still face challenges in presenting the research in a way that could help others reproduce it in the future.

With technical work that can inform or speak to policy questions, there are some concerns. First, certain types of research or results may never come to light—if a company doesn’t like the result that may result from the data analysis, then they may simply not share the data, or they may ask for “pre-publication review” for results based on that data (this practice is common for research that is conducted within companies as well). There is also a second, more subtle concern. Even when the work is technically watertight, a researcher can still face questions—fair or unfair—about the soundness of the work due to the perceived motivations or agendas of cooperative parties involved.

Current Data Sharing Approaches are Good, But They are Not Sufficient

The above methods for data sharing can work well for certain types of research. In my career, I have made hay playing by these rules—often working with a company, first by demonstrating the viability of an idea with a smaller dataset that we gather ourselves and “pitching” the idea to a company.

Yet, in my experience these approaches have two shortcomings. The first relates to incentives. The second relates to privacy.

Problem #1: Incentives.

Certain types of work depend on access to Internet data, but the company who holds the data may not have a direct incentive to facilitate the research. Possible studies of Facebook’s effect on elections certainly fall into this category: They simply may not like the results of the research.

But, there are plenty of other lines of research that fall into the category where incentives may not align. Other examples range from measurements of Internet capacity and performance as they relate to broadband regulation (e.g., net neutrality) to evaluation of an online platform’s content moderation algorithms and techniques. Lots of other work relating to consumer protection falls into this category as well. We have to rely on users and researchers measuring things at the edge of the network to figure out what’s going on; from this vantage point, certain activities may naturally slip under the radar more easily.

The current Internet data sharing roadmap doesn’t paint a rosy picture for research where incentives don’t align. Even when incentives do align, there can be perceptions of “capture”—effectively shilling an intellectual or technical finding in exchange for data access.

It is in the interests of everyone—the academics and their industry partners alike—to establish more formal modes of data exchange when either (1) there is determination that the problem is important to study for the health of the Internet, or for the benefit of consumers; (2) there is the potential that the research will be perceived as not objective due to the nature of the data sharing agreement.

Problem #2: Privacy.

Sharing Internet data with researchers can introduce substantial privacy risks, and the need to share data with any researcher who works with a company should be evaluated carefully—ideally by an independent third party.

When helping develop the researcher exception to the FCC’s broadband privacy rules, I submitted a comment that proposed the following criteria for sharing ISP data with researchers:

  1. Purpose of research. The data satisfies research that aims to promote security, stability, and reliability of networks. The research should have clear benefits for Internet innovation, operations, or security.
  2. Research goals do not violate privacy. The goals of the research does not include compromising consumer privacy;
  3. Privacy risks of data sharing are offset by benefits of the research. The risks of the data exchange are offset by the benefits of the research;
  4. Privacy risks of the data sharing are mitigated. Researchers should strive to use de-­identified data wherever possible.
  5. The data adds value to the research. The research is enhanced by access to the data.

Yet, outlining the criteria is one thing. The thornier question (which we did not address!) is: Who gets to decide the answers?

Universities have institutional review boards that can help evaluate the merits of such a data sharing agreement. But, Cambridge Analytica might have the veneer of “research”, and a company may have no internal incentive to independently evaluate the data sharing agreement on its merits. In light of recent events, we may be headed towards the conclusion that such data-sharing agreements should always be vetted by independent third-party review. If the research doesn’t involve a university, however, the natural question is: Who is that third party?

Looking Ahead: Data Clearinghouses for Internet Data?

Certain types of Internet research—particularly those that involve thorny regulatory or policy questions—could benefit from an independent clearing house, where researchers could propose studies and experiments for independent evaluation and have them evaluated and selected by an independent third party, based on their benefits and risks. Facebook is exploring this avenue in the limited setting of election integrity. This is an exciting step.

Moving forward, it will be interesting to see how Facebook’s meta-experiment on data sharing plays out, and whether it—or some variant—can serve as a model for Internet data sharing for other types of work writ large. In purely technical areas, such a clearinghouse could allow a broader range of researchers to explore, evaluate, reproduce and extend the types of work that for now remains largely irreproducible because data is under lock and key. For these questions, there could be significant benefit to the scientific community. In areas where the technical work or data analysis informs policy questions, the benefits to consumers could be even greater.

Artificial Intelligence and the Future of Online Content Moderation

Yesterday in Berlin, I attended a workshop on the use of artificial intelligence in governing communication online, hosted by the Humboldt Institute for Internet and Society.

Context

In the United States and Europe, many platforms that host user content, such as Facebook, YouTube, and Twitter, have enjoyed safe harbor protections for the content they host, under laws such as Section 230 of the Communications Decency Act (CDA), the Digital Millenium Copyright Act (DMCA), and in Europe, Articles 12–15 of the eCommerce Directive. Some of these laws, such as the DMCA, provide immunity to platforms for copyright damages if the platforms remove content based on knowledge that it is unlawful. Section 230 of the CDA provides broad immunity to platforms, with the express goals of promoting economic development and free expression. Daphne Keller has a good summary of the legal landscape on intermediary liability.

Platforms are now facing increasing pressure to detect and remove illegal (and, in some cases, legal-but-objectionable) content. In the United States, for example, bills in the House and Senate would remove safe harbor protection for platforms that do not remove illegal content related to sex trafficking. The European Union has also considering laws that would limit the immunity of platforms who do not remove illegal content, which in the EU includes four categories: child sex abuse, incitement to terrorism, certain types of hate speech, and intellectual property or copyright infringement.

The mounting pressure on platforms to moderate online content coincides with increasing attention to algorithms that can automate the process of content moderation (“AI”) for the detection and ultimate removal of illegal (or unwanted) content.

The focus of yesterday’s workshop was to explore questions surrounding the role of AI in moderating content online, and the possible implications of AI for the moderation of online content and how online content moderation is governed.

Setting the Tone: Challenges for Automated Filtering

Malavika Jayaram from Digital Asia Hub and I delivered the two opening “impulse statements” for the day. Malavika talked about some of the inherent limitations of AI for automated detection (with a reference to the infamous “Not Hot Dog” app) and pointed out some of the tools that platforms are being pressured to use automated content moderation tools.

I spoke about our long line of research on applying machine learning to detect a wide range of unwanted traffic, ranging from spam to botnets to bulletproof scam hosting sites. I then talked about how the dialog has in some ways used the technical community’s past success in spam filtering to suggest that automated filtering of other types of content should be as easy as flipping a switch. Since spam detection was something we knew how to do, then surely the platforms could also ferret out everything from copyright violations to hate speech, right?

In practice Evan Engstrom and I previously wrote about the difficulty of applying automated filtering algorithms to copyrighted content.  In short, even with a database that matches fingerprints of audio and video content against fingerprints of known copyrighted content, the methods are imperfect. When framing the problem in terms of incitement to violence or hate speech, automated detection becomes even more challenging, due to “corner cases” such as parody, fair use, irony, and so forth. A recent article from James Grimmelmann summarizes some of these challenges.

What I Learned

Over the course of the day, I learned many things about automated filtering that I hadn’t previously thought about.

  • Regulators and platforms are under tremendous pressure to act, based on the assumption that the technical problems are easy.  Regulators and platforms alike are facing increasing pressure to act, as I previously mentioned. Part of the pressure comes from a perception that detection of unwanted content is a solved problem. This myth is sometimes perpetuated by the designers of the original content fingerprinting technologies, some of which are now in widespread use. But, there’s a big difference between testing fingerprints of content against a database of known offending content and building detection algorithms that can classify the semantics of content that has never been seen before. An area where technologists can contribute to this dialog is in studying and demonstrating the capabilities and limitations of automated filtering, both in terms of scale and accuracy. Technologists might study existing automated filtering techniques or design new ones entirely.
  • Takedown requests are a frequent instrument for censorship. I learned about the prevalence of “snitching”, whereby one user may request that a platform take down objectionable content by flagging the content or otherwise complaining about it—in such instances, oppressed groups (e.g., Rohingya Muslims) can be disproportionately targeted by large campaigns of takedown requests. (It was not known whether such campaigns to flag content have been automated on a large scale, but my intuition is that they likely are.) In such cases, the platforms err on the side of removing content, and the process for “remedy” (i.e., restoring the content) can be slow and tedious. This process creates a lever for censorship and suppression of speech.The trends are troubling: according to a recent article, a year ago, Facebook removed 50 percent of content that Israel requested be removed; now that figure is 95 percent.  Jillian York runs a site where users can report these types of takedowns, but these reports and statistics are all self-reported. A useful project might be to automate the measurement of takedowns for some portion of the ecosystem or group of users.
  • The larger platforms share content hashes of unwanted content, but the database and process are opaque. About nine months ago, Twitter, Facebook, YouTube, and Microsoft formed the Global Internet Forum to Counter Terrorism. Essentially, the project relies on something called the Shared Industry Hash Database. It’s very challenging to find anything about this database online aside from a few blog posts from the companies, although it does seem in some way associated with Tech Against Terrorism.The secretive nature of the shared hash database and the process itself has a couple of implications. First, the database is difficult to audit—if content is wrongly placed in the database, removing it would appear next to impossible. Second, only the member companies can check content against the database, essentially preventing smaller companies (e.g., startups) from benefitting from the information. Such limits in knowledge could ultimately prove to be a significant disadvantage if the platforms are ultimately held liable for the content that they are hosting. As I discovered throughout the day, the opaque nature of commercial content moderation proves to be a recurring theme, which I’ll return to later.
  • Different countries have very different definitions of unlawful content. The patchwork of laws governing speech on the Internet makes regulation complicated, as different countries have different laws and restrictions on speech. For example, “incitement to violence” or “hate speech” might mean a different thing in Germany (where Nazi propaganda is illegal) than it does in Spain (where it is illegal to insult the king) or France (which recently vowed to ferret out racist content on social media). When applying this observation to automated detection of illegal content, things become complicated. It becomes impossible to train a single classifier that can be applied generally; essentially, each jurisdiction needs its own classifier.
  • Norms and speech evolve over time, often rapidly. Several attendees observed that most of the automated filtering techniques today boil down to flagging content based on keywords. Such a model can be incredibly difficult to maintain, particularly when it comes to detecting certain types of content such as hate speech. For one, norms and language evolve; a word that was innocuous or unremarkable today could take on an entirely new meaning tomorrow. Complicating matters further, sometimes people try to regain control in an online discussion by co-opting a slur; therefore, a model that bases classification on the presence of certain keywords can produce unexpected false positives, especially in the absence of context.

Takeaways

Aside from the information I learned above, I also took away a few themes about the state of online content moderation:

  • There will likely always be a human in the loop. We must figure out what role the human should play. Detection algorithms are only as good as their input data. If the data is biased, if norms and language evolve, or if data is mislabeled (an even more likely occurrence, since a label like “hate speech” could differ by country), then the outputs will be incorrect. Additionally, algorithms can only detect proxies for semantics and meaning (e.g., an ISIS flag, a large swath of bare skin) but have much more difficulty assessing context, fair use, parody, and other nuance. In short, on a technical front, we have our work cut out for us. It was widely held that humans will always need to be in the loop, and that AI should merely be an assistive technology, for triage, scale, and improving human effectiveness and efficiency when making decisions about moderation. Figuring out the appropriate division of labor between machines and humans is a challenging technical, social, and legal problem.
  • Governance and auditing is currently challenging because decision-making is secretive. The online platforms currently control all aspects of content moderation and governance. They have the data; nobody else has it. They know the classification algorithms they use and the features they use as input to those algorithms; nobody else knows them. They also are the only ones who have insight into the ultimate decision-making process. This situation is different from other unwanted traffic detection problems that the computer science research community has worked on, where it was relatively easy to get a trace of email spam or denial of service traffic, either by generating it or by working with an ISP. In this situation, everything is under lock and key.This lack of public access to data and information makes it difficult to audit the process that platforms are currently using, and it also raises important questions about governance:
    • Should the platforms be the ultimate arbiter in takedown and moderation?
    • Is that an acceptable situation, even if we don’t know the rules that they are using to make those decisions?
    • Who trains the algorithms, and with what data?
    • Who gets access to the models and algorithms? How does disclosure work?
    • How does a user learn that his or her content was taken down, as well as why it was taken down?
    • What are the steps to remedy an incorrect, unlawful, or unjust takedown request?
    • How can we trust the platforms to make the right decisions when in some cases it is in their financial interests to suppress speech? History has suggested that trusting the platforms to do the right thing in these situations can lead to restrictions on speech.

Many of the above questions are regulatory. Yet, technologists can play a role for some aspects of these questions. For example, measurement tools might detect and evaluate removal and takedowns of content for some well-scoped forum or topic. A useful starting point for the design of such a measurement system could be a platform such as Politwoops, which monitors tweets that politicians have deleted.

Summary

The workshop was enlightening. I came as a technologist wanting to learn more about how computer science might be applied to the social and legal problems concerning content moderation; I came away with a few ideas, fueled by exciting discussion. The attendees were an healthy mix of computer scientists, regulators, practitioners, legal scholars, and human rights activists. I’ve worked on censorship of Internet protocols for many years, but in some sense measuring censorship can feel a little bit like looking for one’s key under the lamppost—my sense is that the real power over speech is now held by the platforms, and as a community we need new mechanisms—technical, legal, economic, and social—to hold them to account.

New Jersey Takes Up Net Neutrality: A Summary, and My Experiences as a Witness

On Monday afternoon, I testified before the New Jersey State Assembly Committee on Science, Technology, and Innovation, which is chaired by Assemblyman Andrew Zwicker, who also happens to represent Princeton’s district.

On the committee agenda were three bills related to net neutrality.

Let’s quickly review the recent events. In December 2017, the Federal Communications Commission (FCC) recently rolled back the now-famous 2015 Open Internet Order, which required Internet service providers (ISPs) to abide by several so-called “bright line” rules, which can be summarized as (1) no blocking lawful Internet traffic; (2) no throttling or degrading the performance of lawful Internet traffic; (3) no paid prioritization of one type of traffic over another; (4) transparency about network management practices that may affect the forwarding of traffic.  In addition to these rules, the FCC order also re-classified Internet service as a “Title II” telecommunications service—placing it under the jurisdiction of the FCC’s rulemaking authority—overturning the previous “Title I” information services classification that ISPs previously enjoyed.

The distinction of Title I vs. Title II classification is nuanced and complicated, as I’ve previously discussed. Re-classification of ISPs as a Title II service certainly comes with a host of complicated regulatory strings attached.  It also places the ISPs in a different regulatory regime than the content providers (e.g., Google, Facebook, Amazon, Netflix).

The rollback of the Open Internet Order reverted not only the ISPs’ classification of Title II service, but also the four “bright line rules”. In response, many states have recently been considering and enacting their own net neutrality legislation, including Washington, Oregon, California, and now New Jersey. Generally speaking, these state laws are far less complicated than the original FCC order. They typically involve re-instating the FCC’s bright-line rules, but entirely avoid the question of Title II classification.

On Monday, the New Jersey State Assembly considered three bills relating to net neutrality. Essentially, all three bills amount to providing financial and other incentives to ISPs to abide by the bright line rules.  The bills require ISPs to follow the bright line rules as a condition for:

  1.  securing any contract with the state government (which can often be a significant revenue source);
  2. gaining access to utility poles (which is necessary for deploying infrastructure);
  3. municipal consent (which is required to occupy a city’s right-of-way).

I testified at the hearing, and I also submitted written testimony, which you can read here. This was my first experience testifying before a legislative committee; it was an interesting and rewarding experience.  Below, I’ll briefly summarize the hearing and my testimony (particularly in the context of the other testifying witnesses), as well as my experience as a testifying witness (complete with some lessons learned).

My Testimony

Before I wrote my testimony, I thought hard about what a computer scientist with my expertise could bring to the table as a testifying expert. I focused my testimony on three points:

  • No blocking and no throttling are technically simple to implement. One of the arguments that those opposed to the legislation are making is that different state laws on blocking and throttling could become exceedingly difficult to implement, particularly if each state has its own laws. In short, the argument is that state laws could create a complex regulatory “patchwork” that is burdensome to implement. If we were considering a version of the several-hundred-page FCC’s Open Internet Order in each state, I might tend to agree. But, the New Jersey laws are simple and concise: each law is only a couple of pages. The laws basically say “don’t block or throttle lawful content”. There are clear carve-outs for illegal traffic, attack traffic, and so forth. My comments essentially focused on the simplicity of implementation, and that we need not fear a patchwork of laws if the default is a simple rule that simply prevents blocking or throttling. In my oral testimony, I added (mostly for color) that the Internet, by the way, is already a patchwork of tens of thousands of independently operated networks across hundreds of countries, and that our protocols support carrying Internet traffic over a variety of physical media, from optical networks to wireless networks to carrier pigeon. I also took the opportunity to make the point that, by the way, ISPs are in a relative sense, pretty good actors in this space right now, in contrast to other content providers who have regularly blocked access to content either for anti-competitive reasons, or as a condition for doing business in certain countries.
  • Prioritization can be useful for certain types of traffic, but it is distinct from paid prioritization. Some ISPs have been making arguments recently that prohibiting paid prioritization would prohibit (among other things) the deployment of high-priority emergency services over the Internet. Of course, anyone who has taking an undergraduate networking course will have learned about prioritization (e.g., Weighted Fair Queueing), as well as how prioritization (and even shaping) can improve application performance, by ensuring that interactive, delay-sensitive applications such as gaming are not queued behind lower priority bulk transfers, such as a cloud backup. Yet, prioritization of certain classes of applications over others is a different matter from paid prioritization, whereby one customer might pay an ISP for higher prioritization over competing traffic. I discussed the differences at length.I also talked about how prioritization and paid prioritization could more generally: it’s not just about what a router does, but about who has access to what infrastructure. The bills address “prioritization” merely as a packet scheduling exercise—a router services one queue of packets at a faster rate than another queue. But, there are plenty of other ways that some content can be made to “go faster” than others; one such example is the deployment of content across a so-called Content Delivery Network (CDN)—a distributed network of content caches that are close to users. Some application or content providers may enjoy unfair advantage (“priority”) over others merely by virtue of the infrastructure it has access to. Today’s laws—neither the repealed FCC rules nor the state law—do not say anything about this type of prioritization, which could be applied in anti-competitive ways.Finally, I talked about how prioritization is a bit of a red herring as long as there is spare capacity. Again, in an undergraduate networking course, we talk about resource allocation concepts such as max-min fairness, where every sender gets the capacity they require as long as capacity exceeds total demand. Thus, it is also important to ensure that ISPs and application providers continue to add capacity, both in their networks and at the interconnects between their networks.
  • Transparency is important for consumers, but figuring out exactly what ISPs should expose, in a way that’s meaningful to consumers and not unduly burdensome, is technically challenging. Consumers have a right to know about the service that they are purchasing from their ISP, as well as whether (and how well) that service can support different applications. Disclosure of network management practices and performance certainly makes good sense on the surface, but here the devil is in the details. An ISP could be very specific in disclosure. It could say, for example, that it has deployed a token bucket filter of a certain size, fill rate, and drain rate and detail the places in its network where such mechanisms are deployed. This would constitute a disclosure of a network management practice, but it would be meaningless for consumers. On the other hand, other disclosures might be so vague as to be meaningless; a statement from the ISP that says they might throttle certain types of high volume traffic a times of high demand might not be meaningful in helping a consumer figure out how certain applications might perform. In this sense, paragraph 226 of the Restoring Internet Freedom Order, which talks about consumers’ needs to understand how the network is delivering service for the applications that they care about is spot on. There’s only one problem with that provision: Technically, ISPs would have a hard time doing this without direct access to the client or server side of an application. In short: Transparency is challenging. To be continued.

The Hearing and Vote

The hearing itself was a interesting. There were several testifying witnesses opposing the bills: Jon Leibowitz, from Davis Polk (retained by Internet Service Providers); and a representative from US Telecom. The arguments against the bills were primarily legal and business-oriented. Essentially, the legal argument against the bills is that the states should leave this problem to the federal government. The arguments are (roughly) as follows: (1) The Restoring Internet Freedom Order prevents state pre-emption; (2) The Federal Trade Commission has this well-in-hand, now that ISPs are back in Title I territory (and as former commissioner, Leibowitz would know well the types of authority that the FTC has to bring such cases, as well as many cases they have brought against Google, Facebook, and others); (3) The state laws will create a patchwork of laws and introduce regulatory uncertainty, making it difficult for the ISPs to operate efficiently, and creating uncertainty for future investment.

The arguments in opposition to the bill are orthogonal to the points I made in my own testimony. In particular, I disclaimed any legal expertise on pre-emption. I was, however, able to comment on whether I thought the second and third arguments held water from a technical perspective. While the second point about the FTC authority is mostly a legal question, I understood enough about the FTC act, and the circumstances under which they bring cases, to comment on whether technically the bills in question give consumers more power than they might otherwise have with just the FTC rules in place. My perspective was that they do, although this point is a really interesting case of the muddy distinction between technology and the law: To really dive into arguments around this point, it helps to know a bit about both technology and the law. I was able to comment on the “patchwork” assertion from a technical perspective, as I discussed above.

At the end of the hearing, there was a committee vote on all three bills. It was interesting to see both the voting process, and the commentary that each committee member made with their votes.  In the end, there were two abstentions, with the rest in favor. The members who abstained did so largely on the legal question concerning state pre-emption—perhaps foreshadowing the next round of legal battles.

Lessons Learned

Through this experience, I once again saw the value in having technologists at the table in these forums, where the laws that govern the future of the Internet are being written and decided on. I learned a couple of important lessons, which I’ve briefly summarized below.

My job was to bring technical clarity, not to advocate policy. As a witness, technically I am picking a side. And, in these settings, even when making technical points, one is typically doing so to serve one side of a policy or legal argument. Naturally, given my arguments, I registered for a witness in favor of the legislation.

However, and importantly: that doesn’t mean my job was to advocate policy.  As a technologist, my role as a witness is to explain to the lawmakers technical concepts that can help them make better sense of the various arguments from others in the room. Additionally, I steered clear of rendering legal opinions, and where my comments did rely on legal frameworks, I made it clear that I was not an expert in those matters, but was speaking on technical points within the context of the laws, as I understood them.  Finally, when figuring out how to frame my testimony, I consulted many people: the lawmakers, my colleagues at Princeton, and even the ISPs themselves. In all cases, I asked these stakeholders about the topics I might focus on, as opposed to asking what, specifically I should say. I thought hard about what a computer scientist could bring to the discussion, as well as ensuring that what I said was technically accurate and correct.

A simple technical explanation is of utmost importance. In such a committee hearing, advocates and lobbyists abound (on both sides); technologists are rare. I suspect I was the only technologist in the room. Additionally, most of the people in the room have jobs to make arguments that serve a particular stakeholder.  In doing so, they may muddy the waters, either accidentally or intentionally. To advance their arguments, some people may even say things that are blatantly false (thankfully that didn’t happen on Monday, but I’ve seen it happen in similar forums). Perhaps surprisingly, such discourse can fly by completely unnoticed, because the people in the room—especially the decision-makers—don’t have as deep of an understanding of the technology as the technologists.  Technologists need to be in the room, to shed light and to call foul—and, importantly, to do so using accessible language and examples that non-technical policy-makers can understand.