March 20, 2019

Bridging Tech-Military AI Divides in an Era of Tech Ethics: Sharif Calfee at CITP

In a time when U.S. tech employees are organizing against corporate-military collaborations on AI, how can the ethics and incentives of military, corporate, and academic research be more closely aligned on AI and lethal autonomous weapons?

Speaking today at CITP was Captain Sharif Calfee, a U.S. Naval Officer who serves as a surface warfare officer. He is a graduate of the U.S. Naval Academy and U.S. Naval Postgraduate School and a current MPP student at the Woodrow Wilson School.

Afloat, Sharif most recently served as the commanding officer, USS McCAMPBELL (DDG 85), an Aegis guided missile destroyer. Ashore, Sharif was most recently selected for the Federal Executive Fellowship program and served as the U.S. Navy fellow to the Center for Strategic & Budgetary Assessments (CSBA), a non-partisan, national security policy analysis think-tank in Washington, D.C..

Sharif spoke to CITP today with some of his own views (not speaking for the U.S. government) about how research and defense can more closely collaborate on AI.

Over the last two years, Sharif has been working on ways for the Navy to accelerate AI and adopt commercial systems to get more unmanned systems into the fleet. Toward this goal, he recently interviewed 160 people at 50 organizations. His talk today is based on that research.

Sharif next tells us about a rift between the U.S. government and companies/academia in AI. This rift is a symptom, he tells us, of a growing “civil-military divide” in the US. In previous generations, big tech companies have worked closely with the U.S. military, and a majority of elected representatives in Congress had prior military experience. That’s no longer true. As there’s a bifurcation in the experiences of Americans who serve in the military versus those who have. This lack of familiarity, he says, complicates moments when companies and academics discuss the potential of working with and for the U.S. military.

Next, Sharif says that conversations about tech ethics in the technology industry are creating a conflict that making it difficult for the U.S. military to work with them. He tells us about Project Maven, a project that Google and the Department of Defense worked on together to analyze drone footage using AI. Their purpose was to reduce the number of casualties to civilians who are not considered battlefield combatants. This project, which wasn’t secret, burst into public awareness after a New York Times article and a letter from over three thousand employees. Google declined to renew the DOD contract and update their motto.

U.S. Predator Drone (via Wikimedia Commons)

On the heels of their project Maven decision, Google also faced criticism for working with the Chinese government to provide services in China in ways that enabled certain kinds of censorship. Suddenly, Google found themselves answering questions about why they were collaborating with China on AI and not with the U.S. military.

How do we resolve this impasse in collaboration?

  • The defense acquisition process is hard for small, nimble companies to engage in
  • Defense contracts are too slow, too expensive, too bureaucratic, and not profitable
  • Companies aren’t not necessarily interested in the same type of R&D products as the DOD wants
  • National security partnerships with gov’t might affect opportunities in other international markets.
  • The Cold War is “ancient history” for the current generation
  • Global, international corporations don’t want to take sides on conflicts
  • Companies and employees seek to create good. Government R&D may conflict with that ethos

Academics also have reasons not to work for the government:

  • Worried about how their R&D will be utilized
  • Schools of faculty may philoisophically disagree with the government
  • Universities are incubators of international talent, and government R&D could be divisive, not inclusive
  • Government R&D is sometimes kept secret, which hurts academic careers

Faced with this, according to Sharif, the U.S. government is sometimes baffled by people’s ideological concerns. Many in the government remember the Cold War and knew people who lived and fought in World War Two. They can sometimes be resentful about a cold shoulder from academics and companies, especially since the military funded the foundational work in computer science and AI.

Sharif tells us that R&D reached an inflection point in the 1990s. During the Cold War, new technologies were developed through defense funding (the internet, GPS, nuclear technology) and then they reached industry. Now the reverse happens. Now technologies like AI are being developed by the commercial sector and reaching government. That flow is not very nimble. DOD acquisition systems are designed for projects that take 91 months to complete (like a new airplane), while companies adopt AI technologies in 6-9 months (see this report by the Congressional Research Service).

Conversations about policy and law also constrain the U.S. government from developing and adopting lethal autonomous weapons systems, says Sharif. Even as we have important questions about the ethical risks of AI, Sharif tells us that other governments don’t have the same restrictions. He asks us to imagine what would have happened if nuclear weapons weren’t developed first by the U.S..

How can divides between the U.S. government and companies/academia be bridged? Sharif suggests:

  • The U.S. government must substantially increase R&D funding to help regain influence
  • Establish a prestigious DOD/Government R&D one-year fellowship program with top notch STEM grads prior to joining the commercial sector
  • Expand on the Defense Innovation Unit
  • Elevate the Defense Innovation Board in prominence and expand the project to create conversations that bridge between ideological divides. Organize conversations at high levels and middle management levels to accelerate this familiarization.
  • Increase DARPA and other collaborations with commercial and academic sectors
  • Establish joint DOD and Commercial Sector exchange programs
  • Expand the number of DOD research fellows and scientists present on university campuses in fellowship programs
  • Continue to reform DOD acquisition processes to streamline for sectors like AI

Sharif has also recommended to the U.S. Navy that they create an Autonomy Project Office to enable the Navy to better leverage R&D. The U.S. Navy has used structures like this for previous technology transformations on nuclear propulsion, the Polaris submarine missiles, naval aviation, and the Aegis combat system.

At the end of the day, says Sharif, what happens in a conflict where the U.S. does not have the technological overmatch and is overmatched by someone else? What are the real life consequences? That’s what’s at stake in collaborations between researchers, companies, and the U.S. department of defense.

Princeton Students: Learn the Design & Ethics of Large-Scale Experimentation

Online platforms, which monitor and intervene in the lives of billions of people, routinely host thousands of experiments to evaluate policies, test products, and contribute to theory in the social sciences. These experiments are also powerful tools to monitor injustice and govern human and algorithm behavior. How can we do field experiments at scale, reliably, and ethically?

This spring I’m teaching the undergraduate/graduate class SOC 412: Designing Field Experiments at Scale for the second year. In this hands-on class for students in the social sciences, computer science, and HCI, you will start experimenting right away, learn best practices in experiments in real-world settings, and learn to think critically about the knowledge and power of experimentation. The final project is a group project to design or analyze a large-scale experiment in a novel way. I approach the class with an expectation that each project could become a publishable academic research project.

Project Opportunities for 2019

This year, students will have opportunities to develop the following final projects:

  • Working with WikiLovesAfrica to test ideas for broadening global contributions to Wikipedia and broaden how media made by Africans is used and understood by the rest of the world
  • Data-mining a dataset from roughly a thousand experiments conducted on Wikipedia to make new discoveries about participation online
  • Developing new experiments together with moderators on reddit
  • Your own experiment, including your senior project, if your department approves
  • … additional opportunities TBD

Unsolicited Student Reviews from 2018

“I  recently accepted an associate product manager position at [company]. They essentially work to AB test commercial initiatives by running campaigns like field experiments.  In fact, it seems like a lot of what was covered in SOC 412 will be relevant there!  As a result, I felt that working as a product manager there would give me a lot of influence over not only statistical modeling approaches, but also user privacy and ethical use of data.”

“From my experience, very few professors take the time to provide such personalized feedback and I really appreciate it.”

“Take! This! Class! Even if you’ll never do an online experiment in your line of work, it’s important to know the logistical and ethical issues because such experiments are going on in your daily life, whether you know it or not.”

“Instructions were always very clear. Grading guidelines were also very clear and Nathan’s feedback was always super helpful!”

“I appreciate the feedback you gave throughout the course, and I also value the conversations we had outside of class. As somebody that’s still trying to find myself as a researcher, it was very helpful to get your perspective.”

Sample Class Projects from 2018

Here are examples of projects that students did last year:

Promoting Inclusion and Participation in an Online Gender-Related Discussion Community

Many users join gender-related discussions online to discuss current events and their personal experiences. However, people sometimes feel unwelcome those communities for two reasons. First of all, they may be interested in participating in constructive discussions, but their opinions differ from the a community’s vocal majority. Accordingly, they feel uncomfortable voicing these opinions due to fear of an overwhelmingly negative reaction. Second, as we discovered in a survey, many participants in online gender conversations wish to make the experience uncomfortable for commenters.

In this ongoing study, two undergraduate students worked with moderators of an online community to test the effects on newcomer participation of interventions that provide first-time participants with more accurate information about the values of the community and its organizers.


🗳 Auditing Facebook and Google Election Ad Policies 🇺🇸

Austin Hounsel developed software to generate advertisements and direct volunteers to test and compare the boundaries of Google and Facebook’s election advertising policies. In the class, Austin chose statistical methods and developed an experiment plan in the class. Our findings were published in The Atlantic and will also be submitted in a computer science conference paper (full code, data, and details are available on Github).

In this study, we asked how common these mistakes are and what kinds of ads are mistakenly prohibited by Facebook and Google. Over 23 days, 7 U.S. citizens living inside and outside the United States attempted to publish 477 non-election advertisements that varied in the type of ad, its potentially-mistaken political leaning, and its geographic targeting to a federal or state election voter population. Google did not prohibit any of the ads posted. Facebook prohibited 4.2% of the submitted ads.

Improvements in 2019

Last year was the very first prototype of this class. Thanks to helpful feedback from students, I have adjusted the course to:

  • Provide more space for student discussion, via a dedicated precept period
  • Speed up the research design process, with more refined software examples
  • Improve team selection so students can spend more time focused on projects and less on choosing projects
  • Improving the course readings and materials
  • Take class discussions away from Piazza to Slack
  • Streamline the essay grading process for me and for students

I’ve written more about what I learned from the class in a series of posts here at Freedom to Tinker (part 1) (part 2).

Pilots of risk-limiting election audits in California and Virginia

In order to run trustworthy elections using hackable computers (including hackable voting machines), “elections should be conducted with human-readable paper ballots. … States should mandate risk-limiting audits prior to the certification of election results.

What is a risk-limiting audit, and how do you perform one? An RLA is a human inspection of a random sample of the paper ballots (or batches of ballots)—using a scientific method that guarantees with high confidence that if the voting machines claimed the wrong winner, then the audit will declare, “I cannot confirm this election,” in which case a by-hand recount is appropriate.  This is protection against voting-machine miscalibration, or against fraudulent hacks of the voting machines.

That’s what it is, but how do you do it?  RLAs require not only a statistical design, but a practical plan for selecting hundreds of ballots from among millions of sheets of paper.  It’s an administrative process as much as it is an algorithm.

In 2018, RLAs were performed by the state of Colorado.  In addition, two just-published reports describe pilot RLAs performed by Orange County, California and Fairfax, Virginia.  From these reports (and from the audits they describe) we can learn a lot about how RLAs work in practice.

Orange County, CA Pilot Risk-Limiting Audit, by Stephanie Singer and Neal McBurnett, Verified Voting Foundation, December 2018.

Neal Kelley, Registrar of Voters of Orange County, ran an RLA of 3 county-wide races in the June 2018 primary, with assistance from Verified Voting.  About 635,000 ballots were cast; many ballots were 3 pages long (printed both sides), about 1.4 million sheets overall.  Of these, just 160 specific (randomly selected) ballot sheets  needed to be found and tabulated by human inspection.  How do you manage a million sheets of paper?

Orange County elections warehouse during the June 2018 risk-limiting audit

Like this!  Keep well organized ballot manifests that list each batch of ballots (that were initially counted by optical scanners), where they came from, how many ballots.  How do you know how many ballots are in each batch?  The optical scanners tell you, but you don’t want to trust the optical scanners (a hacked scanner could influence the audit by lying about how many ballots are in a batch).  So you weigh the batch on a high-precision scale, that tells you ±2 sheets.  And so on.   You can read the details in the report, which really helps to demystify the process.   Still, there are many ways of doing an RLA, and this report describes just one of them.  The audit was finished before the deadline for certifying election results.  The estimated salary cost of the staff of the Registrar of Voters, for the days running the audit, was under $4000.

City of Fairfax,VA Pilot Risk-Limiting Audit, by Mark Lindeman, Verified Voting Foundation, December 2018.

Brenda Cabrera, General Registrar of the City of Fairfax, ran a pilot RLA of the June 12th 2018 Republican primary Senate election, with assistance from Verified Voting.  There were 948 ballots cast, and the audit team ran the audit three ways, to test three different RLA methods.   The audit was scheduled to take two days but finished ahead of schedule.

Colorado ran statewide RLAs of its 2018 primary and general elections, after pilot projects in previous years.

From all these activities we continue to learn more about how to run trustworthy elections.  I encourage state and local election officials nationwide to try RLA pilots of their own.  The Verified Voting Foundation, Democracy Works, the Democracy Fund, Free and Fair, and other individuals and organizations are available to provide advice.