January 20, 2019

Princeton Students: Learn the Design & Ethics of Large-Scale Experimentation

Online platforms, which monitor and intervene in the lives of billions of people, routinely host thousands of experiments to evaluate policies, test products, and contribute to theory in the social sciences. These experiments are also powerful tools to monitor injustice and govern human and algorithm behavior. How can we do field experiments at scale, reliably, and ethically?

This spring I’m teaching the undergraduate/graduate class SOC 412: Designing Field Experiments at Scale for the second year. In this hands-on class for students in the social sciences, computer science, and HCI, you will start experimenting right away, learn best practices in experiments in real-world settings, and learn to think critically about the knowledge and power of experimentation. The final project is a group project to design or analyze a large-scale experiment in a novel way. I approach the class with an expectation that each project could become a publishable academic research project.

Project Opportunities for 2019

This year, students will have opportunities to develop the following final projects:

  • Working with WikiLovesAfrica to test ideas for broadening global contributions to Wikipedia and broaden how media made by Africans is used and understood by the rest of the world
  • Data-mining a dataset from roughly a thousand experiments conducted on Wikipedia to make new discoveries about participation online
  • Developing new experiments together with moderators on reddit
  • Your own experiment, including your senior project, if your department approves
  • … additional opportunities TBD

Unsolicited Student Reviews from 2018

“I  recently accepted an associate product manager position at [company]. They essentially work to AB test commercial initiatives by running campaigns like field experiments.  In fact, it seems like a lot of what was covered in SOC 412 will be relevant there!  As a result, I felt that working as a product manager there would give me a lot of influence over not only statistical modeling approaches, but also user privacy and ethical use of data.”

“From my experience, very few professors take the time to provide such personalized feedback and I really appreciate it.”

“Take! This! Class! Even if you’ll never do an online experiment in your line of work, it’s important to know the logistical and ethical issues because such experiments are going on in your daily life, whether you know it or not.”

“Instructions were always very clear. Grading guidelines were also very clear and Nathan’s feedback was always super helpful!”

“I appreciate the feedback you gave throughout the course, and I also value the conversations we had outside of class. As somebody that’s still trying to find myself as a researcher, it was very helpful to get your perspective.”

Sample Class Projects from 2018

Here are examples of projects that students did last year:

Promoting Inclusion and Participation in an Online Gender-Related Discussion Community

Many users join gender-related discussions online to discuss current events and their personal experiences. However, people sometimes feel unwelcome those communities for two reasons. First of all, they may be interested in participating in constructive discussions, but their opinions differ from the a community’s vocal majority. Accordingly, they feel uncomfortable voicing these opinions due to fear of an overwhelmingly negative reaction. Second, as we discovered in a survey, many participants in online gender conversations wish to make the experience uncomfortable for commenters.

In this ongoing study, two undergraduate students worked with moderators of an online community to test the effects on newcomer participation of interventions that provide first-time participants with more accurate information about the values of the community and its organizers.


🗳 Auditing Facebook and Google Election Ad Policies 🇺🇸

Austin Hounsel developed software to generate advertisements and direct volunteers to test and compare the boundaries of Google and Facebook’s election advertising policies. In the class, Austin chose statistical methods and developed an experiment plan in the class. Our findings were published in The Atlantic and will also be submitted in a computer science conference paper (full code, data, and details are available on Github).

In this study, we asked how common these mistakes are and what kinds of ads are mistakenly prohibited by Facebook and Google. Over 23 days, 7 U.S. citizens living inside and outside the United States attempted to publish 477 non-election advertisements that varied in the type of ad, its potentially-mistaken political leaning, and its geographic targeting to a federal or state election voter population. Google did not prohibit any of the ads posted. Facebook prohibited 4.2% of the submitted ads.

Improvements in 2019

Last year was the very first prototype of this class. Thanks to helpful feedback from students, I have adjusted the course to:

  • Provide more space for student discussion, via a dedicated precept period
  • Speed up the research design process, with more refined software examples
  • Improve team selection so students can spend more time focused on projects and less on choosing projects
  • Improving the course readings and materials
  • Take class discussions away from Piazza to Slack
  • Streamline the essay grading process for me and for students

I’ve written more about what I learned from the class in a series of posts here at Freedom to Tinker (part 1) (part 2).

Pilots of risk-limiting election audits in California and Virginia

In order to run trustworthy elections using hackable computers (including hackable voting machines), “elections should be conducted with human-readable paper ballots. … States should mandate risk-limiting audits prior to the certification of election results.

What is a risk-limiting audit, and how do you perform one? An RLA is a human inspection of a random sample of the paper ballots (or batches of ballots)—using a scientific method that guarantees with high confidence that if the voting machines claimed the wrong winner, then the audit will declare, “I cannot confirm this election,” in which case a by-hand recount is appropriate.  This is protection against voting-machine miscalibration, or against fraudulent hacks of the voting machines.

That’s what it is, but how do you do it?  RLAs require not only a statistical design, but a practical plan for selecting hundreds of ballots from among millions of sheets of paper.  It’s an administrative process as much as it is an algorithm.

In 2018, RLAs were performed by the state of Colorado.  In addition, two just-published reports describe pilot RLAs performed by Orange County, California and Fairfax, Virginia.  From these reports (and from the audits they describe) we can learn a lot about how RLAs work in practice.

Orange County, CA Pilot Risk-Limiting Audit, by Stephanie Singer and Neal McBurnett, Verified Voting Foundation, December 2018.

Neal Kelley, Registrar of Voters of Orange County, ran an RLA of 3 county-wide races in the June 2018 primary, with assistance from Verified Voting.  About 635,000 ballots were cast; many ballots were 3 pages long (printed both sides), about 1.4 million sheets overall.  Of these, just 160 specific (randomly selected) ballot sheets  needed to be found and tabulated by human inspection.  How do you manage a million sheets of paper?

Orange County elections warehouse during the June 2018 risk-limiting audit

Like this!  Keep well organized ballot manifests that list each batch of ballots (that were initially counted by optical scanners), where they came from, how many ballots.  How do you know how many ballots are in each batch?  The optical scanners tell you, but you don’t want to trust the optical scanners (a hacked scanner could influence the audit by lying about how many ballots are in a batch).  So you weigh the batch on a high-precision scale, that tells you ±2 sheets.  And so on.   You can read the details in the report, which really helps to demystify the process.   Still, there are many ways of doing an RLA, and this report describes just one of them.  The audit was finished before the deadline for certifying election results.  The estimated salary cost of the staff of the Registrar of Voters, for the days running the audit, was under $4000.

City of Fairfax,VA Pilot Risk-Limiting Audit, by Mark Lindeman, Verified Voting Foundation, December 2018.

Brenda Cabrera, General Registrar of the City of Fairfax, ran a pilot RLA of the June 12th 2018 Republican primary Senate election, with assistance from Verified Voting.  There were 948 ballots cast, and the audit team ran the audit three ways, to test three different RLA methods.   The audit was scheduled to take two days but finished ahead of schedule.

Colorado ran statewide RLAs of its 2018 primary and general elections, after pilot projects in previous years.

From all these activities we continue to learn more about how to run trustworthy elections.  I encourage state and local election officials nationwide to try RLA pilots of their own.  The Verified Voting Foundation, Democracy Works, the Democracy Fund, Free and Fair, and other individuals and organizations are available to provide advice.

 

 

Why voters should mark ballots by hand

Because voting machines contain computers that can be hacked to make them cheat, “Elections should be conducted with human-readable paper ballots. These may be marked by hand or by machine (using a ballot-marking device); they may be counted by hand or by machine (using an optical scanner).  Recounts and audits should be conducted by human inspection of the human-readable portion of the paper ballots.

Ballot-marking devices (BMD) contain computers too, and those can also be hacked to make them cheat.  But the principle of voter verifiability is that when the BMD prints out a summary card of the voter’s choices, which the voter can hold in hand before depositing it for scanning and counting, then the voter has verified the printout that can later be recounted by human inspection.

 

ExpressVote ballot card, with bar codes for optical scanner and with human-readable summary of choices for use in voter verification and in recount or audit.

 

 

 

 

 

 

 

But really?  As a practical matter, do voters verify their BMD-printed ballot cards, and are they even capable of it?  Until now, there hasn’t been much scientific research on that question.

A new study by Richard DeMillo, Robert Kadel, and Marilyn Marks now answers that question with hard evidence:

  1. In a real polling place, half the voters don’t inspect their ballot cards, and the other half inspect for an average of 3.9 seconds (for a ballot with 18 contests!).
  2. When asked, immediately after depositing their ballot, to review an unvoted copy of the ballot they just voted on, most won’t detect that the wrong contests are presented, or that some are missing.

This can be seen as a refutation of Ballot-Marking Devices as a concept.  Since we cannot trust a BMD to accurately mark the ballot (because it may be hacked), and we cannot trust the voter to accurately review the paper ballot (or even to review it at all), what we can most trust is an optical-scan ballot marked by the voter, with a pen.  Although optical-scan ballots aren’t perfect either, that’s the best option we have to ensure that the voter’s choices are accurately recorded on the paper that will be used in a recount or random audit. [Read more…]