July 19, 2018

Can Classes on Field Experiments Scale? Lessons from SOC412

Last semester, I taught a Princeton undergrad/grad seminar on the craft, politics, and ethics of behavioral experimentation. The idea was simple: since large-scale human subjects research is now common outside universities, we need to equip students to make sense of that kind of power and think critically about it.

Path diagram from SOC412 lecture on the Social Media Color Experiment

Most behavioral experiments out in the world are conducted by people with no university training. In 2016, bloggers at NerdyData estimated that A/B test company Optimizely’s software was deployed on over half a million websites. In 2017, the company announced that it had passed its one millionth experiment. Companies trying to support millions of behavioral studies aren’t waiting for universities to train socially conscious experimenters. Instead, training happens in hotel ballrooms at events like Opticon, which draws in over a thousand people every year,  SearchEngineLand’s similarly sized SMX marketing conference series, and O’Reilly’s Strata conferences. And while scientists might consider experiments to be innocuous on their own, many have begun to wonder if the drive to optimize profits through mass behavioral experimentation may have damaging side effects.

Traditionally, training on field experiments has primarily been offered to gradstudents, mostly through mentorship with an advisor, in small graduate seminars, or in classes like ICPSR’s field experiments summer course. Master’s programs in education, economics, and policy also have a history of classes on evaluation. These classes tend to focus on the statistics of experimentation or on the politics of government-directed research. So far, I’ve only found two other undergraduate field experiments classes: one by Esther Duflo in economics and Carolina Castilla’s class at Colgate.

My class, SOC412, set out to introduce students to the actual work of doing experiments and also to give them opportunity to reflect, discuss, and write about the wider societal issues surrounding behavioral research at scale in today’s society. This 10-student seminar class was a prototype for a much larger lecture class I’m considering. The class also gave me an opportunity grow at teaching hybrid classes that combine statistics with critical reflection.

In this post, I describe the class, imagine how it could be improved as a seminar, outline what might need to change for a larger lecture class, and what I learned. I will also include notes for anyone thinking about teaching a class like this.

Goals of the Class

My goal for students was to introduce them to the process of conducting experiments in context of wider debates about the role of experiments in society. By the end of the class, students would have designed and conducted more than one field experiment and have a chance to write about how that experiment connects to these wider social issues. The class alternated between lecture/discussion sessions and lab-like sessions focused on methods. Assignments in the first part of the semester focused on the basics of experimentation, and a second part of the class focused more on developing a final project. You can see the syllabus here.

Scaffolding Student Work on Field Experiments

Designing and completing a field experiment in a single semester isn’t just a lot of work- it requires multiple external factors to converge:

  • Collaborations with external partners need to go smoothly
  • The delivery of the intervention and any measurement need to be simple
  • Experiments need to be doable in the available time
  • The university’s ethics board needs to work on the timeline of an undergrad class

In the first post about SOC412, I give more detail on the work I did to scaffold external factors.

SOC412 also gave me a chance to test the idea that the software I’m developing with the team at CivilServant could reduce the overhead of planning and conducting meaningful experiments with communities. By dedicating some of CivilServant’s resources to the class and inviting our community partners to work with students after our research summit in January, I hoped that students would be able to complete a full research cycle in the course of the class.

How the CivilServant software supports community experiments online

We almost did it <grin>. Students were able to fully develop experiment plans by the end of the semester, and we are conducting all of the studies they designed. Here are some of the great outcomes I got to see from students and that I want to remember for my own future teaching:

  • Asking students to do a first experiment in their own lives is a powerful way to prompt student reflection on the ethics of experimentation
  • Conversations with affected communities do help students think more critically about the contributions and limitations of experimentation in pragmatic settings
  • The statistics parts of the class went smoothest when I anticipated student needs and wrote well documented code for students to work from
  • It worked well to review basic statistical concepts through prepared datasets and transition students to data from their own experiments partway through the course
  • Lectures that illustrated central concepts in multiple ways worked well
  • Simulations were powerful ways to illustrate p-hacking, false positives, false negatives, and decision criteria for statistical results, since we could adjust the parameters and see the results to grow student intuitions
  • Short student presentations prompted close reading by students of specific field experiment designs and gave them a chance to explore personal interests more deeply
  • I think I did the right thing to offer students a chance to develop their own own unique research ideas beyond CivilServant. This added substantial time to my workload, but it allowed students to explore their own interests. I don’t think it will scale well.

Areas for Improving the Class

Here are some of the things that prevented us from meeting the full goals of the class, and how I would teach a seminar class differently in the future:

  • Online discussion:
    • Never use Piazza again. The system steers conversations toward question-answer with the instructor rather than discussion, and the company data-mines student behavior and sells it to job recruiters (they make a big show about opt-in, but it’s an dark pattern default checkbox). I’m thinking about shifting to the open source tools Discourse and NB.
  • Statistics:
    • Introduce students directly to a specific person who can provide extra statistics support as needed, rather than just point them to institutional resources (Brendan Nyhan does this in his politics experiments syllabus)
    • Pre-register every anticipated hypothesis test before the class, unless you want students to legitimately question your own work after you teach them about p-hacking <grin>
    • When teaching meta-analysis and p-hacking, give students a pre-baked set of experiment results (I’m working on getting several large corpora of A/B tests for this, please get in touch if you know where I can find one)
  • Designing experiments:
    • Students conducted power analyses based on historical data, and difficulties with power analysis caused substantial delays on student projects. Develop standard software code for conducting power analyses for experiments with count variable outcomes, which can be reasonably run on student laptops before the heat death of the universe. 
    • Experiments are a multi-stage process where early planning errors can compound. The class needs a good way to handle ongoing documents that will be graded over time, and which may need to be directly adjusted by the instructor or TA for the project to continue.
    • When using software to carry out experiment interventions, don’t expect that students will read the technical description of the system. Walk students through a worked example of an experiment carried out using that software.
    • Create a streamlined process for piloting surveys quickly
    • Create a standard experiment plan template in Word, Google Docs, and LaTeX. Offering an outline and example still yields considerable variation between student work
    • Consider picking a theme for the semester, which will focus students’ theory reading and their experiment ideas
    • Since classes have hard deadlines that cannot easily be altered, do not support student research ideas that involve any new software development.
  •  Participatory research process:
    • Schedule meetings with research partners before the class starts and include a regular meeting time in the syllabus (Nyhan does something similar with an “X period”). If you want to offer students input, choose the meeting time at the beginning of the semester and stick to it. Otherwise, you will lose time to scheduling and projects will slip.
    • Write a guide for students on the process of co-designing a research study, one that you run by research partners, that gives students a way to know where they are in the process, check off their progress, and communicate to the instructor where they are in the process.
  • Team and group selection:
    • While it would be nice to allow students to form project teams based on the studies they are interested in, teams likely need to be formed and settled before students are in a position to imagine and develop final project ideas.
  • Writing: Even students with statistics training will have limited experience writing about statistics for a general audience. Here are two things I would do differently:
    • Create a short guide, partly based on the Chicago Guide to Writing about Numbers, that shows a single finding well reported, poorly/accurately reported, and poorly/inaccurately reported. Talk through this example in class/lab.
    • In the early part of the class, while waiting for results from their own first set of experiments, assign students to write results paragraphs from a series of example studies, referring to the guide.

Supporting a Class With a Startup Nonprofit

This class would not have been possible without the CivilServant nonprofit or Eric Pennington, CivilServant’s data architect. CivilServant provides software infrastructure for collecting data, conducting surveys, and carrying out randomized trials with online communities. The CivilServant nonprofit (which gained a legal status independent of MIT on April 1st, halfway through the semester) also provided research relationships for students. While gradstudents developed their own studies, undergraduate students used CivilServant software and depended on the nonprofit’s partner relationships.

After the class, some students expressed regret that they didn’t end up doing research outside of the opportunities provided through CivilServant. During the semester, I developed several opportunities to conduct field experiments on the Princeton campus, and I explored further ideas with university administrators. Unfortunately, none of the fascinating student ideas or university leads were achievable within a semester (negotiating with university administrators takes time).

Between the cost of the summit and staff time, CivilServant put substantial resources into the class. Was it worth the time and expense? When working with learners, our research couldn’t happen as quickly or efficiently as it might have otherwise. Yet student research also helped CivilServant better focus our software engineering priorities. Supporting the class also gave us a first taste at what it might be like to combine a faculty position with my public interest research. Next spring, we will need to plan well to ensure that CivilServant’s wider work isn’t put on hold to support the class.

Should SOC 412 Be a Lecture or Seminar?

Do I think this class can scale to be a lecture course? I think a larger lecture course may be possible with some modifications under specific conditions:

  • Either (a) drop the participatory component of the course or (b) organize each precept (section) to carry out a single field experiment, coordinated by the preceptor (TA)
  • If needed, relax the goal of completing studies by the end of the semester and find other ways for students to develop their experience communicating results
  • The technical processes for student experiments should not require any custom software, or it will be impossible to support a large number of student projects. This would constrain the scope of possible experiments but increase the chance of students completing their experiments
  • If I’m to teach this as a lecture course next year, I should apply for a teaching grant from Princeton, since scaling the class will take substantial work on software, assignments, and class materials to formalize
  • Notes on Preceptors (TAs)
    • Careful preceptor recruitment, training, and coordination would be essential to scale this class
    • If each precept (section) does a single experiment, the work of developing studies will need to be distributed and managed differently than with the teams of 2-3 that I led
    • The class needs clear grading systems and rubrics for student writing assignments
    • Preceptors in the course could receive a privileged authorship position on any peer reviewed studies from their section, in acknowledgment of the substantial work of supporting this course

Should You Teach a Class Like This?

I had an amazing time teaching SOC412, the students learned well, and we completed and are launching a series of field experiments, all of which are publishable. Teaching this class with ten students was a lot of work, much more than a typical discussion seminar. If you’re thinking about teaching a class like this, here are some questions to ask yourself:

  • do you have the means to deploy multiple field experiments?
  • do you have staff who can support community partnerships?
  • do you have enough partners lined up?
  • is your IRB responsive enough to make quick emendations during a semester?
  • does your department already teach students the needed statistics prerequisites?
  • do you have streamlined ways to conduct experiments that will work for learners?
  • do you have Standard Operating Procedures for common study types, along with full code for the statistical methods?
  • do you have the resources to continuously update any incomplete parts of student projects throughout the course to ensure the quality of projects?

Teaching the Craft, Ethics, and Politics of Field Experiments

How can we manage the politics and ethics of large-scale online behavioral research? When this question came up in April during a forum on Defending Democracy at Princeton, Ed Felten mentioned on stage that I was teaching a Princeton undergrad class on this very topic. No pressure!

Ed was right about the need: people with undergrad computer science degrees routinely conduct large-scale behavioral experiments affecting millions or billions of people. Since large-scale human subjects research is now common, universities need to equip students to make sense of and think critically about that kind of power.

[Read more…]

The Rise of Artificial Intelligence: Brad Smith at Princeton University

What will artificial intelligence mean for society, jobs, and the economy?

Speaking today at Princeton University is Brad Smith, President and Chief Legal Officer of Microsoft. I was in the audience and live-blogged Brad’s talk.

CITP director Ed Felten introduces Brad’s lecture by saying that the tech industry is at a crossroads. With the rise of AI and big data, people have realized that the internet and technology are having a big, long-term effect on many people’s lives. At the same time, we’ve seen increased skepticism about technology and the role of the tech industry in society.

The good news, says Ed, is that plenty of people in the industry are up to the task of explaining what the industry does to cope with these problems in a productive way. What the industry needs now, says Ed, is what Brad offers: a thoughtful approach to the challenges that our society faces, acknowledges the role of tech companies, seeks constructive solutions, and takes responsibility that works across society. If there’s one thing we could to to help the tech industry cope with these questions, says Ed, it would be to clone Brad.

Imagining Artificial Intelligence in Thirty Years

Brad opens by mentioning the new book by his team: The Future Computed Artificial Intelligence and its Role in Society. While writing the book, they realized that it’s not helpful to think about change in the next year or two. Instead, we should be thinking about periods of ten to thirty years.

What was life like twenty years ago? In 1998, people often began their day without anything digital. They would put on a television, listen to the radio, and pull out a calendar. If you needed to call someone, you would use a land phone to reach them. At that time, the single common joke was about whether they could program their VCR machines.

In 2018, the first thing that many people reach for is their phone. Even if you manage to keep your phone in another room, you’ll find yourself reaching for your phone or sitting down in front of your laptop. You now use those devices to find out what happened in the world and with your friends.

What will the world look like in 2038? By that time, Brad argues that we’ll be living with artificial intelligence. Digital assistants are already part of our lives, but they’ll be more common at that time. Rather than looking at lots of apps, we’ll have a digital assistant that will talk to us and tell us what the traffic will be like for us. Twenty years from now, you’ll probably have your digital assistant talking to you as you shave or put on your makeup in the morning.

What is Artificial Intelligence?

To understand what that mean in our lives, we need to understand what artificial intelligence really is. Even today, computers can recognize people, and they can do more – they can make sense of someone’s emotions from their face. We’ve seen the same with the ability of computers to understand language, Brad says. Not only can computers recognize speech, they can also sift through knowledge, make sense of it, and reach conclusions.

In the world today, we read about AI and expect it all to arrive one day, says Brad. That’s not how it’s going to work- AI will become more and more part of our lives in pieces. He tells us about the BMW pedestrian alert, which allows cars to detect pedestrians, beep, signal to the driver, and apply its brakes. Brad also tells us about the Steno app, which records and transcribes. Microsoft now has a version of Skype that detects and auto-translates the conversation– something they’ve now integrated with Powerpoint. Spotify, Netflix, and iTunes all use artificial intelligence to deliver suggestions for the next TV show. None of these systems work with 100% perfection, but neither do human beings.  When asking about an AI system, we need to ask when computers will become as good as a human being.

What advances make AI real? Microsoft Amazon, Google, and others build data centers that are many football fields large in space. This enables companies to gather huge computational power and vast amounts of data. Because algorithms get better with more data, companies have an insatiable appetite for data.

The Challenges of Imagining the Future

All of this is exciting, says Brad, and could deliver huge promise for the world. But we can’t afford to look at this future with uncritical eyes. The world needs to make sense of the risks. As computers behave more like humans, what will that mean for real people? Many people like Stephen Hawking, Elon Musk, and others are warning us about that future. But there is no crystal ball. For a long time, says Brad, I’ve admired futurists, but if a futurist gets something wrong, probably nobody remembers they got it wrong. We may be able to discern patterns, but nobody has a crystal ball.

Learning from The History of the Automobile

How can we think about what may be coming? The first option is to learn from history– not because it repeats itself but because it provides insights. To illustrate this, Brad starts by talking about the transition from horses to automobiles. He shows us a photo of Bertha Benz, whose dowry paid for her husband Karl’s new business. One morning in 1888, she got up and left her husband a note saying that she was taking the car and driving the kids 70 kilometers to visit her mother. Before the day was over, she had to repair the car, but by the end of the day, they had reached her mother’s house. This stunt convinced the world that the automobile would be important to the future.

Next, Brad shows us a photo of New York City in 1905, with streets full of horses and hardly any cars. Twenty years later, there were no horses on the streets. The horse population declined and jobs involved in supporting them disappeared. These direct economic effects weren’t as important as the indirect effects. Consumer credit wasn’t necessarily connected to the automobile, but it was an indirect outcome. Once people wanted to buy cars, they needed a way to finance the cars. Advertising also changed: when people were driving past billboards at speed, advertisers invented logos to make their companies more recognizable.

How Institutions Evolve to Meet Technology & Economic Changes

The effects of the automobile weren’t all good. As the population of horses declined, farmers got smart and grew less hay. They shifted their acre-age to wheat and corn and the prices plummeted. Once the prices plummeted, farmers’ income plummeted. As the farmers fell behind on their loans, the rural banks tried to foreclose them, leading to broad financial collapse. Many of the things we take for granted today come from that experience: the FDIC and insurance regulation, farm subsidies, and many other parts of our infrastructure. With AI, we need to be prepared for changes as substantial.

Understanding the Impact of AI on the Economy

Brad tells us another story about how offices worked. In the 1980s, you handed someone a hand-written document and someone would type it for you. Between the 1980s and today, two big changes happened. First, secretarial staff went on the decline and the professional IT staff was born. Second, people realized that everyone needed to understand how to use computers.

As we think about how work will change, we need to ask what jobs AI will replace. To answer this question, let’s think about what computers can do well: vision, speech, language knowledge. Jobs involving decision-making are already being done by computers (radiology, call centers, fast food orders, auto drivers). Jobs involving translation and learning will also become automated, including machinery inspection and the work of paralegals. At Microsoft, the company used to have multiple people whose job was to inspect fire extinguishers. Now the company has devices that automatically record data on their status, reducing the work involved in maintaining them.

Some jobs are less likely to be replaced by AI, says Brad: anything that requires human understanding and empathy. Nurses, social workers, therapists, and teachers are more likely to be people who will use AI than be replaced by it. This may lead people to take on jobs that they take more satisfaction in doing.

Some of the most exciting developments for AI in the next five years will be in the area of disability. Brad shows us a project called “Seeing AI,” offers an app that describes a person’s surroundings using a phone camera. The app can read barcodes and identify food, identify currency bills, describe a scene, and read text in one’s surroundings. What’s exciting is what it can do for people. The project has already carried out 3 million tasks and it’s getting better and smarter as it goes. This system could be a game changer for people with blindness, says Brad.

Why Ethics Will Be a Growth Area for AI

What jobs will AI create? It’s easier to think about the jobs it will replace than what it will create. When young people in Kindergarten today enter the workplace, he says, the majority of jobs will be ones that don’t yet exist. Some of the new jobs will be ones that support AI to work: computer science, data science, and ethics. “Ultimately, the question is not only what computers *can* do” says Brad, “it’s what computers *should* do.” Under the ethics of AI, the fields of reliability/safety and privacy/security are well developed. Other important areas that are less well developed are research on fairness, inclusiveness. Two issues underly all the rest. Transparency is important because the world needs to know how those systems will work– people need to understand how they work.

AI Accountability and Transparency

Finally, one of the most important questions of our time is: “how do we ensure accountability of machines”- will we ensure that machines will be accountable to people, and will those people be accountable to other people? Only with accountability will be able to

What would it mean to create a hippocratic oath for AI developers? Brad asks: what does it take to train a new generation of people to work on AI with that kind of commitment and principle in mind? These aren’t just questions for people at big tech companies. As companies, governments, universities, and individuals take the building blocks of AI and use them, AI ethics are becoming important to every part of society.

Artificial Intelligence Policy

If we are to stay true to timeless values, says Brad, we need to ask the question about whether we only want ethical people to behave ethically, or everyone to behave ethically? That’s what law does; AI will create new questions for public policy and the evolution of the law. That’s why skilling up for the future isn’t just about science, technology, engineering, and math: as computers behave more like humans, the social sciences and humanities will become even more important. That’s why diversity in the tech industry is also important, says Brad.

How AI is Transforming the Liberal Arts, Engineering, and Agriculture

Brad encourages us to think about disciplines that AI can make more impactful: Ai is changing healthcare (cures for cancer), agriculture (precision farming), accessibility, and our environment. He concludes with two examples. First, Brad talks about the Princeton Geniza Lab, led by Marina Rustow, who are using AI to analyze documents that have been scattered all around the world. Using AI, researchers are joining these digitized fragments. Engineering isn’t only for the engineers– everybody in the liberal arts can benefit from learning a little bit of computer science and data science, and every engineer is going to need some more liberal arts in their future. Brad also  tells us about the AI for Earth project which provides seed funds to researchers who work on the future of the planet. Projects include smart grids in Norway that make energy usage more efficient, a project by the Singaporean government to do smart climate control in buildings, and a project in Tasmania that supports precision farming, saving 30% on irrigation costs.

These examples give us a glimpse on what it means to prepare for an AI powered future, says Brad. We’re also going to need to do more work: we may need a new social contract, because people are going to need to learn new skills, find new career pathways, create new labor rules and protections, and rethink the social safety net as these changes ripple throughout the economy.

Creating the Future of Artificial of Intelligence

Where will AI take us? Brad encourages students to think about the needs of the world and what AI has to offer. It’s going to take a whole generation to think through what AI has to offer and create that future, and he encourages today’s students to sieze that challenge.