December 13, 2018

Building Respectful Products using Crypto: Lea Kissner at CITP

How can we build respect into products and systems? What role does cryptography play in respectful design?

Speaking today at CITP is Lea Kissner (@LeaKissner), global lead of Privacy Technology at Google. Lea has spent the last 11 years designing and building security and privacy for Google projects from the grittiest layers of infrastructure to the shiniest user features — and cleaning up when something goes awry. She earned a Ph.D. in cryptography at Carnegie Mellon and a B.S. in CS from UC Berkeley.

As head of privacy at Google, Lea is crafts privacy reviews, defines what privacy means at Google, and leads a team that supports privacy across Google. Her team also creates tools and infrastructure that manage privacy across the company. If you’ve reviewed your privacy on Google, deleted your data, or shared any information with Google, Lea and her team have shaped your experience.

How does Lea think about privacy? When working to build products that respect users, Lea reminds us that it’s important for people to feel safe. This is a full-stack problem, all the way from humans and societies down to the level of hardware. Since society varies widely, people have very expectations around privacy and security, but not in the ways you would anticipate. Lea talks about many assumptions that don’t apply globally: not all languages have a word for privacy, people don’t always have control over their physical devices, and they often operate in settings of conflict.

Lea next talks about the case of online harassment. She describes hate speech as a distributed denial of service attack, a way to suppress speech they don’t like. Many platforms enable this kind of harassment, allowing anyone to send messages to anyone and enabling mass harassment. Sometimes it’s possible for platforms to develop policies to manage these problems, but platforms are often unable to intervene in cases of conflicting values.

Lea tells us about one project she worked on during the Arab uprisings. When people’s faces appeared in videos of protests, those people sometimes faced substantial risks when videos became widely viewed. Lea’s team worked with YouTube to implement software that allowed content creators to blur the faces of people appearing in videos.

Next, Lea describes the ways that her team links research with practical benefits to people. Her team’s ethnographers study differences in situations and norms. These observations shape how her team designs systems. As they create more systems, they then create design patterns, then do user testing on those patterns. Research with humans is important at both ends of the work: when understanding the meaning and nature of the challenges, and when testing systems.

Finally, Lea argues that we need to make privacy and security easy for people to do. Right now, cryptography processes are hard for people to use, and hard for people to implement. Her team focuses on creating systems to minimize the number of things that humans need to do in order to stay secure.

How Cryptography Projects can Fail

Lea next tells us about common failures in privacy and security.

The first way to fail is to create your own cryptography system. That’s a dangerous thing to do, says Lea. Why do people do this? Some think they’re smart and know enough just enough to be dangerous. Some think it’s cool to roll their own. Some don’t understand how cryptography works. Sometimes it seems too expensive (in terms of computation and network) for them to use a third-party system. To make good crypto easier, Lea’s team has created Tink, a multi-language, cross-platform library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.

Lea urges us, “Do me a solid. Don’t give people excuses to roll their own crypto.”

Another area where people fail is in privacy-preserving computation. Lea tells us the story of a feature within Google where people wanted to send messages to someone whose phone number they have. Simple, right? Lea unpacks how complex such features can be, how easy it is to enable privacy breaches, and how expensive it can be to offer privacy. She describes a system that stores a large number of phone numbers associated with user IDs. By storing information with encrypted user IDs, it’s possible to enable people to manage their privacy. When Lea’s team estimated the impact of this privacy feature, they realized that it would require more than all of Google’s total computational power. They’re still working on that one.

Privacy is easier to implement in structured analysis of databases such as advertising metrics, says Lea. Google has had more success adopting privacy practices in areas like advertising dashboards that don’t involve real-time user experiences.

Hardware failures are a major source of privacy and security failures. Lea tells us about the squirrels and sharks that have contributed to Amazon and Yahoo data failures by nibbling on cables. She then talks to us about sources of failures from software errors, as well as key errors. Lea tells us about Google’s Key Management Server, which knows about data objects and the keys that pertain to those objects. Keys in this service need to be accessed quickly and globally.

How do generalized key management servers fail? First, encrypted data compresses poorly. If a million people send each other the same image, a typical storage system can compress it efficiently, storing it only once. An encrypted storage system has to encrypt and store each image individually. Second, people who store information often like to index and search for information. Exact matches are easy, but if you need to retrieve a range of things from a period of time, you need an index, and to create an index, the software needs to know what’s inside the encrypted data. Sharding, backing up, and caching data is also very difficult when information is encrypted.

Next, Lea tells us about the problem of key rotation. People need to be able to change their keys in any usable encryption system. When rotating keys, for every single object, you need to decrypt it using the key and then re-encrypt it using a new key. During this process, you can’t shut down an entire service in order to re-do the encryption. Within a large organization like Google, key rotation should be regular, but if it needs to be coordinated across a large number of people. Lea’s team tried something like this, but it ended up being too complex for the company’s needs. After trying this, they moved key management to the storage level, where it would be possible to manage and rotate keys independently of software teams.

What do we learn from this? Lea tells us that cryptography is a tool for turning things into key management problems. She encourages us to avoid rolling our own cryptography, to design scalable privacy-preserving systems, plan for key management up front, and evaluate the success of a design in the full stack, working from humans all the way to the hardware.

How can we scale private, smart contracts? Ed Felten on Arbitrum

Smart contracts are powerful virtual referees for holding money and carrying out agreed-on procedures in cases of disputes, but they can’t guarantee privacy and have strict scalability limitations. How can we improve on these constraints?

Here at the Center for IT Policy, it’s the first event of our weekly Tuesday lunch series. Speaking today is Professor Ed Felten, director of CITP. Ed served at the White House as the deputy U.S. chief technology officer from June 2015 to January 2017. Ed was also the first chief technologist for the Federal Trade Commission from January 2011 until September 2012.

What is cryptocurrency? Ed describes a situation where Alice wants to share money with Bob. She digitally signs a data structure indicating that coin C should be paid to Bob’s address, and she sends it to the Bitcoin network. The systems in the network then gossip to each other that Alice wants to pay Bob.

This brings us to the blockchain. The blockchain is a data structure that includes information about transactions and a link to a previous block. Each block includes a cryptographic hash to the previous block, and if anyone accepts the block, they accept the rest of the chain. When Alice creates a transaction, it will be added to a block by a bitcoin miner. This miner then tries to succeed at getting their block to the blockchain- if the miner succeeds, then Alice’s transaction is accepted and it will be deemed to have happened.  That’s how Bitcoin works- it keeps track of all previous transactions, and that’s how it keeps track of currency.

Smart contracts are another blockchain idea- but it’s a misnomer. Here’s how it works. If Alice and Bob want to make an agreement and have a protocol for carrying it out, they write down computer code that defines the behavior of a third party. One way to do it is to have a trusted third party carry out that protocol. A smart contract creates a virtual third party, writes code describing what it should do, and then instantiates it into the blockchain system. Then, if all goes well, the contract will behave according to its code, and it will act as the third party or referee in the agreement between Alice and Bob. Ed shows this with a pile of money because one thing it can do is to receive coins, own them, and do whatever with those coins that its code has defined. The contract, in this sense, is a trusted third party expressed in code.

What can smart contracts do? One option is escrow. Maybe Alice wants to buy books and doesn’t want to pay until she receives the book– but maybe the shop will only ship after payment. This is typically what an escrow agent does. In the optimistic case, the escrow agent receives the money and transfers the money to the shop once the books have been received. Smart contracts can play the role of the escrow agent. Why set this up in code? In theory, an escrow agent defined in code will be less likely to carry out fraud.

Smart contracts can also support sealed-bid auctions. Ed asks us to imagine that someone is selling naming rights to a cafeteria. Everyone submits bids secretly, the “envelopes” are opened at the end of the bid, and whoever bid the most wins. Smart contracts can give people assurances that people will carry out key actions in the process by requiring them to provide a deposit, where they know what will happen with their money

The most popular smart contract system is Ethereum, in which all contract code and data is public. Every miner emulates every execution step of every contract. That is slow, expensive, and doesn’t scale, so Ethereum requires people to pay what they call “gas” – in exchange for computation and storage done by a contract. The high cost to the miners of emulating these steps translates to a high cost of gas.

Contract complexity on Ethereum is capped by a “global gas limit” – defining the maximum amount of contract work that the miners are able to do. Roughly speaking, Ed says, the total computational capacity of Ethereum is less than a tenth of a laptop. These scalability limitations make many protocols impossible, and blockchain space is very limited.

Ethereum also has privacy limitations- Bitcoin scripts and Ethereum code are all public. Not everyone wants the full details of every contract to be visible to everyone. In some cases, you might want something more like a traditional business contract, where the contract terms are normally only known to the parties.

Can we scale smart contracts? That’s what the Arbitrum team was trying to do. To make clear what the team is doing, Ed describes three areas where someone could do work. Rather than focus on the consensus level, the Arbitrum team focused on scaling the smart contracts.

  • Kalodner, H., Goldfeder, S., Chen, X., Weinberg, S. M., & Felten, E. W. (2018, August). Arbitrum: scalable, private smart contracts. In Proceedings of the 27th USENIX Conference on Security Symposium (pp. 1353-1370). USENIX Association.

How can you scale smart contracts? Ed’s team worked on an off-chain protocol. The work is performed out-of-band by the transacting parties. The computation and storage are done off-chain. All of these things need to be linked back to the chain.

Ed quickly summarizes approaches that have been taken, including SNARKs, Incentivized Verification (TrueBit), and State Channels. He goes more in-depth about TrueBit. In this system of incentivized verifiers, a group of “verifiers” volunteer to check computations. They are rewarded more if they find errors. Anyone can be a verifier, and the reward is split among them. If a computation checked by a verifier is incorrect, the verifier can give an efficient proof of incorrectness.

But there’s a participation dilemma to incentivized verifiers. Imagine a game-theory situation where there are N players, who can pay 1 to participate. Imagine that a participating verifier pretends to be more than one verifier (sybils). In that situation, if you have enough people wearing different sybil masks, people are disincentivized from being a verifier. The creators of TrueBit have shown that their system is “one-shot sybil proof.” As a result, if someone claims to be two people, they get two shares of the reward, but the shares are smaller, so that it would have been more profitable to claim (honestly) to be a single party.

Verification is a repeated game; in these cases, a verifier might sacrifice something in one situation in order to gain over long time. In their paper, Ed and his collaborators shared a game theoretic proof showing that every one-shot sybil proof participation game allows a situation where one verifier can bully all other players into not participating by flooding the system with fake verifiers.

The limits of other approaches show why Ed and his collaborators created Arbitrum, which uses a combination of protocol design, incentives, and a virtual machine infrastructure to carry out scalable, trustworthy smart contracts. Arbitrum starts by assuming an underlying consensus layer, which they call a “verifier.”

The Arbitrum system is built around Managers, who manage a virtual machine that carries out computation and data. Arbitrum provides an “any-trust” guarantee; as long as at least one manager of a VM is honest, the VM will execute correctly according to its code.

Imagine that Bob and Alice are going to play a chess competition. They create code that holds the gold medal, receives alternating moves, verifies the validity of the game, and pays the winner. Bob and Alice put the code onto a VM. Who are the managers in this situation? Alice and Bob can be the managers, and so long as they can hold each other accountable, the contract will work.

How can managers in Arbitrum cooperate to advance the state of a VM? Managers have incentives to agree unanimously about what a VM will do. If they all agree and digitally sign the assertion, the system accepts their assertions, since the system assumes that at least one manager is acting honestly. What if managers dispute the claim? A manager can make an assertion and deposit some funds. Another manager can challenge the assertion and also deposit some funds. If there’s a challenge, the system referees the dispute and takes the deposit of the manager that was lying. When a challenge happens, the asserter divides their assertion in half and the challenger must identify which half of the process was incorrect. Eventually, the dispute is narrowed from a large process into a single instruction. The system can then check the one-instruction claim to find out who’s lying.

By dividing the dispute down to a single instruction, Ed says it’s possible to decide the dispute efficiently in a way that minimizes privacy leaks. He then describes the data structure in Arbitrum that stores the state of a program as a tree of cryptographically-stored information. Conventional virtual machines store code and data in ways that require logarithmic time to verify instructions. First, Arbitrum stores data in fixed sized “tuples” that can be arranged in a tree structure. Second, application code manages the tree rather than the VM emulator. In the typical VM, a single instruction takes O(log n) to execute. In Arbitrum, it takes O(log n) instructions to execute something but each instruction takes constant time. And because Arbitrum narrows down verification to a single instruction, resolving a dispute can take constant time.

The state of a VM is revealed only to the VM’s managers– for example, Alice and Bob would be the only people who need to know what moves were made in the chess game. The only things that appear on the chain are: saltable hashes of the VM state, the number and timing of the steps, and the messages/money sent and received by the VM.

The Arbitrum team has implemented this system with 6,800 lines of Go code, a VM emulator, assembler, and loader. They have an honest manager module that makes and defends assertions. Their proof of concept uses a centralized verifier for simplicity, but you could easily replace this pluggable module that allows multiple verifiers. They also have an Arbitrum standard library.

How well does this scale? Ed describes an example contract, showing that at the high end, Arbitrum can work at roughly a million times the performance of Ethereum. Ed thinks it’s the only system that provides scalability, privacy, and a programmable modules for writing smart contracts.

Questions

After the talk, I asked Ed if collaborations like this are common- ones that bring together game-theoretical mechanism design, cryptography, and algorithm/data structure design. Ed responded that most cryptocurrency work does combine these things. What makes Arbitrum unusual, Ed explained, is the way in which the research team re-designed the VM in a way that makes the protocol more scalable. It’s hard for people to keep all of those things in mind, and Ed says that it’s easy to get things wrong– which is why peer review is so important in cryptocurrency research.

Can Classes on Field Experiments Scale? Lessons from SOC412

Last semester, I taught a Princeton undergrad/grad seminar on the craft, politics, and ethics of behavioral experimentation. The idea was simple: since large-scale human subjects research is now common outside universities, we need to equip students to make sense of that kind of power and think critically about it.

In this post, I share lessons for teaching a class like this and how I’m thinking about next year.

Path diagram from SOC412 lecture on the Social Media Color Experiment

[Read more…]