December 9, 2022

Archives for 2016

Bitcoin is unstable without the block reward

With Miles Carlsten, Harry Kalodner, and Matt Weinberg, I have a new paper titled On the instability of Bitcoin without the block reward, which Harry will present at ACM CCS next week. The paper predicts that miner incentives will start to go haywire as Bitcoin rewards shift from block rewards to transaction fees, based on theoretical results that closely match up with findings from our new Bitcoin mining simulator.

Bitcoin provides two incentives for miners: block rewards and transaction fees. Currently the vast majority of miner revenues come from block rewards, but in the long run they will come primarily from transaction fees as block rewards dwindle. This design decision has been discussed a lot, but in terms of monetary policy and hardly ever in terms of security. There has been an implicit belief that the transition to transaction fees will not affect the security and stability of the block chain, and in particular that it is immaterial whether miners receive (say) 25 bitcoins as a fixed reward or 25 bitcoins in expectation via transaction fees.

We reexamine this assumption in our paper, and our findings make disturbing news for the future security of Bitcoin and many other cryptocurrencies. Our key insight is that with only transaction fees, the variance of the miner reward is very high due to the randomness of the block arrival time, and it becomes attractive to fork a “wealthy” block to “steal” the rewards therein. [1]

keyinsight2


The figure shows a scenario where forking might be more profitable than extending the longest chain. See the paper for a full explanation.

Here’s how things could go wrong. Due to the possibility of profitable forking, the default strategy is no longer best; we lay out a menagerie of interesting and bizarre strategies in the paper. The most worrisome is “undercutting,” where miners capture as little of the available transaction fees as they can get away with, leaving the rest in the pool as an incentive for the next miner to extend their block rather than a competing block.

We also show rigorously that selfish mining gets worse when block rewards are replaced by transaction fees, motivated by the following intuition: if you happen to mine a new block just seconds after the last one was found, you gain nothing by publishing, so you might as well keep it for selfish mining in case you get lucky. The variance in transaction fees enables strategies like this that simply don’t make sense when the block reward is fixed.

If miners switch to these deviant strategies, the blockchain will be much less secure because of the mining power wasted due to constant forking, undercutting, and withholding of found blocks.

We derive most of our results in two separate ways: analytically, i.e., using game theory, and with a new mining simulator that we created. This gives us added confidence in our findings. For example, in one setting, the theory predicts a rather grotesque equilibrium involving on the Lambert W function, with the proof running to several pages. Sure enough, in our simulations of the same setting, the Lambert miner does best. We hope that our analytical techniques as well as our simulator will be useful to other researchers. We have made the simulator code open-source.

What is the impact of our findings? The Bitcoin community will probably need to respond to this problem in the long run, potentially via a fork, to discourage deviant strategies. We aren’t predicting that deviant strategies will arise in the short term, and there is a long runway for mitigation steps to be rolled out. The fact that blocks have filled up due to their 1MB limit decreases the variance of transaction fees between different blocks, and this mitigates the problem somewhat, although it is far from a complete and satisfactory solution. For example, at the time of writing our paper, the previous 1000 blocks included per-block transaction fees ranging from 0.03 BTC to 4.51, with a mean of 0.49 and standard deviation of 0.25 (over half the mean!). So simply maintaining the block-size limit probably won’t resolve the underlying issues.

At a deeper level, our results suggest a fundamental rethinking of the role of block rewards in cryptocurrency design. The prevailing view is that the block reward is a necessary but temporary evil to achieve an initial allocation of coins in the absence of a central authority. The transaction-fee regime is seen as the ideal steady state of the system. But our work shows that incentivizing compliant miner behavior in the transaction fee regime is a significantly more daunting task than in the block reward regime. So perhaps designers of new cryptocurrencies should make the block reward permanent and accept monetary inflation as inevitable. Transaction fees would still exist, but merely as an incentive for miners to include transactions in their blocks.

One final point: there is a science of designing economic incentives so that rational players will behave in a desired way, and it’s called mechanism design. Creators of cryptocurrencies (as well as creators of applications such as the DAO) are essentially doing mechanism design. But mechanism design is hard, and our paper is the latest among many to point out that the mechanisms embedded in cryptocurrencies have flaws. Yet, sadly, the cryptocurrency community is currently disjoint from the mechanism design community. That is why I’m thrilled that mechanism design expert Matt Weinberg, who’s behind all the sophisticated theory in our paper, is joining Princeton’s faculty next semester. Expect more research from us on the mechanism design of cryptocurrencies!

[1] The problems uncover arise not because transaction fees may arrive erratically, but because blocks inevitably arrive unpredictably.  We model transaction fees as arriving at a uniform rate. The rate is non-uniform in practice, which is an additional complication. This is a theme throughout our paper: we show that undesirable behaviors will arise even in simplified, “clean” models. This is bad news both because we think things will probably be worse in practice and because we want cryptocurrency mining games to be analytically tractable. Our work shows that in a transaction-fee regime, predicting behavior will be fiendishly complex.

Update: see also Bryan Ford’s response to this post (and paper).

Open Review leads to better books

My book manuscript, Bit by Bit: Social Research in the Digital Age, is now in Open Review.  That means that while the book manuscript goes through traditional peer review, I also posted it online for a parallel Open Review.  During the Open Review everyone—not just traditional peer reviewers—can read the manuscript and help make it better.

open_review_schematic

Schematic of Open Review.

screenshot

Screenshot of Open Review interface. Click for full size.

 

I think that the Open Review process will lead to better books, higher sales, and increased access to knowledge.  In this blog post, I’d like to describe the feedback that I’ve received during the first month of Open Review and what I’ve learned from the process.

[Read more…]

The Effect of DNS on Tor’s Anonymity

This blog post is joint work with Benjamin Greschbach, Tobias Pulls, Laura M. Roberts, and Nick Feamster.

Counting almost two million daily users and 7,000 relays, the Tor network is the largest anonymity network operating today. The Tor Project is maintaining a privacy-enhanced version of the popular Firefox web browser—Tor Browser—that bounces its network traffic over the Tor network. By using Tor Browser, users can protect their web browsing from government surveillance, stalkers, advertisement-injecting ISPs, and nosy neighbors. Tor works by decoupling a user’s identity (i.e., the IP address, which reveals where in the world you are) from her activity (e.g., visiting Facebook). When using Tor, your ISP can no longer learn what websites you visit, but it does know that you are using Tor. The website you visit (e.g., Facebook) will also know that you are using Tor, but it will not learn your IP address. The EFF maintains a great interactive diagram, illustrating information leakage in several scenarios.

While the use of Tor constitutes a significant privacy gain over off-the-shelf web browsers, it is no panacea, and the Tor Project is upfront about its limitations. These limitations are not news to the research community. It is well understood that low-latency anonymity networks such as Tor cannot protect against so-called global passive adversaries. We define such adversaries as those with the ability to monitor both network traffic that enters and exits the network. Then the adversary can run a correlation attack, meaning that it can match packets that go into the network to packets that leave it, or in other words, it can link a client’s identity (her IP address) to her activity (e.g., visiting Facebook), and thus, break anonymity. By design, Tor cannot protect against these global passive adversaries. However, there is still merit in understanding which organizations are in a position to launch such attacks and how practical these attacks are. Only then can we gain an understanding of how much of a real-world threat these attacks constitute.

The research community has put a lot of effort into modeling correlation attacks and their effect on Tor users. To date, these studies have modeled mostly web traffic, i.e., HTTP over port 80 and HTTPS over port 443, which makes sense as web traffic likely constitutes the bulk of a typical Tor user’s online activity. However, there is more to network traffic than just web traffic. Before a browser can fetch a web page such as Facebook, it has to send out a DNS request to learn that the domain facebook.com is reachable via its IP address 173.252.120.68 (among others). DNS traffic is ubiquitous, both over and outside of Tor, so how does this relate to correlation attacks? DNS is a very different beast than HTTP, which is what most research has studied. HTTP traffic travels from a user’s exit relay directly to one or more web servers. DNS traffic, however, can traverse very different paths, depending on how exit relays (the relays in the Tor network that provide the link between Tor and the Internet) are configured. In particular, we find that DNS traffic traverses many networks that are entirely different than the networks that subsequent web traffic traverses. This means that past research likely underestimated the threat of correlation attacks because there are entities on the Internet such as ISPs, autonomous systems, or Internet exchange points that can monitor some DNS traffic but not web traffic coming out of the Tor network and potentially use the DNS traffic to deanonymize Tor users.

Overview of concept

Past traffic correlation studies have focused on linking the TCP stream entering the Tor network to the one(s) exiting the network. We show that an adversary can also link the associated DNS traffic, which can be exposed to many more autonomous systems than the TCP stream.

After many months of work, we have published a research paper on how DNS affects Tor’s anonymity. In our work, we show how an adversary can combine monitored DNS requests with well-understood website fingerprinting attacks to create a new type of DNS-enhanced correlation attack, or DNS-enhanced website fingerprinting attack, depending on how you look at it. We know of several organizations that are in a position to use these attacks, notably Google. Because it operates a popular open DNS resolver, the company can already observe many DNS requests. Additionally, Google can monitor some network traffic that is entering the Tor network: for example, via Google Fiber, via guard relays that are occasionally run in Google’s cloud, and formerly via meek app engine, which is now defunctOur results show that by monitoring DNS traffic, an attacker can enhance a website fingerprinting attack to be highly reliable. When our new, combined attack identifies a website, it is highly likely to be correct for a large number of websites on the Internet. Furthermore, we show that our new type of attack works well on websites that are infrequently visited over the entire Tor network globally. Presumably, this includes censored websites in different regions of the world and websites for whistleblowers and activists, who are in many ways Tor users in most need of protection. Beyond further motivating the deployment of website fingerprinting defenses in Tor, our attack has implications for the design of such defenses: they should probably be stronger than previously thought.

Our study also provides new insight into the current state of exit relays in the Tor network: We find that a significant fraction of exit relays send DNS requests to Google’s public resolvers. Google sees about onethird of DNS requests that exit from the Tor network—an alarmingly high fraction for a single company, particularly because Tor’s very design avoids centralized points of control and observation. Although Tor is reasonably decentralized, our work shows that this does not hold for the wider ecosystem that Tor exists in. We also simulate the Internet-scale implications of our work using the Tor Path Simulator (TorPS). This allows us to understand what-if scenarios such as the implications for Tor users if all exit relays were to run their own local resolvers. Our results show that while running local DNS resolvers on exit relays has its benefits, it also means that DNS requests are very exposed to network-level adversaries.

So what does our work mean for Tor users? We believe that there is no immediate cause for concern. Adversaries that can already monitor large fractions of the Internet—for many people, the biggest threat—will not do any better with our attack. Instead, we investigate how “semi-global” adversaries can get the most out of the data they have. Finally, the Tor Project is already working on techniques to make website fingerprinting attacks harder.

Our project page has more information about our study. Together with the research paper that is now under review, we also publish our code, data, and detailed instructions to replicate our work.

My testimony before the House Subcommittee on IT

I was invited to testify yesterday before the U.S. House of Representatives Subcommittee on Information Technology, at a hearing entitled “Cybersecurity: Ensuring the Integrity of the Ballot Box.”  My written testimony is available here.  My 5-minute opening statement went as follows:

My name is Andrew Appel.  I am Professor of Computer Science at Princeton University.   In this testimony I do not represent my employer. I’m here to give my own professional opinions as a scientist, but also as an American citizen who cares deeply about protecting our democracy.

My research is in software verification, computer security, technology policy, and election machinery.  As I will explain, I strongly recommend that, at a minimum, the Congress seek to ensure the elimination of Direct-Recording Electronic voting machines (sometimes called “touchscreen” machines), immediately after this November’s election; and that it require that all elections be subject to sensible auditing after every election to ensure that systems are functioning properly and to prove to the American people that their votes are counted as cast.

There are cybersecurity issues in all parts of our election system:  before the election, voter-registration databases; during the election, voting machines; after the election, vote-tabulation / canvassing / precinct-aggregation computers.  In my opening statement I’ll focus on voting machines.  The other topics are addressed in a recent report I have co-authored entitled “Ten Things Election Officials Can Do to Help Secure and Inspire Confidence in This Fall’s Elections.”

In the U.S. we use two kinds of voting machines: optical scanners that count paper ballots, and “touchscreen” voting machines, also called “Direct-Recording Electronic.”  Each voting machine is a computer, running a computer program.  Whether that computer counts the votes accurately, or makes mistakes, or cheats by shifting votes from one candidate to another, depends on what software is installed in the computer.  We all use computers, and we’ve all had occasion to install new software.  Sometimes it’s an app we purchase and install on purpose, sometimes it’s a software upgrade sent by the company that made our operating system.  Installing new software in a voting machine is not really much different from installing new software in any other kind of computer.

Installing new software is how you hack a voting machine to cheat. In 2009, in the courtroom of the Superior Court of New Jersey,  I demonstrated how to hack a voting machine.  I wrote a vote-stealing computer program that shifts votes from one candidate to another.   Installing that vote-stealing program in a voting machine takes 7 minutes, per machine, with a screwdriver.  I did this in a secure facility and I’m confident my program has not leaked out to affect real elections, but really the software I built was not rocket science — any computer programmer could write the same code.  Once it’s installed, it could steal elections without detection for years to come.

Voting machines are often delivered to polling places several days before the election—to elementary schools, churches, firehouses.  In these locations anyone could gain access to a voting machine for 10 minutes.  Between elections the machines are routinely opened up for maintenance by county employees or private contractors.  Let’s assume they have the utmost integrity, but still, in the U.S. we try to run our elections so that we can trust the election results without relying on any one individual.

Other computer scientists have demonstrated similar hacks on many models of machine. This is not just one glitch in one manufacturer’s machine, it’s the very nature of computers.

So how can we trust our elections when it’s so easy to make the computers cheat?  Forty states already know the answer:  vote on optical-scan paper ballots.  (My written testimony clarifies this statement.)  The voter fills in the bubble next to the name of their preferred candidate, then takes this paper ballot to the scanner—right there in the precinct—and feeds it in.  That opscan voting machine has a computer in it, and we can’t 100% prevent the computer from being hacked, but that very paper ballot marked by the voter drops into a sealed ballot box under the opscan machine.  Those ballots can be recounted by hand, in a way we can trust.

Unfortunately, there are still about 10 states that primarily use paperless touchscreen voting computers.  There’s no paper ballot to recount.  After the voter touches the screen, we have to rely on the computer—that is, we have to rely on whatever program is installed in the computer that day—to print out the true totals when the polls close.

So what must we do?  In the near term, we must not connect the voting machines to the Internet.  The same goes for those computers used to prepare the electronic ballot definition files before each election, that are used to program the voting machines—that is, we must not connect the voting machines even indirectly to the Internet.  Many able and competent election administrators already follow this “best practice.”  I hope that all 9000 counties and states that run elections follow this practice, and other security best practices, but it’s hard to tell whether they all do.

These and other best practices can help protect against hacking of voting machines by people in other countries through the Internet.  But they can’t protect us from mistakes, software bugs, miscalibration, insider hacking, or against local criminals with access to the machines before or after elections.  So what we must do as soon as possible after November is to adopt nationwide what 40 states have already done: paper ballots, marked by the voter, countable by computer but recountable by hand.

In 2000 we all saw what a disastrously unreliable technology those punch-card ballots were.  So in 2002 the Congress outlawed punch-card ballots, and that was very appropriate.  I strongly recommend that the Congress seek to ensure the elimination of paperless “touchscreen” voting machines, immediately after this November’s election.

Are you really anonymous online?

As you browse the internet, online advertisers track nearly every site you visit, amassing a trove of information on your habits and preferences. When you visit a news site, they might see you’re a fan of basketball, opera and mystery novels, and accordingly select ads tailored to your tastes. Advertisers use this information to create highly personalized experiences, but they typically don’t know exactly who you are. They observe only your digital trail, not your identity itself, and so you might feel that you’ve retained a degree of anonymity.

In new work with Ansh Shukla, Sharad Goel and Arvind Narayanan, we show that these anonymous web browsing records can in fact often be tied back to real-world identities. (Check out our demo, and see if we can figure out who you are.)

At a high level, our approach is based on a simple observation. Each person has a highly distinctive social network, comprised of family and friends from school, work, and various stages throughout one’s life. As a consequence, the set of links in your Facebook and Twitter feeds is likewise highly distinctive, and clicking on these links leaves a tell-tale mark in your browsing history.

Given only the set of web pages an individual has visited, we determine which social media feeds are most similar to it, yielding a list of candidate users who likely generated that web browsing history. In this manner, we can tie a person’s real-world identity to the near complete set of links they have visited, including links that were never posted on any social media site. This method requires only that one click on the links appearing in their social media feeds, not that they post any content.

Carrying out this strategy involves two key challenges, one theoretical and one engineering. The theoretical problem is quantifying how similar a specific social media feed is to a given web browsing history. One simple similarity measure is the fraction of links in the browsing history that also appear in the feed. This metric works reasonably well in practice, but it overstates similarity for large feeds, since those simply contain more links. We instead take an alternative approach. We posit a stylized, probabilistic model of web browsing behavior, and then compute the likelihood a user with that social media feed generated the observed browsing history. It turns out that this method is approximately equivalent to scaling the fraction of history links that appear in the feed by the log of the feed size.

The engineering challenge is identifying the most similar feeds in real time. Here we turn to Twitter, since Twitter feeds (in contrast to Facebook) are largely public. However, even though the feeds are public, we cannot simply create a local copy of Twitter against which we can run our queries. Instead we apply a series of heuristics to dramatically reduce the search space. We then combine caching techniques with on-demand network crawls to construct the feeds of the most promising candidates. On this reduced candidate set, we apply our similarity measure to produce the final results. Given a browsing history, we can typically carry out this entire process in under 60 seconds.

Our initial tests indicate that for people who regularly browse Twitter, we can deduce their identity from their web browsing history about 80% of the time. Try out our web application, and let us know if it works on you!