Ed Felten's blog

It’s Time for India to Face its E-Voting Problem

The unjustified arrest of Indian e-voting researcher Hari Prasad, while an ordeal for Prasad and his family, and an embarrassment to the Indian authorities, has at least helped to focus attention on India’s risky electronic voting machines (EVMs).

Sadly, the Election Commission of India, which oversees the country’s elections, is still sticking to its position that the machines are “perfect” and “fully tamperproof”, despite evidence to the contrary including convincing peer-reviewed research by Prasad and colleagues, not to mention the common-sense fact that no affordable electronic device can ever hope to be perfect or tamperproof. The Election Commission can no longer plausibly deny that EVM vulnerabilities exist. The time has come for India to have an honest, public conversation about how it votes.

The starting point for this discussion must be to recognize the vulnerabilities of EVMs. Like paper ballots, the ballots stored in an EVM are subject to tampering during and after the election, unless they are monitored carefully. But EVMs, unlike paper ballots, are also subject to tampering before the election, perhaps months or years in advance. Indeed, for many EVMs these pre-election vulnerabilities are the most serious problem.

So which voting system should India use? That's a question for the nation to decide based on its own circumstances, but it appears there is no simple answer. The EVMs have problems, and old-fashioned paper ballots have their own problems. Despite noisy claims to the contrary from various sides, showing that one is imperfect does not prove that the other must be used. Most importantly, the debate must recognize that there are more than two approaches -- for example, most U.S. jurisdictions are now moving to systems that combine paper and electronics, such as precinct-count optical scan systems in which the voter marks a paper ballot that is immediately read by an electronic scanner. Whether a similar system would work well for India remains an open question, but there are many options, including new approaches that haven't been invented yet, and India will need to do some serious analysis to figure out what is best.

To find the best voting system for India, the Election Commission will need all of the help it can get from India's academic and technical communities. It will especially need help from people like Hari Prasad. Getting Prasad out of jail and back to work in his lab would not only serve justice -- which should be reason enough to free him -- but would also serve the voters of India, who deserve a better voting system than they have.

Tagged:  

My Experiment with "Digital Drugs"

The latest scare meme is "digital drugs" or "i-dosing", in which kids listen to audio tracks that supposedly induce altered mental states. Concerned adults fear that these "digital drugs" may be a gateway to harder (i.e., actual) drugs. Rumors are circulating among some kids: "I heard it was like some weird demons and stuff through an iPod". In a way, it's a perfect storm of scare memes, involving (1) "drugs", (2) the Internet, and (3) kids listening to freaky music.

When I heard about these "digital drugs", I naturally had to try them, in the interest of science.

(All joking aside, I only did this because I knew it was safe and legal. I don't like to mess with my brain. I rely on my brain to make my living. Without my brain, I'd be ... a zombie, I guess.)

I downloaded a "digital drug" track, donned good headphones, lay down on my bed, closed my eyes, blanked my mind, and pressed "play". What I heard was a kind of droning noise, accompanied by a soft background hiss. It was not unlike the sound of a turboprop airplane during post-takeoff ascent, with two droning engines and the soft hiss of a ventilation fan. This went on for about fifteen minutes, with the drone changing pitch every now and then. That was it.

Did this alter my consciousness? Not really. If anything, fifteen minutes of partial sensory deprivation (eyes closed, hearing nothing but droning and hissing) might have put me in a mild meditative state, but frankly I could have reached that state more easily without the infernal droning, just by lying still and blanking my mind.

Afterward I did some web surfing to try to figure out why people think these sounds might affect the brain. To the extent there is any science at all behind "digital drugs", it involves playing sounds of slightly different frequencies into your two ears, thereby supposedly setting up a low-frequency oscillation in the auditory centers of your brain, which will supposedly interact with your brain waves that operate at a very similar frequency. This theory could be hooey for all I know, but it sounds kind of science-ish so somebody might believe it. I can tell you for sure that it didn't work on me.

So, kids: don't do digital drugs. They're a waste of time. And if you don't turn down the volume, you might actually damage your hearing.

Tagged:  

Identifying Trends that Drive Technology

I’m trying to compile a list of major technological and societal trends that influence U.S. computing research. Here’s my initial list. Please post your own suggestions!

  • Ubiquitous connectivity, and thus true mobility
  • Massive computational capability available to everyone, through the cloud
  • Exponentially increasing data volumes – from ubiquitous sensors, from higher-volume sensors (digital imagers everywhere!), and from the creation of all information in digital form – has led to a torrent of data which must be transferred, stored, and mined: “data to knowledge to action”
  • Social computing – the way people interact has been transformed; the data we have from and about people is transforming
  • All transactions (from purchasing to banking to voting to health) are online, creating the need for dramatic improvements in privacy and security
  • Cybercrime
  • The end of single-processor performance increases, and thus the need for parallelism to increase performance in operating systems and productivity applications, not just high-end applications; also power issues
  • Asymmetric threats, need for surveillance, reconnaissance
  • Globalization – of innovation, of consumption, of workforce
  • Pressing national and global challenges: climate change, education, energy / sustainability, health care (these replace the cold war)

What’s on your list? Please post below!

[cross-posted from CCC Blog]

Tagged:  

The Stock-market Flash Crash: Attack, Bug, or Gamesmanship?

Andrew wrote last week about the stock market's May 6 "flash crash", and whether it might have been caused by a denial-of-service attack. He points to a detailed analysis by nanex.com that unpacks what happened and postulates a DoS attack as a likely cause. The nanex analysis is interesting and suggestive, but I see the situation as more complicated and even more interesting.

Before diving in, two important caveats: First, I don't have access to raw data about what happened in the market that day, so I will accept the facts as posited by nanex. If nanex's description is wrong or incomplete, my analysis won't be right. Second, I am not a lawyer and am not making any claims about what is lawful or unlawful. With that out of the way ...

Here's a short version of what happened, based on the nanex data:
(1) Some market participants sent a large number of quote requests to the New York Stock Exchange (NYSE) computers.
(2) The NYSE normally puts outgoing price quotes into a queue before they are sent out. Because of the high rate of requests, this queue backed up, so that some quotes took a (relatively) long time to be sent out.
(3) A quote lists a price and a time. The NYSE determined the price at the time the quote was put into the queue, and timestamped each quote at the time it left the queue. When the queues backed up, these quotes would be "stale", in the sense that they had an old, no-longer-accurate price --- but their timestamps made them look like up-to-date quotes.
(4) These anomalous quotes confused other market participants, who falsely concluded that a stock's price on the NYSE differed from its price on other exchanges. This misinformation destabilized the market.
(5) The faster a stock's price changed, the more out-of-kilter the NYSE quotes would be. So instability bred more instability, and the market dropped precipitously.

The first thing to notice here is that (assuming nanex has the facts right) there appears to have been a bug in the NYSE's system. If a quote goes out with price P and time T, recipients will assume that the price was P at time T. But the NYSE system apparently generated the price at one time (on entry to the queue) and the timestamp at another time (on exit from the queue). This is wrong: the timestamp should have been generated at the same time as the price.

But notice that this kind of bug won't cause much trouble under normal conditions, when the queue is short so that the timestamp discrepancy is small. The problem might not have be noticed in normal operation, and might not be caught in testing, unless the testing procedure takes pains to create a long queue and to check for the consistency of timestamps with prices. This looks like the kind of bug that developers dread, where the problem only manifests under unusual conditions, when the system is under a certain kind of strain. This kind of bug is an accident waiting to happen.

To see how the accident might develop and be exploited, let's consider the behavior of three imaginary people, Alice, Bob, and Claire.

Alice knows the NYSE has this timestamping bug. She knows that if the bug triggers and the NYSE starts issuing dodgy quotes, she can make a lot of money by exploiting the fact that she is the only market participant who has an accurate view of reality. Exploiting the others' ignorance of real market conditions---and making a ton of money---is just a matter of technique.

Alice acts to exploit her knowledge, deliberately triggering the NYSE bug by flooding the NYSE with quote requests. The nanex analysis implies that this is probably what happened on May 6. Alice's behavior is ethically questionable, if not illegal. But, the nanex analysis notwithstanding, deliberate triggering of the bug is not the only possibility.

Bob also knows about the bug, but he doesn't go as far as Alice. Bob programs his systems to exploit the error condition if it happens, but he does nothing to cause the condition. He just waits. If the error condition happens naturally, he will exploit it, but he'll take care not to cause it himself. This is ethically superior to a deliberate attack (and might be more defensible legally).

(Exercise for readers: Is it ethical for Bob to deliberately refrain from reporting the bug?)

Claire doesn't know that the NYSE has a bug, but she is a very careful programmer, so she writes code that watches other systems for anomalous behavior and ignores systems that seem to be misbehaving. When the flash crash occurs, Claire's code detects the dodgy NYSE quotes and ignores them. Claire makes a lot of money, because she is one of the few market participants who are not fooled by the bad quotes. Claire is ethically blameless --- her virtuous programming was rewarded. But Claire's trading behavior might look a lot like Alice's and Bob's, so an investigator might suspect Claire of unethical or illegal behavior.

Notice that even if there are no Alices or Bobs, but only virtuous Claires, the market might still have a flash crash and people might make a lot of money from it, even in the absence of a denial-of-service attack or indeed of any unethical behavior. The flood of quote requests that trigged the queue backup might have been caused by another bug somewhere, or by an unforeseen interaction between different systems. Only careful investigation will be able to untangle the causes and figure out who is to blame.

If the nanex analysis is at all correct, it has sobering implications. Financial markets are complex, and when we inject complex, buggy software into them, problems are likely to result. The May flash crash won't be the last time a financial market gyrates due to software problems.

Tagged:  

How Not to Fix Soccer

With the World Cup comes the quadrennial ritual in which Americans try to redesign and improve the rules of soccer. As usual, it’s a bad idea to redesign something you don’t understand---and indeed, most of the proposed changes would be harmful. What has surprised me, though, is how rarely anyone explains the rationale behind soccer’s rules. Once you understand the rationale, the rules will make a lot more sense.

So here’s the logic underlying soccer’s rules: the game is supposed to scale down, so that an ordinary youth or recreation-league game can be played under the exact same rules used by the pros. This means that the rules must be designed so that the game can be run by a single referee, without any special equipment such as a scoreboard.

Most of the popular American team sports don’t scale down in this way. American football, basketball, and hockey --- the most common inspirations for “reformed” soccer rules --- all require multiple referees and special equipment. To scale these sports down, you have to change the rules. For example, playground basketball has no shot clock, no counting of fouls, and nonstandard rules for awarding free throws and handling restarts---it’s fun but it’s not the same game the Lakers play. Baseball is the one popular American spectator sport that does scale down.

The scaling principle accounts for soccer’s seemingly odd timekeeping. The clock isn’t stopped and started, because we can’t assume a separate timekeeping official and we don’t want to burden the referee’s attention with a lot of clock management. The time is not displayed to the players, because we can’t assume the availability of a scoreboard. And because the players don’t know the exact remaining time, the referee gives the players some leeway to finish an attack even if the nominal finishing time has been reached. Most of the scalable sports lack a clock --- think of baseball and volleyball --- but soccer manages to reconcile a clock with scalability. Americans often want to “fix” this by switching to a scheme that requires a scoreboard and timekeeper.

The scaling principle also explains the system of yellow and red cards. A hockey-style penalty box system requires special timing and (realistically) a special referee to manage the penalty box and timer. Basketball-style foul handling allows penalties to mount up as more fouls are committed by the same player or team, which is good, but it requires elaborate bookkeeping to keep track of fouls committed by each player and team fouls per half. We don’t want to make the soccer referee keep such detailed records, so we simply ask him to record yellow and red cards, which are rare. He uses his judgment to decide when repeated fouls merit a yellow card. This may seem arbitrary in particular cases but it does seem fair on average. (There’s a longer essay that could be written applying the theory of efficient liability regimes to the design of sports penalties.)

It’s no accident, I think, that scalable sports such as soccer and baseball/softball are played by many Americans who typically watch non-scalable sports. There’s something satisfying about playing the same game that the pros play. So, my fellow Americans, if you’re going to fix soccer, please keep the game simple enough that the rest of us can still play it.

Tagged:  

NJ Voting Machines Left Unattended, Despite Court Opinion

It's Election Day in New Jersey. Longtime readers know that in advance of elections I visit polling places in Princeton, looking for voting machines left unattended, where they are vulnerable to tampering. In the past I have always found unattended machines in multiple polling places.

I hoped this time would be different, given that Judge Feinberg, in her ruling on the New Jersey voting machine case, urged the state not to leave voting machines unattended in public.

Despite the judge's ruling, I found voting machines unattended in three of the four Princeton polling places I visited on Sunday and Monday. Here are my photos from three polling places.

This morning I cast my ballot on one of these machines.

Privacy Theater

I have a piece in today's NY Times "Room for Debate" feature, on whether the government should regulate Facebook. In writing the piece, I was looking for a pithy way to express the problems with today's notice-and-consent model for online privacy. After some thought, I settled on "privacy theater".

Bruce Schneier has popularized the term "security theater," denoting security measures that look impressive but don't actually protect us---they create the appearance of security but not the reality. When a security guard asks to see your ID but doesn't do more than glance at it, that's security theater. Much of what happens at airport checkpoints is security theater too.

Privacy theater is the same concept, applied to privacy. Facebook's privacy policy runs to almost 6000 words of dense legalese. We are all supposed to have read it and agreed to accept its terms. But that's just theater. Hardly any of us have actually read privacy policies, and even fewer consider carefully their provisions. As I wrote in the Times piece, we pretend to have read sites' privacy policies, and the sites pretend that we have understood and consented to all of their terms. It's privacy theater.

Worse yet. privacy policies are subject to change. When sites change their policies, we get another round of privacy theater, in which sites pretend to notify us of the changes, and we pretend to consider them before continuing our use of the site.

And yet, if we're going to replace the notice-and-consent model, we need something else to put in its place. At this point, It's hard to see what that might be. It might help to set up default rules, on the theory that a policy that states how it differs from the default might be shorter and simpler than a stand-alone policy, but that approach will only go so far.

In the end, we may be stuck with privacy theater, just as we're often stuck with security theater. If we can't provide the reality of privacy or security, we can settle for theater, which at least makes us feel a bit better about our vulnerability.

Tagged:  

Developing Texts Like We Develop Software

Recently I was asked to speak at a conference for university librarians, about how the future of academic publication looks to me as a computer scientist. It's an interesting question. What do computer scientists have to teach humanists about how to write? Surely not our elegant prose style.

There is something distinctive about how computer scientists write: we tend to use software development tools to "develop" our texts. This seems natural to us. A software program, after all, is just a big text, and the software developers are the authors of the text. If a tool is good for developing the large, complex, finicky text that is a program, why not use it for more traditional texts as well?

Like software developers, computer scientist writers tend to use version control systems. These are software tools that track and manage different versions of a text. What makes them valuable is not just the ability to "roll back" to old versions -- you can get that (albeit awkwardly) by keeping multiple copies of a file. The big win with version control tools is the level of control they give you. Who wrote this line? What did Joe write last Tuesday? Notify me every time section 4 changes. Undo the changes Fred made last Wednesday, but leave all subsequent changes in place. And so on. Version control systems are a much more powerful relative of the "track changes" and "review" features of standard word processors.

Another big advantage of advanced version control is that it enables parallel development, a style of operation in which multiple people can work on the text, separately, at the same time. Of course, it's easy to work in parallel. What's hard is to merge the parallel changes into a coherent final product --- which is a huge pain in the neck with traditional editing tools, but is easy and natural with a good version control system. Parallel development lets you turn out a high-quality product faster --- it's a necessity when you have hundred or thousands of programmers working on the same product --- and it vastly reduces the amount of human effort spent on coordination. You still need coordination, of course, but you can focus it where it matters, on the conceptual clarity of the document, without getting distracted by version-wrangling.

Interestingly, version control and parallel development turn out to be useful even for single-author works. Version control lets you undo your mistakes, and to reconstruct the history of a problematic section. Parallel development is useful if you want to try an experiment --- what happens if I swap sections 3 and 4? --- and try out this new approach for a while yet retain the ability to accept or reject the experiment as a whole. These tools are so useful that experienced computer scientists tend to use them to write almost anything longer than a blog post.

While version control and parallel development have become standard in computer science writing, there are other software development practices that are only starting to cross the line into CS writing: issue tracking and the release early and often strategy.

Issue tracking systems are used to keep track of problems, bugs, and other issues that need to be addressed in a text. As with version control, you can do this manually, or rely on a simple to-do list, but specialized tools are more powerful and give you better control and better visibility into the past. As with software, issues can range from small problems (our terminology for X is confusing) to larger challenges (it would be nice if our dataset were bigger).

"Release early and often" is a strategy for rapidly improving a text by making it available to users (or readers), getting feedback, and rapidly turning out a new version that addresses the feedback. Users' critiques become issues in the issue tracking system; authors modify the text to address the most urgent issues; and a new version is released as soon as the text stabilizes. The result is rapid improvement, aligned with the true desires of users. This approach requires the right attitude from users, who need to be willing to tolerate problems, in exchange for a promise that their critiques will be addressed promptly.

What does all of this mean for writers who are not computer scientists? I won't be so bold as to say that the future of writing will be just exactly like software development. But I do think that the tools and techniques of software development, which are already widely used by computer scientist writers, will diffuse into more common usage. It will be hard to retrofit them into today's large, well-established editing software, but as writing tools move into the cloud, I wouldn't be surprised to see them take on more of the attributes of today's software development tools.

One consequence of using these tools is that you end up with a fairly complete record of how the text developed over time, and why. Imagine having a record like that for the great works of the past. We could know what the writer did every hour, every day while writing. We could know which issues and problems the author perceived in earlier versions of the text, and how these were addressed. We could know which issues the author saw as still unfixed in the final published text. This kind of visibility will be available into our future writing -- assuming we produce works that are worthy of study.

Tagged:  

India's Electronic Voting Machines Have Security Problems

A team led by Hari Prasad, Alex Halderman, and Rop Gonggrijp released today a technical paper detailing serious security problems with the electronic voting machines (EVMs) used in India.

The independent Electoral Commission of India, which is generally well respected, has dealt poorly with previous questions about EVM security. The chair of the Electoral Commission has called the machines "infallible" and "perfect" and has rejected any suggestion that security improvements are even possible. I hope the new study will cause the EC to take a more realistic approach to EVM security.

The researchers got their hands on a real Indian EVM which they were able to examine and analyze. They were unable to extract the software running in the machine (because that would have required rendering the machine unusable for elections, which they had agreed not to do) so their analysis focused on the hardware. They were able to identify several attacks that manipulated the hardware, either by replacing components or by clamping something on to a chip on the motherboard to modify votes. They implemented demonstration attacks, actually building proof-of-concept substitute hardware and vote-manipulation devices.

Perhaps the most interesting aspect of India's EVMs is how simple they are. Simplicity is a virtue in security as in engineering generally, and researchers (including me) who have studied US voting machines have advocated simplifying their design. India's EVMs show that while simplicity is good, it's not enough. Unless there is some way to audit or verify the votes, even a simple system is subject to manipulation.

If you're interested in the details, please read the team's paper.

The ball is now in the Election Commission's court. Let's hope that they take steps to address the EVM problems, to give the citizens of the world's largest democracy the transparent and accurate elections they deserve.

Tagged:  

Needle-in-a-Haystack Problems, and P vs. NP

Last week I wrote about needle-in-a-haystack problems, in which it's hard to find the solution but if somebody tells you the solution it's easy to verify. A commenter asked whether such problems are related to the P vs. NP problem, which is the most important unsolved problem in theoretical computer science. It turns out that they are related, and that needle-in-a-haystack problems are a nice framework for explaining the P vs. NP problem, which few non-experts seem to understand.

Stated simply, the P vs. NP problem asks whether needle-in-a-haystack computational problems can exist. To understand what this means, we need to take a short detour into computer science theoryland. It's perfectly safe, but stay close to the group so you don't wander off into a finite field.... Okay, let's meet P and NP.

P is the set of problems that can be solved efficiently. The precise theoretical definition of "efficient" is a bit subtle: we define the "size" of a problem to be the number of characters needed to write down the problem; and we say a solution method is efficient if, when the problem is large, the method can finish in a length of time that is less than the problem size raised to some power.

For example, suppose we are given two N-digit numbers, A and B, and a 2N-digit number C, and we're asked whether A times B equals C. It requires 4N characters to write down the problem, one character for each digit in each of the numbers. We can solve the problem by multiplying A and B, using the multiplication method we learned in school, and then comparing the result to C. The multiplication will take us roughly N-squared steps, and the comparison is faster than that. So the solution time will grow roughly as N-squared, and if N is large this is less than the third power of the problem size (i.e., less than 4N raised to the third power). Therefore multiplication is in P, which matches our intuition that we know how to multiply efficiently.

Here's another example, the "traveling salesperson problem" (TSP): given a list of cities, a table showing the airfare between each pair of cities, and a total travel budget, is there an itinerary that visits every city on the list and returns to where it started, without exceeding the total travel budget? One way to solve this problem is to try out all of the possible itineraries, and see if there's one that is cheap enough. That works, but it is not efficient, because its running time grows faster than any fixed power of the problem size. No efficient algorithm for the TSP is known. There might be one, but if there is one we haven't discovered it yet. So we don't know if the TSP is in P.

You might have guessed by now that P stands for "polynomial", as in "solvable in polynomial time". That is indeed the origin of the name P. And you might go on to guess that NP stands for "not polynomial" -- but that would be wrong. If you must know, NP stands for "nondeterministic polynomial". I won't bother explaining what "nondeterministic" means here, because that has confused generations of students. Instead, let's use an alternative, but equivalent, definition of NP.

NP is the set of problems having yes/no answers, for which, whenever the correct answer is "yes", there is some "hint" that allows us to verify the "yes" answer efficiently. Think about the TSP: if somebody tells you a cheap itinerary, you can verify efficiently that that itinerary visits all of the cities, ends where it started, and costs less than the travel budget. So the TSP is in NP.

Multiplication is in NP as well. Given any hint ("booga booga", say) we can verify that A times B equals C, by simply ignoring the hint, and then multiplying and comparing as above. By a similar argument, any problem in P must also be in NP.

You can see now how this connects with needles and haystacks. If a problem is in NP but not in P, it's like a needle-in-a-haystack problem: we can verify the answer efficiently if we're given a hint, but without a hint we can't find the answer efficiently. So asking whether needle-in-a-haystack problems exist is exactly the same as asking whether P is different from NP.

Are P and NP different? We don't know. Consider the Traveling Salesperson Problem. We know it's in NP, because we can verify the answer efficiently given a hint, but we don't know if it's in P. We don't know if the TSP can be solved efficiently -- we haven't found an efficient solution yet but we can't rule out the possibility that somebody will discover one tomorrow. A lot of people have tried really hard for a long time to find an efficient solution, so most experts tend to assume that no efficient solution is possible -- but the experts have been wrong before.

If, on the other hand, P and NP turn out to be equal, the consequences would be huge. For example, much of cryptography would collapse. Decrypting a message is supposed to be a needle-in-a-haystack problem---easy if you have a hint (i.e., the secret decryption key) but hard otherwise.

It's annoying, to say the least, that such a basic question about information remains unanswered. For decades theorists have been trying to scale the P vs. NP mountain from various directions,without success. The rest of us have gotten accustomed to assuming that needle-in-a-haystack problems exist, and hoping for the best.

Syndicate content