November 24, 2024

Manipulating Reputation Systems

BoingBoing points to a nice pair of articles by Annalee Newitz on how people manipulate online reputation systems like eBay’s user ratings, Digg, and so on.

There’s a myth floating around that such systems distill an uncannily accurate folk judgment from the votes submitted by millions of ordinary citizens. The wisdom of crowds, and all that. In fact, reputation systems are fraught with problems, and the most important systems survive because companies expend great effort to supplement the algorithms by investigating abuse and trying to compensate for it. eBay, for example, reportedly works very hard to fight abuse of its reputation system.

Why do people put more faith in reputation systems than the systems really deserve? One reason is the compelling but not entirely accurate analogy to the power of personal reputations in small town gossip networks. If a small-town merchant is accused of cheating a customer, everyone in town will find out quickly and – here’s where the analogy goes off the rails – individual townspeople will make nuanced judgments based on the details of the story, the character of the participants, and their own personal experiences. The reason this works is that the merchant, the customer, and the person evaluating the story are embedded in a complex, densely interconnected network.

When the network of participants gets much bigger and the interconnections much sparser, there is no guarantee that the same system will still work. Even if it does work, a large-scale system might succeed for different reasons than the small-town system. What we need is some kind of theory: some kind of explanation for why a reputation system can succeed. Our theory, whatever it is, will have to account for the desires and incentives of participants, the effect of relevant social norms, and so on.

The incentive problem is especially challenging for recommendation services like Digg. Digg assumes that users will cast votes for the sites they like. If I vote for sites that I really do like, this will mostly benefit strangers (by helping them find something cool to read). But if I sell my votes or cast them for sites run by my friends and me, I will benefit more directly. In short, my incentive is to cheat. These sorts of problems seem likely to get worse as a service grows, because the stakes will grow and the sense of community may weaken.

It seems to me that reputation systems are a fruitful area for technical, economic and social research. I know there is research going on already – and readers will probably chastise me in the comments for not citing it all – but we’re still far from understanding online reputation.

Sarasota: Could a Bug Have Lost Votes?

At this point, we still don’t know what caused the high undervote rate in Sarasota’s Congressional election. [Background: 1, 2.] There are two theories. The State-commissioned study released last week argues that for the theory that a badly designed ballot caused many voters to not see that race and therefore not cast a vote.

Today I want to make the case for the other theory: that a malfunction or bug in the voting machines caused votes to be not recorded. The case sits on four pillars: (1) The postulated behavior is consistent with a common type of computer bug. (2) Similar bugs have been found in voting machines before. (3) The state-commissioned study would have been unlikely to find such a bug. (4) Studies of voting data show patterns that point to the bug theory.

(1) The postulated behavior is consistent with a common type of computer bug.

Programmers know the kind of bug I’m talking about: an error in memory management, or a buffer overrun, or a race condition, which causes subtle corruption in a program’s data structures. Such bugs are maddeningly hard to find, because the problem isn’t evident immediately but the corrupted data causes the program to go wrong in subtle ways later. These bugs often seem to be intermittent or “random”, striking sometimes but lying dormant at other times, and seeming to strike more or less frequently depending on the time of day or other seemingly irrelevant factors. Every experienced programmer tells horror stories about such bugs.

Such a bug is consistent with the patterns we saw in the election. Undervotes didn’t happen to every voter, but they did happen in every precinct, though with different frequency in different places.

(2) Similar bugs have been found in voting machines before.

We know of at least two examples of similar bugs in voting machines that were used in real elections. After problems in Maryland voting machines caused intermittent “freezing” behavior, the vendor recalled the motherboards of 4700 voting machines to remedy a hardware design error.

Another example, this time caused by a software bug, was described by David Jefferson:

In the volume testing of 96 Diebold TSx machines … in the summer of 2005, we had an enormously high crash rate: over 20% of the machines crashed during the course of one election day’s worth of votes. These crashes always occurred either at the end of one voting transaction when the voter touched the CAST button, or right at the beginning of the next voter’s session when the voter SmartCard was inserted.

It turned out that, after a huge effort on Diebold’s part, a [Graphical User Interface] bug was discovered. If a voter touched the CAST button a sloppily, and dragged his/her finger from the button across a line into another nearby window (something that apparently happened with only one of every 400 or 500 voters) an exception would be signaled. But the exception was not handled properly, leading to stack corruption or heap corruption (it was never clear to us which), which apparently invariably lead to the crash. Whether it caused other problems also, such as vote corruption, or audit log corruption, was never determined, at least to my knowledge. Diebold fixed this bug, and at least TSx machines are free of it now.

These are the two examples we know about, but note that neither of these examples was made known to the public right away.

(3) The State-commissioned study would have been unlikely to find such a bug.

The State of Florida study team included some excellent computer scientists, but they had only a short time to do their study, and the scope of their study was limited. They did not perform the kind of time-consuming dynamic testing that one would use in an all-out hunt for such a bug. To their credit, they did the best they could given the limited time and tools they had, but they would have had to get lucky to find such a bug if it existed. Their failure to find such a bug is not strong evidence that a bug does not exist.

(4) Studies of voting data show patterns that point to the bug theory.

Several groups have studied detailed data on the Sarasota election results, looking for patterns that might help explain what happened.

One of the key questions is whether there are systematic differences in undervote rate between individual voting machines. The reason this matters is that if the ballot design theory is correct, then the likelihood that a particular voter undervoted would be independent of which specific machine the voter used – all voting machines displayed the same ballot. But an intermittent bug might well manifest itself differently depending on the details of how each voting machine was set up and used. So if undervote rates depend on attributes of the machines, rather than attributes of the voters, this tends to point toward the bug theory.

Of course, one has to be careful to disentangle the possible causes. For example, if two voting machines sit in different precincts, they will see different voter populations, so their undervote rate might differ even if the machines are exactly identical. Good data analysis must control for such factors or at least explain why they are not corrupting the results.

There are two serious studies that point to machine-dependent results. First, Mebane and Dill found that machines that had a certain error message in their logs had a higher undervote rate. According to the State study, this error message was caused by a particular method used by poll workers to wake the machines up in the morning; so the use of this method correlated with higher undervote rate.

Second, Charles Stewart, an MIT political scientist testifying for the Jennings campaign in the litigation, looked at how the undervote rate depended on when the voting machine was “cleared and tested”, an operation used to prepare the machine for use. Stewart found that machines that were cleared and tested later (closer to Election Day) had a higher undervote rate, and that machines that were cleared and tested on the same day as many other machines also had a higher undervote rate. One possibility is that clearing and testing a machine in a hurry, as the election deadline approached or just on a busy day, contributed to the undervote rate somehow.

Both studies indicate a link between the details of a how a machine was set up and used, and the undervote rate on that machine. That’s the kind of thing we’d expect to see with an intermittent bug, but not if undervotes were caused strictly by ballot design and user confusion.

Conclusion

What conclusion can we draw? Certainly we cannot say that a bug definitely caused undervotes. But we can say with confidence that the bug theory is still in the running, and needs to be considered alongside the ballot design theory as a possible cause of the Sarasota undervotes. If we want to get to the bottom of this, we need to investigate further, by looking more deeply into undervote patterns, and by examining the voting machine hardware and software.

[Correction (Feb. 28): I changed part (3) to say that the team “had” only a short time to do their sstudy. I originally wrote that they “were given” only a short time, which left the impression that the state had set a time limit for the study. As I understand it, the state did not impose such a time limit. I apologize for the error.]

Sarasota Voting Machines Insecure

The technical team commissioned by the State of Florida to study the technology used in the ill-fated Sarasota election has released its report. (Background: on the Sarasota election problems; on the study.)

One revelation from the study is that the iVotronic touch-screen voting machines are terribly insecure. The machines are apparently susceptible to viruses, and there are many bugs a virus could exploit to gain entry or spread:

We found many instances of [exploitable buffer overflow bugs]. Misplaced trust in the election definition file can be found throughout the iVotronic software. We found a number of buffer overruns of this type. The software also contains array out-of-bounds errors, integer overflow vulnerabilities, and other security holes. [page 57]

The equation is simple: sloppy software + removable storage = virus vulnerability. We saw the same thing with the Diebold touchscreen voting system.

Another example of poor security is in the passwords that protect crucial operations such as configuring the voting machine and modifying its software. There are separate passwords for different operations, but the system has a single backdoor that allows all of the passwords to be bypassed by an adversary who can learn or guess a one-byte secret, which is easily guessed since there are only 256 possibilities. (p. 67) For example, an attacker who gets private access to the machine for just a few minutes can apparently use the backdoor to install malicious software onto a machine.

Though the machines’ security is poor and needs to be fixed before it is used in another election, I agree with the study team that the undervotes were almost certainly not caused by a security attack. The reason is simple: only a brainless attacker would cause undervotes. An attack that switched votes from one candidate to another would be more effective and much harder to detect.

So if it wasn’t a security attack, what was the cause of the undervotes?

Experience teaches that systems that are insecure tend to be unreliable as well – they tend to go wrong on their own even if nobody is attacking them. Code that is laced with buffer overruns, array out-of-bounds errors, integer overflow errors, and the like tends to be flaky. Sporadic undervotes are the kind of behavior you would expect to see from a flaky voting technology.

The study claims to have ruled out reliability problems as a cause of the undervotes, but their evidence on this point is weak, and I think the jury is still out on whether voting machine malfunctions could be a significant cause of the undervotes. I’ll explain why, in more detail, in the next post.

Why Understanding Programs is Hard

Senator Sam Brownback has reportedly introduced a bill that would require the people rating videogames to play the games in their entirety before giving a rating. This reflects a misconception common among policymakers: that it’s possible to inspect a program and figure out what it’s going to do.

It’s true that some programs can be completely characterized by inspection, but this is untrue for many programs, including some of the most interesting and useful ones. Even very simple programs can be hard to understand.

Here, for example, is a three-line Python program I just wrote:

import sys, sha

h = sha.new(sha.new(sys.argv[1]).digest()[:9]).digest()

if h.startswith(“abcdefghij”): print “Drat”

(If you don’t speak Python, here’s what the program does: it takes the input you give it, and performs some standard, complicated mathematical operations on the input to yield a character-string. If the first ten characters of that string happen to be “abcedfghij”, then the program prints the word “Drat”; otherwise it doesn’t print anything.)

Will this program ever print a four-letter word? There’s no practical way to tell. It’s obvious that the program’s behavior depends on what input you give it, but is there any input that causes h to start with the magic value abcedfghij? You can run the program and inspect the code but you won’t be able to tell. Even I don’t know, and I wrote the program!

Now you might object that even if we can’t tell whether the program will output a four-letter word, the presence of print “Drat” in the code shows that the programmer at least wanted “Drat” to be a possible output.

Fair enough; but here’s a slightly more complicated program that might or might not print “Drat”.

import sys, sha

h = sha.new(sha.new(sys.argv[1]).digest()[:9]).digest()

if h.startswith(“abcdef”): print h[6:9]

The behavior of this program again depends on its input. For some inputs, it will produce no output. For other inputs, it will produce an output that depends in a complicated way on the input it got. Can this program ever print “Drat”? You can’t tell, and neither can I.

Nonexperts are often surprised to learn that programs can do things the programmers didn’t expect. These surprises can be vexing; but they’re also the main reason computer science is fun.

AACS: Slow Start on Traitor Tracing

[Previous posts in this series: 1, 2, 3, 4, 5, 6, 7, 8.]

Alex wrote on Thursday about the next step in the breakdown of AACS, the encryption scheme used on next-gen DVD discs (HD-DVD and Blu-ray): last week a person named Arnezami discovered and published a processing key that apparently can be used to decrypt all existing discs.

We’ve been discussing AACS encryption, on and off, for several weeks now. To review the state of play: the encryption scheme serves two purposes: key distribution and traitor tracing. Key distribution ensures that every player device, except devices that have been blacklisted, can decrypt a disc. Traitor tracing helps the authorities track down which player has been compromised, if key information is leaked. The AACS authorities encode the header information for each disc in such a way that keys are distributed properly and traitor tracing can occur.

Or that’s the theory, at least. In practice, the authorities are making very little use of the traitor tracing facilities. We’re not sure why this is. They surely have an interest in tracing traitors, and failing to encode discs to facilitate traitor tracing is just a lost opportunity.

The main traitor tracing feature is the so-called sequence key mechanism. This mechanism is not used at all on any of the discs we have seen, nor have we seen any reports of its use.

A secondary traitor tracing feature involves the use of processing keys. Each player device has a unique set of a few hundred device keys, from which it can calculate a few billion different processing keys. Each processing key is computable by only a fraction of the players in the world. Each disc’s headers include a list of the processing keys that can decrypt the disc; any one of the listed processing keys is sufficient to decrypt the disc.

For some reason, all existing discs seem to list the same set of 512 processing keys. Each player will be able to compute exactly one of these processing keys. So when Arnezami leaked a processing key, the authorities could deduce that he must have extracted it from a player that knew that particular processing key. In other words, it narrowed down the identity of his player to about 0.2% of all possible players.

Because all existing discs use the same set of processing keys, the processing key leaked by Arnezami can decrypt any existing disc. Had the authorities used different sets of processing keys on different discs – which was perfectly feasible – then a single processing key would not have unlocked so many discs. Arnezami would have had to extract and publish many processing keys, which would have made his job more difficult, and would have further narrowed down which player he had.

The ability to use different processing key sets on different discs is part of the AACS traitor tracing facility. In failing to do this, the authorities once again failed to use the traitor tracing mechanisms at their disposal.

Why aren’t the authorities working as hard as they can to traitor-trace compromised players? Sure, the sequence key and processing key mechanisms are a bit complex, but if the authorities weren’t going to use these mechanisms, then why would they have gone to the difficulty and expense of designing them and requiring all players to implement them? It’s a mystery to us.