June 28, 2022

Archives for August 2008

Lenz Ruling Raises Epistemological Questions

Stephanie Lenz’s case will be familiar to many of you: After publishing a 29-second video on YouTube that shows her toddler dancing to the Prince song “Let’s Go Crazy,” Ms. Lenz received email from YouTube, informing her that the video was being taken down at Universal Music’s request. She filed a DMCA counter-notification claiming the video was fair use, and the video was put back up on the site. Now Ms. Lenz, represented by the EFF, is suing Universal, claiming that the company violated section 512(f) of the Digital Millennium Copyright Act. Section 512(f) creates liability for a copyright owner who “knowingly materially misrepresents… that material or activity is infringing.”

On Wednesday, the judge denied Universal’s motion to dismiss the suit. The judge held that “in order for a copyright owner to proceed under the DMCA with ‘a good faith belief that the use of the material in the manner complained of is not authorized by the copyright owner, its agent, or the law,’ the owner must evaluate whether the material makes fair use of the copyright.”

The essence of Lenz’s claim is that when Universal sent a notice claiming her use was “not authorized by… the law,” they already knew her use was actually lawful. She cites news coverage that suggests that Universal’s executives watched the video and then, at Prince’s urging, sent a takedown notice they would not have opted to send on their own. Wednesday’s ruling gives the case a chance to proceed into discovery, where Lenz and the EFF can try to find evidence to support their theory that Universal’s lawyers recognized her use was legally authorized under fair use—but caved to Prince’s pressure and sent a spurious notice anyway.

Universal’s view is very different from Lenz’s and, apparently, from the judge’s—they claim that the sense of “not authorized by… the law” required for a DMCA takedown notice is that a use is unauthorized in the first instance, before possible fair use defenses are considered. This position is very important to the music industry’s current practice of sending automated takedown notices based on recognizing copyright works; if copyright owners were required to form any kind of belief about the fairness of a use before asking for a takedown, then this kind of fully computer-automated mass request might not be possible, since it’s hard to imagine a computer performing the four-factor weighing test that informs a fair use determination.

Seen in this light, the case has at least as much to do with the murky epistemology of algorithmic inference as it does with fair use per se. The music industry uses takedown bots to search out and flag potentially infringing uses of songs, and then in at least some instances to send automated takedown notices. If humans at Universal manually review a random sample of the bot’s output, and the statistics and sampling issues are well handled, and they find that a certain fraction of the bot’s output is infringing material, then they can make an inference. They can infer with the statistically appropriate level of confidence that the same fraction of songs in a second sample, consisting of bot-flagged songs “behind a curtain” that have not manually reviewed, are also infringing. If the fraction of material that’s infringing is high enough—e.g. 95 percent?—then one can reasonably or in good faith (at least in the layperson, everyday sense of those terms) believe that an unexamined item turned up by the bot is infringing.

The same might hold true if fair use is also considered: As long a high enough fraction of the material flagged by the bot in the first, manual human review phase turns out to be infringement-not-defensible-as-fair-use, a human can believe reasonably that a given instance flagged by the bot—still “behind the curtain” and not seen by human eyes—is probably an instance of infringement-not-defensible-as-fair-use.

The general principle here would be: If you know the bot is usually right (for some definition of “usually”), and don’t have other information about some case X on which the bot has offered a judgment, then it is reasonable to believe that the bot is right in case X—indeed, it would be unreasonable to believe otherwise, without knowing more. So it seems like there is some level of discernment, in a bot, that would suffice in order for a person to believe in good faith that any given item identified by the bot was an instance of infringement suitable for a DMCA complaint. (I don’t know what the threshold should be, who should decide, or whether or not the industry’s current bots meet it.) This view, when it leads to auto-generated takedown requests, has the strange consequence that music industry representatives are asserting that they have a “good faith belief” certain copies of certain media are infringing, even when they aren’t aware that those copies exist.

Here’s where the sidewalk ends, and I begin to wish I had formal legal training: What are the epistemic procedures required to form a “good faith belief”? How about a “reasonable belief”? This kind of question in the law surely predates computers: It was Oliver Wendell Holmes, Jr. who first created the reasonable man, a personage Louis Menand has memorably termed “the fictional protagonist of modern liability theory.” I don’t even know to whom this question should be addressed: Is there a single standard nationally? Does it vary circuit by circuit? Statute by statute? Has it evolved in response to computer technology? Readers, can you help?

Gymnastics Scores and Grade Inflation

The gymnastics scoring in this year’s Olympics has generated some controversy, as usual. Some of the controversy feel manufactured: NBC tried to create a hubbub over Nastia Liukin losing the uneven bars gold medal on the Nth tiebreaker; but top-level sporting events whose rules do not admit ties must sometimes decide contests by tiny margins.

A more interesting discussion relates to a change in the scoring system, moving from the old 0.0 to 10.0 scale, to a new scale that adds together an “A score” measuring the difficulty of the athlete’s moves and a “B score” measuring how well the moves were performed. The B score is on the old 0-10 scale, but the A score is on an open-ended scale with fixed scores for each constituent move and bonuses for continuously connecting a series of moves.

One consequence of the new system is that there is no predetermined maximum score. The old system had a maximum score, the legendary “perfect 10”, whose demise is mourned old-school gymnastics gurus like Bela Karolyi. But of course the perfect 10 wasn’t really perfect, at least not in the sense that a 10.0 performance was unsurpassable. No matter how flawless a gymnast’s performance, it is always possible, at least in principle, to do better, by performing just as flawlessly while adding one more flip or twist to one of the moves. The perfect 10 was in some sense a myth.

What killed the perfect 10, as Jordan Ellenberg explained in Slate, was a steady improvement in gymnastic performance that led to a kind of grade inflation in which the system lost its ability to reward innovators for doing the latest, greatest moves. If a very difficult routine, performed flawlessly, rates 10.0, how can you reward an astonishingly difficult routine, performed just as flawlessly? You have to change the scale somehow. The gymnastics authorities decided to remove the fixed 10.0 limit by creating an open-ended difficulty scale.

There’s an interesting analogy to the “grade inflation” debate in universities. Students’ grades and GPAs have increased slowly over time, and though this is not universally accepted, there is plausible evidence that today’s students are doing better work than past students did. (At the very least, today’s student bodies at top universities are drawn from a much larger pool of applicants than before.) If you want a 3.8 GPA to denote the same absolute level of performance that it denoted in the past, and if you also want to reward the unprecendented performance of today’s very best students, then you have to expand the scale at the top somehow.

But maybe the analogy from gymnastics scores to grades is imperfect. The only purpose of gymnastics scores is to compare athletes, to choose a winner. Grades have other purposes, such as motivating students to pay attention in class, or rewarding students for working hard. Not all of these purposes require consistency in grading over time, or even consistency within a single class. Which grading policy is best depends on which goals we have in mind.

One thing is clear: any discussion of gymnastics scoring or university grading will inevitably be colored by nostalgic attachment to the artists or students of the past.

How do you compare security across voting systems?

It’s a curious problem: how do you compare two completely unrelated voting systems and say that one is more or less secure than the other?  How can you meaningfully compare the security of paper ballots tabulated by optical scan systems with DRE systems (with or without VVPAT attachments)?

There’s a clear disconnect on this issue.  It shows up, among other places, in a recent blog post by political scientist Thad Hall:

The point here is that, when we think about paper ballots and absentee voting, we do not typically think about or evaluate them “naked” but within an implementation context yet we think nothing of evaluating e-voting “naked” and some almost think it “cheating” to think about e-voting security within the context of implementation.  However, if we held both systems to the same standard, the people in California probably would not be voting using any voting system; given its long history, it is inconceivable that paper ballots would fail to meet the standards to which e-voting is held, absent evaluating its implementation context.

Hall then goes on to point to his recent book with Mike Alvarez, Electronic Elections, that beats on this particular issue at some length.  What that book never offers, however, is a decent comparison between electronic voting and anything else.

I’ve been thinking about this issue for a while: there must be a decent, quantitative way to compare these things.  Turns out, we can leverage a foundational technique from computer science theory: complexity analysis.  CS theory is all about analyzing the “big-O” complexity of various algorithms.  Can we analyze this same complexity for voting systems’ security flaws?

I took a crack at the problem for a forthcoming journal paper.  I classified a wide variety of voting systems according to how much effort you need to do to influence all the votes: effort proportional to the total number of voters, effort proportional to the number of precincts, or constant effort; less effort implies less security.  I also broke this down by different kinds of attacks: integrity attacks that try to change votes in a stealthy fashion, confidentiality attacks that try to learn how specific voters cast their votes, and denial of service attacks that don’t care about stealth but want to smash parts of the election.  This was a fun paper to write, and it nicely responds to Hall and Alvarez’s criticisms.  Have a look.

(Joe Hall also responded to Thad Hall’s post.)