March 19, 2024

Archives for August 2008

Lenz Ruling Raises Epistemological Questions

Stephanie Lenz’s case will be familiar to many of you: After publishing a 29-second video on YouTube that shows her toddler dancing to the Prince song “Let’s Go Crazy,” Ms. Lenz received email from YouTube, informing her that the video was being taken down at Universal Music’s request. She filed a DMCA counter-notification claiming the video was fair use, and the video was put back up on the site. Now Ms. Lenz, represented by the EFF, is suing Universal, claiming that the company violated section 512(f) of the Digital Millennium Copyright Act. Section 512(f) creates liability for a copyright owner who “knowingly materially misrepresents… that material or activity is infringing.”

On Wednesday, the judge denied Universal’s motion to dismiss the suit. The judge held that “in order for a copyright owner to proceed under the DMCA with ‘a good faith belief that the use of the material in the manner complained of is not authorized by the copyright owner, its agent, or the law,’ the owner must evaluate whether the material makes fair use of the copyright.”

The essence of Lenz’s claim is that when Universal sent a notice claiming her use was “not authorized by… the law,” they already knew her use was actually lawful. She cites news coverage that suggests that Universal’s executives watched the video and then, at Prince’s urging, sent a takedown notice they would not have opted to send on their own. Wednesday’s ruling gives the case a chance to proceed into discovery, where Lenz and the EFF can try to find evidence to support their theory that Universal’s lawyers recognized her use was legally authorized under fair use—but caved to Prince’s pressure and sent a spurious notice anyway.

Universal’s view is very different from Lenz’s and, apparently, from the judge’s—they claim that the sense of “not authorized by… the law” required for a DMCA takedown notice is that a use is unauthorized in the first instance, before possible fair use defenses are considered. This position is very important to the music industry’s current practice of sending automated takedown notices based on recognizing copyright works; if copyright owners were required to form any kind of belief about the fairness of a use before asking for a takedown, then this kind of fully computer-automated mass request might not be possible, since it’s hard to imagine a computer performing the four-factor weighing test that informs a fair use determination.

Seen in this light, the case has at least as much to do with the murky epistemology of algorithmic inference as it does with fair use per se. The music industry uses takedown bots to search out and flag potentially infringing uses of songs, and then in at least some instances to send automated takedown notices. If humans at Universal manually review a random sample of the bot’s output, and the statistics and sampling issues are well handled, and they find that a certain fraction of the bot’s output is infringing material, then they can make an inference. They can infer with the statistically appropriate level of confidence that the same fraction of songs in a second sample, consisting of bot-flagged songs “behind a curtain” that have not manually reviewed, are also infringing. If the fraction of material that’s infringing is high enough—e.g. 95 percent?—then one can reasonably or in good faith (at least in the layperson, everyday sense of those terms) believe that an unexamined item turned up by the bot is infringing.

The same might hold true if fair use is also considered: As long a high enough fraction of the material flagged by the bot in the first, manual human review phase turns out to be infringement-not-defensible-as-fair-use, a human can believe reasonably that a given instance flagged by the bot—still “behind the curtain” and not seen by human eyes—is probably an instance of infringement-not-defensible-as-fair-use.

The general principle here would be: If you know the bot is usually right (for some definition of “usually”), and don’t have other information about some case X on which the bot has offered a judgment, then it is reasonable to believe that the bot is right in case X—indeed, it would be unreasonable to believe otherwise, without knowing more. So it seems like there is some level of discernment, in a bot, that would suffice in order for a person to believe in good faith that any given item identified by the bot was an instance of infringement suitable for a DMCA complaint. (I don’t know what the threshold should be, who should decide, or whether or not the industry’s current bots meet it.) This view, when it leads to auto-generated takedown requests, has the strange consequence that music industry representatives are asserting that they have a “good faith belief” certain copies of certain media are infringing, even when they aren’t aware that those copies exist.

Here’s where the sidewalk ends, and I begin to wish I had formal legal training: What are the epistemic procedures required to form a “good faith belief”? How about a “reasonable belief”? This kind of question in the law surely predates computers: It was Oliver Wendell Holmes, Jr. who first created the reasonable man, a personage Louis Menand has memorably termed “the fictional protagonist of modern liability theory.” I don’t even know to whom this question should be addressed: Is there a single standard nationally? Does it vary circuit by circuit? Statute by statute? Has it evolved in response to computer technology? Readers, can you help?

Gymnastics Scores and Grade Inflation

The gymnastics scoring in this year’s Olympics has generated some controversy, as usual. Some of the controversy feel manufactured: NBC tried to create a hubbub over Nastia Liukin losing the uneven bars gold medal on the Nth tiebreaker; but top-level sporting events whose rules do not admit ties must sometimes decide contests by tiny margins.

A more interesting discussion relates to a change in the scoring system, moving from the old 0.0 to 10.0 scale, to a new scale that adds together an “A score” measuring the difficulty of the athlete’s moves and a “B score” measuring how well the moves were performed. The B score is on the old 0-10 scale, but the A score is on an open-ended scale with fixed scores for each constituent move and bonuses for continuously connecting a series of moves.

One consequence of the new system is that there is no predetermined maximum score. The old system had a maximum score, the legendary “perfect 10”, whose demise is mourned old-school gymnastics gurus like Bela Karolyi. But of course the perfect 10 wasn’t really perfect, at least not in the sense that a 10.0 performance was unsurpassable. No matter how flawless a gymnast’s performance, it is always possible, at least in principle, to do better, by performing just as flawlessly while adding one more flip or twist to one of the moves. The perfect 10 was in some sense a myth.

What killed the perfect 10, as Jordan Ellenberg explained in Slate, was a steady improvement in gymnastic performance that led to a kind of grade inflation in which the system lost its ability to reward innovators for doing the latest, greatest moves. If a very difficult routine, performed flawlessly, rates 10.0, how can you reward an astonishingly difficult routine, performed just as flawlessly? You have to change the scale somehow. The gymnastics authorities decided to remove the fixed 10.0 limit by creating an open-ended difficulty scale.

There’s an interesting analogy to the “grade inflation” debate in universities. Students’ grades and GPAs have increased slowly over time, and though this is not universally accepted, there is plausible evidence that today’s students are doing better work than past students did. (At the very least, today’s student bodies at top universities are drawn from a much larger pool of applicants than before.) If you want a 3.8 GPA to denote the same absolute level of performance that it denoted in the past, and if you also want to reward the unprecendented performance of today’s very best students, then you have to expand the scale at the top somehow.

But maybe the analogy from gymnastics scores to grades is imperfect. The only purpose of gymnastics scores is to compare athletes, to choose a winner. Grades have other purposes, such as motivating students to pay attention in class, or rewarding students for working hard. Not all of these purposes require consistency in grading over time, or even consistency within a single class. Which grading policy is best depends on which goals we have in mind.

One thing is clear: any discussion of gymnastics scoring or university grading will inevitably be colored by nostalgic attachment to the artists or students of the past.

How do you compare security across voting systems?

It’s a curious problem: how do you compare two completely unrelated voting systems and say that one is more or less secure than the other?  How can you meaningfully compare the security of paper ballots tabulated by optical scan systems with DRE systems (with or without VVPAT attachments)?

There’s a clear disconnect on this issue.  It shows up, among other places, in a recent blog post by political scientist Thad Hall:

The point here is that, when we think about paper ballots and absentee voting, we do not typically think about or evaluate them “naked” but within an implementation context yet we think nothing of evaluating e-voting “naked” and some almost think it “cheating” to think about e-voting security within the context of implementation.  However, if we held both systems to the same standard, the people in California probably would not be voting using any voting system; given its long history, it is inconceivable that paper ballots would fail to meet the standards to which e-voting is held, absent evaluating its implementation context.

Hall then goes on to point to his recent book with Mike Alvarez, Electronic Elections, that beats on this particular issue at some length.  What that book never offers, however, is a decent comparison between electronic voting and anything else.

I’ve been thinking about this issue for a while: there must be a decent, quantitative way to compare these things.  Turns out, we can leverage a foundational technique from computer science theory: complexity analysis.  CS theory is all about analyzing the “big-O” complexity of various algorithms.  Can we analyze this same complexity for voting systems’ security flaws?

I took a crack at the problem for a forthcoming journal paper.  I classified a wide variety of voting systems according to how much effort you need to do to influence all the votes: effort proportional to the total number of voters, effort proportional to the number of precincts, or constant effort; less effort implies less security.  I also broke this down by different kinds of attacks: integrity attacks that try to change votes in a stealthy fashion, confidentiality attacks that try to learn how specific voters cast their votes, and denial of service attacks that don’t care about stealth but want to smash parts of the election.  This was a fun paper to write, and it nicely responds to Hall and Alvarez’s criticisms.  Have a look.

(Joe Hall also responded to Thad Hall’s post.)

Is the New York Times a Confused Company?

Over lunch I did something old-fashioned—I picked up and read a print copy of the New York Times. I was startled to find, on the front of the business section, a large, colorfully decorated feature headlined “Is Google a Media Company?” The graphic accompanying the story shows a newspaper masthead titled “Google Today,” followed by a list of current and imagined future offerings, from Google Maps and Google Earth to Google Drink and Google Pancake. Citing the new, wikipedia-esque service Knol, and using the example of that service’s wonderful entry on buttermilk pancakes, the Times story argues that Knol’s launch has “rekindled fears among some media companies that Google is increasingly becoming a competitor. They foresee Google’s becoming a powerful rival that not only owns a growing number of content properties, including YouTube, the top online video site, and Blogger, a leading blogging service, but also holds the keys to directing users around the Web.”

I hope the Times’s internal business staff is better grounded than its reporters and editors appear to be—otherwise, the Times is in even deeper trouble than its flagging performance suggests. Google isn’t becoming a media company—it is one now and always has been. From the beginning, it has sold the same thing that the Times and other media outlets do: Audiences. Unlike the traditional media outlets, though, online media firms like Google and Yahoo have decoupled content production from audience sales. Whether selling ads alongside search results, or alongside user-generated content on Knol or YouTube, or displaying ads on a third party blog or even a traditional media web site, Google acts as a broker, selling audiences that others have worked to attract. In so doing, they’ve thrown the competition for ad dollars wide open, allowing any blog to sap revenue (proportionately to audience share) from the big guys. The whole infrastructure is self-service and scales down to be economical for any publisher, no matter how small. It’s a far cry from an advertising marketplace that relies, as the newspaper business traditionally has, on human add sales. In the new environment, it’s a buyer’s market for audiences, and nobody is likely to make the kinds of killings that newspapers once did. As I’ve argued before, the worrying and plausible future for high-cost outlets like the Times is a death of a thousand cuts as revenues get fractured among content sources.

One might argue that sites like Knol or Blogger are a competitive threat to established media outlets because they draw users away from those outlets. But Google’s decision to add these sites hurts its media partners only to the (small) extent that the new sites increase the total amount of competing ad inventory on the web—that is, the supply of people-reading-things to whom advertisements can be displayed. To top it all off, Knol lets authors, including any participating old-media producers, capture revenue from the eyeballs they draw. The revenues in settings like these are slimmer because they are shared with Google, as opposed to being sold directly by NYTimes.com or some other establishment media outlet. But it’s hard to judge whether the Knol reimbursement would be higher or lower than the equivalent payment if an ad were displayed on the established outlet’s site, since Google does not disclose the fraction of ad revenue in shares with publishers in either case. But the addition of one more user-generated content site, whether from Google or anyone else, is at most a footnote to the media industry trend: Google’s revenues come from ads, and that makes it a media company, pure and simple.

Comcast Gets Slapped, But the FCC Wisely Leaves its Options Open

The FCC’s recent Comcast action—whose full text is unavailable as yet, though it was described in a press release and statements from each comissioner—is a lesson in the importance of technological literacy for policymaking. The five commissioners’ views, as reflected in their statements, are strongly correlated to the degree of understanding of the fact pattern that each commissioner’s statement reveals. Both dissenting commissioners, it turns out, materially misunderstood the technical facts on which they assert their decisions were based. But the majority, despite technical competence, avoided a bright line rule—and that might itself turn out to be great policy.

Referring to what she introduces as the “BitTorrent-Comcast controversy,” dissenting Commissioner Tate writes that after the FCC began to look into the matter, “the two parties announced on March 27 an agreement to collaborate in managing web traffic and to work together to address network management and content distribution.” Where private parties can agree among themselves, Commissioner Tate sensibly argues, regulators ought to stand back. But as Ed and others have pointed out before, this has never been a two-party dispute. BitTorrent, Inc., which negotiated with Comcast, doesn’t have the power to redefine the open BitTorrent protocol whose name it shares. Anyone can write client software to share files using today’s version of the Bittorrent protocol – and no agreement between Comcast and BitTorrent, Inc. could change that. Indeed, if the protocoal were modified to buy overall traffic reductions by slowing downloads for individual users, one might expect many users to decline to switch. For this particular issue to be resolved among the parties, Comcast would have to negotiate with all (or at least most of) the present and future developers of Bittorrent clients. A private or mediated resolution among the primary actors involved in this dispute has not taken place and isn’t, as far as I know, currently being attempted. So while I share Ms. Tate’s wise preference for mediation and regulatory reticence, I don’t think her view in this particular case is available to anyone who fully understands the technical facts.

The other dissenting commissioner, Robert McDowell, shares Ms. Tate’s confusion about who the parties to the dispute are, chastising the majority for going forward after Comcast and BitTorrent, Inc. announced their differences settled. He’s also simply confused about the technology, writing that “the vast majority of consumers” “do not use P2P software to watch YouTube” when (a) YouTube isn’t delivered over P2P software, so its traffic numbers don’t speak to the P2P issue and (b) YouTube is one of the most popular sites on the web, making it very unlikely that the “vast majority of consumers” avoid the site. Likewise, he writes that network management allows companies to provide “online video without distortion, pops, and hisses,” analog problems that aren’t faced by digital media.

The majority decision, in finding Comcast’s activities collectively to be over the line from “reasonable network management,” leaves substantial uncertainty about where that line lies, which is another way of saying that the decision makes it hard for other ISPs to predict what kinds of network management, short of what Comcast did, would prompt sanctions in the future. For example, what if Comcast or another ISP were to use the same tools only to target BitTorrent files that appear, after deep packet inspection, to violate copyright? The commissioners were at pains to emphasize that networks are free to police their networks for illegal content. But a filter designed to impede transfer of most infringing video would be certain to generate a significant number of false positives, and the false positives (that is, transfers of legal video impeded by the filter) would act as a thumb on the scales in favor of traditional cable service, raising the same body of concerns about competition that the commissioners cite as a background factor informing their decision to sanction Comcast. We don’t know how that one would turn out.

McDowell’s brief highlights the ambiguity of the finding. He writes: “This matter would have had a better chance on appeal if we had put the horse before the cart and conducted a rulemaking, issued rules and then enforced them… The majority’s view of its ability to adjudicate this matter solely pursuant to ancillary authority is legally deficient as well. Under the analysis set forth in the order, the Commission apparently can do anything so long as it frames its actions in terms of promoting the Internet or broadband deployment.”

Should the commissioners have adopted a “bright line” rule, as McDowell’s dissent suggests? The Comcast ruling’s uncertainty guarantees a future of envelope-pushing and resource intensive, case-by-case adjudication, whether in regulatory proceedings or the courts. But I actually think that might be the best available alternative here. It preserves the Commission’s ability to make the right decision in future cases without having to guess, today, what precise rule would dictate those future results. (On the flip side, it also preserves the Commission’s ability to make bad choices in the future, especially if diminished public interest in the issue increases the odds of regulatory capture.) If Jim Harper is correct that Martin’s support is a strategic gambit to tie the issue up while broadband service expands, this suggests that Martin believes, as I do, that uncertainty about future interventions is a good way to keep ISPs on their best behavior.