July 26, 2017

Lenz Ruling Raises Epistemological Questions

Stephanie Lenz’s case will be familiar to many of you: After publishing a 29-second video on YouTube that shows her toddler dancing to the Prince song “Let’s Go Crazy,” Ms. Lenz received email from YouTube, informing her that the video was being taken down at Universal Music’s request. She filed a DMCA counter-notification claiming the video was fair use, and the video was put back up on the site. Now Ms. Lenz, represented by the EFF, is suing Universal, claiming that the company violated section 512(f) of the Digital Millennium Copyright Act. Section 512(f) creates liability for a copyright owner who “knowingly materially misrepresents… that material or activity is infringing.”

On Wednesday, the judge denied Universal’s motion to dismiss the suit. The judge held that “in order for a copyright owner to proceed under the DMCA with ‘a good faith belief that the use of the material in the manner complained of is not authorized by the copyright owner, its agent, or the law,’ the owner must evaluate whether the material makes fair use of the copyright.”

The essence of Lenz’s claim is that when Universal sent a notice claiming her use was “not authorized by… the law,” they already knew her use was actually lawful. She cites news coverage that suggests that Universal’s executives watched the video and then, at Prince’s urging, sent a takedown notice they would not have opted to send on their own. Wednesday’s ruling gives the case a chance to proceed into discovery, where Lenz and the EFF can try to find evidence to support their theory that Universal’s lawyers recognized her use was legally authorized under fair use—but caved to Prince’s pressure and sent a spurious notice anyway.

Universal’s view is very different from Lenz’s and, apparently, from the judge’s—they claim that the sense of “not authorized by… the law” required for a DMCA takedown notice is that a use is unauthorized in the first instance, before possible fair use defenses are considered. This position is very important to the music industry’s current practice of sending automated takedown notices based on recognizing copyright works; if copyright owners were required to form any kind of belief about the fairness of a use before asking for a takedown, then this kind of fully computer-automated mass request might not be possible, since it’s hard to imagine a computer performing the four-factor weighing test that informs a fair use determination.

Seen in this light, the case has at least as much to do with the murky epistemology of algorithmic inference as it does with fair use per se. The music industry uses takedown bots to search out and flag potentially infringing uses of songs, and then in at least some instances to send automated takedown notices. If humans at Universal manually review a random sample of the bot’s output, and the statistics and sampling issues are well handled, and they find that a certain fraction of the bot’s output is infringing material, then they can make an inference. They can infer with the statistically appropriate level of confidence that the same fraction of songs in a second sample, consisting of bot-flagged songs “behind a curtain” that have not manually reviewed, are also infringing. If the fraction of material that’s infringing is high enough—e.g. 95 percent?—then one can reasonably or in good faith (at least in the layperson, everyday sense of those terms) believe that an unexamined item turned up by the bot is infringing.

The same might hold true if fair use is also considered: As long a high enough fraction of the material flagged by the bot in the first, manual human review phase turns out to be infringement-not-defensible-as-fair-use, a human can believe reasonably that a given instance flagged by the bot—still “behind the curtain” and not seen by human eyes—is probably an instance of infringement-not-defensible-as-fair-use.

The general principle here would be: If you know the bot is usually right (for some definition of “usually”), and don’t have other information about some case X on which the bot has offered a judgment, then it is reasonable to believe that the bot is right in case X—indeed, it would be unreasonable to believe otherwise, without knowing more. So it seems like there is some level of discernment, in a bot, that would suffice in order for a person to believe in good faith that any given item identified by the bot was an instance of infringement suitable for a DMCA complaint. (I don’t know what the threshold should be, who should decide, or whether or not the industry’s current bots meet it.) This view, when it leads to auto-generated takedown requests, has the strange consequence that music industry representatives are asserting that they have a “good faith belief” certain copies of certain media are infringing, even when they aren’t aware that those copies exist.

Here’s where the sidewalk ends, and I begin to wish I had formal legal training: What are the epistemic procedures required to form a “good faith belief”? How about a “reasonable belief”? This kind of question in the law surely predates computers: It was Oliver Wendell Holmes, Jr. who first created the reasonable man, a personage Louis Menand has memorably termed “the fictional protagonist of modern liability theory.” I don’t even know to whom this question should be addressed: Is there a single standard nationally? Does it vary circuit by circuit? Statute by statute? Has it evolved in response to computer technology? Readers, can you help?

Comments

  1. John Millington says:

    “If the fraction of material that’s infringing is high enough—e.g. 95 percent?—then one can reasonably or in good faith (at least in the layperson, everyday sense of those terms) believe that an unexamined item turned up by the bot is infringing.”

    And you can reasonably or in good faith (at least in the layperson, everyday sense of those terms) believe that 1 in 20 of the unexamined items is bogus and, if sent, will damage an innocent person.

    If I tell a robot to shoot a gun into an empty field every 10 minutes and then for a couple days I observe that it shoots people 0% of the time, that observation isn’t going to carry much weight for my defense once someone gets shot.

    DMCA notices carry significant force. The hosting ISP _must_ comply or they’re in big trouble, so when these things get sent out, people’s speech is going to get pulled. Humans should review them *all*, and take accountability for the fact that they _will_ shut someone down, at least for a while. *Bogus* DMCA notices, where if any human had just looked at the use they would have known it was obviously fair use, should not be tolerated.

    The idea of a *robot* blindly sending notices that *people* are legally obligated to obey, is ludicrous. The mere deployment and operation of such a robot, is of pretty questionable “good faith.”

  2. I think this whole line of argument fails on a “good for the goose, good for the gander” basis.

    Copyright holders have always insisted that each use of a copyrighted work requires individual attention, not treatment through some mass process.

    Consider this hypothetical example. Suppose I write an article and want to include several paragraphs from a book. So I call the book publisher and get permission. Then I write 19 more similar articles, each time getting permission from a book publisher to include an excerpt from a book. Now when I write the 21st article, am I entitled to argue “I don’t need to ask permission from the book publisher, because I asked permission 20 times in similar circumstances and got permission each time, so I am entitled to assume that permission would be forthcoming if I asked for it.”? The publisher would have apoplexy if I made such an argument.

    If copyright holders maintain that any use of their copyrighted work requires individual attention from a human being and an individual license or permission, they have no right to issue takedown notices through an automated process. Copyright holders should have to live by the same rules that they insist everyone else must follow.

  3. The general principle here would be: If you know the bot is usually right (for some definition of “usually”), and don’t have other information about some case X on which the bot has offered a judgment, then it is reasonable to believe that the bot is right in case X—indeed, it would be unreasonable to believe otherwise, without knowing more.

    I think the law is interested in the specific case, not the statistical case. The key phrase is “without knowing more” and of course it’s amazing how much you can not see, just by not looking. Suppose I was to drive down a busy street while blindfolded, would it be reasonable to argue that I didn’t see the people I ran into? I believe that there is a liability created by recklessness.

    So it seems like there is some level of discernment, in a bot, that would suffice in order for a person to believe in good faith that any given item identified by the bot was an instance of infringement suitable for a DMCA complaint. (I don’t know what the threshold should be, who should decide, or whether or not the industry’s current bots meet it.)

    There’s an easy system to both set the thresholds and allow the use of bots. All you have to do is hold the RIAA liable for the times they do get it wrong, and deliver a similar penalty to what they would have delivered to the infringer. Then (if they are smart), the RIAA would settle out of court quickly on those cases where they screwed up, and (if the bot really is 95% accurate) then they make more money than they lose. If the bot is 25% accurate then they lose more money than they make, time to get a better bot (or a phone call to India).

  4. The idea of a *robot* blindly sending notices that *people* are legally obligated to obey, is ludicrous. The mere deployment and operation of such a robot, is of pretty questionable “good faith.”

    My understanding is that machines cannot legally make decisions, nor can they legally instigate actions. In situations where there is an outward appearance that a machine is taking action, the legal interpretation is that the person (or possibly multiple people) who set the machine in motion actually created that action. When computers are attacked by malware, the trail of legal responsibility gets very murky, but it never stops in the computer.

    This is a professional opinion, but it is the opinion of a professional programmer trying to understand the law, and not the opinion of a professional lawyer trying to understand computers (you get to choose which is the more dangerous).

  5. Viacom has sent more than 200,000 takedowns to YouTube. So if Viacom accepted a 95% confidence level, it would result in 10,000 noninfringing uses being pulled as “collateral damage.” And we can presume that the number of DMCA takedowns issued will only be increasing as more copyright owners start policing more hosting sites.

    The question regarding “reasonable belief” is not an aggregate inquiry. Rather, it is a question asked of the particular content being censored off the Internet: could a reasonable copyright owner have that belief after viewing this particular video? Mistakes are punishable under 512(f) by a suit for damages and attorneys fees.

    So if your bot is going to be wrong 5% of the time, you’ll have to calculate the resulting legal costs and judgments as part of the “cost of doing business” for your bot. That gets the incentives right — you have a financial incentive to improve your bot to get that cost down.

    It’s worth remembering that the DMCA takedown process was intended to target obvious infringements, not EVERY arguable infringement. If your bot is looking for MD5 hashes of verbatim copies (YouTube does this already), you should have much better than 95% confidence. On the other hand, if you’re relying on loose fingerprinting algorithms to find every piece of content with more than 1 second of the video track of any MGM film, then you are already well beyond the intent of the DMCA takedown process.

  6. If the copyright holders have a reasonable belief that up to 95% of content is infringing, then they have a reasonable belief that 5% is actually not infringing. That should be pretty damning.

  7. Conveniently enough, just today Bruce Schneier posted a link to http://dmca.cs.washington.edu/ and the paper discussing the receipt of takedown notices for the IP addresses of printers and wireless access points. The issues are a little different from the youtube notices, but the point about reasonable belief in the accuracy of bots is the same.

    In addition, I’d like to point out that the takedown notices, nominally signed by human beings, don’t say “we believe that X is infringing”; they say “We know that X is infringing. In any other legal context (say, knowing that 95% of celebrities have at some point used drugs and thus feeling confident in publishing a statement that some particular celebrity is a drug user) making an unbacked claim of fact rather than opinion that someone else is violating the law can result in serious professional and financial penalties.

  8. I am reminded of an old Judge Wapner quote from the TV show, “The People’s Court.” He would often say, “The definition of negligence is to look, but not see.” In this case, it’s typically applied to motor vehicle drivers, or other cases of where someone should have taken the care to look where they were going or what they were doing. Often the defendant would whine, “But judge, I *did* look! I just didn’t see her/him/it!” Not a defense according to ol’ Judge Wapner! 🙂

    NOTE: This reply will probably be taken down by the owner of “The People’s Court” as a matter of course. 🙂

  9. The are several problems with doing DMCA notices statistically, it would create the same effect as say putting all celebrities in jail because 95% of them use drugs.

    The DMCA is very powerful and often abused to censor people, which is something that shouldn’t happen. For that reason the DMCA requires the one sending the notice to take on a lot of liability in the case they are wrong in order to prevent abuse of the system.

    Looking at how the system has been abused, the risks are not nearly severe enough.

  10. A good faith belief cannot include a known statistical factor that could have been easily investigated further before action was taken.

    I think that this is also the lay person’s interpretation.

  11. Liam Hegarty says:

    There is an even more basic question. Can a machine form a belief much less a good faith belief? Short of the singularity I don’t think so.