March 29, 2024

Twenty-First Century Wiretapping: Recognition

For the past several weeks I’ve been writing, on and off, about how technology enables new types of wiretapping, and how public policy should cope with those changes. Having laid the groundwork (1; 2; 3; 4; 5) we’re now ready for to bite into the most interesting question. Suppose the government is running, on every communication, some algorithm that classifies messages as suspicious or not, and that every conversation labeled suspicious is played for a government agent. When, if ever, is government justified in using such a scheme?

Many readers will say the answer is obviously “never”. Today I want to argue that that is wrong – that there are situations where automated flagging of messages for human analysis can be justified.

A standard objection to this kind of algorithmic triggering is that authority to search or wiretap must be based on individualized suspicion, that is, that there must be sufficient cause to believe that a specific individual is involved in illegal activity, before that individual can be wiretapped. To the extent that that is an assertion about current U.S. law, it doesn’t answer my question – recall that I’m writing here about what the legal rules should be, not what they are. Any requirement of individualized suspicion must be justified on the merits. I understand the argument for it on the merits. All I’m saying is that that argument doesn’t win by default.

One reason it shouldn’t win by default is that individualized suspicion is sometimes consistent with algorithmic recognition. Suppose that we have strong cause to believe that Mr. A is planning to commit a terrorist attack or some other serious crime. This would justify tapping Mr. A’s phone. And suppose we know Mr. A is visiting Chicago but we don’t know exactly where in the city he is, and we expect him to make calls on random hotel phones, pay phones, and throwaway cell phones. Suppose further that the police have good audio recordings of Mr. A’s voice.

The police propose to run automated voice recognition software on all phone calls in the Chicago area. When the software flags a recording as containing Mr. A’s voice, that recording will be played for a police analyst, and if the analyst confirms the voice as Mr. A’s, the call will be recorded. The police ask us, as arbiters of the public good, for clearance to do this.

If we knew that the voice recognition algorithm would be 100% accurate, then it would be hard to object to this. Using an automated algorithm would be more consistent with the principle of individualized suspicion than would be the traditional approach of tapping Mr. A’s home phone. His home phone, after all, might be used by an innocent family member or roommate, or by a plumber working in his house

But of course voice recognition is not 100% accurate. It will miss some of Mr. A’s calls, and it will incorrectly flag some calls by others. How serious a problem is this? It depends on how many errors the algorithm makes. The traditional approach sometimes records innocent people – others might use Mr. A’s phone, or Mr. A might turn out to be innocent after all – and these errors make us cautious about wiretapping but don’t preclude wiretapping if our suspicion of Mr. A is strong enough. The same principle ought to hold for automated voice recognition. We should be willing to accept some modest number of errors, but if errors are more frequent we ought to require a very strong argument that recording Mr. A’s phone calls is of critical importance.

In practice, we would want to set out crisply defined criteria for making these determinations, but we don’t need to do that exercise here. It’s enough to observe that given sufficiently accurate voice recognition technology – which might exist some day – algorithmically triggered recording can be (a) justified, and (b) consistent with the principle of individualized suspicion.

But can algorithmic triggering be justified, even if not based on individualized suspicion? I’ll argue next time that it can.

Twenty-First Century Wiretapping: Your Dog Sees You Naked

Suppose the government were gathering information about your phone calls: who you talked to, when, and for how long. If that information were made available to human analysts, your privacy would be impacted. But what if the information were made available only to computer algorithms?

A similar question arose when Google introduced its Gmail service. When Gmail users read their mail, they see advertisements. Servers at Google select the ads based on the contents of the email messages being displayed. If the email talks about camping, the user might see ads for camping equipment. No person reads the email (other than the intended recipient) – but Google’s servers make decisions based on the email’s contents.

Some people saw this as a serious privacy problem. But others drew a line between access by people and by computers, seeing access by even sophisticated computer algorithms as a privacy non-event. One person quipped that “Worrying about a computer reading your email is like worrying about your dog seeing you naked.”

So should we worry about the government running computer algorithms on our call data? I can see two main reasons to object.

First, we might object to the government gathering and storing the information at all, even if the information is not (supposed to be) used for anything. Storing the data introduces risks of misuse, for example, that cannot exist if the data is not stored in the first place.

Second, we might object to actions triggered by the algorithms. For example, if the algorithms flag certain records to be viewed by human analysts, we might object to this access by humans. I’ll consider this issue of algorithm-triggered access in a future post – for now, I’ll just observe that the objection here is not to the access by algorithms, but to the access by humans that follows.

If these are only objections to algorithmic analysis of our data, then it’s not the use of computer algorithms that troubles us. What really bothers us is access to our data by people, whether as part of the plan or as unplanned abuse.

If we could somehow separate the use of algorithms from the possibility of human-mediated privacy problems, then we could safely allow algorithms to crawl over our data. In practice, though, algorithmic analysis goes hand in hand with human access, so the question of how to apportion our discomfort is mostly of theoretical interest. It’s enough to object to the possible access by people, while being properly skeptical of claims that the data is not available to people.

The most interesting questions about computerized analysis arise when algorithms bring particular people and records to the attention of human analysts. That’s the topic of my next post.

Twenty-First Century Wiretapping: Storing Communications Data

Today I want to continue the post-series about new technology and wiretapping (previous posts: 1, 2, 3), by talking about what is probably the simplest case, involving gathering and storage of data by government. Recall that I am not considering what is legal under current law, which is an important issue but is beyond my expertise. Instead, I am considering the public policy question of what rules, if any, should constrain the government’s actions.

Suppose the government gathered information about all phone calls, including the calling and called numbers and the duration of the call, and then stored that information in a giant database, in the hope that it might prove useful later in criminal investigations or foreign intelligence. Unlike the recently disclosed NSA call database, which is apparently data-mined, we’ll assume that the data isn’t used immediately but is only stored until it might be needed. Under what circumstances should this be allowed?

We can start by observing that government should not have free rein to store any data it likes, because storing data, even if it is not supposed to be accessed, still imposes some privacy harm on citizens. For example the possibility of misuse must be taken serious where so much data is at issue. Previously, I listed four types of costs imposed by wiretapping. At least two of those costs – the risk that the information will be abused, and the psychic cost of being watched (such as wondering about “How will this look?”) – apply to stored data, even if nobody is supposed to look at it.

It follows that, before storing such data, government should have to make some kind of showing that the expected value of storing the data outweighs the harms, and that there should be some kind of plan for minimizing the harms, for example by storing the data securely (even against rogue insiders) and discarding the data after some predefined time interval.

The most important safeguard would be an enforceable promise by government not to use the data without getting further permission (and showing sufficient cause). That promise might possibly be broken, but it changes the equation nevertheless by reducing the likelihood and scope of potential misuse.

To whom should the showing of cause be made? Presumably the answer is “a court”. The executive branch agency that wanted to store data would have to convince a court that the expected value of storing the data was sufficient, in light of the expected costs (including all costs/harms to citizens) of storing it. The expected costs would be higher if data about everyone were to be stored, and I would expect a court to require a fairly strong showing of significant benefit before authorizing the retention of so much data.

Part of the required showing, I think, would have to be an argument that there is not some way to store much less data and still get nearly the same benefit. An alternative to storing data on everybody is to store data only about people who are suspected of being bad guys and therefore are more likely to be targets of future investigations.

I won’t try to calibrate the precise weights to place on the tradeoff between the legitimate benefits of data retention and the costs. That’s a matter for debate, and presumably a legal framework would have to be more precise than I am. For now, I’m happy to establish the basic parameters and move on.

All of this gets more complicated when government wants to have computers analyze the stored data, as the NSA is apparently doing with phone call records. How to think about such analyses is the topic of the next post in the series.

Zfone Encrypts VoIP Calls

Phil Zimmerman, who created the PGP encryption software, and faced a government investigation as a result, now offers a new program, Zfone, that provides end-to-end encryption of computer-to-computer (VoIP) phone calls, according to a story in yesterday’s New York Times.

One of the tricky technical problems in encrypting communications is key exchange: how to get the two parties to agree on a secret key that only they know. This is often done with a cumbersome “public key infrastructure” (PKI), which wouldn’t work well for this application. Zfone has a clever key exchange protocol that dispenses with the PKI and instead relies on the two people reading short character strings to each other over the voice connection. This will provide a reasonably secure shared secret key, as long as the two people recognize each others’ voices.

(Homework problem for security students: What does the string-reading accomplish? Based on just the information here, how do you think the Zfone key exchange protocol works?)

In the middle of the article is this interesting passage:

But Mr. Zimmermann, 52, does not see those fearing government surveillance — or trying to evade it — as the primary market [for Zfone]. The next phase of the Internet’s spyware epidemic, he contends, will be software designed to eavesdrop on Internet telephone calls made by corporate users.

“They will have entire digital jukeboxes of covertly acquired telephone conversations, and suddenly someone in Eastern Europe is going to be very wealthy,” he said.

Though the article doesn’t say so directly, this passage seems to imply that Zfone can protect against spyware-based eavesdropping. That’s not right.

One of the challenges in using encryption is that the datastream is not protected before it is encrypted at the source, or after it is decrypted at the destination. If you and I are having a Zfone-protected conversation, spyware on your computer could capture your voice before it is encrypted for transmission to me, and could also capture my voice after it is decrypted on your computer. Zfone is helpless against this threat, as are other VoIP encryption schemes.

All of this points to an interesting consequence of strong encryption. As more and more communications are strongly encrypted, would-be spies have less to gain from wiretapping and more to gain from injecting malware into their targets’ computers. Yet another reason to expect a future with even more malware.

Twenty-First Century Wiretapping: Not So Hypothetical

Two weeks ago I started a series of posts (so far: 1, 2) about how new technologies change the policy issues around government wiretapping. I argued that technology changed the policy equation in two ways, by making storage much cheaper, and by enabling fancy computerized analyses of intercepted communications.

My plan was to work my way around to a carefully-constructed hypothetical that I designed to highlight these two issues – a hypothetical in which the government gathered a giant database of everybody’s phone call records and then did data mining on the database to identify suspected bad guys. I had to lay a bit more groundwork before getting to the hypothetical, but I was planning to get to it after a few more posts.

Events intervened – the “hypothetical” turned out, apparently, to be true – which makes my original plan moot. So let’s jump directly to the NSA call-database program. Today I’ll explain why it’s a perfect illustration of the policy issues in 21st century surveillance. In the next post I’ll start unpacking the larger policy issues, using the call record program as a running example.

The program illustrates the cheap-storage trend for obvious reasons: according to some sources, the NSA’s call record database is the biggest database in the world. This part of the program probably would not have been possible, within the NSA’s budget, until the last few years.

The data stored in the database is among the least sensitive (i.e., private) communications data around. This is not to say that it has no privacy value at all – all I mean is that other information, such as full contents of calls, would be much more sensitive. But even if information about who called whom is not particularly sensitive for most individual calls, the government might, in effect, make it up on volume. Modestly sensitive data, in enormous quantities, can add up to a big privacy problem – an issue that is much more important now that huge databases are feasible.

The other relevant technology trend is the use of automated algorithms, rather than people, to analyze communications traffic. With so many call records, and relatively few analysts, simple arithmetic dictates that the overwhelming majority of call records will never be seen by a human analyst. It’s all about what the automated algorithms do, and which information gets forwarded to a person.

I’ll start unpacking these issues in the next post, starting with the storage question. In the meantime, let me add my small voice to the public complaints about the NSA call record program. They ruined my beautiful hypothetical!