January 16, 2025

Twenty-First Century Wiretapping: Recognition

For the past several weeks I’ve been writing, on and off, about how technology enables new types of wiretapping, and how public policy should cope with those changes. Having laid the groundwork (1; 2; 3; 4; 5) we’re now ready for to bite into the most interesting question. Suppose the government is running, on every communication, some algorithm that classifies messages as suspicious or not, and that every conversation labeled suspicious is played for a government agent. When, if ever, is government justified in using such a scheme?

Many readers will say the answer is obviously “never”. Today I want to argue that that is wrong – that there are situations where automated flagging of messages for human analysis can be justified.

A standard objection to this kind of algorithmic triggering is that authority to search or wiretap must be based on individualized suspicion, that is, that there must be sufficient cause to believe that a specific individual is involved in illegal activity, before that individual can be wiretapped. To the extent that that is an assertion about current U.S. law, it doesn’t answer my question – recall that I’m writing here about what the legal rules should be, not what they are. Any requirement of individualized suspicion must be justified on the merits. I understand the argument for it on the merits. All I’m saying is that that argument doesn’t win by default.

One reason it shouldn’t win by default is that individualized suspicion is sometimes consistent with algorithmic recognition. Suppose that we have strong cause to believe that Mr. A is planning to commit a terrorist attack or some other serious crime. This would justify tapping Mr. A’s phone. And suppose we know Mr. A is visiting Chicago but we don’t know exactly where in the city he is, and we expect him to make calls on random hotel phones, pay phones, and throwaway cell phones. Suppose further that the police have good audio recordings of Mr. A’s voice.

The police propose to run automated voice recognition software on all phone calls in the Chicago area. When the software flags a recording as containing Mr. A’s voice, that recording will be played for a police analyst, and if the analyst confirms the voice as Mr. A’s, the call will be recorded. The police ask us, as arbiters of the public good, for clearance to do this.

If we knew that the voice recognition algorithm would be 100% accurate, then it would be hard to object to this. Using an automated algorithm would be more consistent with the principle of individualized suspicion than would be the traditional approach of tapping Mr. A’s home phone. His home phone, after all, might be used by an innocent family member or roommate, or by a plumber working in his house

But of course voice recognition is not 100% accurate. It will miss some of Mr. A’s calls, and it will incorrectly flag some calls by others. How serious a problem is this? It depends on how many errors the algorithm makes. The traditional approach sometimes records innocent people – others might use Mr. A’s phone, or Mr. A might turn out to be innocent after all – and these errors make us cautious about wiretapping but don’t preclude wiretapping if our suspicion of Mr. A is strong enough. The same principle ought to hold for automated voice recognition. We should be willing to accept some modest number of errors, but if errors are more frequent we ought to require a very strong argument that recording Mr. A’s phone calls is of critical importance.

In practice, we would want to set out crisply defined criteria for making these determinations, but we don’t need to do that exercise here. It’s enough to observe that given sufficiently accurate voice recognition technology – which might exist some day – algorithmically triggered recording can be (a) justified, and (b) consistent with the principle of individualized suspicion.

But can algorithmic triggering be justified, even if not based on individualized suspicion? I’ll argue next time that it can.

Comments

  1. Walter Faxon says

    When the police start using machines to recognize voices, to ensure their continued utility there must be a simultaneous ban on machines or computer programs that distort human speech (in real time) in a way that police machines cannot follow. Possession of such a machine — or access to such a program, even a web application that leaves no permanent trace on the suspect’s computer — would be primae facie evidence of criminal/terrorist intent.

    Not much room left for tinkering there.

  2. Thanks for your comments. Most of these issues I plan to deal with in future posts. That includes the false positive problem, the concern that the infrastructure for content-based triggering invites abuse, and the question of how to reconcile these arguments with current law.

  3. The point I wanted to make has been made in two different ways by Jim Lyons and Michael Weiksner, I’ll add a third–if your detection is anything less than perfect (which of course will be the case), then if your baseline rates of what you are testing for are small, you’ll get more false positives than real positives. This same point has been made by Bruce Schneier about mass wiretapping, and by the mathematician John Allen Paulos in his book _Innumeracy_ about whether you should be concerned about a positive medical test for a serious disease. This is why it’s foolish to have mandatory drug testing in grade schools or mandatory HIV testing as a condition of marriage (the few states that had it scrapped it because it was costing hundreds of thousands of dollars per genuine positive result–it makes more sense to target specific at-risk populations for HIV education, screening, and treatment).

  4. The question/problem is posed wrongly. The real issue is not whether to monitor a given individual/entity in the presence of unknown communication paths (i.e. roving wiretaps), but that this opens up a new avenue of obtaining unauthorized wiretaps that are hard to flag/prove.

    For the sake of argument I presume that current person-bound wiretaps are implemented with the cooperation of telco companies. That is, when wiretapping a line/cell phone, the individual phone line is not bugged, but the traffic is intercepted inside the telco operator’s network(s), using mandated interfaces. Let’s assume a warrant has to be presented prior to or within a set time of obtaining the intercept of an individual/entity.

    With “comprehensive” wiretaps, any technically feasible way of intercepts will require routing a large enough portion of the traffic to/through agency equipment without individual warrants, and for reasons of “sovereignty” and “national security”, agencies will not/cannot disclose what is happening inside the equipment, i.e. what is searched for and what is recorded.

    Within technical feasibility, this will provide means of virtually unlimited and unprovable illicit surveillance.

    From here on the rest is bickering about minor detail, and Devonavar has outlined some of the social implications.

  5. There’s also the consideration that in order to search based on voice pattern, you have to have a high quality recording of a person’s voice. If you start treating voice patterns like fingerprints, this becomes a powerful protection mechanism. Let’s say that calls aren’t recorded unless they’re flagged, and calls are only flagged when they match a voice pattern in the database. If the database consists of voice patterns that are a part of an ongoing criminal investigation which this is authorized for, then it can be flagged for human analysis. This covers the terrorism issue, since you could get the voice pattern of suspected terrorists through conventional wiretaps. It also keeps the system from doing anything to people who aren’t being investigated.

  6. Perhaps you will address this later, but I think the standard relates to Bayes. The classic example is the random drunk driving checkpoint. Although the breathalizer is 95% accurate, only 2% of drivers are actually drunk drivers. Therefore, the chance that you are a drunk driver *given* that you test positive at a check point is: 28% (.95*.2/(.95*.2+.05*.98), not something that passes the reasonable doubt threshold.

    What’s the percentage of communications that are terrorist related? .01%? .00000000001%? You need to have a VERY accurate system before you get anything but false positives. That’s why it’s dangerous to add more hay when you searching for a needle in the haystack.

  7. john erickson says

    Let me preface these comments by stating that I am extremely skeptical that the alleged analysis is as limited in scope as they claim, simply because I don’t believe the NSA knows what they’re looking for. IF a massive analysis of phone records is indeed being performed, I would expect it to be applied across the board to everything, to uncover emergent patterns and thus uncover leads. I am NOT suggesting that they actually DO this; I’m just saying that IF they are doing this, they should maximise the data set…

    There are two general approaches to crunching and categorizing the data. In the first, supervised categorization, the target categories (i.e. the target dimensionality) are known and each item is best-fit based on a vector of extracted features. In some forms, the number of targets is fixed but their definition is not. In the second approach, unsupervised categorization, an unlimited number of categories is allowed to emerge by discovering natural affinities between items. In my view the first is a waste of time UNLESS you know EXACTLY what you are looking for; the second seemes to be what you need to discover patterns that you didn’t have prior knowledge of.

    Unsupervised categorization seems more interesting because it has the potential to unveil patterns that we SHOULD be paying attention to. The problem with UC is that in order to increase its computational efficiency, we must use feature vectors (i.e. dimensionally reduce) as surrogates for the real content; what granularity is sufficient? Should we detect keywords? Keyphrases? If you don’t know what you’re looking for, you had better have a pretty large feature vector befitting of the fact that you are clueless, so that your categorization algorithm can better produce leads…

    Then there is the legal side. To echo John Jordan’s point, what is the legal standing of all of this? The last time I checked, the authorities couldn’t file warrants based on my phone message(s) producing feature vectors that resulted in some algorithm lumping my message(s) in with certain others.

  8. John Jordan says

    On first read, I disagreed with the premise. However, should such a system exist, we need to have certain controls to prevent abuse.

    First, any mass monitoring should be based on the individual and not the whole population. That is, there are no warrants for keywords in any conversation captured.

    Second, the system should have at least two levels of disinterested parties validating (a) the speaker is the target of the warrant and (b) the content of the conversation matches those topics for which the warrant grants access. Once these two verification steps are completed, then, the law enforcement body has access to the conversation.

    This is a model similar to one I read in Conde Nast magazine (sorry, no link 🙁 ) about how Heathrow scans passenger bags. Any bag that fails a test is subject to greater scrutiny on the next level, and only after two or three automated scans (and failures) is the bag subject to inspection by a person.

    By following this model, we get some assurance that only the relevant calls made by the suspect are subject to scrutiny. The rest of the law abiding citizens that either sound similar to the suspect or use words that could be suspect are not immediately targetted for investigation.

  9. Jim Lyon says

    In Ed’s hypothetical, the automated algorithm would need extraordinaryly high accuracy to pass the sniff test. This is true of every algorithm that attempts to find a needle in a haystack.

    I would guess that Chicago as about 10,000,000 phone calls on a given day. If the putative algorithm is 99% accurate, it will flag 100,000 of these calls as suspicious. If our suspect made 10 of these calls, there is only a 0.0001 chance that a flagged call is from the target of the investigation. I don’t think that this creates a reasonable suspicion to listen to the call.

    In order to have even a 50-50 chance that a call is from the suspect, the algorithm would need a 99.9999% accuracy, which I believe would will be unattainable for the foreseeable future.

    Overall, I agree with Devonavar: mass surveillance is a lousy tool for investigation, but it’s a great tool for maintaining power. We need to approach the discussion from that direction.

  10. If the police have that much information on Mr A and they can’t come up with ways to catch him using existing capability, doesn’t that make them incompetent ?
    And there are practicalities to be oberved – e.g. if the auto scanner doesn’t reconginze the voice does that mean it’s not the voice they are looking for or that it’s been encrypted ? do you ban encryption, and would that help in this scenario (if you think encryption is being used what do the police do) ?

    To be effective laws need to be obervant of practical realities – there may be some practical scenario that illustrates a justification to auto scan all phone calls but I don’t think you’ve found it yet.

  11. I find that your “standard argument” for never allowing mass wiretapping does not even come close to my own concerns about wiretapping. I can’t speak for the masses, but my own feeling is that the issue isn’t individual suspicion, but whether we should allow the creation of a tool that can facilitate perfect enforcement — of anything.

    I would say that mass wiretapping is less a tool of enforcement than a tool of power. Mass wiretapping opens up possibilities for mass manipulation that should never be opened. I believe there is general consensus on this, if the typical reaction to 1984 is anything to judge by.

    I object to allowing machines to judge what constitutes suspicious activity, and I object even more to allowing humans to do so in an environment where every citizen can be monitored. The first objection does not relate to the possibility of the machine being mistaken, but to the fact that I do not trust any algorithm a human can create to take context, social factors, or changing notions of justice fully into account. The second objection is the power objection. I do not want to live in a society where every word we say is purposefully monitored. Such monitoring opens up the possiblity of perfect enforcement, which I do not think is anywhere near as good as you make out.

    The problem with perfect enforcement is at least twofold:

    1) It makes possible the perfect enforcement of questionable laws. Do we really want to make it possible for entire segments of society to be sectioned off and labelled “in violation of section 3.ii of the anti-clown act”? On a more serious note, do we want it to be possible to keep watch on, say, all people of Mexican descent? Such a questionable act could easily be justified with statistics so long as the right (wrong) people are in power. After all, Mexicans are more likely than other groups to be illegal immigrants, right?

    2) It fosters an environment of paranoia. People self-sensor when they believe they can be heard. Do we really want this force to become strong enough that people stop questioning the laws in society? I see a strong force that prevents unjust laws from being changed here.

    To sum up, I don’t think the accuracy of the algorithms that are applied in mass wiretapping is the issue. My concern is that the potential for abuse (and the severity of that abuse) far outweighs the potential benefit of better or perfect enforcement.

  12. Roving wiretaps have been around for years, so the idea of automating them seems reasonable plausible. To what extent, however, does automation create a demand that wasn’t there before — at least partly to justify the price of the infrastructure? To what extent does it lower the perceived cost of doing the tap, so that requests will be filed as a matter of course?

  13. My concern is what happens after the algorithm triggers. In your example, law enforcement is placed into the loop to verify the algorithm’s result. However, as the number of potential “hits” go up the temptation will be to take immediate and automatic action without the encumberance of a FISA-like review.

    For example, take the NSA call detail logging scenario that appears to be going on right now. When given the phone number of a known or suspected terrorist, the algorithm might be to search all incoming and outgoing calls from that number and place the people who own *those* numbers (or all that live at the billing address) on the terrorist watch list. After all, if they’re not terrorists, they will notify the government and ask to have themselves taken off the list, right? Easier said than done.

    http://www.baltimoresun.com/news/opinion/oped/bal-op.terrorist28may28,0,2723012.story