August 23, 2017

Twenty-First Century Wiretapping: Storing Communications Data

Today I want to continue the post-series about new technology and wiretapping (previous posts: 1, 2, 3), by talking about what is probably the simplest case, involving gathering and storage of data by government. Recall that I am not considering what is legal under current law, which is an important issue but is beyond my expertise. Instead, I am considering the public policy question of what rules, if any, should constrain the government’s actions.

Suppose the government gathered information about all phone calls, including the calling and called numbers and the duration of the call, and then stored that information in a giant database, in the hope that it might prove useful later in criminal investigations or foreign intelligence. Unlike the recently disclosed NSA call database, which is apparently data-mined, we’ll assume that the data isn’t used immediately but is only stored until it might be needed. Under what circumstances should this be allowed?

We can start by observing that government should not have free rein to store any data it likes, because storing data, even if it is not supposed to be accessed, still imposes some privacy harm on citizens. For example the possibility of misuse must be taken serious where so much data is at issue. Previously, I listed four types of costs imposed by wiretapping. At least two of those costs – the risk that the information will be abused, and the psychic cost of being watched (such as wondering about “How will this look?”) – apply to stored data, even if nobody is supposed to look at it.

It follows that, before storing such data, government should have to make some kind of showing that the expected value of storing the data outweighs the harms, and that there should be some kind of plan for minimizing the harms, for example by storing the data securely (even against rogue insiders) and discarding the data after some predefined time interval.

The most important safeguard would be an enforceable promise by government not to use the data without getting further permission (and showing sufficient cause). That promise might possibly be broken, but it changes the equation nevertheless by reducing the likelihood and scope of potential misuse.

To whom should the showing of cause be made? Presumably the answer is “a court”. The executive branch agency that wanted to store data would have to convince a court that the expected value of storing the data was sufficient, in light of the expected costs (including all costs/harms to citizens) of storing it. The expected costs would be higher if data about everyone were to be stored, and I would expect a court to require a fairly strong showing of significant benefit before authorizing the retention of so much data.

Part of the required showing, I think, would have to be an argument that there is not some way to store much less data and still get nearly the same benefit. An alternative to storing data on everybody is to store data only about people who are suspected of being bad guys and therefore are more likely to be targets of future investigations.

I won’t try to calibrate the precise weights to place on the tradeoff between the legitimate benefits of data retention and the costs. That’s a matter for debate, and presumably a legal framework would have to be more precise than I am. For now, I’m happy to establish the basic parameters and move on.

All of this gets more complicated when government wants to have computers analyze the stored data, as the NSA is apparently doing with phone call records. How to think about such analyses is the topic of the next post in the series.

Comments

  1. Richard says:

    Everyone (that is, business and government) now realizes that we live in an information age, and even if we do not now have a use (good or ill) for any particular type of information at present, it has become cheap enough to store currently-useless information that no one wants to waste its future potential. The ‘best’ policy from that point of view is to store everything against the day when some way is found to make that old information invaluable. The “making it useful” part is what is truly scary.

    Government keeping it’s own dataflow on citizens is bad enough, and worsened in this case by forcibly (if not actually legally) diverting and capturing business dataflows as well. But what about business dataflow itself? Are the phone companies maintaining permanent records of our calls for their own purposes now? Every company seems to want to join this database bandwagon, from grocery ‘customer loyalty cards’ on up through every transaction we make. I worry what happens when all these companies (and probably government too) start consolidating all these types of records into a meta-database to profile all our behavior, much as was done with our financial information by the credit bureaus. I never gave anyone permission for that … and I’m sure no one will ask me about the coming data consolidation, mining and profiling.

  2. I’m not sure this kind of calculus of harm versus benefit is terribly useful, because neither human beings nor conventional error-propagation techniques are much good at dealing with (what would turn out to be if you’re right about the need for such length discussion) small differences between very large numbers. One the one hand you have the risk of nuclear or biological catastropher; on the other, you have the risk of a corrupt police state so pervasive it makes normal life and commerce impossible, each with probabilities difficult to calculate. On the third hand you have a spectrum of lesser harms from both terrorism and life in the panopticon, also with intractable probability distributions. On the fourth you have the question of what use total surveillance will actually be in preventing major (or even minor) terrorist activities.

    It’s this last that decides the issue for me: the reports of counterproductiveness that filtered back from the NSA program, where thousands of false “leads” took investigators’ time away from more substantial avenues of inquiry, do not bode well. What they suggest is that to make significant beneficial use of the data turned up by mining such an archive, you would have to hire so many more boots-on-the-ground investigators as to turn the nation (or the world) into a police state. Or, in pseudo-quantitiative terms, multiplying your potential benefit by the negligible or even negative efficiency of the followup process gets you as close to zero as makes no difference.

    It’s tempting, of course, to imagine collecting such an archive, to be used only when the effectiveness of data-mining algorithms and investigative techniques have provably increased to a point where the tradeoffs would be positive, but that way lies stupid.

  3. The question is not whether the government should or should not collect such information, but whether it should be able to prevent public access to its repositories of such information and public inspection and use of the algorithms it uses to analyse it.

    Transparency is the name of the game.

    You cannot control information, its collection or analysis.

    You can only hope to assure the integrity of the government by making it as transparent as possible.

  4. dr2chase says:

    The cost-benefit question is an interesting one. I think P2P file sharing is going to make life hard for the wire tappers and traffic analyzers.

    Right now, P2P file-sharing on a popular torrent from my venerable Mac G4 has connections to us*, is, sg*, ca ,mx, gb*, fr*, de, hu*, nl, au, eg, cn, kr*, pl. About 25% of the connections are encrypted to thwart traffic shaping, all using RC4-160. Each country with at least one connection encrypted is indicated with a * above. So, from a home DSL line, 9Kb/second of unbreakably encrypted traffic, every second, sent all over the world, and if I judge from what I see, at least 10% of the P2P users are doing the same.

    I find puzzling that the spooks are willing to lean on the phone companies to get access to call records, but are not willing, or did not have the foresight, to lean on them to not shape traffic. It is unlikely that encrypted P2P would have appeared so soon, or be so widely deployed, if the ISPs were not discriminating against P2P traffic. If the ISPs start using even cleverer ways of detecting and impeding P2P, I am sure that the P2P programs will respond with new obfuscations.

  5. To clarify my previous post:

    You amend the constitution to make it a requirement that whatever information the government collects concerning the people it serves, along with any analysis or processing it uses, must be made available to the people.

    This requirement will tend to make the government a little more prudent in its choice of what information to collect.

    There is only public and private.

    There is no ‘private, but also known by the representatives of the people and its agents’ – unless you really fancy a vicarious police state.

  6. This only really applies to the Internet. Phone companies already store full calling records, and in the case of cell phones, the locations the calls are made from. They have to. They bill you for it. And all of this information is accessible to police, but they need the paperwork. I don’t trust the phone companies, but I trust them more than the government. Why should let the people we’re trying to protect our information from hold it, while the people who have legitimate use for it can do the same job?

    Next problem. In terms of the Internet, what are we actually talking about storing? A list of visited websites? We’re transferring more and more data. No matter how cheap storage gets we can’t possibly record everything that the entire nation transfers, especially as VOIP and video conferencing become more prominent.

  7. What I find to be perhaps the greatest cause for concern is that law enforcers typically fear embarrassment above all else. Once they’ve begun an investigation, based on whatever flimsy evidence they can lay their hands on, there is a tremendous amount of pressure to show that the suspect is guilty of something. The sort of guilt by association that call records will enable will not serve to convict, but will serve to launch investigations that end in frame-ups.

    Of course, having posted this, I’m now doubtless associated with some sort of terrorist-sympathizer cabal.

  8. For an example of Kevin’s concern, see the Yee investigation, which started with allegations of treason and espionage, and ended up with a plea to charges of adultery. Even the current partial-surveillance state suffers seriously from the lack of a downside to fishing expeditions.

  9. Throughout your posting, Ed, the elephant in the room, so to speak, is the word, “Constitution”. You never actually mention it anywhere–for the obvious reason that it’s far from clear that the US Constitution–even after being inflated by generations of quasi-religious interpreters into a kind of complete guide to life and law for the faithful–has so much as a word to say on the subject at hand. Nevertheless, your language is full of oblique references to what might be called a Constitutional approach to the issue: “government should not have free rein to store any data it likes”; “government should have to make some kind of showing that the expected value of storing the data outweighs the harms”; “The executive branch agency that wanted to store data would have to convince a court that the expected value of storing the data was sufficient”; and so on.

    None of this is the slightest bit obvious or even natural to those of us who think of democracy independently of the American Constitutional tradition. In other countries, what data the government stores about its citizens is (only) subject, like every other government policy, to the rules of democratic accountability. If the public are unfazed by government storage of a certain type of data, then abstract concerns about the government’s “free rein to store any data it likes” are irrelevant. Likewise, the tradeoffs between privacy and security inherent in such data storage are debated in the court of public opinion, not in “a court” (which in the US means, for all intents and purposes, the US Supreme Court). Finally, even if the public decides that some kind of independent oversight is necessary, judicial oversight is only one kind that it might prefer to apply–and it’s far from obvious that it’s the best kind. An independent or quasi-independent government board or agency could be established for the purpose, or a legislative committee could be given oversight powers, or (as has been suggested here in the comments) mandated disclosure, combined with a vigilant press and public, might be deemed sufficient.

    Of course, the American Constitutional approach is not entirely devoid of appealing features, and it certainly has plenty of devoted–one might say, religious–enthusiasts. It might be worthwhile thinking a bit about the alternatives, though, and at least making a pro forma case for your choice of a Constitutional approach, should you decide to stick with it.

  10. @paul – Indeed, Yee is an example of just the sort of thing I was speaking of. More recently, there’s the Florida grad student who has been in an investigation based on an anonymous denunciation of his horror fiction – apparently originating from someone who was disgruntled by the student’s moderation at Wikipedia. http://pulpdecameron.livejournal.com/52385.html

    Given how easy it is for law enforcement to plant contraband on someone’s person or property, I’m amazed that any investigations ever end up exonerating anyone. It really is a tribute to our officers that so many resist the temptation to find evidence of some crime or other.

    Nevertheless, I’m not prepared to surrender all my rights to them. Any legal system – including absolute dictatorship – is fine if run by angels. I want one that can be run by humans.

  11. @Dan Simon:

    I would certainly imagine that seizure of telephone records is at least colorably “unreasonable” under Amendment IV. Deciding on the “reasonableness” of the seizure is in the domain of the judiciary. While you argue that the Constitution is silent on the matter, I would argue that Amendment IV (and for that matter, Amendments IX and X) have quite a bit of bearing – and I would be astounded if any Federal judge would not at least admit that a plausible Constitutional question exists.

    There is certainly the question of whether the Executive Branch has overstepped the authority delegated to it by the Congress – which falls back on the old question of whether its powers are delegated by the working of legislation or inherent. Alas, past Supreme Court decisions seem to shed little light http://caselaw.lp.findlaw.com/data/constitution/article02/01.html

  12. Whatever the Constutition itself says, the important thing would seem to be the Fourth Amendment…

  13. I agree a bit with what Crosby Fitch says. Though their is a fine line with what the government deems public. I guess thats the big question in taht regard. I dont understand how our Corporate owned (wink wink) media doesnt jump all over this and show some real journalism. The growth of the blog world has shown we are hungry for honest answers and we care about whats going on. Now we just need to do something about it.

  14. the_zapkitty says:

    Yo, I suspect the SuncMax spammer will get deleted, and this reply as well, but that’s ok… I mean well. I just can’t let falsehoods stand… even lies by omission 🙂

    The problem was that neither Sony BMG nor SuncMax ASKED FOR USER PERMISSION before ‘dialing home’ [i]in the first place[/i].

    [b]That’s[/b] the kind of arrogance is that got you sued.

    A court-mandated investigation into whether Sony BMG or SuncMax had committed [i]further[/i] offenses with this data-gathering capability doesn’t invalidate the original admission: the software dialed home without permission.

    So you have missed the point entirely… which is a SuncMax specialty, ne?

  15. the_zapkitty says:

    [b][/b]…[i][/i]…[code][/code]…

    Can you tell I’ve spent too much time on phpBB and not enough responding to blogs? 🙂

  16. There should really be a standardization of the “tag language” on all these “postable sites”. 😛

    They admit they gathered album ids and IP addresses. The latter can sometimes be tied to an individual, and often to a small region such as a single city. If the album ids were serial numbers rather than all those for copies of a given album being the same, then those could be tied to a particular purchase, and that would associate the purchaser with a small region (where the store was) and may identify them exactly (CC and debit card purchase, including any mail-order or online purchase where there’s no “region where the store was”).

    This means they could, in fact, have hung a name on most of their data streams and tied nearly all of them to at least a geographical region no larger than a city. That comes damned close to effectively gathering personally-identifiable information.

  17. It seems to me that the whole concept of the governement storing phone records on a contingency basis misses most of the point of the wire tapping in the first place. Our governement ought to be concerned with preventive approaches to crime in general. Yes, convicting terrorists or what have you after the fact is important, but we should mitigate the potential in the first place.

  18. Dan,

    You’re right that I was thinking within the American constitutional tradition. The most important aspect of that tradition, for my purposes here, is the separation of powers. Rather than saying “the courts” needed to give permission, I might have said “another branch of government”.

    FISA is an instructive example here. The executive branch carries out surveillance; the judiciary determines whether the executive has made the necessary showing to justify a search; and the legislative branch passed the law that determines what the necessary showing is. That seems to match well with the natures of the three branches: the executive acts to gather intelligence and catch criminals; the judiciary makes factual determinations about specific cases; the legislative establishes general rules.

    Sometimes surveillance requires secrecy, so the specific target doesn’t know he is being watched. I’m very uneasy about allowing one branch of government to carry out secret surveillance, without some kind of oversight by another branch. The ability to have (relative) independent oversight is one of the big advantages of the separation of powers.

  19. Car Carl,

    Suppose that last week, in May, intelligence analysts discovered that Mr. A is probably a terrorist planning an attack in July. They want to know who talked with Mr. A in March and April; but they can’t do that unless the records were stored. So storing records can be useful in preventing crime.

  20. Note to readers:

    I deleted a lengthy off-topic comment here. It was unrelated to the topic of this post, and it consisted entirely of the text of a long document posted elsewhere on the web (rather than a link).

    Discussion of old posts belongs on those posts. Reposting of long documents available elsewhere (rather than linking, along with commentary supported by selective quotations) is not appropriate in the comments.

  21. I have no problem with gathering data to prevent a crime. But if they are to do so, it should be with the understanding that if obtained without a warrant despite an expectation of privacy, it cannot subsequently be used to convict. This seems to be the only sensible way to balance these interests in light of the fourth amendment.

    The current US govt. on the other hand, is guilty of all of the following: failing to prevent (if not actually causing) terrorist attacks about which they did have prior fairly specific knowledge; imprisonments without due process of law; incommunicado detainments without access even to legal counsel; extradition of prisoners to regimes that permit torture; operating their own extrajudicial prisons off home soil that engage in torture. In one notorious instance, a Canadian of Middle-Eastern descent trying to cross the border into the US was not simply turned back, but captured and deported — not to Canada, but to a regime in the Middle East with one of the worst human-rights records out there, and this after having, as I recall, fled that regime previously to live as a refugee in Canada.

    I would be very uncomfortable with a precancerous government storing all that data, whatever its constitution said about how it could be used judicially, given how it might be used extrajudicially beyond merely crime prevention. The present US government is so far out of balance it isn’t funny — the executive branch is out of control, the legislative branch is authorizing it, and the judiciary, which should rein both of them in, is sitting around with thumbs up their asses when they aren’t handing down execrable verdicts like Blizzard v. BnetD. IOW, when they’re not just doing nothing, as likely as not they’re actually part of the problem.

    What the hell happened? It seems to have been sliding downhill since the second world war, increasingly disengaging more and more regulatory mechanisms, much like a precancerous cell…

  22. Actually, when I said “what the hell happened”, I didn’t intend it as a rhetorical question; I’m genuinely curious to know. 😛

  23. They’ve wanted to do this for a long time. It would not surprise me much to find out that it’s been going on for longer than a few years.

    In the mid-80s I heard AI researcher Ed Feigenbaum speak at Stetson University in central Florida. He recounted how some unnamed government agency had contracted him to do a feasability study on just this kind of analysis of phone call data. He didn’t say whether he’d come to the conclusion that it would be feasable.