Suppose the government were gathering information about your phone calls: who you talked to, when, and for how long. If that information were made available to human analysts, your privacy would be impacted. But what if the information were made available only to computer algorithms?
A similar question arose when Google introduced its Gmail service. When Gmail users read their mail, they see advertisements. Servers at Google select the ads based on the contents of the email messages being displayed. If the email talks about camping, the user might see ads for camping equipment. No person reads the email (other than the intended recipient) – but Google’s servers make decisions based on the email’s contents.
Some people saw this as a serious privacy problem. But others drew a line between access by people and by computers, seeing access by even sophisticated computer algorithms as a privacy non-event. One person quipped that “Worrying about a computer reading your email is like worrying about your dog seeing you naked.”
So should we worry about the government running computer algorithms on our call data? I can see two main reasons to object.
First, we might object to the government gathering and storing the information at all, even if the information is not (supposed to be) used for anything. Storing the data introduces risks of misuse, for example, that cannot exist if the data is not stored in the first place.
Second, we might object to actions triggered by the algorithms. For example, if the algorithms flag certain records to be viewed by human analysts, we might object to this access by humans. I’ll consider this issue of algorithm-triggered access in a future post – for now, I’ll just observe that the objection here is not to the access by algorithms, but to the access by humans that follows.
If these are only objections to algorithmic analysis of our data, then it’s not the use of computer algorithms that troubles us. What really bothers us is access to our data by people, whether as part of the plan or as unplanned abuse.
If we could somehow separate the use of algorithms from the possibility of human-mediated privacy problems, then we could safely allow algorithms to crawl over our data. In practice, though, algorithmic analysis goes hand in hand with human access, so the question of how to apportion our discomfort is mostly of theoretical interest. It’s enough to object to the possible access by people, while being properly skeptical of claims that the data is not available to people.
The most interesting questions about computerized analysis arise when algorithms bring particular people and records to the attention of human analysts. That’s the topic of my next post.
Re automated vs. human review: Even if review is automated, it implements human (imposed) judgement. The automated procedure is just a tool. When you hit somebody with a car, you can claim the car hit the person and you only pressed the gas pedal (or perhaps not even that), but nobody is going to buy that. So why should anybody buy the argument that automated judgement is not tantamount to human judgement? In practice it is likely to be even worse because it has to be simplistic for the reason of humanly easily understood concepts not being easily formalized.
Dennis D. McDonald: More likely than Verizon making money with providing your call data, they are being compelled to bear the incurred extra cost, and have worked this into the price structure that you are being charged. One of those “costs of doing business”, besides excessive employee wages and healthcare costs.
FISA approval should be required for a human to view a record that was kicked out by automated analysis. And it is feasible to design a management system for FISA to do this efficiently.
In the mid-90’s I reviewed a book by Diffie called Privacy on the Line which discussed abuses of wiretapping power. Most of these abuses can be curtailed with simple oversight solutions which cause a more independent set of eyes to take a look at what is happening.
Ever seen a movie called “the net”? With that hot actress? Scary stuff.
I think that the notion of human access to data groveled by algorithms is really a proxy for potentially permanent “real-world” consequences of any kind. Obviously if there are no consequences — or no consequences outside your own head, as with the gmail ads — it’s more difficult to get upset. But of course no one is going to spend terabytes of storage and untold petacycles of CPU on getting results that will uniformly be thrown away.
So which real-world consequences would be OK and which wouldn’t? Should you be put on a permanent terrorist watch list as a result of your calling patterns (even if the calls never see human scrutiny)? What about a few percent higher chance of selection for extra screening at the airport? What about a few more points on your probability of a tax audit? A few points off your credit score?
In some ways, having conequences for data-mining that don’t directly involve human scrutiny scares me more than the idea of having phone records and recordings bumped to a human analyst, because those consequences imply (or perhaps ultimately require) a densely-connected, completely nontransparent web of connections among the various authentican and authorization infrastructures in our lives. And that in turn means that the olf science fiction scenario of finding your credit cards, driver’s license, passports and internet access all revoked at the same time for no discoverable reason becomes a little more plausible.
I think it’s quite likely that Google has made gmail searchable by NSA agents, in a similar way to AT&T’s voluntary compliance with warrentless wiretapping.
I guess “human actions” covers any action resulting from the automatic scanning, it seems to me ther could be all manner of escallations of automatic actions before any genuine human is triggered, and I think they are all objectionable (unlike my dog or my spam filter which i am aware of and to a large extent control).
But also there are other objectionable consequences, such as handing over to commercial organisations bulk information in an “anonymous” manner – for example suppose the government decides it can raise funds by selling info on “suggested” targets derived from the auto scanned data. they wouldn’t need to hand over the actual data, And they wouldn’t need to identify individuals -could be just families or communities- but privacy is still violated and this abuse would be very difficult to detect.
I agree that a major concern is with the human actions that follow the triggers created by algorithmic scanning, not the algorithimic scanning itself. If no human actions were to be triggered, there would be no reason for the NSA scanning of non-content related phone data.
I expect that in a few months we will learn about the triggering of non-court-approved actions taken to examine phone call content based on analysis of call patterns.
However, if Verizon is going to make money from selling data about my calling patterns to the federal government, I believe that (a) Verizon should disclose that fact to me and (b) I should receive a percentage of the money earned, perhaps in a line item discount on my monthly phone bill.
Professor Felton raises the Gmail’s scanning issue as an example of the erosion of privacy through automated scanning, but Gmail’s scanning is not of the same type scale of freedom erosion as the NSA database.
– Google gave notice before scanning Gmail messages. NSA gave no notice.
– We can choose another email provider instead of Gmail. Many of us cannot choose a phone provider that does not provide data to the NSA.
– We can choose to encrypt all outgoing email so that any ISP/Gmail based scanner has nothing to scan. The NSA data is connection data, not content.
– Finally, once again, he NSA data is connection data, not content. To be equivalent, Gmail would have to scan the email headers, determine who you are corresponding with and then select ads based on the identity of the sender.
I’ve got to agree with Hal here. At issue are the severe actions that the government can take. The government ability to imprison people means that the Gmail example in the introduction distracts us from the real issue. Governments have the ability to misuse data to an extent that clearly surpasses corporate misuse. This is why we need to demand even more openness from governments than from corporations.
I don’t think that a principal argument against automated analysis of trafic data can be made, when the results are either anonymous statistics or thrown away without being seen by a human. There are quite a few practical arguments to be made against the data collection activities of the NSA.
First some background: History has shown that not every government acts in the best interests of its citicens, we can all point to governments in recent history that acted outright repressive. Even governments that were democraticly elected turned onto their population.
There are three hard questions that a democracy needs to address:
a) How do we minimize our chances that a thug gets elected?
b) How do we reduce the damage that a thug can do as leader?
c) How do we get rid of the thug that we incidently elected?
With respect to b), crime needs darkness and secrecy. Transparancy in government and accountabilty for government actions can limit the damage an incidental bad choice of representative does.
The NSA tried to keep her wiretaps and collection of call records secret. They are fighting a call for accountability by claiming state secret privillege. Such behaviour is not something one should expect from a government organisation in a democratic country.
Brings to mind “Minority Report” by Philip K. Dick. Made into a movie in 2002. The Precogs are essentially algorithms for identifying potential evildoers. Furthermore, as our hero Anderton (Tom Cruise) cruised through the mall he was presented with individualized adds derived from remote sensors reading his identity. Do we have a copyright problem here? 🙂
Email is not ordinarily especially private. Google can read your gmail. Your isp can read it too, and your regular email. So can anyone with a sniffer on a networked machine along the route the data travels. Mail with sensitive contents should be encrypted.
More generally, perhaps we need a movement of cypher-punk types to engage in self-help in these areas. Preferably a large number of people, all of whom do things like gratuitously encrypt ordinary, not-sensitive emails, surf with anonymizing proxies, use tor and freenet, share a bunch of (innocuous) files via BT, place random local hangup calls now and again, and more generally act so as to confound any attempts by prying eyes (or computers) at traffic analysis or outright capture of their communications. The philosophy being first amendment and one of “if companies and governments want this info that badly, make ’em work for it!”
It’s not a matter of privacy, it’s a matter of freedom. There’s no privacy difference between AT&T knowing my phone records and the U.S. government knowing them. The mere fact that the government was able to collect these phone records from businesses means that my privacy was already gone.
The difference is that AT&T cannot throw me into prison if it sees something it doesn’t like in my phone contacts, and the government can.
Positioning the issue as one of privacy is a big mistake. There is NO privacy in phone call records, by their nature. The issue is one of government action and possible infringement on freedom.
I would agree with your statement that our discomfort stems mostly from the potential for human access to this data. Of course, access by algorithms must lead to human access of this information if it is to be of any use. In the example cited of looking at whom someone had talked to and for how long, isn’t that sort of information already available to law enforcement with a warrant? And to target any individual, we would theoretically have to have information outside of the simple call info about a suspected terrorist. So, if we can already get this sort of information about individuals when we have reason to suspect them of something, I don’t see the real benefit of this system to outweigh the current regime.
And as a rather obvious corollary of this discussion, the reason why Gmail’s ads are not “a serious privacy problem” is that the only person viewing the results of Google’s email-reading, ad-selecting algorithm is the person to which the email was sent… which is along the line of what I thought when that whole hoopla about Gmail’s ads erupted. As an aside, people that don’t want any computer program to read their email would have to start by disabling their spam filters…
If the ads for a particular email were accessible to someone else, then it would be a serious privacy problem.