May 27, 2022

Archives for May 2006

Art of Science, and Princeton Privacy Panel

Today I want to recommend two great things happening at Princeton, one of which is also on the Net.

Princeton’s second annual Art of Science exhibit was unveiled recently, and it’s terrific, just like last year. Here’s some background, from the online exhibit:

In the spring of 2006 we again asked the Princeton University community to submit images—and, for the first time, videos and sounds—produced in the course of research or incorporating tools and concepts from science. Out of nearly 150 entries from 16 departments, we selected 56 works to appear in the 2006 Art of Science exhibition.

The practices of science and art both involve the single-minded pursuit of those moments of discovery when what one perceives suddenly becomes more than the sum of its parts. Each piece in this exhibition is, in its own way, a record of such a moment. They range from the image that validates years of research, to the epiphany of beauty in the trash after a long day at the lab, to a painter’s meditation on the meaning of biological life.

You can view the exhibit online, but the best way to see it is in person, in the main hallway of the Friend Center on the Princeton campus. One of the highlights is outdoors: a fascinating metal object that looks for all the world like a modernist sculpture but was actually built as a prototype winding coil for a giant electromagnet that will control superhot plasma in a fusion energy experiment. (The online photo doesn’t do it justice.)

If you’re on the Princeton campus on Friday afternoon (June 2), you’ll want to see the panel discussion on “Privacy and Security in the Digital Age”, which I’ll be moderating. We have an all-star group of panelists:
* Dave Hitz (Founder, Network Appliance)
* Paul Misener (VP for Global Public Affairs, Amazon)
* Harriet Pearson (Chief Privacy Officer, IBM)
* Brad Smith (Senior VP and General Counsel, Microsoft)
It’s in 006 Friend, just downstairs from the Art of Science exhibit, from 2:00 to 3:00 on Friday.

These panelists are just a few of the distinguished Princeton alumni who will be on campus this weekend for Reunions.

Twenty-First Century Wiretapping: Your Dog Sees You Naked

Suppose the government were gathering information about your phone calls: who you talked to, when, and for how long. If that information were made available to human analysts, your privacy would be impacted. But what if the information were made available only to computer algorithms?

A similar question arose when Google introduced its Gmail service. When Gmail users read their mail, they see advertisements. Servers at Google select the ads based on the contents of the email messages being displayed. If the email talks about camping, the user might see ads for camping equipment. No person reads the email (other than the intended recipient) – but Google’s servers make decisions based on the email’s contents.

Some people saw this as a serious privacy problem. But others drew a line between access by people and by computers, seeing access by even sophisticated computer algorithms as a privacy non-event. One person quipped that “Worrying about a computer reading your email is like worrying about your dog seeing you naked.”

So should we worry about the government running computer algorithms on our call data? I can see two main reasons to object.

First, we might object to the government gathering and storing the information at all, even if the information is not (supposed to be) used for anything. Storing the data introduces risks of misuse, for example, that cannot exist if the data is not stored in the first place.

Second, we might object to actions triggered by the algorithms. For example, if the algorithms flag certain records to be viewed by human analysts, we might object to this access by humans. I’ll consider this issue of algorithm-triggered access in a future post – for now, I’ll just observe that the objection here is not to the access by algorithms, but to the access by humans that follows.

If these are only objections to algorithmic analysis of our data, then it’s not the use of computer algorithms that troubles us. What really bothers us is access to our data by people, whether as part of the plan or as unplanned abuse.

If we could somehow separate the use of algorithms from the possibility of human-mediated privacy problems, then we could safely allow algorithms to crawl over our data. In practice, though, algorithmic analysis goes hand in hand with human access, so the question of how to apportion our discomfort is mostly of theoretical interest. It’s enough to object to the possible access by people, while being properly skeptical of claims that the data is not available to people.

The most interesting questions about computerized analysis arise when algorithms bring particular people and records to the attention of human analysts. That’s the topic of my next post.

Twenty-First Century Wiretapping: Storing Communications Data

Today I want to continue the post-series about new technology and wiretapping (previous posts: 1, 2, 3), by talking about what is probably the simplest case, involving gathering and storage of data by government. Recall that I am not considering what is legal under current law, which is an important issue but is beyond my expertise. Instead, I am considering the public policy question of what rules, if any, should constrain the government’s actions.

Suppose the government gathered information about all phone calls, including the calling and called numbers and the duration of the call, and then stored that information in a giant database, in the hope that it might prove useful later in criminal investigations or foreign intelligence. Unlike the recently disclosed NSA call database, which is apparently data-mined, we’ll assume that the data isn’t used immediately but is only stored until it might be needed. Under what circumstances should this be allowed?

We can start by observing that government should not have free rein to store any data it likes, because storing data, even if it is not supposed to be accessed, still imposes some privacy harm on citizens. For example the possibility of misuse must be taken serious where so much data is at issue. Previously, I listed four types of costs imposed by wiretapping. At least two of those costs – the risk that the information will be abused, and the psychic cost of being watched (such as wondering about “How will this look?”) – apply to stored data, even if nobody is supposed to look at it.

It follows that, before storing such data, government should have to make some kind of showing that the expected value of storing the data outweighs the harms, and that there should be some kind of plan for minimizing the harms, for example by storing the data securely (even against rogue insiders) and discarding the data after some predefined time interval.

The most important safeguard would be an enforceable promise by government not to use the data without getting further permission (and showing sufficient cause). That promise might possibly be broken, but it changes the equation nevertheless by reducing the likelihood and scope of potential misuse.

To whom should the showing of cause be made? Presumably the answer is “a court”. The executive branch agency that wanted to store data would have to convince a court that the expected value of storing the data was sufficient, in light of the expected costs (including all costs/harms to citizens) of storing it. The expected costs would be higher if data about everyone were to be stored, and I would expect a court to require a fairly strong showing of significant benefit before authorizing the retention of so much data.

Part of the required showing, I think, would have to be an argument that there is not some way to store much less data and still get nearly the same benefit. An alternative to storing data on everybody is to store data only about people who are suspected of being bad guys and therefore are more likely to be targets of future investigations.

I won’t try to calibrate the precise weights to place on the tradeoff between the legitimate benefits of data retention and the costs. That’s a matter for debate, and presumably a legal framework would have to be more precise than I am. For now, I’m happy to establish the basic parameters and move on.

All of this gets more complicated when government wants to have computers analyze the stored data, as the NSA is apparently doing with phone call records. How to think about such analyses is the topic of the next post in the series.