Computer science research on re-identification has repeatedly demonstrated that sensitive information can be inferred even from de-identified data in a wide variety of domains. This has posed a vexing problem for practitioners and policy makers. If the absence of “personally identifying information” cannot be relied on for privacy protection, what are the alternatives? Joanna Huey, Ed Felten, and I tackle this question in a new paper “A Precautionary Approach to Big Data Privacy”. Joanna presented the paper at the Computers, Privacy & Data Protection conference earlier this year.
The Open PACER Act provides for free and open access to electronic federal court records. The courts currently offer an expensive and difficult-to-use web site. They charge more than their cost of offering the service—more than Congress has authorized—violating the E-Government Act of 2002. This Act seeks to, once and for all, compel the courts to fulfil Congress’ longstanding vision of making this information “freely available to the greatest extent possible“.
Transcript after the jump.
Zeynep pointed to her New York Times op-ed, “Beware the Smart Campaign,” about political campaigns collecting and exploiting detailed information about individual voters. Given the emerging conventional wisdom that the Obama campaign’s technological superiority played an important role in the President’s re-election, we should expect more aggressive attempts to micro-target voters by both parties in future election cycles. Let’s talk about how voters might respond.
I just published a new opinion piece in the New York Times, entitled “Beware the Smart Campaign”. I react to the Obama campaign’s successful use of highly quantitative voter targeting that is inspired by “big data” commercial marketing techniques and implemented through state-of-the-art social science knowledge and randomized field experiments. In the op-ed, I wonder whether the “persuasion score” strategy championed by Jim Messina, Obama’s campaign manager, is on balance good for democracy in the long run.
Mr. Messina is understandably proud of his team, which included an unprecedented number of data analysts and social scientists. As a social scientist and a former computer programmer, I enjoy the recognition my kind are getting. But I am nervous about what these powerful tools may mean for the health of our democracy, especially since we know so little about it all.
For all the bragging on the winning side — and an explicit coveting of these methods on the losing side — there are many unanswered questions. What data, exactly, do campaigns have on voters? How exactly do they use it? What rights, if any, do voters have over this data, which may detail their online browsing habits, consumer purchases and social media footprints?
You can read the full article here.
The argument in an op-ed is necessarily concise and leaves out much of the nuance but I think this is an important question facing democracies. The key to my argument is that big data analytics + better social science isn’t just the same old, same old but poses novel threats to healthy public discourse. I welcome feedback and comments as we are just starting to grapple with these new developments!