If you talk about ‘metadata’, ‘big data’ and ‘Big Brother’ just as easily as you order a pizza, ethnography and anthropology are probably not your first points of reference. But the outcome of a recent encounter of ethnographer Tom Boellstorff and Edward Snowden (not IRL but IRP), is that tech policy wonks and researchers should be careful with their day to day vocabulary, as concepts carry politics of control and power.
In ‘Making big data, in theory’, ethnographer and anthropologist Tom Boellstorff discusses Edward Snowden’s revelations IRP (In Research Paper). The paper, published on 7 October, is a refreshing 12 page take on the construction of some very dominant concepts in tech policy and research today.
Take the concept of ‘metadata’. In support of intelligence programs disclosed by Snowden, proponents claim that one of the most controversial programs is just about ‘metadata’ – who you call, where you were – not about the content of communications. Historical analysis of the Western misconception of Aristotle’s Metaphysics explains how the Greek prefix meta falsely has come to imply hierarchy in our tech language today. With hierarchy comes classification: ‘it establishes an implicit system of control’. Boellstorff argues that such control creates the power to marginalize the actual implications of the program and conveniently obscure such aspects as scale – crucial to the debate as the extremely large scale of the data collection may reveal highly sensitive information about a person, organization or people, as Ed argued in his testimony before the Senate Judiciary Committee.
Such rational arguments make perfect sense: it’s much more revealing to learn that I’ve called my shrink four times a week in the last four years, and four times on Christmas eve, than to know what I’ve actually said in a single conversation. But once you control the conceptualization of ‘metadata’, you influence public debate about the intelligence program. Not unlike most of our concepts, ‘metadata’ is not a fact, it is made up. Subsequent re-conceptualization (Bruce Schneier: ‘metadata equals surveillance’) might make perfect sense, but becomes extremely hard and you find yourself on the defence.
Or take the concept of ‘big data’, a carefully constructed frame by proponents of systematic surveillance for commercial purposes. The concepts breathes a promise of a crystal ball and a solution for all problems of humankind. While careful theoretical examination of the concept of ‘big data’ is lacking, and in fact the trustworthiness or statistical quality of data analysis is often quite problematic, ‘big data’ has today become institutionalized, part of tech policy lingo and an attractive resource for research funding. Meanwhile, questioning ‘big data’ is to proponents the same as to reject modern life, to reject connecting with friends and, you know, world peace. Yes, we care about your privacy, but we need to solve the problem of aging first. The problem of aging, you mean the natural process that is part of us being humans?
In a ‘big data’ world, Boellstorff argues, ‘surveillance’ should not be understood through the lens of Orwell’s Big Brother or Foucault’s Panopticon. A more nuanced metaphor can be found in Foucault’s analysis of the confession, something Boellstorff in a brilliant part of the paper calls an ‘incitement to disclose’. This reminds of Barry Wellman’s networked individualism [pdf]: reject ‘big data’, and risk to be an outcast of your network – both socially and technically, you’re not part of the same systems if you don’t participate. Boellstorff terms this as a ‘dialectic of surveillance and recognition’, a dynamic that spurs a completely different set of trade-offs for users, than merely rejecting Big Brother or the Panopticon post-Snowden. If one follows Boellstorff, addressing surveillance requires a different set of policy responses than usual pleas for oversight, transparency and accountability of intelligence pratices. It illustrates that the many problems around surveillance should take the entire infrastructure for ‘big data’ into account, rather than only one institution or individual technology — as Joris van Hoboken aptly noted with regard to drones at last weeks Drone Conference in NYC.
Boellstorff mainly aims to raise theoretical questions about the underlying theory of our concepts, and to warn for carelessness when we talk concepts — because of their inherent politics of power and control. His ethnographic and anthropologic view on tech policy should be a source of introspection and inspiration for all involved in tech research and policy. If we really live in a ‘digital age’, research on the use of our concepts, such as the use of ‘privacy’ in various communities of computer scientists [see Claudia Lopez and Seda Gürses, IEEE P&S 2013, pdf], becomes increasingly important to flesh out the underlying incentives for framing the concepts in the way we do, sometimes even without fully realizing it.
‘Making up big data, in theory’ reminds us that concepts like ‘metadata’, ‘big data’ and ‘Big Brother’ that tech researchers and policymakers use on a daily basis are made up, and may be carefully constructed with a certain politics in mind. Beware of those politics — or would you rather notice someone sneaked in a good chunk of chili on your pizza after you’ve take a bite?
Thanks to fellow CITP Fellow Merlyna Lim for pointing me at this paper.
Thank you so much for the props and this amazing discussion – I am flattered and honored!
Excellent article about the not so defined arenas of Big Data, Metadata, and the images of how the playing field seems as vast as the jargon implies, with meanings and speculations being read into everyday actions. At the moment, we are studying, or revisiting ethical theories in my Digital Ethics class, and something very basic keeps popping up in light of surveillance practices and implications, and that metadata analysis can be dangerously speculative and way more intrusive than it seems to be, or that “not reading” e-mail is a semantically problematic. Here’s the thing that keeps popping up for me. Utilitarianism, the basic consequential ethical theory. I have this unsettling notion that not only will a consequential approach not be the focus of all of these Big Data applications, but that Big Data has the mechanism to order those consequences. Yikes!
Senator Dianne Feinstein has just pubslished an OpEd in USA Today in which she calls for continuation of the NSA call-records program because ‘it is not surveillance’: http://www.usatoday.com/story/opinion/2013/10/20/nsa-call-records-program-sen-dianne-feinstein-editorials-debates/3112715/
Instead of a Nitzean “superman”, we have a similarly amoral, or even evil, but nicer sounding “metaman”. As in “I never metaman I didn’t like”. Some Will Rogers that.
“This reminds of Barry Wellman’s networked individualism [pdf]: reject ‘big data’, an risk to be an outcast of your network – both socially and technically, you’re not part of the same systems if you don’t participate.”
It also reminds of John The Divine’s mark of the beast: you can’t buy or sell without it.
It also reminds me of all the libertarian claptrap about “voice and exit;” namely that the key difference is between de jure exit (the shopworn chant “nobody’s holding a gun to your head”) and de facto exit (you have a viable alternative available upon exit)
Uh, no.
“metadata” doesn’t mean data that controls other data, and it’s not “made up” either. “metadata” means data *about* other data. It is a broadly applicable concept, not something that was invented to justify surveillance.
@lawrence: boellstorff writes about ‘meta’ as having acquired two meanings in tech parlance today: 1) hierarchically [see post], 2) laterally – which is the ‘data about data’ meaning you ascribe to. he critiques that second conception as well – in short: metadata is not a seperate category of data, it is data itself.
but my point here, is that nuanced discussios about conceptualizations get blown away by societal debates. proponents adopt the first – false – hierarchical approach (“it’s just metadata”) which isn’t sufficiently debunked by journalists, tweeters, policy wonks and researchers. control definitions and conceptualization, control the debate.
a broader point is, that here lies and important role for the social sciences, which are often seen within the tech policy and research bubble as a somewhat inferior class of stakeholders when discussing technology issues.
A librarian here. My colleagues have used metadata–the term and the function–for ages in both senses, as far as I can tell. Boellstorf isn’t very clear about the distinction. I don’t see why “language about language” is clearly distinct from “language of a second order.” Put another way, if “hierarchical” use of the term imports a degree of preference for or higher valuation of the metadata over the data it describes, or vice versa, rather than a value-neutral account of its operation with respect to the data it describes, that could very well be because under some circumstances metadata are indeed more valuable than the data (or vice versa). An example might be the publication of a work released multiple times in a series of editions. The cataloger identifies and creates the appropriate metadata to distinguish editions. This permits researchers seeking a particular edition to locate it, where the title, author, publisher, etc., would not assist in that endeavor.
If Boellstorf wants to “challenge assumptions of a neat division between data and metadata,” he’ll get no complaint from me. But really, anybody with the slightest familiarity with the concept would be surprised to learn that the division had ever been assumed to be neat. It’s always the case that one person’s metadata can be another’s data.
The problem with media discussion about metadata as it pertains to the early stories about Verizon’s records delivered to the NSA is twofold: first, as the OP makes clear, sometimes metadata can be more revealing or intrusive than the data themselves. The media have been roundly condemned for failing to register this fact. The second feature of the problem, though, has received almost no media attention, at least to my knowledge. Metadata are not by definition content-free. It may very well be the case that Verizon’s metadata entailed no recording of content of phone calls. Nobody “listened in,” and so nobody could tag or generate means for topical access to the calls. But a cataloger does routinely provide topical metadata when she assigns subject headings to cataloged works. Furthermore, there is nothing preventing Verizon from 1) recording (but not “listening to”) the calls, 2) automatically transcribing them, and 3) generating a word cloud assigned to the respective call. Under that scenario, not only would analysts of the metadata be able to identify the fact of multiple telephone calls to one’s therapist, but they would also be able to ascertain a glimpse of the content.
thanks, Dean, that’s a great perspective. It has received some, but far too little, media attention that the ‘metadata’ of e-mails is basically the e-mail header which contains the subject line of e-mails. That clearly reveals content.
That discussion of “big data” is itself carefully constructed. When I hear the term, I think of Big Ag, Big Pharma, Big Money or Big Business, i.e. a bunch a very powerful, mostly faceless folks who most definitely do not have my best interests in mind. To think of “big data” as something that is a priori good requires having drunk the koolade and being on the inside already. (Which may be true of most researchers, but not, I think, the public in general.)