November 28, 2015


New Professors’ Letter Opposing The Defend Trade Secrets Act of 2015

As Freedom to Tinker readers may recall, I’ve been very concerned about the problems associated with the proposed Defend Trade Secrets Act. Ostensibly designed to combat cyberespionage against United States corporations, it is instead not a solution to that problem, and fraught with downsides. Today, over 40 colleagues in the academic world joined Eric Goldman, Chris Seaman, Sharon Sandeen and me in raising a variety of concerns about the DTSA in the following letter:

Professors’ Letter in Opposition to the Defend Trade Secrets Act of 2015.

Importantly, this new letter incorporates our 2014 opposition letter. As we explained,

While we agree that effective legal protection for U.S. businesses’ legitimate trade secrets is important to American innovation, we believe that the DTSA—which would represent the most significant expansion of federal law in intellectual property since the Lanham Act in 1946—will not solve the problems identified by its sponsors. Instead of addressing cyberespionage head-on, passage of the DTSA is likely to create new problems that could adversely impact domestic innovation, increase the duration and cost of trade secret litigation, and ultimately negatively affect economic growth. Therefore, the undersigned call on Congress to reject the DTSA.

We also call on Congress to hold hearings “that focus on the costs of the legislation and whether the DTSA addresses the cyberespionage problem that it is allegedly designed to combat. Specifically, Congress should evaluate the DTSA through the lens of employees, small businesses, and startup companies that are most likely to be adversely affected by the legislation.”

I will continue to blog on the DTSA as events warrant, and encourage Freedom to Tinker readers to contact their members of Congress and urge them to vote against the DTSA.



Provisions: how Bitcoin exchanges can prove their solvency

Millions of Bitcoin users store their bitcoins with online exchanges (e.g. Coinbase, Kraken) which store bitcoins on their customers’ behalf. They present an interface that looks somewhat like an online bank, allowing users to log in and request payments to other users or withdrawals. For many users this approach makes a lot more sense than the traditional approach of storing private keys on your laptop or phone and interacting with the Bitcoin network directly. Online exchanges require no software installation, enable a familiar password-based authentication model, and can guard against the risk of losing funds with a stolen laptop. Online exchanges can also improve the scalability and efficiency of Bitcoin by settling many logical transactions between users without actually moving funds on the block chain.

Of course, users must trust these exchanges not to get hacked or simply abscond with their money, both of which happened frequently in the early days of Bitcoin (nearly half of exchanges studied in a 2013 research paper failed). Famously, Mt. Gox was the largest online exchange until 2014 when it lost most of its customers’ funds under murky circumstances.

It has long been a goal of the Bitcoin community for exchanges to be able to cryptographically prove solvency—that is, to prove that they still control enough bitcoins to cover all of their customers’ accounts. Greg Maxwell first proposed an approach using Merkle trees in 2013, but this requires revealing (at a minimum) the total value of the exchange’s assets and which addresses the exchange controls. Exchanges have specifically cited these privacy risks as a reason they have not deployed proofs of solvency, relying on trusted audit instead.

In a new paper presented this month at CCS (co-authored with Gaby G. Dagher, Benedikt Bünz, Jeremy Clark and Dan Boneh), we present Provisions, the first cryptographic proof-of-solvency with strong privacy guarantees. Our protocol is suitable for Bitcoin but would work for most other cryptocurrencies (e.g. Litecoin, Ethereum). Our protocol hides the total assets and liabilities of the exchange, proving only that assets are strictly greater than liabilities. If desired, the value of this surplus can be proven. Provisions also hides all customer balances and hides which Bitcoin addresses the bank controls within a configurable anonymity set of other addresses on the block chain. The proofs are large, but reasonable to compute on a daily basis (in the tens of GB for a large exchange, computable in about an hour). Best of all, it is very simple and fast for each user to verify that they have been correctly included. We can even extend the protocol to prevent collusion between exchanges. The details are in the paper, the full version of which is now online.

While our Provisions protocol removes the privacy concerns of performing a cryptographic proof-of-solvency, there are still some practical deployment questions because the proof requires the exchange to compute using its private keys. Exchanges rightly go to great lengths to protect these keys, often keeping them offline and/or in hardware security modules. Performing a regular solvency proof requires careful thinking about the right internal procedure for accessing these keys.

These deployment questions can be solved. We hope that cryptographic proofs of solvency will soon be expected of upstanding exchanges. Incidents like that of Mt. Gox have greatly damaged public perception of the entire Bitcoin ecosystem. While solvency proofs can’t prevent exchange compromises, they would have made Mt. Gox’s troubles public earlier and more clearly. They would also shore up confidence in today’s exchanges which are (presumably) solvent.

Taking a step back, solvency proofs are yet another example where we can replace an  expensive and trust-laden process in the offline world (financial inspection by a trusted auditor) with a “trustless” cryptographic protocol. It’s always exciting to take a new step in that direction. There remain limits as to what cryptography can do though. Critically, solvency proofs do not create a binding obligation to pay. A malicious exchange could complete a Provisions proof and then immediately abscond with all of the money. For this reason, some form of government regulation of online exchanges makes sense. Though regulation is dreaded by many in the Bitcoin community, it appears to be on the horizon. Bills have been proposed in several states, largely aimed at exchanges. Interestingly, the model regulatory framework proposed by the Conference of State Bank Supervisors in September already mentions cryptographic solvency proofs as a means of demonstrating solvency. We hope this recommendation is enacted in law and solvency proofs are a tool to avoid the cost of the heavyweight auditing requirements traditionally demanded of banks, while simultaneously increasing transparency for exchange customers.


How is NSA breaking so much crypto?

There have been rumors for years that the NSA can decrypt a significant fraction of encrypted Internet traffic. In 2012, James Bamford published an article quoting anonymous former NSA officials stating that the agency had achieved a “computing breakthrough” that gave them “the ability to crack current public encryption.” The Snowden documents also hint at some extraordinary capabilities: they show that NSA has built extensive infrastructure to intercept and decrypt VPN traffic and suggest that the agency can decrypt at least some HTTPS and SSH connections on demand.

However, the documents do not explain how these breakthroughs work, and speculation about possible backdoors or broken algorithms has been rampant in the technical community. Yesterday at ACM CCS, one of the leading security research venues, we and twelve coauthors presented a paper that we think solves this technical mystery.

[Read more…]


Classified material in the public domain: what’s a university to do?

Yesterday I posted some thoughts about Purdue University’s decision to destroy a video recording of my keynote address at its Dawn or Doom colloquium. The organizers had gone dark, and a promised public link was not forthcoming. After a couple of weeks of hoping to resolve the matter quietly, I did some digging and decided to write up what I learned. I posted on the web site of the Century Foundation, my main professional home:

It turns out that Purdue has wiped all copies of my video and slides from university servers, on grounds that I displayed classified documents briefly on screen. A breach report was filed with the university’s Research Information Assurance Officer, also known as the Site Security Officer, under the terms of Defense Department Operating Manual 5220.22-M. I am told that Purdue briefly considered, among other things, whether to destroy the projector I borrowed, lest contaminants remain.

I was, perhaps, naive, but pretty much all of that came as a real surprise.

Let’s rewind. Information Assurance? Site Security?

These are familiar terms elsewhere, but new to me in a university context. I learned that Purdue, like a number of its peers, has a “facility security clearance” to perform classified U.S. government research. The manual of regulations runs to 141 pages. (Its terms forbid uncleared trustees to ask about the work underway on their campus, but that’s a subject for another day.) The pertinent provision here, spelled out at length in a manual called Classified Information Spillage, requires “sanitization, physical removal, or destruction” of classified information discovered on unauthorized media.

Two things happened in rapid sequence around the time I told Purdue about my post.

First, the university broke a week-long silence and expressed a measure of regret:

UPDATE: Just after posting this item I received an email from Julie Rosa, who heads strategic communications for Purdue. She confirmed that Purdue wiped my video after consulting the Defense Security Service, but the university now believes it went too far.

“In an overreaction while attempting to comply with regulations, the video was ordered to be deleted instead of just blocking the piece of information in question. Just FYI: The conference organizers were not even aware that any of this had happened until well after the video was already gone.”

“I’m told we are attempting to recover the video, but I have not heard yet whether that is going to be possible. When I find out, I will let you know and we will, of course, provide a copy to you.”

Then Edward Snowden tweeted the link, and the Century Foundation’s web site melted down. It now redirects to Medium, where you can find the full story.

I have not heard back from Purdue today about recovery of the video. It is not clear to me how recovery is even possible, if Purdue followed Pentagon guidelines for secure destruction. Moreover, although the university seems to suggest it could have posted most of the video, it does not promise to do so now. Most importantly, the best that I can hope for here is that my remarks and slides will be made available in redacted form — with classified images removed, and some of my central points therefore missing. There would be one version of the talk for the few hundred people who were in the room on Sept. 24, and for however many watched the live stream, and another version left as the only record.

For our purposes here, the most notable questions have to do with academic freedom in the context of national security. How did a university come to “sanitize” a public lecture it had solicited, on the subject of NSA surveillance, from an author known to possess the Snowden documents? How could it profess to be shocked to find that spillage is going on at such a talk? The beginning of an answer came, I now see, in the question and answer period after my Purdue remarks. A post-doctoral research engineer stood up to ask whether the documents I had put on display were unclassified. “No,” I replied. “They’re classified still.” Eugene Spafford, a professor of computer science there, later attributed that concern to “junior security rangers” on the faculty and staff. But the display of Top Secret material, he said, “once noted, … is something that cannot be unnoted.”

Someone reported my answer to Purdue’s Research Information Assurance Officer, who reported in turn to Purdue’s representative at the Defense Security Service. By the terms of its Pentagon agreement, Purdue decided it was now obliged to wipe the video of my talk in its entirety. I regard this as a rather devout reading of the rules, which allowed Purdue to “realistically consider the potential harm that may result from compromise of spilled information.” The slides I showed had been viewed already by millions of people online. Even so, federal funding might be at stake for Purdue, and the notoriously vague terms of the Espionage Act hung over the decision. For most lawyers, “abundance of caution” would be the default choice. Certainly that kind of thinking is commonplace, and sometimes appropriate, in military and intelligence services.

But universities are not secret agencies. They cannot lightly wear the shackles of a National Industrial Security Program, as Purdue agreed to do. The values at their core, in principle and often in practice, are open inquiry and expression.

I do not claim I suffered any great harm when Purdue purged my remarks from its conference proceedings. I do not lack for publishers or public forums. But the next person whose talk is disappeared may have fewer resources.

More importantly, to my mind, Purdue has compromised its own independence and that of its students and faculty. It set an unhappy precedent, even if the people responsible thought they were merely following routine procedures.

One can criticize the university for its choices, and quite a few have since I published my post. What interests me is how nearly the results were foreordained once Purdue made itself eligible for Top Secret work.

Think of it as a classic case of mission creep. Purdue invited the secret-keepers of the Defense Security Service into one cloistered corner of campus (“a small but significant fraction” of research in certain fields, as the university counsel put it). The trustees accepted what may have seemed a limited burden, confined to the precincts of classified research.

Now the security apparatus claims jurisdiction over the campus (“facility”) at large. The university finds itself “sanitizing” a conference that has nothing to do with any government contract.

I am glad to see that Princeton takes the view that “[s]ecurity regulations and classification of information are at variance with the basic objectives of a University.” It does not permit faculty members to do classified work on campus, which avoids Purdue’s “facility” problem. And even so, at Princeton and elsewhere, there may be an undercurrent of self-censorship and informal restraint against the use of documents derived from unauthorized leaks.

Two of my best students nearly dropped a course I taught a few years back, called “Secrecy, Accountability and the National Security State,” when they learned the syllabus would include documents from Wikileaks. Both had security clearances, for summer jobs, and feared losing them. I told them I would put the documents on Blackboard, so they need not visit the Wikileaks site itself, but the readings were mandatory. Both, to their credit, stayed in the course. They did so against the advice of some of their mentors, including faculty members. The advice was purely practical. The U.S. government will not give a clear answer when asked whether this sort of exposure to published secrets will harm job prospects or future security clearances. Why take the risk?

Every student and scholar must decide for him- or herself, but I think universities should push back harder, and perhaps in concert. There is a treasure trove of primary documents in the archives made available by Snowden and Chelsea Manning. The government may wish otherwise, but that information is irretrievably in the public domain. Should a faculty member ignore the Snowden documents when designing a course on network security architecture? Should a student write a dissertation on modern U.S.-Saudi relations without consulting the numerous diplomatic cables on Wikileaks? To me, those would be abdications of the basic duty to seek out authoritative sources of knowledge, wherever they reside.

I would be interested to learn how others have grappled with these questions. I expect to write about them in my forthcoming book on surveillance, privacy and secrecy.


Berkeley releases report on barriers to cybersecurity research

I’m pleased to share this report, as I helped organize this event.

Researchers associated with the UC Berkeley School of Information and School of Law, the Berkeley Center for Law and Technology, and the International Computer Science Institute (ICSI) released a workshop report detailing legal barriers and other disincentives to cybersecurity research, and recommendations to address them. The workshop held at Berkeley in April, supported by the National Science Foundation, brought together leading computer scientists and lawyers, from academia, civil society, and industry, to map out legal barriers to cybersecurity research and propose a set of concrete solutions.

The workshop report provides important background for the NTIA-convened multistakeholder process exploring security vulnerability disclosure, which launched today at Berkeley.  The report documents the importance of cybersecurity research, the chilling effect caused by current regulations, and the diversity of the vulnerability landscape that counsels against both single and fixed practices around vulnerability disclosures.

Read the report here.


Has Apple Doomed Ads on the Web? Will It Crush Google?

Recently Apple announced that, for the first time ever, ad-blocking plugins will be allowed in mobile Safari in iOS 9. There has been a large outpouring of commentary about this, and there seems to be pretty broad agreement on two things: (1) this action on Apple’s part was aimed at Google and (2) for publishers this will be something between terrible and catastrophic.

I believe that people are making these assessments based on a lack of understanding of the technical details of what is in fact going on.

For the most part, the public does not appreciate the extent to which, when a web browser visits a typical site, the “page” being served comes from multiple parties. Go to a typical e-commerce site, and you will find pixels, trackers, and content from additional servers, from a few to dozens.  These produce analytics for the site owner, run A/B tests, place ads, and many other things. There is even a service that knows what size clothing to sell. It is these services that are the target of ad blockers.

The reason ad blockers work is that the industry has made a standard method of ad placement, which is trivial to implement for the publishers and e-commerce web sites. Ad serving is fully browser-based, so the publishers have to do nothing more than install a line of code in their html pages that pulls in a javascript file from the ad company’s server. Once the javascript is in the web page, the ad company takes care of the rest: it figures out what ad to display and injects it into the page.

Aside from the simplicity for the publisher, this architecture has an additional advantage for the ad company: they can track users as they go from site to site. Since the web page is pulling in a javascript file from the ad company’s server, that site is able to set a permanent cookie on the user’s browser, which will be sent every subsequent time that user goes to any site that uses the services of that ad company. Thus the ad company is able to accumulate lots of data on users, without most people knowing. In some cases, people’s objection is not to the existence of ads per se, but the secret and unaccountable way in which data is collected.

It is this architecture however that renders the ad vulnerable to the blocker. In fact, ad blockers have existed for desktop browsers for a long time.

So there is nothing really new under the sun, just the growing popularity of the tracker/ad blocking software. If the use of these plugins becomes ubiquitous, only one thing would have to change – the publishers would have to insert the line of code in some way on the server side, and the ad would just look as though it came with the rest of the page. At that point, the browser plugin is useless.

What would be the knock-on effects of this? The ad companies no longer have any way to track users as they move around the web. Absent some way on the ad companies’ part to implement a cross-site evercookie (which would be considered unethical and would quickly be blocked by browser authors if discovered), the ad companies will no longer have a way to connect users on one site to users on another. The ads you’d see on a given site could be based solely on the interactions you’ve had with that one site – which would be a boon to privacy.

This is a change, for certain, but probably not the apocalypse for publishing it has been made out to be. There will be a rush to develop ad-placement technology for the server side as there was on the client, but when all settles down it will be pretty easy for the publishers to implement.

It’s even arguable that in that world of anonymous web surfing, the better web properties would be able to charge higher rates – absent spying on the readers, decisions about the value of ad placements would be based on the demographics of the readers of the site – just as for offline properties.

That being said, if you ever reveal your identity to a web site (for example by entering your e-mail address) that site could set a cookie so as to remember who you are. From that point on, information could quietly be sent to the ad server, perhaps storing all the URLs you visit on that site.

So, in the end, this change actually may be a boon for Google. If it’s really true that tracking users is so valuable for ad placement, Google has an advantage the other ad companies do not: many millions of users using Gmail and the Chrome browser, both of which Google controls. If you use Google’s e-mail, Google knows what links you are getting sent from advertisers. If you click a link in a Gmail message going to a web site with Google serving ads on the back end, you can arrive at the site with Google already knowing who you are. (This can be done unobtrusively using the http referrer header.)

Even if you don’t use Gmail, you may sign in to Chrome to sync your data across devices. This uploads information to Google’s servers so it can be sent to other devices, such as your Android phone. One of the things that can be synced is the browser history. If this is done, Google – and no one else – will have the same information they would have collected with browser cookies.

If Apple is looking to damage Google, their plan may backfire. No one else, not even Facebook, has a chance of matching this.


VW = Voting Wulnerability

On Friday, the US Environmental Protection Agency (EPA) “accused the German automaker of using software to detect when the car is undergoing its periodic state emissions testing. Only during such tests are the cars’ full emissions control systems turned on. During normal driving situations, the controls are turned off, allowing the cars to spew as much as 40 times as much pollution as allowed under the Clean Air Act, the E.P.A. said.”  (NY Times coverage) The motivation for the “defeat device” was improved performance, although I haven’t seen whether “performance” in this case means faster acceleration or better fuel mileage.

So what does this have to do with voting?

For as long as I’ve been involved in voting (about a decade), technologists have expressed concerns about “logic and accuracy” (L&A) testing, which is the technique used by election officials to ensure that voting machines are working properly prior to election day.  In some states, such tests are written into law; in others, they are common practice.  But as is well understood by computer scientists (and doubtless scientists in other fields), testing can prove presence of flaws, but not their absence.

In particular, computer scientists have noted that clever (that is, malicious) software in a voting machine could behave “correctly” when it detects that L&A testing is occurring, and revert to its improper behavior when L&A testing is complete.  Such software could be introduced anywhere along the supply chain – by the vendor of the voting system, by someone in an elections office, or by an intruder who installs malware in voting systems without the knowledge of the vendor or elections office.  It really doesn’t matter who installs it – just that the capability is possible.

It’s not all that hard to write software that detects whether a given use is for L&A or a real election.  L&A testing frequently follows patterns, such as its use on dates other than the first Tuesday in November, or by patterns such as three Democratic votes, followed by two Republican votes, followed by one write-in vote, followed by closing the election.  And the malicious software doesn’t need to decide a priori if a given series of votes is L&A or a real election – it can make the decision when the election is closed down, and erase any evidence of the real votes.

Such concerns have generally been dismissed in the debate about voting system security.  But with all-electronic voting systems, especially Digital Recording Electronic (DRE) machines (such as the touch-screen machines common in many states), this threat has always been present.

And now, we have evidence “in the wild” that the threat can occur.  In this case, the vendor (Volkswagen) deliberately introduced software that detected whether it was in test mode or operational mode, and adjusted behavior accordingly.  Since the VW software had to prospectively make the decision whether to behave in test mode as the car engine is operating, this is far more difficult than a voting system, where the decision can be made retrospectively when the election is closed.

In the case of voting, the best solution today is optical scanned paper ballots.  That way, we have “ground truth” (the paper ballots) to compare to the reported totals.

The bottom line: it’s far too easy for software to detect its own usage, and change behavior accordingly.  When the result is increased pollution or a tampered election, we can’t take the risk.

Postscript: A colleague pointed out that malware has for years behaved differently when it “senses” that it’s being monitored, which is largely a similar behavior. In the VW and voting cases, though, the software isn’t trying to prevent being detected directly; it’s changing the behavior of the systems when it detects that it’s being monitored.


Freedom to Tinker on the Radio

Today on the Canadian Broadcasting Corporation’s CBC Radio show, “The Current”, a 20-minute segment about the freedom to tinker:

“Arrested, for tinkering.  Young Ahmed Mohamed likes to take things apart, cross wires, experiment… and put things back together again. It’s the kind of hobby that once led to companies like…say, Apple and Microsoft. But is a security-centric culture interfering with the freedom to tinker?”

Radio host Piya Chattopadhyay interviews three panelists:

  • Lindy Wilkins, community technologist and the co-founder of Make Friends, a monthly meet-up of makers and community organizers in Toronto,
  • Alexandra Samuel, independent technology researcher in Vancouver who is working on a book about Tinkering and education for kids,
  • Andrew Appel, Professor of Computer Science at Princeton University and blogger at Freedom-to-Tinker.

When I was Ahmed’s age, back in 1973, I read this really cool article in Scientific American’s Amateur Scientist column, about how to use TTL integrated circuit components to make, for example, a clock.  So I went to Radio Shack to buy the parts, I learned how to use a soldering iron, and I built a clock.

Didn’t get arrested.  Was that because I was white, because I went to a school where the teachers had some sense, because it was before 9/11 and mass school shootings, or all of the above?


“Private blockchain” is just a confusing name for a shared database

Banks and financial institutions seem to be all over the blockchain. It seems they agree with the Bitcoin community that the technology behind Bitcoin can provide an efficient platform for settlement and for issuing digital assets. Curiously, though, they seem to shy away from Bitcoin itself. Instead, they want something they have more control over and doesn’t require exposing transactions publicly. Besides, Bitcoin has too much of an association in the media with theft, crime, and smut — no place for serious, upstanding bankers. As a result, the buzz in the financial industry is about “private blockchains.”

But here’s the thing — “private blockchain” is just a confusing name for a shared database.

The key to Bitcoin’s security (and success) is its decentralization which comes from its innovative use of proof-of-work mining. However, if you have a blockchain where only a few companies are allowed to participate, proof-of-work doesn’t make sense any more. You’re left with a system where a set of identified (rather than pseudonymous) parties maintain a shared ledger, keeping tabs on each other so that no single party controls the database. What is it about a blockchain that makes this any better than using a regular replicated database?

Supporters argue that the blockchain’s crypto, including signatures and hash pointers, is what distinguishes a private blockchain from a vanilla shared database. The crypto makes the system harder to tamper with and easier to audit. But these aspects of the blockchain weren’t Bitcoin’s innovation! In fact, Satoshi tweaked them only slightly from the earlier research that he cites in his whitepaper research by Haber and Stornetta going all the way back to 1991!

Here’s my take on what’s going on:

  • It is true that adding signatures and hash pointers makes a shared database a bit more secure. However, it’s qualitatively different from the level of security, irreversibility, and censorship-resistance you get with the public blockchain.
  • The use of these crypto techniques for building a tamper-resistant database has been known for 25 years. At first there wasn’t much impetus for Wall Street to pay attention, but gradually there has arisen a great opportunity in moving some types of financial infrastructure to an automated, cryptographically secured model.
  • For banks to go this route, they must learn about the technology, get everyone to the same table, and develop and deploy a standard. The blockchain conveniently solves these problems due to the hype around it. In my view, it’s not the novelty of blockchain technology but rather its mindshare that has gotten Wall Street to converge on it, driven by the fear of missing out. It’s acted as a focal point for standardization.
  • To build these private blockchains, banks start with the Bitcoin Core code and rip out all the parts they don’t need. It’s a bit like hammering in a thumb tack, but if a hammer is readily available and no one’s told you that thumb tacks can be pushed in by hand, there’s nothing particularly wrong with it.

Thanks to participants at the Bitcoin Pacifica gathering for helping me think through this question. 

avatar can use your DNA to target ads

With the reduction in costs of genotyping technology, genetic genealogy has become accessible to more people. Various websites such as offer genetic genealogy services. Users of these services are mailed an envelope with a DNA collection kit, in which users deposit their saliva. The users then mail their kits back to the service and their samples are processed. The genealogy company will try to match the user’s DNA against other users in its genealogy and genetic database. As these services become more popular, we need more public discourse about the implications of releasing our genetic information to commercial enterprises.

Given that genetic information can be very sensitive, I found that the privacy policy of Ancestry’s DNA services has some surprising disclosures about how they could use your genetic information.

Here are some excerpts with the worrying parts in bold:

Subject to the restrictions described in this Privacy Statement and applicable law, we may use personal information for any reasonable purpose related to the business, including to communicate with you, to provide you information about Ancestry’s and AncestryDNA’s products and services, to respond to your requests, to update our product offerings, to improve the content and User experience on the AncestryDNA Website, to help you and others discover more about your family, to let you know about offers of interest from AncestryDNA or Ancestry, and to prepare and perform demographic, benchmarking, advertising, marketing, and promotional studies.

To distribute advertisements: AncestryDNA strives to show relevant advertisements. To that end, AncestryDNA may use the information you provide to us, as well as any analyses we perform, aggregated demographic information (such as women between the ages of 45-60), anonymized data compared to data from third parties, or the placement of cookies and other tracking technologies… In these ways, AncestryDNA can display relevant ads on the AncestryDNA Website, third party websites, or elsewhere.

The privacy policy gives Ancestry permission to use its users’ genetic information for advertising purposes. When I inquired with Ancestry, they pointed to the following part of their privacy policy:

We do not provide advertisers with access to individual account information. AncestryDNA does not sell, rent or otherwise distribute the personal information you provide us to these advertisers unless you have given us your consent to do so.

However, it is not clear how your personal information can be used to display “relevant ads” unless either Ancestry operates as an ad network itself or Ancestry communicates some personal information to third party advertisers in order to target the ads. Below, I expand on concerns raised by this privacy policy:

Users may “consent” to the use of their genetic data unknowingly. The privacy policy says Ancestry can distribute users’ private information if Ancestry gets permission first. That permission could be granted by a dialog that users click through without much thought. Research has shown that users are already desensitized to privacy and security warnings.

Even if only Ancestry is using the personal information to target ads, the data might accidentally find its way to third parties. Researchers have demonstrated how it can be difficult to avoid information leakage through URLs or cookies or more sophisticated attacks. If Ancestry categorizes its users according to their genetic traits and then stores and transfers these categories in cookies and URL parameters (a common practice for the analogous “behavioral segment” categories used for many targeted ads), then the genetic data can easily leak to third parties.

The genetic data collected by these services may endanger the privacy of users and their families. A genome is not something easily made unlinkable. Only 33 bits of entropy are necessary to uniquely identify a person. The DNA profiles used by law enforcement in the US today take samples from 13 location on the genome, and have about 54 bits of entropy. The test that Ancestry uses samples 700,000 locations on the genome, which will likely have much more than 33 bits of entropy. In fact, I believe this is enough entropy to compromise not only an individual’s privacy, but also the privacy of family members. With the 13 CODIS locations, law enforcement can already do familial searches for close family members. I hope to touch on the familial aspects of DNA privacy at a later date. The compromise of familial privacy is in part what makes collecting and distributing DNA even more sensitive that just collecting an individual’s full name or address.

Genetic data can be used to discriminate against people on the basis of characteristics they cannot control. More than identity, DNA data may allow someone to infer behavior and health attributes. Major concerns about the impact of genetic information on employment and health insurance led Congress to pass the Genetic Information Nondiscrimination Act, which makes it illegal to use genetics to decide hiring or health insurance pricing. However, GINA may not effectively deter people who 1) are not employers or insurers (e.g., landlords discriminating in their choice of tenants, which is prohibited by California state law but not by the federal provisions in GINA); 2) do not believe they will be caught; or 3) are not aware that they are discriminating, as discussed next.

Unintentional discrimination may occur. The big data report from the White House warns that the “increasing use of algorithms to make eligibility decisions must be carefully monitored for potential discriminatory outcomes for disadvantaged groups, even absent discriminatory intent.” An algorithm that takes genetic information as an input likely will lead to results that differ based on genes. This outcome already discriminates on the basis of genetics, and because genes are correlated with other sensitive attributes, it can also discriminate on the basis of characteristics such as race or health status. The discrimination occurs whether or not the algorithm’s user intended it.