October 7, 2015


Berkeley releases report on barriers to cybersecurity research

I’m pleased to share this report, as I helped organize this event.

Researchers associated with the UC Berkeley School of Information and School of Law, the Berkeley Center for Law and Technology, and the International Computer Science Institute (ICSI) released a workshop report detailing legal barriers and other disincentives to cybersecurity research, and recommendations to address them. The workshop held at Berkeley in April, supported by the National Science Foundation, brought together leading computer scientists and lawyers, from academia, civil society, and industry, to map out legal barriers to cybersecurity research and propose a set of concrete solutions.

The workshop report provides important background for the NTIA-convened multistakeholder process exploring security vulnerability disclosure, which launched today at Berkeley.  The report documents the importance of cybersecurity research, the chilling effect caused by current regulations, and the diversity of the vulnerability landscape that counsels against both single and fixed practices around vulnerability disclosures.

Read the report here.


Has Apple Doomed Ads on the Web? Will It Crush Google?

Recently Apple announced that, for the first time ever, ad-blocking plugins will be allowed in mobile Safari in iOS 9. There has been a large outpouring of commentary about this, and there seems to be pretty broad agreement on two things: (1) this action on Apple’s part was aimed at Google and (2) for publishers this will be something between terrible and catastrophic.

I believe that people are making these assessments based on a lack of understanding of the technical details of what is in fact going on.

For the most part, the public does not appreciate the extent to which, when a web browser visits a typical site, the “page” being served comes from multiple parties. Go to a typical e-commerce site, and you will find pixels, trackers, and content from additional servers, from a few to dozens.  These produce analytics for the site owner, run A/B tests, place ads, and many other things. There is even a service that knows what size clothing to sell. It is these services that are the target of ad blockers.

The reason ad blockers work is that the industry has made a standard method of ad placement, which is trivial to implement for the publishers and e-commerce web sites. Ad serving is fully browser-based, so the publishers have to do nothing more than install a line of code in their html pages that pulls in a javascript file from the ad company’s server. Once the javascript is in the web page, the ad company takes care of the rest: it figures out what ad to display and injects it into the page.

Aside from the simplicity for the publisher, this architecture has an additional advantage for the ad company: they can track users as they go from site to site. Since the web page is pulling in a javascript file from the ad company’s server, that site is able to set a permanent cookie on the user’s browser, which will be sent every subsequent time that user goes to any site that uses the services of that ad company. Thus the ad company is able to accumulate lots of data on users, without most people knowing. In some cases, people’s objection is not to the existence of ads per se, but the secret and unaccountable way in which data is collected.

It is this architecture however that renders the ad vulnerable to the blocker. In fact, ad blockers have existed for desktop browsers for a long time.

So there is nothing really new under the sun, just the growing popularity of the tracker/ad blocking software. If the use of these plugins becomes ubiquitous, only one thing would have to change – the publishers would have to insert the line of code in some way on the server side, and the ad would just look as though it came with the rest of the page. At that point, the browser plugin is useless.

What would be the knock-on effects of this? The ad companies no longer have any way to track users as they move around the web. Absent some way on the ad companies’ part to implement a cross-site evercookie (which would be considered unethical and would quickly be blocked by browser authors if discovered), the ad companies will no longer have a way to connect users on one site to users on another. The ads you’d see on a given site could be based solely on the interactions you’ve had with that one site – which would be a boon to privacy.

This is a change, for certain, but probably not the apocalypse for publishing it has been made out to be. There will be a rush to develop ad-placement technology for the server side as there was on the client, but when all settles down it will be pretty easy for the publishers to implement.

It’s even arguable that in that world of anonymous web surfing, the better web properties would be able to charge higher rates – absent spying on the readers, decisions about the value of ad placements would be based on the demographics of the readers of the site – just as for offline properties.

That being said, if you ever reveal your identity to a web site (for example by entering your e-mail address) that site could set a cookie so as to remember who you are. From that point on, information could quietly be sent to the ad server, perhaps storing all the URLs you visit on that site.

So, in the end, this change actually may be a boon for Google. If it’s really true that tracking users is so valuable for ad placement, Google has an advantage the other ad companies do not: many millions of users using Gmail and the Chrome browser, both of which Google controls. If you use Google’s e-mail, Google knows what links you are getting sent from advertisers. If you click a link in a Gmail message going to a web site with Google serving ads on the back end, you can arrive at the site with Google already knowing who you are. (This can be done unobtrusively using the http referrer header.)

Even if you don’t use Gmail, you may sign in to Chrome to sync your data across devices. This uploads information to Google’s servers so it can be sent to other devices, such as your Android phone. One of the things that can be synced is the browser history. If this is done, Google – and no one else – will have the same information they would have collected with browser cookies.

If Apple is looking to damage Google, their plan may backfire. No one else, not even Facebook, has a chance of matching this.


VW = Voting Wulnerability

On Friday, the US Environmental Protection Agency (EPA) “accused the German automaker of using software to detect when the car is undergoing its periodic state emissions testing. Only during such tests are the cars’ full emissions control systems turned on. During normal driving situations, the controls are turned off, allowing the cars to spew as much as 40 times as much pollution as allowed under the Clean Air Act, the E.P.A. said.”  (NY Times coverage) The motivation for the “defeat device” was improved performance, although I haven’t seen whether “performance” in this case means faster acceleration or better fuel mileage.

So what does this have to do with voting?

For as long as I’ve been involved in voting (about a decade), technologists have expressed concerns about “logic and accuracy” (L&A) testing, which is the technique used by election officials to ensure that voting machines are working properly prior to election day.  In some states, such tests are written into law; in others, they are common practice.  But as is well understood by computer scientists (and doubtless scientists in other fields), testing can prove presence of flaws, but not their absence.

In particular, computer scientists have noted that clever (that is, malicious) software in a voting machine could behave “correctly” when it detects that L&A testing is occurring, and revert to its improper behavior when L&A testing is complete.  Such software could be introduced anywhere along the supply chain – by the vendor of the voting system, by someone in an elections office, or by an intruder who installs malware in voting systems without the knowledge of the vendor or elections office.  It really doesn’t matter who installs it – just that the capability is possible.

It’s not all that hard to write software that detects whether a given use is for L&A or a real election.  L&A testing frequently follows patterns, such as its use on dates other than the first Tuesday in November, or by patterns such as three Democratic votes, followed by two Republican votes, followed by one write-in vote, followed by closing the election.  And the malicious software doesn’t need to decide a priori if a given series of votes is L&A or a real election – it can make the decision when the election is closed down, and erase any evidence of the real votes.

Such concerns have generally been dismissed in the debate about voting system security.  But with all-electronic voting systems, especially Digital Recording Electronic (DRE) machines (such as the touch-screen machines common in many states), this threat has always been present.

And now, we have evidence “in the wild” that the threat can occur.  In this case, the vendor (Volkswagen) deliberately introduced software that detected whether it was in test mode or operational mode, and adjusted behavior accordingly.  Since the VW software had to prospectively make the decision whether to behave in test mode as the car engine is operating, this is far more difficult than a voting system, where the decision can be made retrospectively when the election is closed.

In the case of voting, the best solution today is optical scanned paper ballots.  That way, we have “ground truth” (the paper ballots) to compare to the reported totals.

The bottom line: it’s far too easy for software to detect its own usage, and change behavior accordingly.  When the result is increased pollution or a tampered election, we can’t take the risk.

Postscript: A colleague pointed out that malware has for years behaved differently when it “senses” that it’s being monitored, which is largely a similar behavior. In the VW and voting cases, though, the software isn’t trying to prevent being detected directly; it’s changing the behavior of the systems when it detects that it’s being monitored.


Freedom to Tinker on the Radio

Today on the Canadian Broadcasting Corporation’s CBC Radio show, “The Current”, a 20-minute segment about the freedom to tinker:

“Arrested, for tinkering.  Young Ahmed Mohamed likes to take things apart, cross wires, experiment… and put things back together again. It’s the kind of hobby that once led to companies like…say, Apple and Microsoft. But is a security-centric culture interfering with the freedom to tinker?”

Radio host Piya Chattopadhyay interviews three panelists:

  • Lindy Wilkins, community technologist and the co-founder of Make Friends, a monthly meet-up of makers and community organizers in Toronto,
  • Alexandra Samuel, independent technology researcher in Vancouver who is working on a book about Tinkering and education for kids,
  • Andrew Appel, Professor of Computer Science at Princeton University and blogger at Freedom-to-Tinker.

When I was Ahmed’s age, back in 1973, I read this really cool article in Scientific American’s Amateur Scientist column, about how to use TTL integrated circuit components to make, for example, a clock.  So I went to Radio Shack to buy the parts, I learned how to use a soldering iron, and I built a clock.

Didn’t get arrested.  Was that because I was white, because I went to a school where the teachers had some sense, because it was before 9/11 and mass school shootings, or all of the above?


“Private blockchain” is just a confusing name for a shared database

Banks and financial institutions seem to be all over the blockchain. It seems they agree with the Bitcoin community that the technology behind Bitcoin can provide an efficient platform for settlement and for issuing digital assets. Curiously, though, they seem to shy away from Bitcoin itself. Instead, they want something they have more control over and doesn’t require exposing transactions publicly. Besides, Bitcoin has too much of an association in the media with theft, crime, and smut — no place for serious, upstanding bankers. As a result, the buzz in the financial industry is about “private blockchains.”

But here’s the thing — “private blockchain” is just a confusing name for a shared database.

The key to Bitcoin’s security (and success) is its decentralization which comes from its innovative use of proof-of-work mining. However, if you have a blockchain where only a few companies are allowed to participate, proof-of-work doesn’t make sense any more. You’re left with a system where a set of identified (rather than pseudonymous) parties maintain a shared ledger, keeping tabs on each other so that no single party controls the database. What is it about a blockchain that makes this any better than using a regular replicated database?

Supporters argue that the blockchain’s crypto, including signatures and hash pointers, is what distinguishes a private blockchain from a vanilla shared database. The crypto makes the system harder to tamper with and easier to audit. But these aspects of the blockchain weren’t Bitcoin’s innovation! In fact, Satoshi tweaked them only slightly from the earlier research that he cites in his whitepaper research by Haber and Stornetta going all the way back to 1991!

Here’s my take on what’s going on:

  • It is true that adding signatures and hash pointers makes a shared database a bit more secure. However, it’s qualitatively different from the level of security, irreversibility, and censorship-resistance you get with the public blockchain.
  • The use of these crypto techniques for building a tamper-resistant database has been known for 25 years. At first there wasn’t much impetus for Wall Street to pay attention, but gradually there has arisen a great opportunity in moving some types of financial infrastructure to an automated, cryptographically secured model.
  • For banks to go this route, they must learn about the technology, get everyone to the same table, and develop and deploy a standard. The blockchain conveniently solves these problems due to the hype around it. In my view, it’s not the novelty of blockchain technology but rather its mindshare that has gotten Wall Street to converge on it, driven by the fear of missing out. It’s acted as a focal point for standardization.
  • To build these private blockchains, banks start with the Bitcoin Core code and rip out all the parts they don’t need. It’s a bit like hammering in a thumb tack, but if a hammer is readily available and no one’s told you that thumb tacks can be pushed in by hand, there’s nothing particularly wrong with it.

Thanks to participants at the Bitcoin Pacifica gathering for helping me think through this question. 


Ancestry.com can use your DNA to target ads

With the reduction in costs of genotyping technology, genetic genealogy has become accessible to more people. Various websites such as Ancestry.com offer genetic genealogy services. Users of these services are mailed an envelope with a DNA collection kit, in which users deposit their saliva. The users then mail their kits back to the service and their samples are processed. The genealogy company will try to match the user’s DNA against other users in its genealogy and genetic database. As these services become more popular, we need more public discourse about the implications of releasing our genetic information to commercial enterprises.

Given that genetic information can be very sensitive, I found that the privacy policy of Ancestry’s DNA services has some surprising disclosures about how they could use your genetic information.

Here are some excerpts with the worrying parts in bold:

Subject to the restrictions described in this Privacy Statement and applicable law, we may use personal information for any reasonable purpose related to the business, including to communicate with you, to provide you information about Ancestry’s and AncestryDNA’s products and services, to respond to your requests, to update our product offerings, to improve the content and User experience on the AncestryDNA Website, to help you and others discover more about your family, to let you know about offers of interest from AncestryDNA or Ancestry, and to prepare and perform demographic, benchmarking, advertising, marketing, and promotional studies.

To distribute advertisements: AncestryDNA strives to show relevant advertisements. To that end, AncestryDNA may use the information you provide to us, as well as any analyses we perform, aggregated demographic information (such as women between the ages of 45-60), anonymized data compared to data from third parties, or the placement of cookies and other tracking technologies… In these ways, AncestryDNA can display relevant ads on the AncestryDNA Website, third party websites, or elsewhere.

The privacy policy gives Ancestry permission to use its users’ genetic information for advertising purposes. When I inquired with Ancestry, they pointed to the following part of their privacy policy:

We do not provide advertisers with access to individual account information. AncestryDNA does not sell, rent or otherwise distribute the personal information you provide us to these advertisers unless you have given us your consent to do so.

However, it is not clear how your personal information can be used to display “relevant ads” unless either Ancestry operates as an ad network itself or Ancestry communicates some personal information to third party advertisers in order to target the ads. Below, I expand on concerns raised by this privacy policy:

Users may “consent” to the use of their genetic data unknowingly. The privacy policy says Ancestry can distribute users’ private information if Ancestry gets permission first. That permission could be granted by a dialog that users click through without much thought. Research has shown that users are already desensitized to privacy and security warnings.

Even if only Ancestry is using the personal information to target ads, the data might accidentally find its way to third parties. Researchers have demonstrated how it can be difficult to avoid information leakage through URLs or cookies or more sophisticated attacks. If Ancestry categorizes its users according to their genetic traits and then stores and transfers these categories in cookies and URL parameters (a common practice for the analogous “behavioral segment” categories used for many targeted ads), then the genetic data can easily leak to third parties.

The genetic data collected by these services may endanger the privacy of users and their families. A genome is not something easily made unlinkable. Only 33 bits of entropy are necessary to uniquely identify a person. The DNA profiles used by law enforcement in the US today take samples from 13 location on the genome, and have about 54 bits of entropy. The test that Ancestry uses samples 700,000 locations on the genome, which will likely have much more than 33 bits of entropy. In fact, I believe this is enough entropy to compromise not only an individual’s privacy, but also the privacy of family members. With the 13 CODIS locations, law enforcement can already do familial searches for close family members. I hope to touch on the familial aspects of DNA privacy at a later date. The compromise of familial privacy is in part what makes collecting and distributing DNA even more sensitive that just collecting an individual’s full name or address.

Genetic data can be used to discriminate against people on the basis of characteristics they cannot control. More than identity, DNA data may allow someone to infer behavior and health attributes. Major concerns about the impact of genetic information on employment and health insurance led Congress to pass the Genetic Information Nondiscrimination Act, which makes it illegal to use genetics to decide hiring or health insurance pricing. However, GINA may not effectively deter people who 1) are not employers or insurers (e.g., landlords discriminating in their choice of tenants, which is prohibited by California state law but not by the federal provisions in GINA); 2) do not believe they will be caught; or 3) are not aware that they are discriminating, as discussed next.

Unintentional discrimination may occur. The big data report from the White House warns that the “increasing use of algorithms to make eligibility decisions must be carefully monitored for potential discriminatory outcomes for disadvantaged groups, even absent discriminatory intent.” An algorithm that takes genetic information as an input likely will lead to results that differ based on genes. This outcome already discriminates on the basis of genetics, and because genes are correlated with other sensitive attributes, it can also discriminate on the basis of characteristics such as race or health status. The discrimination occurs whether or not the algorithm’s user intended it.


Bitcoin course available on Coursera; textbook is now official

Earlier this year we made our online course on Bitcoin publicly available 11 video lectures and draft chapters of our textbook-in-progress, including exercises. The response has been very positive: numerous students have sent us thanks, comments, feedback, and a few error corrections. We’ve heard that our materials are being used in courses at a few universities. Some students have even translated the chapters to other languages.

Coursera. I’m very happy to announce that the course is now available as a Princeton University online course on Coursera. The first iteration starts next Friday, September 4. The Coursera version offers embedded quizzes to test your understanding; you’ll also be part of a community of students to discuss the lectures with (about 10,000 15,000 have already signed up). We’ve also fixed all the errors we found thanks to the video editing skillz of the Princeton Broadcast Center folks. Sign up now, it’s free!

We’re closely watching ongoing developments in the cryptocurrency world such as Ethereum. Whenever a body of scientific knowledge develops around a new area, we will record additional lectures. The Coursera class already includes one additional lecture: it’s on the history of cryptocurrencies by Jeremy Clark. Jeremy is the ideal person to give this lecture for many reasons, including the fact that he worked with David Chaum for many years.

Jeremy Clark lecturing on the history of cryptocurrencies

Textbook. We’re finishing the draft of the textbook; Chapter 8 was released today and the rest will be coming out in the next few weeks. The textbook closely follows the structure of the lectures, but the textual format has allowed us to refine and polish the explanations, making them much clearer in many places, in my opinion.

I’m excited to announce that we’ll be publishing the textbook with Princeton University Press. The draft chapters will continue to be available free of charge, but you should buy the book it will be peer reviewed, professionally edited and typeset, and the graphics will be re-done professionally.

Finally, if you’re an educator interested in teaching Bitcoin, write to us and we’ll be happy to share with you some educational materials that aren’t yet public.


Robots don’t threaten, but may be useful threats

Hi, I’m Joanna Bryson, and I’m just starting as a fellow at CITP, on sabbatical from the University of Bath.  I’ve been blogging about natural and artificial intelligence since 2007, increasingly with attention to public policy.  I’ve been writing about AI ethics since 1998.  This is my first blog post for Freedom to Tinker.

Will robots take our jobs?  Will they kill us in war?  The answer to these questions depends not (just) on technological advances – for example in the area of my own expertise, AI – but in how we as a society determine to view what it means to be a moral agent.  This may sound esoteric, and indeed the term moral agent comes from philosophy.  An agent is something that changes its environment (so chemical agents cause reactions).  A moral agent is something society holds responsible for the changes it effects.

Should society hold robots responsible for taking jobs or killing people?  My argument is “no”.  The fact that humans have full authorship over robots‘ capacities, including their goals and motivations, means that transferring responsibility to them would require abandoning, ignoring or just obscuring the obligations of humans and human institutions that create the robots.  Using language like “killer robots” can confuse the tax-paying public already easily lead by science fiction and runaway agency detection to believing that robots are sentient competitors.  This belief ironically serves to protect the people and organisations that are actually the moral actors.

So robots don’t kill or replace people; people use robots to kill or replace each other.  Does that mean there’s no problem with robots?  Of course not. Asking whether robots (or any other tools) should be subject to policy and regulation is a very sensible question.

In my first paper about robot ethics (you probably want to read the 2011 update for IJCAI, Just an Artifact: Why Machines are Perceived as Moral Agents), Phil Kime and I argued that as we gain greater experience of robots, we will stop reasoning about them so naïvely, and stop ascribing moral agency (and patiency [PDF, draft]) to them.  Whether or not we were right is an empirical question I think would be worth exploring – I’m increasingly doubting whether we were.  Emotional engagement with something that seems humanoid may be inevitable.  This is why one of the five Principles of Robotics (a UK policy document I coauthored, sponsored by the British engineering and humanities research councils) says “Robots are manufactured artefacts. They should not be designed in a deceptive way to exploit vulnerable users; instead their machine nature should be transparent.” Or in ordinary language, “Robots are artifacts; they should not be designed to exploit vulnerable users by evoking an emotional response or dependency. It should always be possible to tell a robot from a human.”

Nevertheless, I hope that by continuing to educate the public, we can at least help people make sensible conscious decisions about allocating their resources (such as time or attention) between real humans versus machines.  This is why I object to language like “killer robots.”  And this is part of the reason why my research group works on increasing the transparency of artificial intelligence.

However, maybe the emotional response we have to the apparently human-like threat of robots will also serve some useful purposes.  I did sign the “killer robot” letter, because although I dislike the headlines associated with it, the actual letter (titled “Autonomous Weapons: an Open Letter from AI & Robotics Researchers“) makes clear the nature of the threat of taking humans out of the loop on real-time kill decisions.   Similarly, I am currently interested in understanding the extent to which information technology, including AI, is responsible for the levelling off of wages since 1978.  I am still reading and learning about this; I think it’s quite possible that the problem is not information technology per se, but rather culture, politics and policy more generally.  However, 1978 was a long time ago.  If more pictures of the Terminator get more people attending to questions of income inequality and the future of labour, maybe that’s not a bad thing.


How not to measure security

A recent paper published by Smartmatic, a vendor of voting systems, caught my attention.

The first thing is that it’s published by Springer, which typically publishes peer-reviewed articles – which this is not. This is a marketing piece. It’s disturbing that a respected imprint like Springer would get into the business of publishing vendor white papers. There’s no disclaimer that it’s not a peer-reviewed piece, or any other indication that it doesn’t follow Springer’s historical standards.

The second, and more important issue, is that the article could not possibly have passed peer review, given some of its claims. I won’t go into the controversies around voting systems (a nice summary of some of those issues can be found on the OSET blog), but rather focus on some of the security metrics claims.

The article states, “Well-designed, special-purpose [voting] systems reduce the possibility of results tampering and eliminate fraud. Security is increased by 10-1,000 times, depending on the level of automation.”

That would be nice. However, we have no agreed-upon way of measuring security of systems (other than cryptographic algorithms, within limits). So the only way this is meaningful is if it’s qualified and explained – which it isn’t. Other studies, such as one I participated in (Applying a Reusable Election Threat Model at the County Level), have tried to quantify the risk to voting systems – our study measured risk in terms of the number of people required to carry out the attack. So is Smartmatic’s study claiming that they can make an attack require 10 to 1000 more people, 10 to 1000 times more money, 10 to 1000 times more expertise (however that would be measured!), or something entirely different?

But the most outrageous statement in the article is this:

The important thing is that, when all of these methods [for providing voting system security] are combined, it becomes possible to calculate with mathematical precision the probability of the system being hacked in the available time, because an election usually happens in a few hours or at the most over a few days. (For example, for one of our average customers, the probability was 1×10-19. That is a point followed by 19 [sic] zeros and then 1). The probability is lower than that of a meteor hitting the earth and wiping us all out in the next few years—approximately 1×10-7 (Chemical Industry Education Centre, Risk-Ed n.d.)—hence it seems reasonable to use the term ‘unhackable’, to the chagrin of the purists and to my pleasure.

As noted previously, we don’t know how to measure much of anything in security, and we’re even less capable of measuring the results of combining technologies together (which sometimes makes things more secure, and other times less secure). The claim that putting multiple security measures together gives risk probabilities with “mathematical precision” is ludicrous. And calling any system “unhackable” is just ridiculous, as Oracle discovered some years ago when the marketing department claimed their products were “unhackable”. (For the record, my colleagues in engineering at Oracle said they were aghast at the slogan.)

As Ron Rivest said at a CITP symposium, if voting vendors have “solved the Internet security and cybersecurity problem, what are they doing implementing voting systems? They should be working with the Department of Defense or financial industry. These are not solved problems there.” If Smartmatic has a method for obtaining and measuring security with “mathematical precision” at the level of 1019, they should be selling trillions of dollars in technology or expertise to every company on the planet, and putting everyone else out of business.

I debated posting this blog entry, because it may bring more attention to a marketing piece that should be buried. But I hope that writing this will dissuade anyone who might be persuaded by Smartmatic’s unsupported claims that masquerade as science. And I hope that it may embarrass Springer into rethinking their policy of posting articles like this as if they were scientific.