April 24, 2014

avatar

Sequoia Announces Voting System with Published Code

Sequoia Voting Systems, one of the major e-voting companies, announced Tuesday that it will publish all of the source code for its forthcoming Frontier product. This is great news–an important step toward the kind of transparency that is necessary to make today’s voting systems trustworthy.

To be clear, this will not be a fully open source system, because it won’t give users the right to modify and redistribute the software. But it will be open in a very important sense, because everyone will be free to inspect, analyze, and discuss the code.

Significantly, the promise to publish code covers all of the systems involved in running the election and reporting results, “including precinct and central count digital optical scan tabulators, a robust election management and ballot preparation system, and tally, tabulation, and reporting applications”. I’m sure the research community will be eager to study this code.

The trend toward publishing election system source code has been building over the last few years. Security experts have long argued that public scrutiny tends to increase security, and is one of the best ways to justify public trust in a system. Independent studies of major voting vendors’ source code have found code quality to be disappointing at best, and vendors’ all-out resistance to any disclosure has eroded confidence further. Add to this an increasing number of independent open-source voting systems, and secret voting technologies start to look less and less viable, as the public starts insisting that longstanding principles of election transparency be extended to election technology. In short, the time had come for this step.

Still, Sequoia deserves a lot of credit for being the first major vendor to open its technology. How long until the other major vendors follow suit?

avatar

DRM by any other name: The latest from Hollywood

Sunday’s New York Times had an article, Studios’ Quest for Life After DVDs. To nobody’s surprise, consumers want to have convenient access to “their” media, wherever they happen to be, without all the annoying restrictions that come into play when you add DRM to the picture. To many people’s surprise, sales of DVDs (much less Blu-ray) are in trouble.

In the third quarter, studios’ home entertainment divisions generated about $4 billion, down 3.2 percent from a year ago, according to the Digital Entertainment Group, a trade consortium. But digital distribution contributed just $420 million, an increase of 18 percent.

Given that DVDs are really a luxury good (versus, say, food or electricity), the 3.2 percent drop seems like Hollywood is getting off easy. The growth in digital distribution is clearly getting attention, though. What’s going on here? I imagine several things. People sometimes miss their shows. Maybe the cable went out. Maybe the TiVo crashed. Maybe they’re on the road. Drop $2 at the iTunes Store and you’re good to go. That’s attractive and it’s real money.

Still, the article goes on to talk about… yet more DRM.

Standing in the way are technology hurdles — how to let consumers play a video on various devices without letting them share it with 10,000 close friends on a pirate site — and the reluctance of studios to cooperate too closely with rivals for reasons of antitrust scrutiny and sheer competitiveness.

And piracy, at least conceptually, would be less of a worry. The technology [Disney's Keychest] rests on cloud computing, in which huge troves of data are stored on remote servers so users have access from anywhere. Movies would be streamed from the cloud and never downloaded, making them harder to pirate.

Of course, this is baloney. If it’s going to work on my iPhone while I’m sitting in an airplane, the entire video needs to be stored there in advance. Furthermore, if the video is supposed to be “high definition,” that’s a bare minimum of 5 megabits/sec. (Broadcast HD is 20 megabits/sec and Blu-ray is 48 megabits/sec.) Most home DSL or cable modem connections either will never go that fast, or certainly cannot maintain those speeds without hiccups, particularly when sharing the line with other users. To do high quality video, you either have to have a real broadcast medium (cable, over-the-air, or satellite) or you have to download in advance and store on a hard drive.

And, of course, once you’ve stored the video, it’s just not that hard to extract it. And it always will be. The challenge for Hollywood is to change the incentives of the game. Maybe sell me a flat-rate subscription. Maybe bundle it with my DSL provider. But make the experience compelling enough and cheap enough, and I’ll do it. I regularly extract video from my TiVo and copy it to my iPhone via third-party software. It’s practically painless and it happens to yield files that I could share with the world, but I don’t. Why? Because there’s real downside (I’d rather not get sued, thanks), and no particular upside.

So, dearest Hollywood executive, consider that selling your content for a reduced price, with no DRM, is not the same thing as “giving it away.” If you allow third-parties to license your content and distribute it without DRM, you can still go after the “pirates”, yet you’ll allow normal people to enjoy your work without making them suffer for it. Yes, you may have kids copying content from one to the next, just like we used to do dubbing cassette tapes, but those incremental losses can and will be offset by the incremental gains of people enjoying your work and hitting the “buy” button.

avatar

There’s anonymity on the Internet. Get over it.

In a recent interview prominent antivirus developer Eugene Kaspersky decried the role of anonymity in cybercrime. This is not a new claim – it is touched on in the Commission on Cybersecurity for the 44th Presidency Report and Cybersecurity Act of 2009, among others – but it misses the mark. Any Internet design would allow anonymity. What renders our Internet vulnerable is primarily weakness of software security and authentication, not anonymity.

Consider a hypothetical of three Internet users: Alice, Bob, and Charlie. If Alice wants to communicate anonymously with Charlie, she may relay her messages through Bob. While Charlie knows Bob is an intermediary, Charlie does not know with whom he is ultimately communicating. For even greater anonymity Alice can pass her messages through multiple Bobs, and by applying cryptography she can ensure no individual Bob can piece together that she is communicating with Charlie. This basic approach to anonymity is remarkable in its independence of the Internet’s design: it only requires that some Bob(s) can and do run intermediary software. Even on an Internet where users could verify each other’s identity this means of anonymity would remain viable.

The sad state of software security – the latest DHS weekly bulletin alone identified over 40 “high severity” vulnerabilities – is what enables malicious users to exploit the Internet’s indelible capacity for anonymity. Modifying the prior hypothetical, suppose Alice now wants to spam, phish, denial of service (DoS) attack, or hack Charlie. After compromising Bob’s computer with malicious software (malware), Alice can send emails, host websites, and launch DoS attacks from it; Charlie knows Bob is apparently misbehaving, but has no means of discovering Alice’s role. Nearly all spam, phishing, and DoS attacks are now perpetrated with networks of compromised computers like Bob’s (botnets). At the writing of a July 2009 private sector report, just five botnets sourced nearly 75% of spam. Worse yet, botnets are increasingly self-perpetuating: spam and phishing websites propagate malware that compromises new computers for the botnet.

Shortcomings in authentication, the means of proving one’s identity either when necessary or at all times, are a secondary contributor to the Internet’s ills. Most applications rely on passwords, which are easily guessed or divulged through deception – the very mechanisms of most phishing and account hijacking. There are potential technical solutions that would enable a user to authenticate themselves without the risk of compromising accounts. But any approach will be undermined by weaknesses in underlying software security when a malicious party can trivially compromise a user’s computer.

The policy community is already trending towards acceptance of Internet anonymity and refocusing on software security and authentication; the recent White House Cyberspace Policy Review in particular emphasizes both issues. To the remaining unpersuaded, I can only offer at last a truism: There’s anonymity on the Internet. Get over it.

avatar

Net Neutrality: When is Network Management "Reasonable"?

Last week the FCC released its much-awaited Notice of Proposed Rulemaking (NPRM) on network neutrality. As expected, the NPRM affirms past FCC neutrality principles, and adds two more. Here’s the key language:

1. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from sending or receiving the lawful content of the user’s choice over the Internet.

2. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from running the lawful applications or using the lawful services of the user’s choice.

3. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from connecting to and using on its network the user’s choice of lawful devices that do not harm the network.

4. Subject to reasonable network management, a provider of broadband Internet access service may not deprive any of its users of the user’s entitlement to competition among network providers, application providers, service providers, and content providers.

5. Subject to reasonable network management, a provider of broadband Internet access service must treat lawful content, applications, and services in a nondiscriminatory manner.

6. Subject to reasonable network management, a provider of broadband Internet access service must disclose such information concerning network management and other practices as is reasonably required for users and content, application, and service providers to enjoy the protections specified in this part.

That’s a lot of policy packed into (relatively) few words. I expect that my colleagues and I will have a lot to say about these seemingly simple rules over the coming weeks.

Today I want to focus on the all-purpose exception for “reasonable network management”. Unpacking this term might tell us a lot about how the proposed rule would operate.

Here’s what the NPRM says:

Reasonable network management consists of: (a) reasonable practices employed by a provider of broadband Internet access to (i) reduce or mitigate the effects of congestion on its network or to address quality-of-service concerns; (ii) address traffic that is unwanted by users or harmful; (iii) prevent the transfer of unlawful content; or (iv) prevent the unlawful transfer of content; and (b) other reasonable network management practices.

The key word is “reasonable”, and in that respect the definition is nearly circular: in order to be “reasonable”, a network management practice must be (a) “reasonable” and directed toward certain specific ends, or (b) “reasonable”.

In the FCC’s defense, it does seek comments and suggestions on what the definition should be, and it does say that it intends to make case-by-case determinations in practice, as it did in the Comcast matter. Further, it rejects a “strict scrutiny” standard of the sort that David Robinson rightly criticized in a previous post.

“Reasonable” is hard to define because in real life every “network management” measure will have tradeoffs. For example, a measure intended to block copyright-infringing material would in practice make errors in both directions: it would block X% (less than 100%) of infringing material, while as a side-effect also blocking Y% (more than 0%) of non-infringing material. For what values of X and Y is such a measure “reasonable”? We don’t know.

Of course, declaring a vague standard rather than a bright-line rule can sometimes be good policy, especially where the facts on the ground are changing rapidly and it’s hard to predict what kind of details might turn out to be important in a dispute. Still, by choosing a case-by-case approach, the FCC is leaving us mostly in the dark about where it will draw the line between “reasonable” and “unreasonable”.

avatar

Intractability of Financial Derivatives

A new result by Princeton computer scientists and economists shows a striking application of computer science theory to the field of financial derivative design. The paper is Computational Complexity and Information Asymmetry in Financial Products by Sanjeev Arora, Boaz Barak, Markus Brunnermeier, and Rong Ge. Although computation has long been used in the financial industry for program trading and “the thermodynamics of money”, this new paper applies an entirely different kind of computer science: Intractability Theory.

A financial derivative is a contract specifying a payoff calculated by some formula based on the yields or prices of a specific collection of underlying assets. Consider the securitization of debt: a CDO (collateralized debt obligation) is a security formed by packaging together hundreds of home mortgages. The CDO is supposedly safer than the individual mortgages, since it spreads the risk (not every mortgage is supposed to default at once). Furthermore, a CDO is usually divided into “senior tranches” which are guaranteed not to drop in value as long as the total defaults in the pool does not exceed some threshhold; and “junior tranches” that are supposed to bear all the risk.

Trading in derivatives brought down Lehman Brothers, AIG, and many other buyers, based on mistaken assumptions about the independence of the underlying asset prices; they underestimated the danger that many mortgages would all default at the same time. But the new paper shows that in addition to that kind of danger, risks can arise because a seller can deliberately construct a derivative with a booby trap hiding in plain sight.

It’s like encryption: it’s easy to construct an encrypted message (your browser does this all the time), but it’s hard to decrypt without knowing the key (we believe even the NSA doesn’t have the computational power to do it). Similarly, the new result shows that the seller can construct the CDO with a booby trap, but even Goldman Sachs won’t have enough computational power to analyze whether a trap is present.

The paper shows the example of a high-volume seller who builds 1000 CDOs from 1000 asset-classes of home mortages. Suppose the seller knows that a few of those asset classes are “lemons” that won’t pay off. The seller is supposed to randomly distribute the asset classes into the CDOs; this minimizes the risk for the buyer, because there’s only a small chance that any one CDO has more than a few lemons. But the seller can “tamper” with the CDOs by putting most of the lemons in just a few of the CDOs. This has an enormous effect on the senior tranches of those tampered CDOs.

In principle, an alert buyer can detect tampering even if he doesn’t know which asset classes are the lemons: he simply examines all 1000 CDOs and looks for a suspicious overrepresentation of some of the asset classes in some of the CDOs. What Arora et al. show is that is an NP-complete problem (“densest subgraph”). This problem is believed to be computationally intractable; thus, even the most alert buyer can’t have enough computational power to do the analysis.

Arora et al. show it’s even worse than that: even after the buyer has lost a lot of money (because enough mortgages defaulted to devalue his “senior tranche”), he can’t prove that that tampering occurred: he can’t prove that the distribution of lemons wasn’t random. This makes it hard to get recourse in court; it also makes it hard to regulate CDOs.

Intractability Theory forms the basis for several of the technologies discussed on Freedom-to-Tinker: cryptography, digital-rights management, watermarking, and others. Perhaps now financial policy is now another one.

avatar

Sidekick Users' Data Lost: Blame the Cloud?

Users of Sidekick mobile phones saw much of their data disappear last week due to engineering problems at a Microsoft data center. Sidekick devices lose the contents of their memory when they don’t have power (e.g. when the battery is being changed), so all data is transmitted to a data center for permanent storage — which turned out not to be so permanent.

(The latest news is that some of the data, perhaps most of it, may turn out to be recoverable.)

A common response to this story is that this kind of danger is inherent in “cloud” computing services, where you rely on some service provider to take care of your data. But this misses the point, I think. Preserving data is difficult, and individual users tend to do a mediocre job of it. Admit it: You have lost your own data at some point. I know I have lost some of mine. A big, professionally run data center is much less likely to lose your data than you are.

It’s worth noting, too, that many cloud services face lower risk of this sort of problem. My email, for example, lives in the cloud–the “official copy” is on a central server, and copies are downloaded frequently to my desktop and laptop computers. If the server were to go up in flames, along with all of the server backups, I would still be in good shape, because I would still have copies of everything on my desktop and laptop.

For my email and similar services, the biggest risk to data integrity is not that the server will disappear altogether, but that the server will misbehave in subtle ways, causing my stored data to be corrupted over time. Thanks to the automatic synchronization between the server and my two clients (desktop and laptop), bad data could be replicated silently into all copies. In principle, some of the damage could be repaired later, using the server’s backups, but that’s a best case scenario.

This risk, of buggy software corrupting data, has always been with us. The question is not whether problems will happen in the cloud — in any complex technology, trouble comes with the territory — but whether the cloud makes a problem worse.

avatar

PrivAds: Behavioral Advertising without Tracking

There’s an interesting new paper out of Stanford and NYU, about a system called “PrivAds” that tries to provide behavioral advertising on web sites, without having a central server gather detailed information about user behavior. If the paper’s approach turns out to work, it could have an important impact on the debate about online advertising and privacy.

Advertisers have obvious reasons to show you ads that match your interests. You can benefit too, if you see ads that are relevant to your needs, rather than ones you don’t care about. The problem, as I argued in my Congressional testimony, comes when sites track your activities, and build up detailed files on you, in order to do the targeting.

PrivAds tries to solve this problem by providing behavioral advertising without having any server track you. The idea is that your own browser will track you, and analyze your online activities to build a model of your interests, but your browser won’t reveal this information to anyone else. When a site wants to show you an interest-based ad, your browser will choose the ad from a portfolio of ads offered by the ad service.

The tricky part is how your browser can do all of this without incidentally leaking your activities to the server. For example, the ad agency needs to know how many times each ad was shown. How can you report this to the ad service without revealing which ads you saw? PrivAds offers a solution based on fancy cryptography, so that the ad agency can aggregate reports from many users, without being able to see the users’ individual reports. Similarly, every interaction between your browser and the outside must be engineered carefully so that behavioral advertising can occur but the browser doesn’t telegraph your actions.

It’s not clear at this point whether the PrivAds approach will work, in the sense of protecting privacy without reducing the effectiveness of ad targeting. It’s clear, though, that PrivAds is asking an important question.

If the PrivAds approach succeeds, demonstrating that behavioral advertising does not require tracking, this doesn’t mean that companies will stop wanting to track you — but it does mean that they won’t be able to use advertising as an excuse to track you.

avatar

Chilling and Warming Effects

For several years, the Chilling Effects Clearinghouse has cataloging the effects of legal threats on online expression and helping people to understand their rights. Amid all the chilling we continue to see, it’s welcome to see rays of sunshine when bloggers stand up to threats, helping to stop the cycle of threat-and-takedown.

The BoingBoing team did this the other day when they got a legal threat from Ralph Lauren’s lawyers over an advertisement they mocked on the BoingBoing blog for featuring a stick-thin model. The lawyers claimed copyright infringement, saying “PRL owns all right, title, and interest in the original images that appear in the Advertisements.” Other hosts pull content “expeditiously” when they receive these notices (as Google did when notified of the post on Photoshop Disasters), and most bloggers and posters don’t counter-notify, even though Chilling Effects offers a handy counter-notification form.

Not BoingBoing, they posted the letter (and the image again) along with copious mockery, including an offer to feed the obviously starved models, and other sources picked up on the fun. The image has now been seen by many more people than would have discovered it in BoingBoing’s archives, in a pattern the press has nicknamed the “Streisand Effect.”

We use the term “chilling effects” to describe indirect legal restraints, or self-censorship, because most cease-and-desist letters don’t go through the courts. The lawyers (and non-lawyers) sending them rely on the in terrorem effects of threatened legal action, and often succeed in silencing speech for the cost of an e-postage stamp.

Actions like BoingBoing’s use the court of public opinion to counter this squelching. They fight legalese with public outrage (in support of legal analysis), and at the same time, help other readers to understand they have similar rights. Further, they increase the “cost” of sending cease-and-desists, as they make potential claimants consider the publicity risks being made to look foolish, bullying, or worse.

For those curious about the underlying legalities here, the Copyright Act makes clear that fair use, including for the purposes of commentary, criticism, and news reporting, is not an infringement of copyright. See Chilling Effects’ fair use FAQ. Yet the DMCA notice-and-takedown procedure encourages ISPs to respond to complaints with takedown, not investigation and legal balancing. Providers like BoingBoing’s Priority Colo should also get credit for their willingness to back their users’ responses.

As a result of the attention, Ralph Lauren apologized for the image: “After further investigation, we have learned that we are responsible for the poor imaging and retouching that resulted in a very distorted image of a woman’s body. We have addressed the problem and going forward will take every precaution to ensure that the caliber of our artwork represents our brand appropriately.”

May the warming (and proper attention to the health of fashion models) continue!

[cross-posted at Chilling Effects]

avatar

Privacy as a Social Problem, Not a Technology Problem

Bob Blakley had an interesting post Monday, arguing that technologists tend to frame the privacy issue poorly. (I would add that many non-technologists use the same framing.) Here’s a sample:

That’s how privacy works; it’s not about secrecy, and it’s not about control: it’s about sociability. Privacy is a social good which we give to one another, not a social order in which we control one another.

Technologists hate this; social phenomena aren’t deterministic and programmers can’t write code to make them come out right. When technologists are faced with a social problem, they often respond by redefining the problem as a technical problem they think they can solve.

The privacy framing that’s going on in the technology industry today is this:

Social Frame: Privacy is a social problem; the solution is to ensure that people use sensitive personal information only in ways that are beneficial to the subject of the information.

BUT as technologists we can’t … control peoples’ behavior, so we can’t solve this problem. So instead let’s work on a problem that sounds similar:

Technology Frame: Privacy is a technology problem; since we can’t make people use sensitive personal information sociably, the solution is to ensure that people never see others’ sensitive personal information.

We technologists have tried to solve the privacy problem in this technology frame for about a decade now, and, not surprisingly (information wants to be free!) we have failed.

The technology frame isn’t the problem. Privacy is the problem. Society can and routinely does solve the privacy problem in the social frame, by getting the vast majority of people to behave sociably.

This is an excellent point, and one that technologists and policymakers would be wise to consider. Privacy depends, ultimately, on people and institutions showing a reasonable regard for the privacy interests of others.

Bob goes on to argue that technologies should be designed to help these social mechanisms work.

A sociable space is one in which people’s social and antisocial actions are exposed to scrutiny so that normal human social processes can work.

A space in which tagging a photograph publicizes not only the identities of the people in the photograph but also the identities of the person who took the photograph and the person who tagged the photograph is more sociable than a space in which the only identity revealed is that of the person in the photograph – because when the picture of Jimmy holding a martini washes up on the HR department’s desk, Jimmy will know that Johnny took it (at a private party) and Julie tagged him – and the conversations humans have developed over tens of thousands of years to handle these situations will take place.

Again, this is an excellent and underappreciated point. But we need to be careful how far we take it. If we go beyond Bob’s argument, and we say that good design of the kind he advocates can completely solve the online privacy problem, then we have gone too far.

Technology doesn’t just move old privacy problems online. It also creates new problems and exacerbates old ones. In the old days, Johnny and Julie might have taken a photo of Jimmy drinking at the office party, and snail-mailed the photo to HR. That would have been a pretty hostile act. Now, the same harm can arise from a small misunderstanding: Johnny and Julie might assume that HR is more tolerant, or that HR doesn’t watch Facebook; or they might not realize that a site allows HR to search for photos of Jimmy. A photo might be taken by Johnny and tagged by Julie, even though Johnny and Julie don’t know each other. All in all, the photo scenario is more likely to happen today than in the pre-Net age.

This is just one example of what James Grimmelmann calls Accidental Privacy Spills. Grimmelmann tells the story of a private email message that was forwarded and re-forwarded to thousands of people, not by malice but because many people made the seemingly harmless decision to forward it to a few friends. This would never have happened with a personal letter. (Personal letters are sometimes publicized against the wishes of the author, but that’s very rare and wouldn’t have happened in the case Grimmelmann describes.) As the cost of capturing, transmitting, storing, and searching photos and other digital information falls to near-zero, it’s only natural that more capturing, transmitting, storing, and searching of information will occur.

Good design is not the whole solution to our privacy problem. But design has the huge advantage that we can get started on it right away, without needing to reach some sweeping societal agreement about what the rules should be. If you’re designing a product, or deciding which product to use, you can support good privacy design today.

avatar

Introducing FedThread: Opening the Federal Register

Today we are rolling out FedThread, a new way of interacting with the Federal Register. It’s the latest civic technology project from our team at Princeton’s Center for Information Technology Policy.

The Federal Register is “[t]he official daily publication for rules, proposed rules, and notices of Federal agencies and organizations, as well as executive orders and other presidential documents.” It’s published by the U.S. government, five days a week. The Federal Register tells citizens what their government is doing, in a lot more detail than the news media do.

FedThread makes the Federal Register more open and accessible. FedThread gives users:

  • collaborative annotation: Users can attach a note to any paragraph of the Federal Register; a conversation thread hangs off of every paragraph.
  • advanced search: Users can search the Federal Register (going back to 2000) on full text, by date, agency, and other fields.
  • customized feeds: Any search can be turned into an RSS feed. The resulting feed will include any new items that match the search query. Feeds can be delivered by email as well.

I think FedThread is a nice tool, but what’s most amazing to me is that the whole project took only ten days to create. Ten days ago we had no code, no HTML, no plan, not even a block diagram on a whiteboard. Today we launched a pretty good service.

How was this possible? Three things enabled it.

First, government provided the necessary data, for bulk download, in a format (XML) that’s easy for software to handle. This let us acquire and manipulate the underlying data (Federal Register contents) quickly. Folks at the Government Printing Office, National Archives and Records Administration, and Office of Science and Technology Policy all helped to make this possible. The roll-out of the government’s XML-based Federal Register site today is a significant step forward.

Second, we had great tools, such as Linux, Apache, MySql, Python, Django, jQuery, Datejs, and lxml. These tools are capable, flexible, and free, and they fit together in useful ways. More than once we faced a challenging engineering problem, only to find an existing tool that did almost exactly what we needed. When we needed a tool for managing inline discussion threads within a document, Adrian Holovaty, Jacob Kaplan-Moss and Jack Slocum graciously let us use their code from djangobook.com, which served as the basis for our system. Tools like these help small teams build big projects quickly.

Third, we have a amazing team. A project like this needs people who are super-smart, tireless, have great engineering judgment, and know how to work as a team. Joe Calandrino, Ari Feldman, Harlan Yu, and Bill Zeller all did fantastic work building the site. We set an insane schedule — at the start we guessed we had a 50% chance of having anything at all ready by today — and they raced ahead of the schedule, to the point that we expanded the project’s scope more than once. Great job, guys! Now please get some sleep.

We hope FedThread is a useful tool that brings more people into contact with the operations of their government — one small step in a larger trend of using technology to make government more transparent.