April 23, 2014

avatar

Acceptance rates at security conferences

How competitive are security research conferences? Several people have been tracking this information. Mihai Christodorescu has a nice chart of acceptance and submission rates over time. The most recent data point we have is the 2009 Usenix Security Symposium, which accepted 26 of 176 submissions (a 14.8% acceptance ratio, consistent with recent years). Acceptance rates like that, at top security conferences, are now pretty much the norm.

With its deadline one week ago, ACM CCS 2009 got 317 submissions this year (up from 274 last year, and approx. 300 the year before) and ESORICS 2009, with a submission deadline last Friday night, got 222 submissions (up from about 170 last year).

Think about that: right now there are over 500 research manuscripts in the field of computer security fighting it out, and maybe 15-20% of those will get accepted. (And that’s not counting research in cryptography, or the security-relevant papers that regularly appear in the literature on operating systems, programming languages, networking, and other fields.) Ten years ago, when I first began as an assistant professor, there would be half as many papers submitted. At the time, I grumbled that we had too many security conferences and that the quality of the proceedings suffered. Well, that problem seems mostly resolved, except rather than having half as many conferences, we now have a research community that’s apparently twice as large. I suppose that’s a good thing, although there are several structural problems that we, the academic security community, really need to address.

  • What are we supposed to do with the papers that are rejected, resubmitted, rejected again, and so on? Clearly, some of this work has value and never gets seen. Should we make greater use of the arXiv.org pre-print service? There’s a crypto and computer security section, but it’s not heavily used. Alternatively, we could join on on the IACR Cryptology ePrint Archive or create our own.
  • Should we try to make the conference reviewing systems more integrated across conferences, such that PC comments from one conference show up in a subsequent conference, and the subsequent PC can see both drafts of the paper? This would make conference reviewing somewhat more like journal reviewing, providing a measure of consistency from one conference to the next.
  • Low acceptance ratios don’t necessarily achieve higher quality proceedings. There’s a distinctive problem that occurs when a conference has a huge PC and only three of them review any given paper. Great papers still get in and garbage papers are still rejected, but the outcomes for papers “on the bubble” becomes more volatile, depending on whether those papers get the right reviewers. Asking PC members to do more reviews is just going to lower the quality of the reviews or discourage people from accepting positions on PCs. Adding additional PC members could help, but it also can be unwieldy to manage a large PC, and there will be even more volatility.
  • Do we need another major annual computer security conference? Should more workshops be willing to take conference-length submissions? Or should our conferences raise their acceptance rates up to something like 25%, even if that means compressed presentations and the end of printed proceedings? How much “good” work is out there, if only there was a venue in which to print it?

About the only one of these ideas I don’t like is adding another top-level security conference. Otherwise, we could well do all-of-the-above, and that would be a good thing. I’m particularly curious if arbitrarily increasing the acceptance rates would resolve some of the volatility issues on the bubble. I think I’d rather that our conferences err on the side of taking the occasional bad/broken/flawed paper rather than rejecting the occasional good-but-misunderstood paper.

Maybe we just need to harness the power of our graduate students. When you give a grad student a paper to review, they treat it like a treasure and write a detailed review, even if they may not be the greatest expert in the field. Conversely, when you give an overworked professor a paper to review, they blast through it, because they don’t have the time to spend a full day on any given paper. Well, it’s not like our grad students have anything better to be doing. But does the additional time they can spend per paper make up for the relative lack of experience and perspective? Can they make good accept-or-reject judgements for papers on the bubble?

For additional thoughts on this topic, check out Matt Welsh’s thoughts on scaling systems conferences. He argues that there’s a real disparity between the top programs / labs and everybody else and that it’s worthwhile to take steps to fix this. (I’ll argue that security conferences don’t seem to have this particular problem.) He also points out what I think is the deeper problem, which is that hotshot grad students must get themselves a long list of publications to have a crack at a decent faculty job. This was emphatically not the case ten years ago.

See also, Birman and Schneider’s CACM article (behind a paywall, unless your university has a site license). They argue that the focus on short, incremental results is harming our field’s ability to have impact. They suggest improving the standing of journals in the tenure game and they suggest disincentivizing people from submitting junk / preliminary papers by creating something of a short-cut reject that gets little or no feedback and also, by virtue of the conferences not being blind-review, creates the possibility that a rejected paper could harm the submitter’s reputation.

avatar

Chinese Internet Censorship: See It For Yourself

You probably know already that the Chinese government censors Internet traffic. But you might not have known that you can experience this censorship yourself. Here’s how:

(1) Open up another browser window or tab, so you can browse without losing this page.

(2) In the other window, browse to baidu.com. This is a search engine located in China.

(3) Search for an innocuous term such as “freedom to tinker”. You’ll see a list of search results, sent back by Baidu’s servers in China.

(4) Now return to the main page of baidu.com, and search for “Falun Gong”. [Falun Gong is a dissident religious group that is banned in China.]

(5) At this point your browser will report an error — it might say that the connection was interrupted or that the page could not be loaded. What really happened is that the Great Firewall of China saw your Internet packets, containing the forbidden term “Falun Gong”, and responded by disrupting your connection to Baidu.

(6) Now try to go back to the Baidu home page. You’ll find that this connection is disrupted too. Just a minute ago, you could visit the Baidu page with no trouble, but now you’re blocked. The Great Firewall is now cutting you off from Baidu, because you searched for Falun Gong.

(7) After a few minutes, you’ll be allowed to connect to Baidu again, and you can do more experiments.

(Reportedly, users in China see different behavior. When they search for “Falun Gong” on Baidu, the connection isn’t blocked. Instead, they see “sanitized” search results, containing only pages that criticize Falun Gong.)

If you do try more experiments, feel free to report your results in the comments.

avatar

Stimulus transparency and the states

Yesterday, I testified at a field hearing of the U.S. House Committee on Oversight and Government Reform. The hearing title was The American Recovery and Reinvestment Act of 2009: The Role of State and Local Governments.

My written testimony addressed plans to put stimulus data on the Internet, primarily at Recovery.gov. There have been promising signs, but important questions remain open, particularly about stimulus funds that are set to flow through the states. I was reacting primarily to the most recent round of stimulus-related guidance from the Office of Management and Budget (dated April 3).

Based on the probing questions about Recovery.gov that were asked by members from both parties, I’m optimistic that Congressional oversight will be a powerful force to encourage progress toward greater online transparency.

avatar

FBI's Spyware Program

Note: I worked for the Department of Justice’s Computer Crime and Intellectual Property Section (CCIPS) from 2001 to 2005. The documents discussed below mention a memo written by somebody at CCIPS during the time I worked there, but absolutely everything I say below reflects only my personal thoughts and impressions about the documents released to the public today.

Two years ago, Kevin Poulsen broke the news that the FBI had successfully deployed spyware to help catch a student sending death threats to his high school. The FBI calls the tool a CIPAV for “computer and internet protocol address verifier.”

We learned today that Kevin filed a Freedom of Information Act request (along with EFF and CNet News) asking for other information about CIPAVs. The FBI has responded, Kevin made the 152 pages available, and I just spent the past half hour skimming them.

Here are some unorganized impressions:

  • The 152 pages don’t take long to read, because they have been so heavily redacted. The vast majority of the pages have no substantive content at all.
  • Page one may be the most interesting page. Someone at CCIPS, my old unit, cautions that “While the technique is of indisputable value in certain kinds of cases, we are seeing indications that it is being used needlessly by some agencies, unnecessarily raising difficult legal questions (and a risk of suppression) without any countervailing benefit,”
  • On page 152, the FBI’s Cryptographic and Electronic Analysis Unit (CEAU) “advised Pittsburgh that they could assist with a wireless hack to obtain a file tree, but not the hard drive content.” This is fascinating on several levels. First, what wireless hack? The spyware techniques described in Poulsen’s reporting are deployed when a target is unlocatable, and the FBI tricks him or her into clicking a link. How does wireless enter the picture? Don’t you need to be physically proximate to your target to hack them wirelessly? Second, why could CEAU “assist . . . to obtain a file tree, but not the hard drive content.” That smells like a legal constraint, not a technical one. Maybe some lawyer was making distinctions based on probable cause?
  • On page 86, the page summarizing the FBI’s Special Technologies and Applications Office (STAO) response to the FOIA request, STAO responds that they have included an “electronic copy of ‘Magic Quadrant for Information Access Technology’” on cd-rom. Is that referring to this Gartner publication, and if so, what does this have to do with the FOIA request? I’m hoping one of the uber geeks reading this blog can tie FBI spyware to this phrase.
  • Pages 64-80 contain the affidavit written to justify the use of the CIPAV in the high school threat case. I had seen these back when Kevin first wrote about them, but if you haven’t seen them yet, you should read them.
  • It definitely appears that the FBI is obtaining search warrants before installing CIPAVs. Although this is probably enough to justify grabbing IP addresses and information packed in a Windows registry, it probably is not enough alone to justify tracing IP addresses in real time. The FBI probably needs a pen register/trap and trace order in addition to the warrant to do that under 18 U.S.C. 3123. Although pen registers are mentioned a few times in these documents–particularly in the affidavit mentioned above–many of the documents simply say “warrant.” This is probably not of great consequence, because if FBI has probable cause to deploy one of these, they can almost certainly justify a pen register order, but why are they being so sloppy?

Two final notes: First, I twittered my present sense impressions while reading the documents, which was an interesting experiment for me, if not for those following me. If you want to follow me, visit my profile.

Second, if you see anything else in the documents that bear scrutiny, please leave them in the comments of this post.

avatar

On open source vs. disclosed source voting systems

Sometimes, working on voting seems like running on a treadmill. Old disagreements need to be argued again and again. As long as I’ve been speaking in public about voting, I’ve discussed the need for voting systems’ source code to be published, as in a book, to create transparency into how the systems operate. Or, put another way, trade secrecy is anathema to election transparency. We, the people, have an expectation that our election policies and procedures are open to scrutiny, and that critical scrutiny is essential to the exercise of our Democracy. (Cue the waving flags.)

On Tuesday, the Election Technology Council (a trade association of four major American voting system manufacturers) put out a white paper on open-source and voting systems. It’s nice to see them finally talking about the issue, but there’s a distinctive cluelessness in this paper about what, exactly, open source is and what it means for a system to be secure. For example, in a sidebar titled “Disclosed vs. Open: Clarifying Misconceptions”, the report states:

… taking a software product that was once proprietary and disclosing its full source code to the general public will result in a complete forfeiture of the software’s security … Although computer scientists chafe at the thought of “security through obscurity,” there remains some underlying truths to the idea that software does maintain a level of security through the lack of available public knowledge of the inner workings of a software program.

Really? No. Disclosing the source code only results in a complete forfeiture of the software’s security if there was never any security there in the first place. If the product is well-engineered, then disclosing the software will cause no additional security problems. If the product is poorly-engineered, then the lack of disclosure only serves the purpose of delaying the inevitable.

What we learned from the California Top-to-Bottom Review and the Ohio EVEREST study was that, indeed, these systems are unquestionably and unconscionably insecure. The authors of those reports (including yours truly) read the source code, which certainly made it easier to identify just how bad these systems were, but it’s fallacious to assume that a prospective attacker, lacking the source code and even lacking our reports, is somehow any less able to identify and exploit the flaws. The wide diversity of security flaws exploited on a regular basis in Microsoft Windows completely undercuts the ETC paper’s argument. The bad guys who build these attacks have no access to Windows’s source code, but they don’t need it. With common debugging tools (as well as customized attacking tools), they can tease apart the operation of the compiled, executable binary applications and engineer all sorts of malware.

Voting systems, in this regard, are just like Microsoft Windows. We have to assume, since voting machines are widely dispersed around the country, that attackers will have the opportunity to tear them apart and extract the machine code. Therefore, it’s fair to argue that source disclosure, or the lack thereof, has no meaningful impact on the operational security of our electronic voting machines. They’re broken. They need to be repaired.

The ETC paper also seems to confuse disclosed source (published, as in a book) with open source (e.g., under a GPL or BSD license). For years, I’ve been suggesting that the former would be a good thing, and I haven’t taken a strong position on the latter. Even further, the ETC paper seems to assume that open source projects are necessarily driven by volunteer labor, rather than by companies. See, for example:

… if proprietary software is ripped open through legislative fiat, whatever security features exist are completely lost until such time that the process improvement model envisioned by the open source community has an opportunity to take place (Hall 2007).

There are plenty of open-source projects that are centrally maintained by commercial companies with standard, commercial development processes. There’s no intrinsic reason that software source code disclosure or open licensing makes any requirements on the development model. And, just because software is suddenly open, there’s indeed no guarantee that a community of developers will magically appear and start making improvements. Successful open source communities arise when distinct developers or organizations share a common need.

Before I go on, I’ll note that the ETC report has cherry-picked citations to support its cause, and those citations are neither being used honestly nor completely. The above citation to Joe Hall’s 2007 EVT paper distorts Hall’s opinions. His actual paper, which surveys 55 contracts between voting system manufacturers and the jurisdictions that buy them, makes very specific recommendations, including that these contracts should allow for source code review in the pre-election, post-election, and litigation stages of the election cycle. Hall is arguing in favor of source code disclosure, yet the citation to his paper would seem to have him arguing against it!

So, how would open source (or disclosed source) work in the commercial voting machine industry? The ETC paper suggests that it might be difficult to get an open-source project off the ground with distributed development by volunteers. This is perfectly believable. Consequently, that’s not how it would ever work. As I said above, I’ve always advocated for disclosure, but let’s think through how a genuine open-source voting machine might succeed. A likely model is that a state, coalition of states, or even the Federal government would need to step in to fund the development of these systems. The development organization would most likely be a non-profit company, along the lines of the many Federally Funded Research and Development Centers (FFRDCs) already in existence. Our new voting FFRDC, presumably sponsored by the EAC, would develop the source code and get it certified. It would also standardize the hardware interface, allowing multiple vendors to build compatible hardware. Because the code would be open, these hardware vendors would be free to make enhancements or to write device drivers, which would then go back to the FFRDC for integration and testing. (The voting FFRDC wouldn’t try to take the code to existing voting systems, so there’s no worry about stealing their IP. It’s easier to just start from scratch.) My hypothetical FFRDC model isn’t too far away from how Linux is managed, or even how Microsoft manages Windows, so this isn’t exactly science fiction.

The ETC paper asks who would develop new features as might be required by a customer and suggests that the “lack of a clear line of accountability for maintaining an open source project” would hinder this process. In my hypothetical FFRDC model, the customer could commission their own programmers to develop the enhancement and contribute this back to the FFRDC for integration. The customer could also directly commission the FFRDC or any other third-party to develop something that suits their needs. They could test it locally in mock elections, but ultimately their changes would need to pass muster with the FFRDC and the still-independent certification and testing authorities. (Or, the FFRDC could bring the testing/certification function in-house or NIST could be funded to do it. That’s a topic for another day.) And, of course, other countries would be free to adopt our hardware and customize our software for their own needs.

Unfortunately, such a FFRDC structure seems unlikely to occur in the immediate future. Who’s going to pay for it? Like it or not, we’re going to be stuck with the present voting system manufacturers and their poorly engineered products for a while. The challenge is to keep clarity on what’s necessary to improve their security engineering. By requiring source code disclosure, we improve election transparency, and we can keep pressure on the vendors to improve their systems. If the security flaws found two years ago in the California and Ohio studies haven’t been properly fixed by one vendor while another is making progress, that progress will be visible and we can recommend that the slow vendor be dropped.

A secondary challenge is to counter the sort of mischaracterizations that are still, sadly, endemic from the voting system industry. Consider this quote:

If policymakers attempt to strip the intellectual property from voting system software, it raises two important areas of concern. The first is the issue of property takings without due process and compensation which is prohibited under the United States Constitution. The second area of concern is one of security. The potential for future gains with software security will be lost in the short-term until such time that an adequate product improvement model is incorporated. Without a process improvement model in place, any security features present in current software would be lost. At the same time, the market incentives for operating and supporting voting products would be eliminated.

For starters, requiring the disclosure of source code does not represent any sort of “taking” of the code. Vendors would still own copyrights to their code. Furthermore, they may still avail themselves of the patent system to protect their intellectual property. Their only loss would be of the trade secret privilege.

And again, we’ve got the bogus security argument combined with some weird process improvement model business. Nobody says that disclosing your source requires you to change your process. Instead, the undisputed fact that these vendors’ systems are poorly engineered requires them to improve their processes (and what have they been doing for the past two years?), which would be necessary regardless of whether the source code is publicly disclosed.

Last but not least, it’s important to refute one more argument:

Public oversight is arguably just as diminished in an open source environment since the layperson is unable to read and understand software source code adequately enough to ensure total access and comprehension. … However, effective oversight does not need to be predicated on the removal of intellectual property protections. Providing global access to current proprietary software would undermine the principles of intellectual property and severely damage the viability of the current marketplace.

Nobody has ever suggested that election transparency requires the layperson to be able to understand the source code. Rather, it requires the layperson to be able to trust their newspaper, or political party, or Consumer Reports, or the League of Women Voters, to be able to retain their own experts and reach their own conclusions.

As to the “principles of intellectual property”, the ETC paper conflates and confuses copyright, patent, and trade secrets. Any sober analysis must consider these distinctly. As to the “viability of the current marketplace”, the market demands products that are meaningfully secure, usable, reliable, and affordable. So long as the present vendors fail on one or more of these counts, their markets will suffer.

Update: Gordon Haff chimes in at cnet, on how the ETC misconceptions about how open source development procedures work are far from atypical.

avatar

Thoughts on juries for intellectual property lawsuits

Here’s a thought that’s been stuck in my head for the past few days. It would never be practical, but it’s an interesting idea to ponder. David Robinson tells me I’m not the first one to have this idea, either, but anyway…

Consider what happens in intellectual property lawsuits, particularly concerning infringement of patents or misappropriation of trade secrets. Ultimately, a jury is being asked to rule on essential questions like whether a product meets all the limitations of a patent’s claims, or whether a given trade secret was already known to the public. How does the jury reach a verdict? They’re presented with evidence and with testimony from experts for the plaintiff and experts for the defendant. The jurors then have to sort out whose arguments they find most persuasive. (Of course, a juror who doesn’t follow the technical details could well favor an expert who they find more personable, or better able to handle the pressure of a hostile cross-examination.)

One key issue in many patent cases is the interpretation of particular words in the patent. If they’re interpreted narrowly, then the accused product doesn’t infringe, because it doesn’t have the specific required feature. Conversely, if the claims are interpreted broadly enough for the accused product to infringe the patent, then the prior art to the patent might also land within the broader scope of the claims, thus rendering the patent invalid as either anticipated by or rendered obvious by the prior art. Even though the court will construe the claims in its Markman ruling, there’s often still plenty of room for argument. How, then, does the jury sort out the breadth of the terms of a patent? Again, they watch dueling experts, dueling attorneys, and so forth, and then reach their own conclusions.

What’s missing from this game is a person having ordinary skill in the art at the time of the invention (PHOSITA). One of the jobs of an expert is to interpret the claims of a patent from the perspective of a PHOSITA. Our hypothetical PHOSITA’s perspective is also essential to understanding how obvious a patent’s invention is relative to the prior art. The problem I want to discuss today is that in most cases, nobody on the jury is a PHOSITA or anywhere close. What would happen if they were?

With a hypothetically jury of PHOSITAs, they would be better equipped to read the patent themselves and directly answer questions that are presently left for experts to argue. Does this patent actually enable a PHOSITA to build the gadget (i.e., to “practice the invention”)? Would the patent in question be obvious given a description of the prior art at the time? Or, say in a trade secret case, is the accused secret something that’s actually well-known? With a PHOSITA jury, they could reason about these questions from their own perspective. Imagine, in a software-related case, being able to put source code in front of a jury and have them be able to read it independently. This idea effectively rethinks the concept of a jury of one’s peers. What if juries on technical cases were “peers” with the technology that’s on trial? It would completely change the game.

This idea would never fly for a variety of reasons. First and foremost, good luck finding enough people with the right skill sets and lacking any conflict of interest. Even if our court system had enough data on the citizenry to be able to identify suitable jury candidates (oh, the privacy concerns!), some courts’ jurisdictions simply don’t have enough citizens with the necessary skills and lack of conflicts. What would you do? Move the lawsuit to a different jurisdiction? How many parts of the country have a critical mass of engineers/scientists with the necessary skills? Furthermore, a lot of the wrangling in a lawsuit boils down to controlling what information is and is not presented to the jury. If the jury shows up with their own knowledge, they may reach their own conclusions based on that knowledge, and that’s something that many lawyers and courts would find undesirable because they couldn’t control it.

Related discussion shows up in a recent blog post by Julian Sanchez and a followup by Eric Rescorla. Sanchez’s thesis is that it’s much easier to make a scientific argument that sounds plausible, while being completely bogus, than it is to refute such a argument, because the refutation could well require building up an explanation of the relevant scientific background. He’s talking about climate change scientists vs. deniers or about biologists refuting “intelligent design” advocates, but the core of the argument is perfectly applicable here. A PHOSITA jury would have a better chance of seeing through bogus arguments and consequently they would be more likely to reach a sound verdict.

avatar

Fascinating New Blog: ComputationalLegalStudies.com

I was inspired to post the essay I discussed in the prior post by the debut of the best new law blog I have seen in a long time, Computational Legal Studies, featuring the work of Daniel Katz and Michael Bommarito, both graduate students in the University of Michigan’s political science department.

Every single blog they have posted has caused me to smack my head once for not having thought of the idea first, and a second time for not having their datasets and skillz. Their visualization of who has gotten TARP funds and how they’re connected to legislators deserves to be printed on posters and hung up in newsrooms across the country (not to mention in offices on Capitol Hill). They’ve also shown good taste by building a bridge to this blog, linking favorably back to the great CITP work led by David Robinson on government openness.

I will have more to say about Dan and Mike’s new blog in the weeks and months to come, but for now it is enough to welcome them to the blogosphere.

avatar

Computer Programming and the Law: A New Research Agenda

By my best estimate, at least twenty different law professors on the tenure track at American law schools once held a job as a professional computer programmer. I am proud to say that two of us work at my law school.

Most of these hyphenate lawprof-coders rarely write any code today, and this is a shame. There are many good reasons why the world would be a better place if we began to integrate computer programming into legal scholarship (and more generally, into law and policy).

Two years ago, I wrote a blog post for a lawprof blog exploring this idea. I promised a follow-up post, but never delivered. A year later, I expanded the idea into an essay, which the good people at the Villanova Law Review agreed to publish sometime later this year. With this post, I am releasing a slightly-outdated draft of the essay for the first time to the public. You can download it at SSRN.

In the abstract, I say:

This essay proposes a new interdisciplinary research agenda called Computer Programming and the Law. By harnessing the power of computer programming, legal scholars can develop better tools, data, and insights for advancing their research interests. This essay presents the case for this new research agenda, highlights some examples of those who have begun to blaze the trail, and includes code samples to demonstrate the power and potential of developing software for legal scholarship. The code samples in this essay can be run like a piece of software—thanks to a technique known as literate programming—making this the world’s first law review article that is also a working computer program.

If you have any interest in the intersection of technology and policy (in other words, if you read this blog), please read the essay and let me know what you think. Unlike many law review articles, this one is short. And how bad could it be? It contains 350 lines of perl! (Wait, don’t answer that!)