April 29, 2024

Federating the "big four" computer security conferences

Last year, I wrote a report about rebooting the CS publication process (Tinker post, full tech report; an abbreviated version has been accepted to appear as a Communications of the ACM viewpoint article). I talked about how we might handle four different classes of research papers (“top papers” which get in without incident, “bubble papers” which could well have been published if only there was capacity, “second tier” papers which are only of interest to limited communities, and “noncompetitive” papers that have no chance) and I suggested that we need to redesign how we handle our publication process, primarily by adopting something akin to arXiv.org on a massive scale. My essay goes into detail on the benefits and challenges of making this happen.

Of all the related ideas out there, the one I find most attractive is what the database community has done with Proceedings of the VLDB Endowment (see also, their FAQ). In short, if you want to publish a paper in VLDB, one of the top conferences in databases, you must submit your manuscript to the PVLDB. Submissions then go through a journal-like two-round reviewing process. You can submit a paper at any time and you’re promised a response within two months. Accepted papers are published immediately online and are also presented at the next VLDB conference.

I would love to extend the PVLDB idea to the field of computer security scholarship, but this is troublesome when our “big four” security conferences — ISOC NDSS, IEEE Security & Privacy (the “Oakland” conference), USENIX Security, and ACM CCS — are governed by four separate professional societies. Back in the old days (ten years ago?), NDSS and USENIX Security were the places you sent “systems” security work, while Oakland and CCS were where you sent “theoretical” security work. Today, that dichotomy doesn’t really exist any more. You pretty much just send your paper to the conference with next deadline. Pretty much the same community of people serves on each program committee and the same sorts of papers appear at every one of these conferences. (Although USENIX Security and NDSS may well still have a preference for “systems” work, the “theory” bias at Oakland and CCS is gone.)

My new idea: Imagine that we set up the “Federated Proceedings of Computer Security” (representing a federation of the four professional societies in question). It’s a virtual conference, publishing exclusively online, so it has no effective limits on the number of papers it might publish. Manuscripts could be submitted to FPCS with rolling deadlines (let’s say one every three months, just like we have now) and conference-like program committees would be assembled for each deadline. (PVLDB has continuous submissions and publications. We could do that just as well.) Operating like a conference PC, top papers would be accepted rapidly and be “published” with the speed of a normal conference PC process. The “bubble” papers that would otherwise have been rejected by our traditional conference process would now have a chance to be edited and go through a second round of review with the same reviewers. Noncompetitive papers would continue to be rejected, as always.

How would we connect FPCS back to the big four security conferences? Simple: once a paper is accepted for FPCS publication, it would appear at the next of the “big four” conferences. Initially, FPCS would operate concurrently with the regular conference submission process, but it could quickly replace it as well, just as PVLDB quickly became the exclusive mechanism for submitting a paper to VLDB.

One more idea: there’s no reason that FPCS submissions need to be guaranteed a slot in one of the big four security conferences. It’s entirely reasonable that we could increase the acceptance rate at FPCS, and have a second round of winnowing for which papers are presented at our conferences. This could either be designed as a “pull” process, where separate conference program committees pick and choose from the FPCS accepted papers, or it could be designed as a “push” process, where conferences give a number of slots to FPCS, which then decides which papers to “award” with a conference presentation. Either way, any paper that’s not immediately given a conference slot is still published, and any such paper that turns out to be a big hit can always be awarded with a conference presentation, even years after the fact.

This sort of two-tier structure has some nice benefits. Good-but-not-stellar papers get properly published, better papers get recognized as such, the whole process operates with lower latency than our current system. Furthermore, we get many fewer papers going around the submit/reject/revise/resubmit treadmill, thus lowering the workload on successive program committees. It’s full of win.

Of course, there are many complications that would get in the way of making this happen:

  • We need a critical mass to get this off the ground. We could initially roll it out with a subset of the big four, and/or with more widely spaced deadlines, but it would be great if the whole big four bought into the idea all at once.
  • We would need to harmonize things like page length and other formatting requirements, as well as have a unified policy on single vs. double-blind submissions.
  • We would need a suitable copyright policy, perhaps adopting something like the Usenix model where authors retain their copyright while agreeing to allow FPCS (and its constituent conferences) the right to republish their work. ACM and IEEE would require arm-twisting to go along with this.
  • We would need a governance structure for FPCS. That would include a steering committee for selecting the editor/program chairs, but who watches the watchers?
  • What do we do with our journals? FPCS changes our conference process around, but doesn’t touch our journals at all. Of course, the journals could also reinvent themselves, but that’s a separate topic.

In summary, my proposed Federated Proceedings of Computer Security adapts many of the good ideas developed by the database community with their PVLDB. We could adopt it incrementally for only one of the big four conferences or we could go whole hog and try to change all four at once.

Thoughts?

Building a better CA infrastructure

As several Tor project authors, Ben Adida and many others have written, our certificate authority infrastructure has the flaw that any one CA, anywhere on the planet, can issue a certificate for any web site, anywhere else on the planet. This was tolerable when the only game in town was VeriSign, but now that’s just untenable. So what solutions are available?

First, some non-solutions: Extended validation certs do nothing useful. Will users be properly trained to look for the extra changes in browser behavior as to scream when they’re absent via a normal cert? Fat chance. Similarly, certificate revocation lists buy you nothing if you can’t actually download them (a notable issue if you’re stuck behind the firewall of somebody who wants to attack you).

A straightforward idea is to track the certs you see over time and generate a prominent warning if you see something anomalous. This is available as a fully-functioning Firefox extension, Certificate Patrol. This should be built into every browser.

In addition to your first-hand personal observations, why not leverage other resources on the network to make their own observations? For example, while Google is crawling the web, it can easily save SSL/TLS certificates when it sees them, and browsers could use a real-time API much like Google SafeBrowsing. A research group at CMU has already built something like this, which they call a network notary. In essence, you can have multiple network services, running from different vantage points in the network, all telling you whether the cryptographic credentials you got match what others are seeing. Of course, if you’re stuck behind an attacker’s firewall, the attacker will similarly filter out all these sites.

UPDATE: Google is now doing almost exactly what I suggested.

There are a variety of other proposals out there, notably trying to leverage DNSSEC to enhance or supplant the need for SSL/TLS certificates. Since DNSSEC provides more control over your DNS records, it also provides more control over who can issue SSL/TLS certificates for your web site. If and when DNSSEC becomes universally supported, this would be a bit harder for attacker firewalls to filter without breaking everything, so I certainly hope this takes off.

Let’s say that future browsers properly use all of these tricks and can unquestionably determine for you, with perfect accuracy, when you’re getting a bogus connection. Your browser will display an impressive error dialog and refuses to load the web site. Is that sufficient? This will certainly break all the hotel WiFi systems that want to redirect you to an internal site where they can charge you to access the network. (Arguably, this sort of functionality belongs elsewhere in the software stack, such as through IEEE 802.21, notably used to connect AT&T iPhones to the WiFi service at Starbucks.) Beyond that, though, should the browser just steadfastly refuse to allow the connection? I’ve been to at least one organization whose internal WiFi network insists that it proxy all of your https sessions and, in fact, issues fabricated certificates that you’re expected to configure your browser to trust. We need to support that sort of thing when it’s required, but again, it would perhaps best be supported by some kind of side-channel protocol extension, not by doing a deliberate MITM attack on the crypto protocol.

Corner cases aside, what if you’re truly in a hostile environment and your browser has genuinely detected a network adversary? Should the browser refuse the connection, or should there be some other option? And if so, what would that be? Should the browser perhaps allow the connection (with much gnashing of teeth and throbbing red borders on the window)? Should previous cookies and saved state be hidden away? Should web sites like Gmail and Facebook allow users to have two separate passwords, one for “genuine” login and a separate one for “Yes, I’m in a hostile location, but I need to send and receive email in a limited but still useful fashion?”

[Editor’s note: you may also be interested in the many prior posts on this topic by Freedom to Tinker contributors: 1, 2, 3, 4, 5, 6, 7, 8 — as well as the “Emerging Threats to Online Trust: The Role of Public Policy and Browser Certificates” event that CITP hosted in DC last year with policymakers, industry, and activists.]

The case of Prof. Cronon and the FOIA requests for his private emails

Prof. William Cronon, from the University of Wisconsin, started a blog, Scholar as Citizen, wherein he critiqued Republican policies in the State of Wisconsin and elsewhere. I’m going to skip the politics and focus on the fact that the Republicans used Wisconsin’s FOIA mechanism to ask for a wide variety of his emails and they’re likely to get them.

Cronon believes this is a fishing expedition to find material to discredit him and he’s probably correct. He also notes that he scrupulously segregates his non-work-related emails into a private account (perhaps Gmail) while doing his work-related email using his wisc.edu address, as well he should.

What I find fascinating about the Cronon case is that it highlights a threat model for email privacy that doesn’t get much discussion among my professional peers. Sophisticated cryptographic mechanisms don’t protect emails against a FOIA request (or, for that matter, a sufficiently motivated systems administrator).

When I’ve worked in the past with lawyers when our communications weren’t privileged (i.e., opposing counsel would eventually receive every email we ever exchanged), we instead exchanged emails of the form “are you available for a phone call at 2pm?” and not much else. This is annoying when working on a lawsuit and it would completely grind to a halt the regular business of a modern adacemic.

While Cronon doesn’t want to abandon his wisc.edu address, consider the case that he could just forward his email to Gmail and have the university system delete its local copy (which is certainly an option for me with my rice.edu email). At that point, it becomes an interesting legal question of whether a FOIA request can compel production of content from his “private” email service. (And, future lawmaking could well explicitly extend the reach of FOIA to private accounts, particularly when many well-known politicians and others subject to FOIA deliberately conduct their professional business on private servers.)

Here’s another thing to ponder: When I send email from Gmail, it happily forges my rice.edu address in the from line. This allows me to use Gmail without most of the people who correspond with me ever knowing or caring that I’m using Gmail. By blurring the lines between my rice.edu and gmail.com email, am I also blurring the boundary of legal requests to discover my email? Since Rice is a private university, there are presumably no FOIA issues for me, but would it be any different for Prof. Cronon? Could or should present or future FOIA laws compel you to produce content from your “private” email service when you conflate it with your “professional” email address?

Or, leaving FOIA behind for the minute, could or should my employer have any additional privilege to look into my Gmail account when I’m using it for all of my professional emails and forging a rice.edu mail header?

One last alternative: Let’s say I appended some text like this at the bottom on my email:

My personal email is dwallach at gmail.com and my professional email is dwallach at rice.edu. Please use the former for personal matters and the latter for professional matters.

If I go to explicit lengths to separate the two email addresses, using separate services, and making it abundantly clear to all my correspondents which address serves which purpose, could or should that make for a legally significant difference in how FOIA treats my emails?

A public service rant: please fix your bibliography

Like many academics, I spend a lot of time reading and reviewing technical papers. I find myself continually surprised at the things that show up in the bibliography, so I thought it might be worth writing this down all in one place so that future conferences and whatnot might just hyperlink to this essay and say “Do That.”

Do not use BibTeX entries that are auto-generated from Citeseer, DBLP, the ACM Digital Library, or any other such thing. It’s stunning how many errors these contain. One glaring example: papers that appeared in the Symposium on Operating System Principles (SOSP) often turn out as citations to ACM Operating Systems Review. While that’s not incorrect, it’s also not the proper way to cite the paper. Another common error is that auto-generated citations inevitably have the wrong address, if they have it at all. (Hint: the ACM’s headquarters are in New York but almost all of their conferences are elsewhere. If you have “New York” anywhere in your bib file, there’s a good chance it should be something else.)

Leave out LNCS volume numbers and such for conferences. Many, many conferences have their proceedings appear as LNCS volumes. That’s nice, but it consumes unnecessary space in your bibliography. All I need to know is that we’re looking at CRYPTO ’86. I don’t need to know that it’s also LNCS vol. 263.

For most any paper, leave out the editors. I need to know who wrote the paper, not who was the program chair of the conference or editor of the journal.

For most any conference paper, leave out the publisher or organization. I don’t need to see Springer-Verlag, USENIX Association, or ACM Press. For journal papers, you need to use your discretion. Sometimes the name of the association is part of the journal name, so there’s no real need to repeat it. The only places where I regularly include organization names are technical reports, technical manuals and documentation, and published books.

Be consistent with how you cite any conference. For space reasons, you may wish to contract a conference name, only listing “SOSP ’03” rather than “Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03)”. That’s fine, at least for big conferences like SOSP where everybody should have heard of it, but use the same contraction throughout your bibliography. If you say “SOSP ’03” in one place and “Proceedings of SOSP ’03” somewhere else, that’s really annoying. Top tip: if you’re space constrained, the easiest thing to nuke is the string “Proceedings of”.

Make sure you have the right author list and with the proper initials. When you use BibTeX and you plug in “D.S. Wallach”, what comes out is “D. Wallach” since there’s no whitespace before the “S”. It’s damn near impossible to catch these things in your source file by eye, so you should do a regular-expression search ([A-Z].[A-Z].) or proofread the resulting bibliography. I’ve sometimes seen citations to papers where there were co-authors missing. Please double-check this sort of thing, often by visiting the authors’ home pages or conference home pages.

Be consistent with spelling out names versus using initials. Most bib styles just use initials rather than whole names. However, if you’re using a style that uses whole names, make sure that you’ve got the whole name for every citation in your bibliography. (Or switch styles.)

Always include a URL for blogs, Wikipedia articles, and newspaper articles (or, at least, newspaper articles since the dawn of the web). Stock BibTeX styles don’t know what URLs are, so the easiest solution is to use the “note” field. Make sure you put the url in a url{} environment so it becomes a hyperlink in the resulting PDF. I’m less confident I can advise you to always include a string like “Accessed on 11/08/2010”. But if you do it, do it consistently. Top tip: if you say usepackage{url} urlstyle{sf} in your LaTeX header, you’ll get more compact URLs than the stock typewriter font. See also, urlbst.

Don’t use a citation just to point to a software project. If you need to give credit to a software package you used, just drop a footnote and put the URL there. You only need a citation when you’re citing an actual paper of some sort. However, if there’s a research paper or book that was written by the authors of the software you used, and that paper or book describes the software, then you should cite the paper/book, and possibly include the URL for the software in the citation.

BibTeX sometimes fails when given a long URL in the note field. This manifests itself as a %-character and a newline inserted in the generated bbl file. (Why? I have no idea.) I have a short Perl script that I always work into my Makefile that post-processes the bbl file to fix this. So should you.

Eliminate the string “to appear” from your bibliography. Somebody years from now will look back in time and find these sorts of markers amusing. Worse, you can easily forget you put that in your bib file. It’s odd reading a manuscript in 2011 that cites a paper “to appear” in 2009.

For any conference, include the address, month, and year. And for the month, use three letter codes in your BibTeX (jan, feb, mar, apr, …) without quotation marks. The BibTeX style will deal with expanding those or using proper contractions. For the address, be consistent about how you handle them. Don’t say “Berkeley, CA” in one place and “Berkeley, California” in another. Also, this may be my U.S.-centric bias showing through, but you don’t need to add “U.S.A.” after “Berkeley, CA”. For international addresses, however, you should include the country and the state/region is optional. “Paris, France” is an easy one. I’ll have to defer to my Canadian readers to chime in about whether it’s better to cite “Vancouver, B.C., Canada”, “Vancouver, Canada”, or “Vancouver, B.C.”

But not the page numbers. Back in the old days, I once got razzed by a journal editor for not including page numbers in all my citations. (And you think I’m pedantic!) Given how many conferences are ditching printed proceedings altogether, it’s acceptable to leave these out now, including for old references that you’re far more likely to dig up online than in the printed proceedings.

Double-check any author with accents in their name and try to get it right. BibTeX doesn’t seem to play nicely with Unicode characters, at least for me, so you have to use the LaTeX codes instead. I’m sure David Mazières appreciates it when you spell his name right.

Double-check the capitalization of your paper titles. I tend to use the BibTeX “abbrv” style, which forces lower case for every word in your paper title, excepting the first word. You then have to put curly braces around words that truly need capital letters like BitTorrent or something. Some hand-written bib entries I’ve seen put curly braces around every word because they really, really want the entry with lots of capital letters. Don’t do that. Use a different bib style if you want different behavior, but then make sure your resulting bibliography has consistent capitalization for every entry. I don’t particularly care whether you go with lots of capital letters or not, but please be consistent about it. Also, double check that proper nouns are properly capitalized.

When you post your own papers online, post a bib entry next to them. This might encourage people to cite your paper properly. For your personal web page, you might like the Exhibit API, which can turn BibTeX entries into HTML, dynamically. (See Ben Adida’s page for one example.) If you’re setting up something for your whole lab or department, Drupal Scholar seems pretty good. (See my colleague Lydia Kavraki’s lab page for one example. I’m expecting we will adopt this across our entire CS department.)

And, last but not least, a citation is not a noun. When you cite a paper, it’s grammatically the same as making a parenthetical remark. If you need to refer to a paper as a noun, you need to use the author names (“Alice and Bob [23] showed that the halting problem is hard.”) or the name of the system (“The Chrome web browser [47,48] uses separate processes for each tab to improve fault isolation.”) If there are three or more authors, then you just use the first one with “et al.” (“Alice et al. [24] proved P is not equal to NP.” — note also the lack of italics for “et al.”) For the ACM journal style, there’s something called citeN rather than the usual cite, which is worth using. You can also look into using various additional packages to get similar functionality in any LaTeX paper style like natbib.

Obligatory caveat: A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. – Ralph Waldo Emerson

Things overheard on the WiFi from my Android smartphone

Today in my undergraduate security class, we set up a sniffer so we could run Wireshark and Mallory to listen in on my Android smartphone. This blog piece summarizes what we found.

  • Google properly encrypts traffic to Gmail and Google Voice, but they don’t encrypt traffic to Google Calendar. An eavesdropper can definitely see your calendar transactions and can likely impersonate you to Google Calendar.
  • Twitter does everything in the clear, but then your tweets generally go out for all the world to see, so there isn’t really a privacy concern. Twitter uses OAuth signatures, which appear to make it difficult for a third party to create forged tweets.
  • Facebook does everything in the clear, much like Twitter. My Facebook account’s web settings specify full-time encrypted traffic, but this apparently isn’t honored or supported by Facebook’s Android app. Facebook isn’t doing anything like OAuth signatures, so it may be possible to inject bogus posts as well. Also notable: one of the requests we saw going from my phone to the Facebook server included an SQL statement within. Could Facebook’s server have a SQL injection vulnerability? Maybe it was just FQL, which is ostensibly safe.
  • The free version of Angry Birds, which uses AdMob, appears to preserve your privacy. The requests going to the AdMob server didn’t have anything beyond the model of my phone. When I clicked an ad, it sent the (x,y) coordinates of my click and got a response saying to send me to a URL in the web browser.
  • Another game I tried, Galcon, had no network activity whatsoever. Good for them.
  • SoundHound and ShopSaavy transmit your fine GPS coordinates whenever you make a request to them. One of the students typed the coordinates into Google Maps and they nailed me to the proper side of the building I was teaching in.

What options do Android users have, today, to protect themselves against eavesdroppers? Android does support several VPN configurations which you could configure before you hit the road. That won’t stop the unnecessary transmission of your fine GPS coordinates, which, to my mind, neither SoundHound nor ShopSaavy have any business knowing. If that’s an issue for you, you could turn off your GPS altogether, but you’d have to turn it on again later when you want to use maps or whatever else. Ideally, I’d like the Market installer to give me the opportunity to revoke GPS privileges for apps like these.

Instructor note: live demos where you don’t know the outcome are always a dicey prospect. Set everything up and test it carefully in advance before class starts.