June 28, 2022

Archives for February 2011

A public service rant: please fix your bibliography

Like many academics, I spend a lot of time reading and reviewing technical papers. I find myself continually surprised at the things that show up in the bibliography, so I thought it might be worth writing this down all in one place so that future conferences and whatnot might just hyperlink to this essay and say “Do That.”

Do not use BibTeX entries that are auto-generated from Citeseer, DBLP, the ACM Digital Library, or any other such thing. It’s stunning how many errors these contain. One glaring example: papers that appeared in the Symposium on Operating System Principles (SOSP) often turn out as citations to ACM Operating Systems Review. While that’s not incorrect, it’s also not the proper way to cite the paper. Another common error is that auto-generated citations inevitably have the wrong address, if they have it at all. (Hint: the ACM’s headquarters are in New York but almost all of their conferences are elsewhere. If you have “New York” anywhere in your bib file, there’s a good chance it should be something else.)

Leave out LNCS volume numbers and such for conferences. Many, many conferences have their proceedings appear as LNCS volumes. That’s nice, but it consumes unnecessary space in your bibliography. All I need to know is that we’re looking at CRYPTO ’86. I don’t need to know that it’s also LNCS vol. 263.

For most any paper, leave out the editors. I need to know who wrote the paper, not who was the program chair of the conference or editor of the journal.

For most any conference paper, leave out the publisher or organization. I don’t need to see Springer-Verlag, USENIX Association, or ACM Press. For journal papers, you need to use your discretion. Sometimes the name of the association is part of the journal name, so there’s no real need to repeat it. The only places where I regularly include organization names are technical reports, technical manuals and documentation, and published books.

Be consistent with how you cite any conference. For space reasons, you may wish to contract a conference name, only listing “SOSP ’03” rather than “Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03)”. That’s fine, at least for big conferences like SOSP where everybody should have heard of it, but use the same contraction throughout your bibliography. If you say “SOSP ’03” in one place and “Proceedings of SOSP ’03” somewhere else, that’s really annoying. Top tip: if you’re space constrained, the easiest thing to nuke is the string “Proceedings of”.

Make sure you have the right author list and with the proper initials. When you use BibTeX and you plug in “D.S. Wallach”, what comes out is “D. Wallach” since there’s no whitespace before the “S”. It’s damn near impossible to catch these things in your source file by eye, so you should do a regular-expression search ([A-Z].[A-Z].) or proofread the resulting bibliography. I’ve sometimes seen citations to papers where there were co-authors missing. Please double-check this sort of thing, often by visiting the authors’ home pages or conference home pages.

Be consistent with spelling out names versus using initials. Most bib styles just use initials rather than whole names. However, if you’re using a style that uses whole names, make sure that you’ve got the whole name for every citation in your bibliography. (Or switch styles.)

Always include a URL for blogs, Wikipedia articles, and newspaper articles (or, at least, newspaper articles since the dawn of the web). Stock BibTeX styles don’t know what URLs are, so the easiest solution is to use the “note” field. Make sure you put the url in a url{} environment so it becomes a hyperlink in the resulting PDF. I’m less confident I can advise you to always include a string like “Accessed on 11/08/2010”. But if you do it, do it consistently. Top tip: if you say usepackage{url} urlstyle{sf} in your LaTeX header, you’ll get more compact URLs than the stock typewriter font. See also, urlbst.

Don’t use a citation just to point to a software project. If you need to give credit to a software package you used, just drop a footnote and put the URL there. You only need a citation when you’re citing an actual paper of some sort. However, if there’s a research paper or book that was written by the authors of the software you used, and that paper or book describes the software, then you should cite the paper/book, and possibly include the URL for the software in the citation.

BibTeX sometimes fails when given a long URL in the note field. This manifests itself as a %-character and a newline inserted in the generated bbl file. (Why? I have no idea.) I have a short Perl script that I always work into my Makefile that post-processes the bbl file to fix this. So should you.

Eliminate the string “to appear” from your bibliography. Somebody years from now will look back in time and find these sorts of markers amusing. Worse, you can easily forget you put that in your bib file. It’s odd reading a manuscript in 2011 that cites a paper “to appear” in 2009.

For any conference, include the address, month, and year. And for the month, use three letter codes in your BibTeX (jan, feb, mar, apr, …) without quotation marks. The BibTeX style will deal with expanding those or using proper contractions. For the address, be consistent about how you handle them. Don’t say “Berkeley, CA” in one place and “Berkeley, California” in another. Also, this may be my U.S.-centric bias showing through, but you don’t need to add “U.S.A.” after “Berkeley, CA”. For international addresses, however, you should include the country and the state/region is optional. “Paris, France” is an easy one. I’ll have to defer to my Canadian readers to chime in about whether it’s better to cite “Vancouver, B.C., Canada”, “Vancouver, Canada”, or “Vancouver, B.C.”

But not the page numbers. Back in the old days, I once got razzed by a journal editor for not including page numbers in all my citations. (And you think I’m pedantic!) Given how many conferences are ditching printed proceedings altogether, it’s acceptable to leave these out now, including for old references that you’re far more likely to dig up online than in the printed proceedings.

Double-check any author with accents in their name and try to get it right. BibTeX doesn’t seem to play nicely with Unicode characters, at least for me, so you have to use the LaTeX codes instead. I’m sure David Mazières appreciates it when you spell his name right.

Double-check the capitalization of your paper titles. I tend to use the BibTeX “abbrv” style, which forces lower case for every word in your paper title, excepting the first word. You then have to put curly braces around words that truly need capital letters like BitTorrent or something. Some hand-written bib entries I’ve seen put curly braces around every word because they really, really want the entry with lots of capital letters. Don’t do that. Use a different bib style if you want different behavior, but then make sure your resulting bibliography has consistent capitalization for every entry. I don’t particularly care whether you go with lots of capital letters or not, but please be consistent about it. Also, double check that proper nouns are properly capitalized.

When you post your own papers online, post a bib entry next to them. This might encourage people to cite your paper properly. For your personal web page, you might like the Exhibit API, which can turn BibTeX entries into HTML, dynamically. (See Ben Adida’s page for one example.) If you’re setting up something for your whole lab or department, Drupal Scholar seems pretty good. (See my colleague Lydia Kavraki’s lab page for one example. I’m expecting we will adopt this across our entire CS department.)

And, last but not least, a citation is not a noun. When you cite a paper, it’s grammatically the same as making a parenthetical remark. If you need to refer to a paper as a noun, you need to use the author names (“Alice and Bob [23] showed that the halting problem is hard.”) or the name of the system (“The Chrome web browser [47,48] uses separate processes for each tab to improve fault isolation.”) If there are three or more authors, then you just use the first one with “et al.” (“Alice et al. [24] proved P is not equal to NP.” — note also the lack of italics for “et al.”) For the ACM journal style, there’s something called citeN rather than the usual cite, which is worth using. You can also look into using various additional packages to get similar functionality in any LaTeX paper style like natbib.

Obligatory caveat: A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. – Ralph Waldo Emerson

Things overheard on the WiFi from my Android smartphone

Today in my undergraduate security class, we set up a sniffer so we could run Wireshark and Mallory to listen in on my Android smartphone. This blog piece summarizes what we found.

  • Google properly encrypts traffic to Gmail and Google Voice, but they don’t encrypt traffic to Google Calendar. An eavesdropper can definitely see your calendar transactions and can likely impersonate you to Google Calendar.
  • Twitter does everything in the clear, but then your tweets generally go out for all the world to see, so there isn’t really a privacy concern. Twitter uses OAuth signatures, which appear to make it difficult for a third party to create forged tweets.
  • Facebook does everything in the clear, much like Twitter. My Facebook account’s web settings specify full-time encrypted traffic, but this apparently isn’t honored or supported by Facebook’s Android app. Facebook isn’t doing anything like OAuth signatures, so it may be possible to inject bogus posts as well. Also notable: one of the requests we saw going from my phone to the Facebook server included an SQL statement within. Could Facebook’s server have a SQL injection vulnerability? Maybe it was just FQL, which is ostensibly safe.
  • The free version of Angry Birds, which uses AdMob, appears to preserve your privacy. The requests going to the AdMob server didn’t have anything beyond the model of my phone. When I clicked an ad, it sent the (x,y) coordinates of my click and got a response saying to send me to a URL in the web browser.
  • Another game I tried, Galcon, had no network activity whatsoever. Good for them.
  • SoundHound and ShopSaavy transmit your fine GPS coordinates whenever you make a request to them. One of the students typed the coordinates into Google Maps and they nailed me to the proper side of the building I was teaching in.

What options do Android users have, today, to protect themselves against eavesdroppers? Android does support several VPN configurations which you could configure before you hit the road. That won’t stop the unnecessary transmission of your fine GPS coordinates, which, to my mind, neither SoundHound nor ShopSaavy have any business knowing. If that’s an issue for you, you could turn off your GPS altogether, but you’d have to turn it on again later when you want to use maps or whatever else. Ideally, I’d like the Market installer to give me the opportunity to revoke GPS privileges for apps like these.

Instructor note: live demos where you don’t know the outcome are always a dicey prospect. Set everything up and test it carefully in advance before class starts.

What an expert on seals has to say

During the New Jersey voting machines lawsuit, the State defendants tried first one set of security seals and then another in their vain attempts to show that the ROM chips containing vote-counting software could be protected against fraudulent replacement. After one or two rounds of this, Plaintiffs engaged Dr. Roger Johnston, an expert on physical security and tamper-indicating seals, to testify about New Jersey’s insecure use of seals.

In his day job, Roger is a scientist at the Argonne National Laboratory, working to secure (among other things) our nation’s shipments of nuclear materials. He has many years of experience in the scientific study of security seals and their use protocols, as well as physical security in general. In this trial he testified in his private capacity, pro bono.

He wrote an expert report in which he analyzed the State’s proposed use of seals to secure voting machines (what I am calling “Seal Regime #2” and “Seal Regime #3”). For some of these seals, he and his team of technicians have much slicker techniques to defeat these seals than I was able to come up with. Roger chooses not to describe the methods in detail, but he has prepared this report for the public.

What I found most instructive about Roger’s report (including in version he has released publicly) is that he explains that you can’t get security just by looking at the individual seal. Instead, you must consider the entire seal use protocol:

Seal use protocols are the formal and informal procedures for choosing, procuring, transporting, storing, securing, assigning, installing, inspecting, removing, and destroying seals. Other components of a seal use protocol include procedures for securely keeping track of seal serial numbers, and the training provided to seal installers and inspectors. The procedures for how to inspect the object or container onto which seals are applied is another aspect of a seal use protocol. Seals and a tamper-detection program are no better than the seal use protocols that are in place.

He explains that inspecting seals for evidence of tampering is not at all straightforward. Inspection often requires removing the seal—for example, when you pull off an adhesive-tape seal that’s been tampered with, it behaves differently than one that’s undisturbed. A thorough inspection may involve comparing the seal with microphotographs of the same seal taken just after it was originally applied.

For each different seal that’s used, one can develop a training program for the seal inspectors. Because the state proposed to use four different kinds of seals, it would need four different sets training materials. Training all the workers who would inspect the State’s 10,000 voting machines would be quite expensive. With all those seals, just the seal inspections themselves would cost over $100,000 per election.

His report also discusses “security culture.”

“Security culture” is the official and unofficial, formal and informal behaviors, attitudes, perceptions, strategies, rules, policies, and practices associated with security. There is a consensus among security experts that a healthy security culture is required for effective security….

A healthy security culture is one in which security is integrated into everyday work, management, planning, thinking, rules, policies, and risk management; where security is considered as a key issue at all employee levels (and not just an afterthought); where security is a proactive, rather than reactive activity; where security measures are carefully defined, and frequently reviewed and studied; where security experts are involved in choosing and reviewing security strategies, practices, and products; where the organization constantly seeks proactively to understand vulnerabilities and provide countermeasures; where input on potential security problems are eagerly considered from any quarter; and where wishful thinking and denial is deliberately avoided in regards to threats, risks, adversaries, vulnerabilities, and the insider threat….

Throughout his deposition … Mr. Giles [Director of the NJ Division of Elections] indicates that he believes good physical security requires a kind of band-aid approach, where serious security vulnerabilities can be covered over with ad hoc fixes or the equivalent of software patches. Nothing could be further from the truth.

Roger Johnston’s testimony about the importance of seal use protocols—as considered separately from the individual seals themselves—made a strong impression on the judge: in the remedy that the Court ordered, seal use protocols as defined by Dr. Johnston played a prominent role.