April 19, 2024

A public service rant: please fix your bibliography

Like many academics, I spend a lot of time reading and reviewing technical papers. I find myself continually surprised at the things that show up in the bibliography, so I thought it might be worth writing this down all in one place so that future conferences and whatnot might just hyperlink to this essay and say “Do That.”

Do not use BibTeX entries that are auto-generated from Citeseer, DBLP, the ACM Digital Library, or any other such thing. It’s stunning how many errors these contain. One glaring example: papers that appeared in the Symposium on Operating System Principles (SOSP) often turn out as citations to ACM Operating Systems Review. While that’s not incorrect, it’s also not the proper way to cite the paper. Another common error is that auto-generated citations inevitably have the wrong address, if they have it at all. (Hint: the ACM’s headquarters are in New York but almost all of their conferences are elsewhere. If you have “New York” anywhere in your bib file, there’s a good chance it should be something else.)

Leave out LNCS volume numbers and such for conferences. Many, many conferences have their proceedings appear as LNCS volumes. That’s nice, but it consumes unnecessary space in your bibliography. All I need to know is that we’re looking at CRYPTO ’86. I don’t need to know that it’s also LNCS vol. 263.

For most any paper, leave out the editors. I need to know who wrote the paper, not who was the program chair of the conference or editor of the journal.

For most any conference paper, leave out the publisher or organization. I don’t need to see Springer-Verlag, USENIX Association, or ACM Press. For journal papers, you need to use your discretion. Sometimes the name of the association is part of the journal name, so there’s no real need to repeat it. The only places where I regularly include organization names are technical reports, technical manuals and documentation, and published books.

Be consistent with how you cite any conference. For space reasons, you may wish to contract a conference name, only listing “SOSP ’03” rather than “Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03)”. That’s fine, at least for big conferences like SOSP where everybody should have heard of it, but use the same contraction throughout your bibliography. If you say “SOSP ’03” in one place and “Proceedings of SOSP ’03” somewhere else, that’s really annoying. Top tip: if you’re space constrained, the easiest thing to nuke is the string “Proceedings of”.

Make sure you have the right author list and with the proper initials. When you use BibTeX and you plug in “D.S. Wallach”, what comes out is “D. Wallach” since there’s no whitespace before the “S”. It’s damn near impossible to catch these things in your source file by eye, so you should do a regular-expression search ([A-Z].[A-Z].) or proofread the resulting bibliography. I’ve sometimes seen citations to papers where there were co-authors missing. Please double-check this sort of thing, often by visiting the authors’ home pages or conference home pages.

Be consistent with spelling out names versus using initials. Most bib styles just use initials rather than whole names. However, if you’re using a style that uses whole names, make sure that you’ve got the whole name for every citation in your bibliography. (Or switch styles.)

Always include a URL for blogs, Wikipedia articles, and newspaper articles (or, at least, newspaper articles since the dawn of the web). Stock BibTeX styles don’t know what URLs are, so the easiest solution is to use the “note” field. Make sure you put the url in a url{} environment so it becomes a hyperlink in the resulting PDF. I’m less confident I can advise you to always include a string like “Accessed on 11/08/2010”. But if you do it, do it consistently. Top tip: if you say usepackage{url} urlstyle{sf} in your LaTeX header, you’ll get more compact URLs than the stock typewriter font. See also, urlbst.

Don’t use a citation just to point to a software project. If you need to give credit to a software package you used, just drop a footnote and put the URL there. You only need a citation when you’re citing an actual paper of some sort. However, if there’s a research paper or book that was written by the authors of the software you used, and that paper or book describes the software, then you should cite the paper/book, and possibly include the URL for the software in the citation.

BibTeX sometimes fails when given a long URL in the note field. This manifests itself as a %-character and a newline inserted in the generated bbl file. (Why? I have no idea.) I have a short Perl script that I always work into my Makefile that post-processes the bbl file to fix this. So should you.

Eliminate the string “to appear” from your bibliography. Somebody years from now will look back in time and find these sorts of markers amusing. Worse, you can easily forget you put that in your bib file. It’s odd reading a manuscript in 2011 that cites a paper “to appear” in 2009.

For any conference, include the address, month, and year. And for the month, use three letter codes in your BibTeX (jan, feb, mar, apr, …) without quotation marks. The BibTeX style will deal with expanding those or using proper contractions. For the address, be consistent about how you handle them. Don’t say “Berkeley, CA” in one place and “Berkeley, California” in another. Also, this may be my U.S.-centric bias showing through, but you don’t need to add “U.S.A.” after “Berkeley, CA”. For international addresses, however, you should include the country and the state/region is optional. “Paris, France” is an easy one. I’ll have to defer to my Canadian readers to chime in about whether it’s better to cite “Vancouver, B.C., Canada”, “Vancouver, Canada”, or “Vancouver, B.C.”

But not the page numbers. Back in the old days, I once got razzed by a journal editor for not including page numbers in all my citations. (And you think I’m pedantic!) Given how many conferences are ditching printed proceedings altogether, it’s acceptable to leave these out now, including for old references that you’re far more likely to dig up online than in the printed proceedings.

Double-check any author with accents in their name and try to get it right. BibTeX doesn’t seem to play nicely with Unicode characters, at least for me, so you have to use the LaTeX codes instead. I’m sure David Mazières appreciates it when you spell his name right.

Double-check the capitalization of your paper titles. I tend to use the BibTeX “abbrv” style, which forces lower case for every word in your paper title, excepting the first word. You then have to put curly braces around words that truly need capital letters like BitTorrent or something. Some hand-written bib entries I’ve seen put curly braces around every word because they really, really want the entry with lots of capital letters. Don’t do that. Use a different bib style if you want different behavior, but then make sure your resulting bibliography has consistent capitalization for every entry. I don’t particularly care whether you go with lots of capital letters or not, but please be consistent about it. Also, double check that proper nouns are properly capitalized.

When you post your own papers online, post a bib entry next to them. This might encourage people to cite your paper properly. For your personal web page, you might like the Exhibit API, which can turn BibTeX entries into HTML, dynamically. (See Ben Adida’s page for one example.) If you’re setting up something for your whole lab or department, Drupal Scholar seems pretty good. (See my colleague Lydia Kavraki’s lab page for one example. I’m expecting we will adopt this across our entire CS department.)

And, last but not least, a citation is not a noun. When you cite a paper, it’s grammatically the same as making a parenthetical remark. If you need to refer to a paper as a noun, you need to use the author names (“Alice and Bob [23] showed that the halting problem is hard.”) or the name of the system (“The Chrome web browser [47,48] uses separate processes for each tab to improve fault isolation.”) If there are three or more authors, then you just use the first one with “et al.” (“Alice et al. [24] proved P is not equal to NP.” — note also the lack of italics for “et al.”) For the ACM journal style, there’s something called citeN rather than the usual cite, which is worth using. You can also look into using various additional packages to get similar functionality in any LaTeX paper style like natbib.

Obligatory caveat: A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. – Ralph Waldo Emerson

On kids and social networking

Sunday’s New York Times has an article about cyber-bullying that’s currently #1 on their “most popular” list, so this is clearly a topic that many find close and interesting.

The NYT article focuses on schools’ central role in policing their students social behavior. While I’m all in favor of students being taught, particularly by older peer students, the importance of self-moderating their communications, schools face a fundamental quandary:

Nonetheless, administrators who decide they should help their cornered students often face daunting pragmatic and legal constraints.

“I have parents who thank me for getting involved,” said Mike Rafferty, the middle school principal in Old Saybrook, Conn., “and parents who say, ‘It didn’t happen on school property, stay out of my life.’ ”

Judges are flummoxed, too, as they wrestle with new questions about protections on student speech and school searches. Can a student be suspended for posting a video on YouTube that cruelly demeans another student? Can a principal search a cellphone, much like a locker or a backpack?

It’s unclear. These issues have begun their slow climb through state and federal courts, but so far, rulings have been contradictory, and much is still to be determined.

Here’s one example that really bothers me:

A few families have successfully sued schools for failing to protect their children from bullies. But when the Beverly Vista School in Beverly Hills, Calif., disciplined Evan S. Cohen’s eighth-grade daughter for cyberbullying, he took on the school district.

After school one day in May 2008, Mr. Cohen’s daughter, known in court papers as J. C., videotaped friends at a cafe, egging them on as they laughed and made mean-spirited, sexual comments about another eighth-grade girl, C. C., calling her “ugly,” “spoiled,” a “brat” and a “slut.”

J. C. posted the video on YouTube. The next day, the school suspended her for two days.

“What incensed me,” said Mr. Cohen, a music industry lawyer in Los Angeles, “was that these people were going to suspend my daughter for something that happened outside of school.” On behalf of his daughter, he sued.

If schools don’t have the authority to discipline J. C., as the court apparently ruled, and her father is more interested in defending her than disciplining her for clearly inappropriate behavior, then can we find some other solution?

Of course, there’s nothing new about bullying among the early-teenage set. I will refrain from dredging such stories from my own pre-Internet pre-SMS childhood, but there’s no question that these kids are at an important stage of their lives, where they’re still learning important and essential concepts, like how to relate to their peers and the importance (or lack thereof) of their peers’ approval, much less understanding where to draw boundaries between their public self and their private feelings. It’s certainly important for us, the responsible adults of the world, to recognize that nothing we can say or do will change the fundamentally social awkwardness of this age. There will never be an ironclad solution that eliminates kids bullying, taunting, or otherwise hurting one other.

Given all that, the rise of electronic communications (whether SMS text messaging, Facebook, email, or whatever else) changes the game in one very important way. It increases the velocity of communications. Every kid now has a megaphone for reaching their peers, whether directly through a Facebook posting that can reach hundreds of friends at once or indirectly through the viral spread of embarrassing gossip from friend to friend, and that speed can cause salacious information to get around well before any traditional mechanisms (parental, school administrative, or otherwise) can clamp down and assert some measure of sanity. For possibly the ultimate example of this, see a possibly fictitious yet nonetheless illustrative girl’s written hookup list posted by her brother as a form of revenge against her ratting out his hidden stash of beer. Needless to say, in one fell swoop, this girl’s life got turned upside down with no obvious way to repair the social damage.

Alright, we invented this social networking mess. Can we fix it?

The only mechanism I feel is completely inappropriate is this:

But Deb Socia, the principal at Lilla G. Frederick Pilot Middle School in Dorchester, Mass., takes a no-nonsense approach. The school gives each student a laptop to work on. But the students’ expectation of privacy is greatly diminished.

“I regularly scan every computer in the building,” Ms. Socia said. “They know I’m watching. They’re using the cameras on their laptops to check their hair and I send them a message and say: ‘You look great! Now go back to work.’ It’s a powerful way to teach kids: ‘I’m paying attention, you need to do what’s right.’ ”

Not only do I object to the Big Brother aspect of this (do schools still have 1984 on their reading lists?), but turning every laptop into a surveillance device is a hugely tempting target for a variety of bad actors. Kids need and deserve some measure of privacy, at least to the extent that schools already give kids a measure of privacy against arbitrary and unjustified search and seizure.

Surveillance is widely considered to be more acceptable when it’s being done by parents, who might insist they have their kids’ passwords in order to monitor them. Of course, kids of this age will reasonably want or need to have privacy from their parents as well (e.g., we don’t want to create conditions where victims of child abuse can be easily locked down by their family).

We could try to invent technical means to slow down the velocity of kids’ communications, which could mean adding delays as a function of the fanout of a message, or even giving viewers of any given message a kill switch over it, that could reach back and nuke earlier, forwarded copies to other parties. Of course, such mechanisms could be easily abused. Furthermore, if Facebook were to voluntarily create such a mechanism, kids might well migrate to other services that lack the mechanism. If we legislate that children of a certain age must have technically-imposed communication limits across the board (e.g., limited numbers of SMS messages per day), then we could easily get into a world where a kid who hits a daily quota cannot communicate in an unexpectedly urgent situation (e.g., when stuck at an alcoholic party and needing a sober ride home).

Absent any reasonable technical solution, the proper answer is probably to restrict our kids’ access to social media until we think they’re mature enough to handle it, to make sure that we, the parents, educate them about the proper etiquette, and that we take responsibility for disciplining our kids when they misbehave.

Gymnastics Scores and Grade Inflation

The gymnastics scoring in this year’s Olympics has generated some controversy, as usual. Some of the controversy feel manufactured: NBC tried to create a hubbub over Nastia Liukin losing the uneven bars gold medal on the Nth tiebreaker; but top-level sporting events whose rules do not admit ties must sometimes decide contests by tiny margins.

A more interesting discussion relates to a change in the scoring system, moving from the old 0.0 to 10.0 scale, to a new scale that adds together an “A score” measuring the difficulty of the athlete’s moves and a “B score” measuring how well the moves were performed. The B score is on the old 0-10 scale, but the A score is on an open-ended scale with fixed scores for each constituent move and bonuses for continuously connecting a series of moves.

One consequence of the new system is that there is no predetermined maximum score. The old system had a maximum score, the legendary “perfect 10”, whose demise is mourned old-school gymnastics gurus like Bela Karolyi. But of course the perfect 10 wasn’t really perfect, at least not in the sense that a 10.0 performance was unsurpassable. No matter how flawless a gymnast’s performance, it is always possible, at least in principle, to do better, by performing just as flawlessly while adding one more flip or twist to one of the moves. The perfect 10 was in some sense a myth.

What killed the perfect 10, as Jordan Ellenberg explained in Slate, was a steady improvement in gymnastic performance that led to a kind of grade inflation in which the system lost its ability to reward innovators for doing the latest, greatest moves. If a very difficult routine, performed flawlessly, rates 10.0, how can you reward an astonishingly difficult routine, performed just as flawlessly? You have to change the scale somehow. The gymnastics authorities decided to remove the fixed 10.0 limit by creating an open-ended difficulty scale.

There’s an interesting analogy to the “grade inflation” debate in universities. Students’ grades and GPAs have increased slowly over time, and though this is not universally accepted, there is plausible evidence that today’s students are doing better work than past students did. (At the very least, today’s student bodies at top universities are drawn from a much larger pool of applicants than before.) If you want a 3.8 GPA to denote the same absolute level of performance that it denoted in the past, and if you also want to reward the unprecendented performance of today’s very best students, then you have to expand the scale at the top somehow.

But maybe the analogy from gymnastics scores to grades is imperfect. The only purpose of gymnastics scores is to compare athletes, to choose a winner. Grades have other purposes, such as motivating students to pay attention in class, or rewarding students for working hard. Not all of these purposes require consistency in grading over time, or even consistency within a single class. Which grading policy is best depends on which goals we have in mind.

One thing is clear: any discussion of gymnastics scoring or university grading will inevitably be colored by nostalgic attachment to the artists or students of the past.

Live Webcast: Future of News, May 14-15

We’re going to do a live webcast of our workshop on “The Future of News“, which will be held tomorrow and Thursday (May 14-15) in Princeton. Attending the workshop (free registration) gives you access to the speakers and other attendees over lunch and between sessions, but if that isn’t practical, the webcast is available.

Here are the links you need:

Sessions are scheduled for 10:45-noon and 1:30-5:00 on Wed., May 14; and 9:30-12:30 and 1:30-3:15 on Thur., May 15.

May 14-15: Future of News workshop

We’re excited to announce a workshop on “The Future of News“, to be held May 14 and 15 in Princeton. It’s sponsored by the Center for InfoTech Policy at Princeton.

Confirmed speakers include Kevin Anderson, David Blei, Steve Borriss, Dan Gillmor, Matthew Hurst, Markus Prior, David Robinson, Clay Shirky, Paul Starr, and more to come.

The Internet—whose greatest promise is its ability to distribute and manipulate information—is transforming the news media. What’s on offer, how it gets made, and how end users relate to it are all in flux. New tools and services allow people to be better informed and more instantly up to date than ever before, opening the door to an enhanced public life. But the same factors that make these developments possible are also undermining the institutional rationale and economic viability of traditional news outlets, leaving profound uncertainty about how the possibilities will play out.

Our tentative topics for panels are:

  • Data mining, visualization, and interactivity: To what extent will new tools for visualizing and artfully presenting large data sets reduce the need for human intermediaries between facts and news consumers? How can news be presented via simulation and interactive tools? What new kinds of questions can professional journalists ask and answer using digital technologies?
  • Economics of news: How will technology-driven changes in advertising markets reshape the news media landscape? Can traditional, high-cost methods of newsgathering support themselves through other means? To what extent will action-guiding business intelligence and other “private journalism”, designed to create information asymmetries among news consumers, supplant or merge with globally accessible news?
  • The people formerly known as the audience: How effectively can users collectively create and filter the stream of news information? How much of journalism can or will be “devolved” from professionals to networks of amateurs? What new challenges do these collective modes of news production create? Could informal flows of information in online social networks challenge the idea of “news” as we know it?
  • The medium’s new message: What are the effects of changing news consumption on political behavior? What does a public life populated by social media “producers” look like? How will people cope with the new information glut?

Registration: Registration, which is free, carries two benefits: We’ll have a nametag waiting for you when you arrive, and — this is the important part — we’ll feed you lunch on both days. To register, please contact CITP’s program assistant, Laura Cummings-Abdo, at . Include your name, affiliation and email address.