September 25, 2017

Open Access to Scholarly Publications at Princeton

In its September 2011 meeting, the Faculty of Princeton University voted unanimously for a policy of open access to scholarly publications:

“The members of the Faculty of Princeton University strive to make their publications openly accessible to the public. To that end, each Faculty member hereby grants to The Trustees of Princeton University a nonexclusive, irrevocable, worldwide license to exercise any and all copyrights in his or her scholarly articles published in any medium, whether now known or later invented, provided the articles are not sold by the University for a profit, and to authorize others to do the same. This grant applies to all scholarly articles that any person authors or co-authors while appointed as a member of the Faculty, except for any such articles authored or co-authored before the adoption of this policy or subject to a conflicting agreement formed before the adoption of this policy. Upon the express direction of a Faculty member, the Provost or the Provost’s designate will waive or suspend application of this license for a particular article authored or co-authored by that Faculty member.

“The University hereby authorizes each member of the faculty to exercise any and all copyrights in his or her scholarly articles that are subject to the terms and conditions of the grant set forth above. This authorization is irrevocable, non-assignable, and may be amended by written agreement in the interest of further protecting and promoting the spirit of open access.”

Basically, this means that when professors publish their academic work in the form of articles in journals or conferences, they should not sign a publication contract that prevents the authors from also putting a copy of their paper on their own web page or in their university’s public-access repository.

Most publishers in Computer Science (ACM, IEEE, Springer, Cambridge, Usenix, etc.) already have standard contracts that are compatible with open access. Open access doesn’t prevent these publishers from having a pay wall, it allows other means of finding the same information. Many publishers in the natural sciences and the social sciences also have policies compatible with open access.

But some publishers in the sciences, in engineering, and in the humanities have more restrictive policies. Action like this by Princeton’s faculty (and by the faculties at more than a dozen other universities in 2009-10) will help push those publishers into the 21st century.

The complete report of the Committee on Open Access is available here.

A public service rant: please fix your bibliography

Like many academics, I spend a lot of time reading and reviewing technical papers. I find myself continually surprised at the things that show up in the bibliography, so I thought it might be worth writing this down all in one place so that future conferences and whatnot might just hyperlink to this essay and say “Do That.”

Do not use BibTeX entries that are auto-generated from Citeseer, DBLP, the ACM Digital Library, or any other such thing. It’s stunning how many errors these contain. One glaring example: papers that appeared in the Symposium on Operating System Principles (SOSP) often turn out as citations to ACM Operating Systems Review. While that’s not incorrect, it’s also not the proper way to cite the paper. Another common error is that auto-generated citations inevitably have the wrong address, if they have it at all. (Hint: the ACM’s headquarters are in New York but almost all of their conferences are elsewhere. If you have “New York” anywhere in your bib file, there’s a good chance it should be something else.)

Leave out LNCS volume numbers and such for conferences. Many, many conferences have their proceedings appear as LNCS volumes. That’s nice, but it consumes unnecessary space in your bibliography. All I need to know is that we’re looking at CRYPTO ’86. I don’t need to know that it’s also LNCS vol. 263.

For most any paper, leave out the editors. I need to know who wrote the paper, not who was the program chair of the conference or editor of the journal.

For most any conference paper, leave out the publisher or organization. I don’t need to see Springer-Verlag, USENIX Association, or ACM Press. For journal papers, you need to use your discretion. Sometimes the name of the association is part of the journal name, so there’s no real need to repeat it. The only places where I regularly include organization names are technical reports, technical manuals and documentation, and published books.

Be consistent with how you cite any conference. For space reasons, you may wish to contract a conference name, only listing “SOSP ’03” rather than “Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03)”. That’s fine, at least for big conferences like SOSP where everybody should have heard of it, but use the same contraction throughout your bibliography. If you say “SOSP ’03” in one place and “Proceedings of SOSP ’03” somewhere else, that’s really annoying. Top tip: if you’re space constrained, the easiest thing to nuke is the string “Proceedings of”.

Make sure you have the right author list and with the proper initials. When you use BibTeX and you plug in “D.S. Wallach”, what comes out is “D. Wallach” since there’s no whitespace before the “S”. It’s damn near impossible to catch these things in your source file by eye, so you should do a regular-expression search ([A-Z].[A-Z].) or proofread the resulting bibliography. I’ve sometimes seen citations to papers where there were co-authors missing. Please double-check this sort of thing, often by visiting the authors’ home pages or conference home pages.

Be consistent with spelling out names versus using initials. Most bib styles just use initials rather than whole names. However, if you’re using a style that uses whole names, make sure that you’ve got the whole name for every citation in your bibliography. (Or switch styles.)

Always include a URL for blogs, Wikipedia articles, and newspaper articles (or, at least, newspaper articles since the dawn of the web). Stock BibTeX styles don’t know what URLs are, so the easiest solution is to use the “note” field. Make sure you put the url in a url{} environment so it becomes a hyperlink in the resulting PDF. I’m less confident I can advise you to always include a string like “Accessed on 11/08/2010”. But if you do it, do it consistently. Top tip: if you say usepackage{url} urlstyle{sf} in your LaTeX header, you’ll get more compact URLs than the stock typewriter font. See also, urlbst.

Don’t use a citation just to point to a software project. If you need to give credit to a software package you used, just drop a footnote and put the URL there. You only need a citation when you’re citing an actual paper of some sort. However, if there’s a research paper or book that was written by the authors of the software you used, and that paper or book describes the software, then you should cite the paper/book, and possibly include the URL for the software in the citation.

BibTeX sometimes fails when given a long URL in the note field. This manifests itself as a %-character and a newline inserted in the generated bbl file. (Why? I have no idea.) I have a short Perl script that I always work into my Makefile that post-processes the bbl file to fix this. So should you.

Eliminate the string “to appear” from your bibliography. Somebody years from now will look back in time and find these sorts of markers amusing. Worse, you can easily forget you put that in your bib file. It’s odd reading a manuscript in 2011 that cites a paper “to appear” in 2009.

For any conference, include the address, month, and year. And for the month, use three letter codes in your BibTeX (jan, feb, mar, apr, …) without quotation marks. The BibTeX style will deal with expanding those or using proper contractions. For the address, be consistent about how you handle them. Don’t say “Berkeley, CA” in one place and “Berkeley, California” in another. Also, this may be my U.S.-centric bias showing through, but you don’t need to add “U.S.A.” after “Berkeley, CA”. For international addresses, however, you should include the country and the state/region is optional. “Paris, France” is an easy one. I’ll have to defer to my Canadian readers to chime in about whether it’s better to cite “Vancouver, B.C., Canada”, “Vancouver, Canada”, or “Vancouver, B.C.”

But not the page numbers. Back in the old days, I once got razzed by a journal editor for not including page numbers in all my citations. (And you think I’m pedantic!) Given how many conferences are ditching printed proceedings altogether, it’s acceptable to leave these out now, including for old references that you’re far more likely to dig up online than in the printed proceedings.

Double-check any author with accents in their name and try to get it right. BibTeX doesn’t seem to play nicely with Unicode characters, at least for me, so you have to use the LaTeX codes instead. I’m sure David Mazières appreciates it when you spell his name right.

Double-check the capitalization of your paper titles. I tend to use the BibTeX “abbrv” style, which forces lower case for every word in your paper title, excepting the first word. You then have to put curly braces around words that truly need capital letters like BitTorrent or something. Some hand-written bib entries I’ve seen put curly braces around every word because they really, really want the entry with lots of capital letters. Don’t do that. Use a different bib style if you want different behavior, but then make sure your resulting bibliography has consistent capitalization for every entry. I don’t particularly care whether you go with lots of capital letters or not, but please be consistent about it. Also, double check that proper nouns are properly capitalized.

When you post your own papers online, post a bib entry next to them. This might encourage people to cite your paper properly. For your personal web page, you might like the Exhibit API, which can turn BibTeX entries into HTML, dynamically. (See Ben Adida’s page for one example.) If you’re setting up something for your whole lab or department, Drupal Scholar seems pretty good. (See my colleague Lydia Kavraki’s lab page for one example. I’m expecting we will adopt this across our entire CS department.)

And, last but not least, a citation is not a noun. When you cite a paper, it’s grammatically the same as making a parenthetical remark. If you need to refer to a paper as a noun, you need to use the author names (“Alice and Bob [23] showed that the halting problem is hard.”) or the name of the system (“The Chrome web browser [47,48] uses separate processes for each tab to improve fault isolation.”) If there are three or more authors, then you just use the first one with “et al.” (“Alice et al. [24] proved P is not equal to NP.” — note also the lack of italics for “et al.”) For the ACM journal style, there’s something called citeN rather than the usual cite, which is worth using. You can also look into using various additional packages to get similar functionality in any LaTeX paper style like natbib.

Obligatory caveat: A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. – Ralph Waldo Emerson

On kids and social networking

Sunday’s New York Times has an article about cyber-bullying that’s currently #1 on their “most popular” list, so this is clearly a topic that many find close and interesting.

The NYT article focuses on schools’ central role in policing their students social behavior. While I’m all in favor of students being taught, particularly by older peer students, the importance of self-moderating their communications, schools face a fundamental quandary:

Nonetheless, administrators who decide they should help their cornered students often face daunting pragmatic and legal constraints.

“I have parents who thank me for getting involved,” said Mike Rafferty, the middle school principal in Old Saybrook, Conn., “and parents who say, ‘It didn’t happen on school property, stay out of my life.’ ”

Judges are flummoxed, too, as they wrestle with new questions about protections on student speech and school searches. Can a student be suspended for posting a video on YouTube that cruelly demeans another student? Can a principal search a cellphone, much like a locker or a backpack?

It’s unclear. These issues have begun their slow climb through state and federal courts, but so far, rulings have been contradictory, and much is still to be determined.

Here’s one example that really bothers me:

A few families have successfully sued schools for failing to protect their children from bullies. But when the Beverly Vista School in Beverly Hills, Calif., disciplined Evan S. Cohen’s eighth-grade daughter for cyberbullying, he took on the school district.

After school one day in May 2008, Mr. Cohen’s daughter, known in court papers as J. C., videotaped friends at a cafe, egging them on as they laughed and made mean-spirited, sexual comments about another eighth-grade girl, C. C., calling her “ugly,” “spoiled,” a “brat” and a “slut.”

J. C. posted the video on YouTube. The next day, the school suspended her for two days.

“What incensed me,” said Mr. Cohen, a music industry lawyer in Los Angeles, “was that these people were going to suspend my daughter for something that happened outside of school.” On behalf of his daughter, he sued.

If schools don’t have the authority to discipline J. C., as the court apparently ruled, and her father is more interested in defending her than disciplining her for clearly inappropriate behavior, then can we find some other solution?

Of course, there’s nothing new about bullying among the early-teenage set. I will refrain from dredging such stories from my own pre-Internet pre-SMS childhood, but there’s no question that these kids are at an important stage of their lives, where they’re still learning important and essential concepts, like how to relate to their peers and the importance (or lack thereof) of their peers’ approval, much less understanding where to draw boundaries between their public self and their private feelings. It’s certainly important for us, the responsible adults of the world, to recognize that nothing we can say or do will change the fundamentally social awkwardness of this age. There will never be an ironclad solution that eliminates kids bullying, taunting, or otherwise hurting one other.

Given all that, the rise of electronic communications (whether SMS text messaging, Facebook, email, or whatever else) changes the game in one very important way. It increases the velocity of communications. Every kid now has a megaphone for reaching their peers, whether directly through a Facebook posting that can reach hundreds of friends at once or indirectly through the viral spread of embarrassing gossip from friend to friend, and that speed can cause salacious information to get around well before any traditional mechanisms (parental, school administrative, or otherwise) can clamp down and assert some measure of sanity. For possibly the ultimate example of this, see a possibly fictitious yet nonetheless illustrative girl’s written hookup list posted by her brother as a form of revenge against her ratting out his hidden stash of beer. Needless to say, in one fell swoop, this girl’s life got turned upside down with no obvious way to repair the social damage.

Alright, we invented this social networking mess. Can we fix it?

The only mechanism I feel is completely inappropriate is this:

But Deb Socia, the principal at Lilla G. Frederick Pilot Middle School in Dorchester, Mass., takes a no-nonsense approach. The school gives each student a laptop to work on. But the students’ expectation of privacy is greatly diminished.

“I regularly scan every computer in the building,” Ms. Socia said. “They know I’m watching. They’re using the cameras on their laptops to check their hair and I send them a message and say: ‘You look great! Now go back to work.’ It’s a powerful way to teach kids: ‘I’m paying attention, you need to do what’s right.’ ”

Not only do I object to the Big Brother aspect of this (do schools still have 1984 on their reading lists?), but turning every laptop into a surveillance device is a hugely tempting target for a variety of bad actors. Kids need and deserve some measure of privacy, at least to the extent that schools already give kids a measure of privacy against arbitrary and unjustified search and seizure.

Surveillance is widely considered to be more acceptable when it’s being done by parents, who might insist they have their kids’ passwords in order to monitor them. Of course, kids of this age will reasonably want or need to have privacy from their parents as well (e.g., we don’t want to create conditions where victims of child abuse can be easily locked down by their family).

We could try to invent technical means to slow down the velocity of kids’ communications, which could mean adding delays as a function of the fanout of a message, or even giving viewers of any given message a kill switch over it, that could reach back and nuke earlier, forwarded copies to other parties. Of course, such mechanisms could be easily abused. Furthermore, if Facebook were to voluntarily create such a mechanism, kids might well migrate to other services that lack the mechanism. If we legislate that children of a certain age must have technically-imposed communication limits across the board (e.g., limited numbers of SMS messages per day), then we could easily get into a world where a kid who hits a daily quota cannot communicate in an unexpectedly urgent situation (e.g., when stuck at an alcoholic party and needing a sober ride home).

Absent any reasonable technical solution, the proper answer is probably to restrict our kids’ access to social media until we think they’re mature enough to handle it, to make sure that we, the parents, educate them about the proper etiquette, and that we take responsibility for disciplining our kids when they misbehave.