November 21, 2024

Archives for 2009

Debugging the Zune Blackout

On December 31, some models of the Zune, Microsoft’s portable music player, went dark. The devices were unusable until the following day. Failures like this are sometimes caused by complex chains of mishaps, but this particular one is due to a single programming error that is reasonably easy to understand. Let’s take a look.

Here is the offending code (reformatted slightly), in the part of the Zune’s software that handles dates and times:

year = 1980;

while (days > 365) {
    if (IsLeapYear(year))  {
        if (days > 366)  {
            days -= 366;
            year += 1;
        }
     } else {
        days -= 365;
        year += 1;
    }
}

At the beginning of this code, the variable days is the number of days that have elapsed since January 1, 1980. Given this information, the code is supposed to figure out (a) what year it is, and (b) how many days have elapsed since January 1 of the current year. (Footnote for pedants: here “elapsed since” actually means “elapsed including”, so that days=1 on January 1, 1980.)

On December 31, 2008, days was equal to 10592. That is, 10592 days had passed since January 1, 1980. It follows that 10226 days had passed since January 1, 1981. (Why? Because there were 366 days in 1980, and 10592 minus 366 is 10226.) Applying the same logic repeatedly, we can figure out how many days had passed since January 1 of each subsequent year. We can stop doing this when the number of remaining days is less than a year — then we’ll know which year it is, and which day within that year.

This is the method used by the Zune code quoted above. The code keeps two variables, days and year, and it maintains the rule that days days have passed since January 1 of year. The procedure continues as long as there are more than 365 days remaining (“while (days > 365)“). If the current year is a leap year (“if (IsLeapYear(year))“), it subtracts 366 from days and adds one to year; otherwise it subtracts 365 from days and adds one to year.

On December 31, 2008, starting with days=10592 and years=1980, the code would eventually reach the point where days=366 and year=2008, which means (correctly) that 366 days had elapsed since January 1, 2008. To put it another way, it was the 366th day of 2008.

This is where things went horribly wrong. The code decided it wasn’t time to stop yet, because days was more than 365. (“while (days > 365)”) It then asked whether year was a leap year, concluding correctly that 2008 was a leap year. (“if (IsLeapYear(year))”) It next determined that days was not greater than 366 (“if (days > 366)“), so that no arithmetic should be performed. The code had gotten stuck: it couldn’t stop, because days was greater than 365, but it couldn’t make progress, because days was not greater than 366. This section of code would keep running forever — leaving the Zune seemingly dead in the water.

The only way out of this mess was to wait until the next day, when the computation would go differently. Fortunately, the same problem would not occur again until December 31, 2012 (the last day of the next leap year), and Microsoft has ample time to patch the Zune code by then.

What lessons can we learn from this? First, even seemingly simple computations can be hard to get right. Microsoft’s quality control process, which is pretty good by industry standards, failed to catch the problem in this simple code. How many more errors like this are lurking in popular software products? Second, errors in seemingly harmless parts of a program can have serious consequences. Here, a problem computing dates caused the entire system to be unusable for a day.

This story might help to illustrate why experienced engineers assume that any large software program will contain errors, and why they distrust anyone who claims otherwise. Getting a big program to run at all is an impressive feat of engineering. Making it error-free is too much to hope for. For the foreseeable future, software errors will be a fact of life.

[Hat tip: “itsnotabigtruck” at ZuneBoards.]

Predictions for 2009

Here are our predictions for 2009. These are based on input from Andrew Appel, Joe Calandrino, Will Clarkson, Ari Feldman, Ed Felten, Alex Halderman, Joseph Lorenzo Hall, Tim Lee, Paul Ohm, David Robinson, Dan Wallach, Harlan Yu, and Bill Zeller. Please note that individual contributors (including me) don’t necessarily agree with all of these predictions.

(1) DRM technology will still fail to prevent widespread infringement. In a related development, pigs will still fail to fly.

(2) Patent reform legislation will come closer to passage in this Congress, but will ultimately fail as policymakers wait to determine the impact of the Bilski case’s apparent narrowing of business model patentability.

(3) As lawful downloading of music and movies continues to grow, consumer satisfaction with lossy formats will decline, and higher-priced options that offer higher fidelity will begin to predominate. At least one major online music service will begin to offer music in a lossless format.

(4) The RIAA’s “graduated response” initiative will sputter and die because ISPs are unwilling to cut off users based on unrebutted accusations. Lawsuits against individual end-user infringers will quietly continue.

(5) The DOJ will bring criminal actions against big-time individual copyright infringers based on data culled from the server logs of a large “private” BitTorrent community.

(6) Questions over the enforceability of free / open source software licenses will move closer to resolution.

(7) NebuAd and the regional ISPs recently sued for deploying NebuAd’s advertising system will settle with the class action plantiffs for an undisclosed sum. At least in part because of the lawsuit and settlement, no U.S. ISP will deploy a new NebuAd/Phorm-like system in 2009. Meanwhile, Phorm will continue to be successful with privacy regulators in the UK and will sign up reluctant ISPs there who are facing competitive pressure. Activists will raise strong objections to no avail.

(8) The federal Court of Appeals for the Ninth Circuit will hear oral argument in the case of U.S. v. Lori Drew, the Megan Meier/MySpace prosecution. By year’s end, the Ninth Circuit panel still will not have issued a decision, although after oral argument, the pundits will predict a 3-0 or 2-1 reversal of the conviction.

(9) As a result of the jury’s guilty verdict in U.S. v. Lori Drew, dozens of plaintiffs will file civil lawsuits in 2009 alleging violations of the federal Computer Fraud and Abuse Act premised on the theory that one can “exceed authorized access” or act “in excess of authorization” by violating Terms of Service. Thankfully, the Department of Justice won’t bring any other criminal cases premised on this theory, at least not until it sees how the Ninth Circuit rules.

(10) The Computer Fraud and Abuse Act (CFAA) will be the new DMCA. Many will argue that the law needs to be reformed, but this argument will struggle to gain traction with the lay public, notwithstanding the fact that lay users face potential liability for routine behaviors due to CFAA overbreadth.

(11) An academic security researcher will face prosecution under the CFAA, anti wire tapping laws, or other computer intrusion statutes for violations that occurred in the process of research.

(12) An affirmative action lawsuit will be filed against a university, challenging the use of a software algorithm used in evaluating applicants.

(13) There will be lots of talk about net neutrality but no new legislation, as everyone waits to see how the Comcast/BitTorrent issue plays out in the courts.

(14) The Obama administration will bring an atmosphere of antitrust enforcement to the IT industry, but no major cases will be brought in 2009.

(15) The new administration will be seen as trying to “reboot” the FCC.

(16) One of the major American voting system manufacturers (Diebold/Premier, Sequoia, ES&S, or Hart InterCivic) will go out of business or be absorbed into one of its rivals.

(17) The federal voting machine certification regime will increasingly be seen as a failure. States will strengthen their own certification processes, and at least one major state will stop requiring federal certification. The failure of the federal process to certify systems or software patches in a timely fashion will be cited as a reason for this move.

(18) Estonia and other countries will continue experimenting in real elections with online or mobile phone voting. They will claim that these trials are successful because “nothing went wrong.” Security analysts will continue to claim that these systems are fundamentally flawed and will continue to be ignored. Exactly the same thing will continue to happen with U.S. overseas and military voters.

(19) We’ll see the first clear-cut evidence of a malicious attack on a voting system fielded in a state or local election. This attack will exploit known flaws in a “toe in the water” test and vendors will say they fixed the flaw years ago and the new version is in the certification pipeline.

(20) U.S. federal government computers will suffer from at least one high-profile compromise by a foreign entity, leaking a substantial amount of classified or highly sensitive information abroad.

(21) There will be one or more major Internet outages attributed to attacks on DNS, BGP, or other Internet plumbing that is immediately labeled an act of “cyber-warfare” or “cyber-terrorism.” The actual cause will be found to be the action of spammers or other professional Internet miscreants.

(22) Present flaws in the web’s Certification Authority process, such as the MD5 issue or the leniency of some CAs in issuing certificates, will lead to regulation of the CA process. Among other things, there will be calls for restrictions on which CAs can issue certs for which Top Level Domains.

(23) One or more major Internet services or top-tier network providers will experience prolonged failures and/or unrecoverable data severe enough that the company’s president ends up testifying before Congress about it.

(24) Shortly after the start of the new administration, the TSA will quietly phase out the ban on flying with liquids or stop enforcing it in practice. The color-coded national caution levels (which have remained at “orange” forever) will be phased out.

(25) All 20 of the top 20 U.S. newspapers by circulation will experience net reductions in their newsroom headcounts in 2009. At least 15 of the 20 will see weekday circulation decline by 15% or more over the course of the year. By the end of the year, at least one major U.S. city will lack a daily newspaper.

(26) Advertising spending in older media will plummet, but online ad spending will be roughly level, as advertisers warm to online ads whose performance is more easily measured. Traditional media will be forced to offer advertisers fire sale prices, and the ratio of content to advertising in many traditional media outlets will increase.

(27) An embarrassing leak of personal data will emerge from one or more of the social networking firms (e.g., Facebook), leading Congress to consider legislation that probably won’t solve the problem and will never actually reach the floor for a vote.

(28) Facebook will be sold for $4 billion and Mark Zuckerberg will step down as CEO.

(29) Web 2.0 startups will not be hammered by the economic downtown. In fact, web 2.0 innovation may prove to be countercyclical. Costs are controllable: today’s workstyles don’t require lavish office space, marketing can be viral, and pay-as-you-go computing services eliminate the need for big upfront investments in infrastructure. Laid off big-company workers and refugees from the financial world will keep skilled wages low. The surge in innovation will be real, but its effects will mostly be felt in future years.

(30) The Blu-ray format will increasingly be seen as a failure as customers rely more on online streaming.

(31) Emboldened by Viacom’s example against Time Warner, TV network owners will increasingly demand higher payments from cable companies with the threat of moving content online instead. Cable companies will attempt to more heavily limit the content that network owners can host on Hulu and other sites.

(32) The present proliferation of incompatible set-top boxes that aim to connect your TV to the Internet will lead to the establishment of a huge industry consortium with players from three major interest groups (box builders, content providers, software providers), reminiscent of the now-defunct SDMI consortium, and with many of the same members. In 2009, they will generate a variety of press releases but will accomplish nothing.

(33) A hot Christmas item will be a cheap set-top box that allows normal people to download, organize, and view video and audio podcasts in their own living rooms. This product will work with all of the major free online sources of audio and video, and a few of the paid sources.

(34) Internet Explorer’s usage share will fall below 50 percent for the first time in a decade, spurred by continued growth of Firefox and Safari and deals with OEMs to pre-load Google Chrome.

(35) Somebody besides Apple will sell an iPod clone that’s a drop-in replacement for a real iPod, complete with support for iTunes DRM, video playback, and so forth. Apple will sue (or threaten to sue), but won’t be able to stop distribution of this product.

(36) Apple will release a netbook, which will be a souped-up iPhone with an 8″ screen and folding keyboard. It will sell for $899.

(37) No white space devices will be approved for use by the FCC. Submitted spectrum sensing devices will fare well in both laboratory and field tests, but approval will be delayed politically by the anti-white space lobby.

(38) More and more Internet traffic will encrypted, as concern grows about eavesdropping, content modification, filtering, and security attacks.

Feel free to offer your own predictions in the comments.

2008 Predictions Scorecard

As usual, we’ll kick off the new year by reviewing the predictions we made for the previous year. Here now, our 2008 predictions, in italics, with hindsight in ordinary type.

(1) DRM technology will still fail to prevent widespread infringement. In a related development, pigs will still fail to fly.

We predict this every year, and it’s always right. This prediction is so obvious that it’s almost unfair to count it. Verdict: right.

(2) Copyright issues will still be gridlocked in Congress.

We could predict this every year, and it would almost always be right. History teaches that it usually takes a long time to build consensus for any copyright changes. Verdict: right.

(3) No patent reform bill will be passed. Baby steps toward a deal between the infotech and biotech industries won’t lead anywhere.

Verdict: right.

(4) DRM-free sales will become standard in the music business. The movie studios will flirt with the idea of DRM-free sales but won’t take the plunge, yet.

This was basically right. DRM-free music sales are much more common than before. Whether they’re “standard” is a matter for debate. The movie studios haven’t followed the record industry, yet. Verdict: mostly right.

(5) The 2008 elections will not see an e-voting meltdown of Florida 2000 proportions, but a bevy of smaller problems will be reported, further fueling the trend toward reform.

As predicted, there was no meltdown but we did see a bevy of smaller problems. Whether this fueled the trend toward reform is debatable. The problems that did occur tended to be ignored because the presidential election wasn’t close. Verdict: mostly right.

(6) E-voting lawsuits will abound, with voters suing officials, officials suing other officials, and officials suing vendors (or vice versa).

There were some lawsuits, but they didn’t “abound”. Verdict: mostly wrong.

(7) Second Life will jump the shark and the cool kids will start moving elsewhere; but virtual worlds generally will lumber on.

Second Life seems to have lost its cool factor, but then so have virtual worlds generally. Still, they’re lumbering on. Verdict: mostly right.

(8) MySpace will begin its long decline, losing customers for the first time.

I haven’t seen data to confirm or refute this one. (Here’s one source.) Comscore said that Facebook passed MySpace in user share, but that doesn’t imply that MySpace decreased. Verdict: unknown.

(9) The trend toward open cellular data networks will continue, but not as quickly as optimists had hoped.

This one is hard to call. The growth of Android and iPhone unlocking would seem to be steps toward open cellular data networks, but the movement has not been rapid. Verdict: mostly right.

(10) If a Democrat wins the White House, we’ll hear talk about reinvigorated antitrust enforcement in the tech industries. (But of course it will all be talk, as the new administration won’t take office until 2009.)

Verdict: right.

(11) A Facebook application will cause a big privacy to-do.

There were Facebook privacy issues, but mostly about non-application issues. Overall, interest in Facebook apps declined during the year. Verdict: mostly wrong.

(12) There will be calls for legislation to create a sort of Web 2.0 user’s bill of rights, giving users rights to access and extract information held by sites; but no action will be taken.

Verdict: right.

(13) An epidemic of news stories about teenage webcam exhibitionism will lead to calls for regulation.

Verdict: wrong.

(14) Somebody will get Skype or a similar VoIP client running on an Apple iPhone and it will, at least initially, operate over AT&T’s cellular phone network. AT&T and/or Apple will go out of their way to break this, either by filtering the network traffic or by locking down the iPhone.

Various VoIP clients did run on the iPhone. Apple said they would allow this over conventional WiFi networks but intended to prevent it on the cellular network, presumably by banning from the iPhone App Store any application that provided VoIP on the cell network. Verdict: right.

Our final scorecard: six right, four mostly right, two mostly wrong, one wrong, one unknown.

Stay tuned for our 2009 predictions.

More Privacy, Bit by Bit

Before the Holidays, Yahoo got a flurry of good press for the announcement that it would (as the LA Times puts it) “purge user data after 90 days.” My eagle-eyed friend Julian Sanchez noticed that the “purge” was less complete than privacy advocates might have hoped. It turns out that Yahoo won’t be deleting the contents of its search logs. Rather, it will merely be zeroing out the last 8 bits of users’ IP addresses. Julian is not impressed:

dropping the last byte of an IP address just means you’ve narrowed your search space down to (at most) 256 possibilities rather than a unique machine. By that standard, this post is anonymous, because I guarantee there are more than 255 other guys out there with the name “Julian Sanchez.”

The first three bytes, in the majority of cases, are still going to be enough to give you a service provider and a rough location. Assuming every address in the range is in use, dropping the least-significant byte just obscures which of the 256 users at that particular provider is behind each query. In practice, though, the search space is going to be smaller than that, because people are creatures of habit: You’re really working with the pool of users in that range who perform searches on Yahoo. If your not-yet-anonymized logs show, say, 45 IP addreses that match those first three bytes making routine searches on Yahoo (17.6% of the search market x 256 = 45) you can probably safely assume that an “anonymized” IP with the same three leading bytes is one of those 45. If different users tend to exhibit different usage patterns in search time, clustering of queries, expertise with Boolean operators, or preferred natural language, you can narrow it down further.

I think this isn’t quite fair to Yahoo. Dropping the last eight bits of the IP address certainly doesn’t protect privacy as much as deleting log entries entirely, but it’s far from useless. To start with, there’s often not a one-to-one correspondence between IP addresses and Internet users. Often a single user has multiple IPs. For example, when I connect to the Princeton wireless network, I’m dynamically assigned an IP address that may not be the same as the IP address I used the last time I logged on. I also access the web from my iPhone and from hotels and coffee shops when I travel. Conversely, several users on a given network may be sharing a single IP address using a technology called network address translation. So even if you know the IP address of the user who performed a particular search, that may simply tell you that the user works for a particular company or connected from a particular coffee shop. Hence, tracking a particular user’s online activities is already something of a challenge, and it becomes that much harder if several dozen users’ online activities are scrambled together in Yahoo!’s logs.

Now, whether this is “enough” privacy depends a lot on what kind of privacy problem you’re worried about. It seems to me that there are three broad categories of privacy concerns:

  • Privacy violations by Yahoo or its partners: Some people are worried that Yahoo itself is tracking their online activities, building an online profile about them, and selling this information to third parties. Obviously, Yahoo’s new policy will do little to allay such concerns. Indeed, as David Kravets points out, Yahoo will have already squeezed all the personal information it can out of those logs before it scours them. If you don’t trust Yahoo or its business partners, this move isn’t going to make you feel very much safer.
  • Data breaches: A second concern involves cases where customer data falls into the wrong hands due to a security breach. In this case, it’s not clear that search engine logs are especially useful to data thieves in the first place. Data thieves are typically looking for information such as credit card and Social Security numbers that can make them a quick buck. People rarely type such information into search boxes. Some searches may be embarrassing to users, but they probably won’t be so embarrassing as to enable blackmail or extortion. So search logs are not likely to be that useful to criminals, whether or not they are “anonymized.”
  • Court-ordered information release: This is the case where the new policy could have the biggest effect. Consider, for example, a case where the police seek a suspect’s search results. The new policy will help protect privacy in three ways: first, if Yahoo! can’t cleanly filter search logs by IP address, judges may be more reluctant to order the disclosure of several dozen users’ search results just to give police information from a single suspect. Second, scrubbing the last byte of the IP address will make searching through the data much more difficult. Finally, the resulting data will be less useful in the court of law, because prosecutors will need to convince a jury that a given search was performed by the defendant rather than another user who happened to have a similar IP address. At the margin, then, Yahoo’s new policy seems likely to significantly enhance user privacy against government information requests. The same principle applies in the case of civil suits: the recording and movie industries, for example, will have a harder time using Yahoo!’s search logs as evidence that a user was engaged in illegal file-sharing.

So based on the small amount of information Yahoo has made available, it seems that the new policy is a real, if small, improvement in users’ privacy. However, it’s hard to draw any definite conclusions without more specific information about what information Yahoo! is saving. Because anonymizing data is a lot harder than people think. AOL learned this the hard way in 2006 when “anonymized” search results were released to researchers. People quickly noticed that you could figure out who various users were by looking at the contents of their searches. The data wasn’t so anonymous after all.

One reason AOL’s data wasn’t so anonymous is that AOL had “anonymized” the data set by assigning each user a unique ID. That meant people could look at all searches made by a single user and find searches that gave clues to the user’s identity. Had AOL instead stripped off the user information without replacing it, it would have been much harder to de-anonymize the data because there would be no way to match up different searches by the same user. If Yahoo’s logs include information linking each user’s various searches together, then even deleting the IP address entirely probably won’t be enough to safeguard user privacy. On the other hand, if the only user-identifying information is the IP address, then stripping off the low byte of the IP address is a real, if modest, privacy enhancement.

Taking Advantage of Citizen Contrarians

In my last post, I argued that sifting through citizens’ questions for the President is a job best done outside of government. More broadly, there’s a class of input that is good for government to receive, but that probably won’t be welcome at the staff level, where moment-to-moment success is more of a concern than long-term institutional thriving. Tough questions from citizens are in this category. So is unexpected, challenging or contrarian citizen advice or policy input. A flood of messages that tell the President “I’m in favor of what you already plan to do,” perhaps leavened with a sprinkling of “I respectfully disagree, but still like you anyway,” would make for great PR, and better yet, since such messages don’t offer action guiding advice, they don’t actually drive any change whatsoever in what anyone in government—from the West Wing to the furthest corners of the executive branch—does.

Will the new administration set things up to run this way? I don’t know. Certainly, the cookie-cutter blandness of their responses to the first round of online citizen questions is not a promising sign. There’s no question that Obama himself sees some value in real, tough questions that come from the masses. But the immediate practical advantages of a choir that echoes the preacher may be a much more attractive prospect for his staff then the scrambling, search, and actual policy rethought that might have to follow tough questions or unexpected advice.

This outcome would be a lost opportunity precisely because there are pockets of untapped expertise, uncommon wisdom, and bright ideas out there. Surfacing these insights—the inputs that weren’t already going to be incorporated into the policy process, the thoughts that weren’t talking points during the campaign, the things we didn’t already know—is precisely what the new collaborative technologies have made possible.

On the other hand, in order for this to work, we need to be able to regard (at least some of) the surprising, unexpected or quirky citizen inputs as successes for the system that attracted them, rather than failures. We can already find out what the median voter thinks, without all these fancy new systems, and in any case, his or her opinion is unlikely to add new or unexpected value to the policy process.

Obamacto.org, a potential model for external sites that gather citizen input for government, has a leaderboard of suggested priorities for the new CTO, voted up by visitors to the site. The first three suggestions are net neutrality regulation, Patriot Act repeal and DMCA repeal—unsurprising major issues. Arguably, if enough people took part in the online voting, there would be some value in knowing how the online group had prioritized these familiar requests. But with the fourth item, things get interesting: it reads “complete the job on metrication that Ronald Reagan defunded.”

On the one hand, my first reaction to this is to laugh: Regardless of whether or not moving to the metric system would be a good idea, it’s something that doesn’t have nearly the political support today that would be needed in order for it to be a plausible priority for Obama’s CTO. Put another way, there’s no chance that persuading America to do this is the best use of the new administration’s political capital.

On the other hand, maybe that’s what these sorts of online fora are for: Changing which issues are on the table, and how we think about them. The netroots turned net neutrality into a mainstream political issue, and for all I know they (or some other constellation of political forces) could one day do the same for the drive to go metric.

Readers, commenters: What do you think? Are quirky inputs like the suggestion that Obama’s CTO focus on metrication a hopeful sign for the value new deliberative technologies can add in the political process? Or, are they a sign that we haven’t figured out how these technologies should work or how to use them?