August 31, 2016

Archives for November 2008


Economic Growth, Censorship, and Search Engines

Economic growth depends on an ability to access relevant information. Although censorship prevents access to certain information, the direct consequences of censorship are well-known and somewhat predictable. For example, blocking access to Falun Gong literature is unlikely to harm a country’s consumer electronics industry. On the web, however, information of all types is interconnected. Blocking a web page might have an indirect impact reaching well beyond that page’s contents. To understand this impact, let’s consider how search results are affected by censorship.

Search engines keep track of what’s available on the web and suggest useful pages to users. No comprehensive list of web pages exists, so search providers check known pages for links to unknown neighbors. If a government blocks a page, all links from the page to its neighbors are lost. Unless detours exist to the page’s unknown neighbors, those neighbors become unreachable and remain unknown. These unknown pages can’t appear in search results — even if their contents are uncontroversial.

When presented with a query, search engines respond with relevant known pages sorted by expected usefulness. Censorship also affects this sorting process. In predicting usefulness, search engines consider both the contents of pages and the links between pages. Links here are like friendships in a stereotypical high school popularity contest: the more popular friends you have, the more popular you become. If your friend moves away, you become less popular, which makes your friends less popular by association, and so on. Even people you’ve never met might be affected.

“Popular” web pages tend to appear higher in search results. Censoring a page distorts this popularity contest and can change the order of even unrelated results. As more pages are blocked, the censored view of the web becomes increasingly distorted. As an aside, Ed notes that blocking a page removes more than just the offending material. If censors block Ed’s site due to an off-hand comment on Falun Gong, he also loses any influence he has on information security.

These effects would typically be rare and have a disproportionately small impact on popular pages. Google’s emphasis on the long tail, however, suggests that considerable value lies in providing high-quality results covering even less-popular pages. To avoid these issues, a government could allow limited individuals full web access to develop tools like search engines. This approach seems likely to stifle competition and innovation.

Countries with greater censorship might produce lower-quality search engines, but Google, Yahoo, Microsoft, and others can provide high-quality search results in those countries. These companies can access uncensored data, mitigating the indirect effects of censorship. This emphasizes the significance of measures like the Global Network Initiative, which has a participant list that includes Google, Yahoo, and Microsoft. Among other things, the initiative provides guidelines for participants regarding when and how information access may be restricted. The effectiveness of this specific initiative remains to be seen, but such measures may provide leading search engines with greater leverage to resist arbitrary censorship.

Search engines are unlikely to be the only tools adversely impacted by the indirect effects of censorship. Any tool that relies on links between information (think social networks) might be affected, and repressive states place themselves at a competitive disadvantage in developing these tools. Future developments might make these points moot: in a recent talk at the Center, Ethan Zuckerman mentioned tricks and trends that might make censorship more difficult. In the meantime, however, governments that censor information may increasingly find that they do so at their own expense.


Does Your House Need a Tail?

Thus far, the debate over broadband deployment has generally been between those who believe that private telecom incumbents should be in charge of planning, financing and building next-generation broadband infrastructure, and those who advocate a larger role for government in the deployment of broadband infrastructure. These proposals include municipal-owned networks and a variety of subsidies and mandates at the federal level for incumbents to deploy faster broadband.

Tim Wu and Derek Slater have a great new paper out that approaches the problem from a different perspective: that broadband deployments could be planned and financed not by government or private industry, but by consumers themselves. That might sound like a crazy idea at first blush, but Wu and Slater do a great job of explaining how it might work. The key idea is “condominium fiber,” an arrangement in which a number of neighboring households pool their resources to install fiber to all the homes in their neighborhoods. Once constructed, each home would own its own fiber strand, while the shared costs of maintaining the “trunk” cable from the individual homes to a central switching location would be managed in the same way that condominium and homeowners’ associations currently manage the shared areas of condos and gated communities. Indeed, in many cases the developer of a new condominium tower or planned community could lay fiber along with water and power lines, and the fiber would be just one of the shared resources that would be managed collectively by the homeowners.

If that sounds strange, it’s important to remember that there are plenty of examples where things that were formerly rented became owned. For example, fifty years ago in the United States no one owned a telephone. The phone was owned by Ma Bell and if yours broke they’d come and install a new one. But that changed, and now people own their phones and the wiring inside their homes, with your phone company owning the cable outside the home. One way to think about Slater and Wu’s “homes with tails” concept is that it’s just shifting that line of demarcation again. Under their proposal, you’d own the wiring inside your home and the line from you to your broadband provider.

Why would someone want to do such a thing? The biggest advantage, from my perspective, is that it could solve the thorny problem of limited competition in the “last mile” of broadband deployment. Right now, most customers have two options for high-speed Internet access. Getting more options using the traditional, centralized investment model is going to be extremely difficult because it costs a lot to deploy new infrastructure all the way to customers’ homes. But if customers “brought their own” fiber, then the barrier to entry would be much lower. New providers would simply need to bring a single strand of fiber to a neighborhood’s centralized point of presence in order to offer service to all customers in that neighborhood. So it would be much easier to imagine a world in which customers had numerous options to choose from.

The challenge is solving the chicken-and-egg problem: customer owned fiber won’t be attractive until there are several providers to choose from, but it doesn’t make sense for new firms to enter this market until there are a significant number of neighborhoods with customer-owned fiber. Wu and Slater suggest several ways this chicken-and-egg problem might be overcome, but I think it will remain a formidable challenge. My guess is that at least at the outset, the customer-owned model will work best in new residential construction projects, where the costs of deploying fiber will be very low (because they’ll already be digging trenches for power and water).

But the beauty of their model is that unlike a lot of other plans to encourage broadband deployment, this isn’t an all-or-nothing choice. We don’t have to convince an entire nation, state, or even city to sign onto a concept like this. All you need is a neighborhood with a few dozen early-adopting consumers and an ISP willing to serve them. Virtually every cutting-edge technology is taken up by a small number of early adopters (who pay high prices for the privilege of being the first with a new technology) before it spreads to the general public, and the same model is likely to apply to customer-owned fiber. If the concept is viable, someone will figure out how to make it work, and their example will be duplicated elsewhere. So I don’t know if customer-owned fiber is the wave of the future, but I do hope that people start experimenting with it.

You can check out their paper here. You can also check out an article I wrote for Ars Technica this summer that is based on conversations with Slater, Wu, and other pioneers in this area.


Discerning Voter Intent in the Minnesota Recount

Minnesota election officials are hand-counting millions of ballots, as they perform a full recount in the ultra-close Senate race between Norm Coleman and Al Franken. Minnesota Public Radio offers a fascinating gallery of ballots that generated disputes about voter intent.

A good example is this one:

A scanning machine would see the Coleman and Franken bubbles both filled, and call this ballot an overvote. But this might be a Franken vote, if the voter filled in both slots by mistake, then wrote “No” next to Coleman’s name.

Other cases are more difficult, like this one:

Do we call this an overvote, because two bubbles are filled? Or do we give the vote to Coleman, because his bubble was filled in more completely?

Then there’s this ballot, which is destined to be famous if the recount descends into ligitation:

[Insert your own joke here.]

This one raises yet another issue:

Here the problem is the fingerprint on the ballot. Election laws prohibit voters from putting distinguishing marks on their ballots, and marked ballots are declared invalid, for good reason: uniquely marked ballots can be identified later, allowing a criminal to pay the voter for voting “correctly” or punish him for voting “incorrectly”. Is the fingerprint here an identifying mark? And if so, how can you reject this ballot and accept the distinctive “Lizard People” ballot?

Many e-voting experts advocate optical-scan voting. The ballots above illustrate one argument against opscan: filling in the ballot is a free-form activity that can create ambiguous or identifiable ballots. This creates a problem in super-close elections, because ambiguous ballots may make it impossible to agree on who should have won the election.

Wearing my pure-scientist hat (which I still own, though it sometimes gets dusty), this is unsurprising: an election is a measurement process, and all measurement processes have built-in errors that can make the result uncertain. This is easily dealt with, by saying something like this: Candidate A won by 73 votes, plus or minus a 95% confidence interval of 281 votes. Or perhaps this: Candidate A won with 57% probability. Problem solved!

In the real world, of course, we need to declare exactly one candidate to be the winner, and a lot can be at stake in the decision. If the evidence is truly ambiguous, somebody is going to end up feeling cheated, and the most we can hope for is a sense that the rules were properly followed in determining the outcome.

Still, we need to keep this in perspective. By all reports, the number of ambiguous ballots in Minnesota is miniscule, compared to the total number cast in Minnesota. Let’s hope that, even if some individual ballots don’t speak clearly, the ballots taken collectively leave no doubt as to the winner.


Low Hit Rate Isn't the Problem with TSA Screening

The TSA, which oversees U.S. airport security, comes in for a lot of criticism — much of it deserved. But sometimes commentators let their dislike for the TSA get the better of them, and they offer critiques that don’t stand up logically.

A good example is yesterday’s USA Today article on TSA’s behavioral screening program, and the commentary that followed it. The TSA program trained screeners to look for nervous and suspicious behavior, and to subject travellers exhibiting such behavior to more stringent security measures such as pat-down searches or short interviews.

Commentators condemned the TSA program because fewer than 1% of the selected travellers were ultimately arrested. Is this a sensible objection? I think not, for reasons I’ll explain below.

Before I explain why, let’s take a minute to set aside our general opinions about the TSA. Forget the mandatory shoe removal and toiletry-container nitpicking. Forget that time the screener was rude to you. Forget the slippery answers to inconvenient Constitutional questions. Forget the hours you have spent waiting in line. Put on your blinders please, just for now. We’ll take them off later.

Now suppose that TSA head Kip Hawley came to you and asked you to submit voluntarily to a pat-down search the next time you travel. And suppose you knew, with complete certainty, that if you agreed to the search, this would magically give the TSA a 0.1% chance of stopping a deadly crime. You’d agree to the search, wouldn’t you? Any reasonable person would accept the search to save (by assumption) at least 0.001 lives. This hypothetical TSA program is reasonable, even though it only has a 0.1% arrest rate. (I’m assuming here that an attack would cost only one life. Attacks that killed more people would justify searches with an even smaller arrest rate.)

So the commentators’ critique is weak — but of course this doesn’t mean the TSA program should be seen as a success. The article says that the arrests the system generates are mostly for drug charges or carrying a false ID. Should a false-ID arrest be considered a success for the system? Certainly we don’t want to condone the use of false ID, but I’d bet most of these people are just trying to save money by flying on a ticket in another person’s name — which hardly makes them Public Enemy Number One. Is it really worth doing hundreds of searches to catch one such person? Are those searches really the best use of TSA screeners’ time? Probably not.

On the whole, I’m not sure I can say whether the behavioral screening program is a good idea. It apparently hasn’t caught any big fish yet, but it might have positive effects by deterring some serious crimes. We haven’t seen the data to support it, and we’ve learned to be skeptical of TSA claims that some security measure is necessary.

Now it’s time for the professor to call on one of the diehard civil libertarians in the class, who by this point are bouncing in their seats with both hands waving in the air. They’re dying to point out that our system, for good reason, doesn’t automatically accept claims by the authorities that searches or seizures are justified, and that our institutions are properly skeptical about expanding the scope of searches. They’re unhappy that the debate about this TSA program is happening after it was in place, rather than before it started. These are all good points.

The TSA’s behavioral screening is a rich topic for debate — but not because of its arrest rate.


Can Google Flu Trends Be Manipulated?

Last week researchers from Google and the Centers for Disease Control unveiled a cool new research result, showing that they could gauge the level of influenza infections in a region of the U.S. by seeing how often people in those regions did Google searches for certain terms related to the flu and flu symptoms. The search-based predictions correlate remarkably well with the medical data on flu rates — not everyone who searches for “cough medicine” has the flu, but enough do that an increase in flu cases correlates with an increase in searches for “cough medicine” and similar terms. The system is called Google Flu Trends.

Privacy groups have complained, but this use of search data seems benign — indeed, this level of flu detection requires only that search data be recorded per region, not per individual user. The legitimate privacy worry here is not about the flu project as it stands today but about other uses that Google or the government might find for search data later.

My concern today is whether Flu Trends can be manipulated. The system makes inferences from how people search, but people can change their search behavior. What if a person or a small group set out to convince Flu Trends that there was a flu outbreak this week?

An obvious approach would be for the conspirators to do lots of searches for likely flu-related terms, to inflate the count of flu-related searches. If all the searches came from a few computers, Flu Trends could presumably detect the anomalous pattern and the algorithm could reduce the influence of these few computers. Perhaps this is already being done; but I don’t think the research paper mentions it.

A more effective approach to spoofing Flu Trends would be to use a botnet — a large collection of hijacked computers — to send flu-related searches to Google from a larger number of computers. If the added searches were diffuse and well-randomized, they would be very hard to distinguish from legitimate searches, and the Flu Trends would probably be fooled.

This possibility is not discussed in the Flu Trends research paper. The paper conspicuously fails to identify any of the search terms that the system is looking for. Normally a paper would list the terms, or at least give examples, but none of the terms appear in the paper, and the Flu Trends web site gives only “flu” as an example search term. They might be withholding the search terms to make manipulation harder, but more likely they’re withholding the search terms for business reasons, perhaps because the terms have value in placing or selling ads.

Why would anyone want to manipulate Flu Trends? If flu rates affect the financial markets by moving the stock prices of certain drug or healthcare companies, then a manipulator can profit by sending false signals about flu rates.

The most interesting question about Flu Trends, though, is what other trends might be identifiable via search terms. Government might use similar methods to look for outbreaks of more virulent diseases, and businesses might look for cultural trends. In all of these cases, manipulation will be a risk.

There’s an interesting analogy to web linking behavior. When the web was young, people put links in their sites to point readers to other interesting sites. But when Google started inferring sites’ importance from their incoming links, manipulators started creating links for their Google-effect. The result was an ongoing cat-and-mouse game between search engines and manipulators. The more search behavior takes on commercial value, the more manipulators will want to change search behavior for commercial or cultural advantage.

Anything that is valuable to measure is probably, to someone, valuable to manipulate.


The future of photography

Several interesting things are happening in the wild world of digital photography as it’s colliding with digital video. Most notably, the new Canon 5D Mark II (roughly $2700) can record 1080p video and the new Nikon D90 (roughly $1000) can record 720p video. At the higher end, Red just announced some cameras that will ship next year that will be able to record full video (as fast as 120 frames per second in some cases) at far greater than HD resolutions (for $12K, you can record video at a staggering 6000×4000 pixels). You can configure a Red camera as a still camera or as a video camera.

Recently, well-known photographer Vincent Laforet (perhaps best known for his aerial photographs, such as “Me and My Human“) got his hands on a pre-production Canon 5D Mark II and filmed a “mock commercial” called “Reverie”, which shows off what the camera can do, particularly its see-in-the-dark low-light abilities. If you read Laforet’s blog, you’ll see that he’s quite excited, not just about the technical aspects of the camera, but about what this means to him as a professional photographer. Suddenly, he can leverage all of the expensive lenses that he already owns and capture professional-quality video “for free.” This has all kinds of ramifications for what it means to cover an event.

For example, at professional sporting events, video rights are entirely separate from the “normal” still photography rights given to the press. It’s now the case that every pro photographer is every bit as capable of capturing full resolution video as the TV crew covering the event. Will still photographers be contractually banned from using the video features of their cameras? Laforet investigated while he was shooting the Beijing Olympics:

Given that all of these rumours were going around quite a bit in Beijing [prior to the announcement of the Nikon D90 or Canon 5D Mark II] – I sat down with two very influential people who will each be involved at the next two Olympic Games. Given that NBC paid more than $900 million to acquire the U.S. Broadcasting rights to this past summer games, how would they feel about a still photographer showing up with a camera that can shoot HD video?

I got the following answer from the person who will be involved with Vancouver which I’ll paraphrase: Still photographers will be allowed in the venues with whatever camera they chose, and shoot whatever they want – shooting video in it of itself, is not a problem. HOWEVER – if the video is EVER published – the lawsuits will inevitably be filed, and credentials revoked etc.

This to me seems like the reasonable thing to do – and the correct approach. But the person I spoke with who will be involved in the London 2012 Olympic Games had a different view, again I paraphrase: “Those cameras will have to be banned. Period. They will never be allowed into any Olympic venue” because the broadcasters would have a COW if they did. And while I think this is not the best approach – I think it might unfortunately be the most realistic. Do you really think that the TV producers and rights-owners will “trust” photographers not to broadcast anything they’ve paid so much for. Unlikely.

Let’s do a thought experiment. Red’s forthcoming “Scarlet FF35 Mysterium Monstro” will happily capture 6000×4000 pixels at 30 frames per second. If you multiply that out, assuming 8 bits per pixel (after modest compression), you’re left with the somewhat staggering data rate of 720MB/s (i.e., 2.6TB/hour). Assuming you’re recording that to the latest 1.5TB hard drives, that means you’re swapping media every 30 minutes (or you’re tethered to a RAID box of some sort). Sure, your camera now weighs more and you’re carrying around a bunch of hard drives (still lost in the noise relative to the weight that a sports photographer hauls around in those long telephoto lenses), but you manage to completely eliminate the “oops, I missed the shot” issue that dogs any photographer. Instead, the “shoot” button evolves into more of a bookmarking function. “Yeah, I think something interesting happened around here.” It’s easy to see photo editors getting excited by this. Assuming you’ve got access to multiple photographers operating from different angles, you can now capture multiple views of the same event at the same time. With all of that data, synchronized and registered, you could even do 3D reconstructions (made famous/infamous by the “bullet time” videos used in the Matrix films or the Gap’s Khaki Swing commercial). Does the local newspaper have the rights to do that to an NFL game or not?

Of course, this sort of technology is going to trickle down to gear that mere mortals can afford. Rather than capturing every frame, maybe you now only keep a buffer of the last ten seconds or so, and when you press the “shoot” button, you get to capture the immediate past as well as the present. Assuming you’ve got a sensor that let’s you change the exposure on the fly, you can also now imagine a camera capturing a rapid succession of images at different exposures. That means no more worries about whether you over or under-exposed your image. In fact, the camera could just glue all the images together into a high-dynamic-range (HDR) image, which yields sometimes fantastic results.

One would expect, in the cutthroat world of consumer electronics, that competition would bring features like this to market as fast as possible, although that’s far from a given. If you install third-party firmware on a Canon point-and-shoot, you get all kinds of functionality that the hardware can support but which Canon has chosen not to implement. Maybe Canon would rather you spend more money for more features, even if the cheaper hardware is perfectly capable. Maybe they just want to make common feature easy to use and not overly clutter the UI. (Not that any camera vendors are doing particularly well on ease of use, but that’s a topic for another day.)

Freedom to Tinker readers will recognize some common themes here. Do I have the right to hack my own gear? How will new technology impact old business models? In the end, when industries collide, who wins? My fear is that the creative freelance photographer, like Laforet, is likely to get pushed out by the big corporate sponsor. Why allow individual freelancers to shoot a sports event when you can just spread professional video cameras all over the place and let newspapers buy stills from those video feeds? Laforet discussed these issues at length; his view is that “traditional” professional photography, as a career, is on its way out and the future is going to be very, very different. There will still be demand for the kind of creativity and skills that a good photographer can bring to the game, but the new rules of the game have yet to be written.


Total Election Awareness

Ed recently made a number of predictions about election day (“Election 2008: What Might Go Wrong”). In terms of long lines and voting machine problems, his predictions were pretty spot on.

On election day, I was one of a number of volunteers for the Election Protection Coalition at one of 25 call centers around the nation. Kim Zetter describes the OurVoteLive project, involving 100 non-profit organizations, ten thousand volunteers that answered 86,000 calls with a 750 line call-center operation (“U.S. Elections — It Takes a Village”):

The Election Protection Coalition, a network of more than 100 legal, voting rights and civil liberties groups was the force behind the 1-866-OUR-VOTE hotline, which provided legal experts to answer nearly 87,000 calls that came in over 750 phone lines on Election Day and dispatched experts to address problems in the field as they arose.

Pam Smith of the Verified Voting Foundation made sure each call center had a voting technologist responsible for responding to voting machine reports and advising mobile legal volunteers how to respond on the ground. It was simply a massive operation. Matt Zimmerman and Tim Jones of the Electronic Frontier Foundation and their team get serious props as developers and designers of the their Total Election Awareness (TEA) software behind OurVoteLive.

As Kim describes in the Wired article, the call data is all available in CSV, maps, tables, etc.: I just completed a preliminary qualitative analysis of the 1800 or so voting equipment incident reports: “A Preliminary Analysis of OVL Voting Equipment Reports”. Quite a bit of data in there with which to inform future efforts.


How Fragile Is the Internet?

With Barack Obama’s election, we’re likely to see a revival of the network neutrality debate. Thus far the popular debate over the issue has produced more heat than light. On one side have been people who scoff at the very idea of network neutrality, arguing either that network neutrality is a myth or that we’d be better off without it. On the other are people who believe the open Internet is hanging on by its fingernails. These advocates believe that unless Congress passes new regulations quickly, major network providers will transform the Internet into a closed network where only their preferred content and applications are available.

One assumption that seems to be shared by both sides in the debate is that the Internet’s end-to-end architecture is fragile. At times, advocates on both sides debate seem to think that AT&T, Verizon, and Comcast have big levers in their network closets labeled “network neutrality” that they will set to “off” if Congress doesn’t stop them. In a new study for the Cato Institute, I argue that this assumption is unrealistic. The Internet has the open architecture it has for good technical reasons. The end-to-end principle is deeply embedded in the Internet’s architecture, and there’s no straightforward way to change it without breaking existing Internet applications.

One reason is technical. Advocates of regulation point to a technology called deep packet inspection as a major threat to the Internet’s open architecture. DPI allows network owners to look “inside” Internet packets, reconstructing the web page, email, or other information as it comes across the wire. This is an impressive technology, but it’s also important to remember its limitations. DPI is inherently reactive and brittle. It requires human engineers to precisely describe each type of traffic that is to be blocked. That means that as the Internet grows ever more complex, more and more effort would be required to keep DPI’s filters up to date. It also means that configuration problems will lead to the accidental blocking of unrelated traffic.

The more fundamental reason is economic. The Internet works as well as it does precisely because it is decentralized. No organization on Earth has the manpower that would have been required to directly manage all of the content and applications on the Internet. Networks like AOL and Compuserve that were managed that way got bogged down in bureaucracy while they were still a small fraction of the Internet’s current size. It is not plausible that bureaucracies at Comcast, AT&T, or Verizon could manage their TCP/IP networks the way AOL ran its network a decade ago.

Of course what advocates of regulation fear is precisely that these companies will try to manage their networks this way, fail, and screw the Internet up in the process. But I think this underestimates the magnitude of the disaster that would befall any network provider that tried to convert their Internet service into a proprietary network. People pay for Internet access because they find it useful. A proprietary Internet would be dramatically less useful than an open one because network providers would inevitably block an enormous number of useful applications and websites. A network provider that deliberately broke a significant fraction of the content or applications on its network would find many fewer customers willing to pay for it. Customers that could switch to a competitor would. Some others would simply cancel their home Internet service and rely instead on Internet access at work, school, libraries, etc. And many customers that had previously taken higher-speed Internet service would downgrade to basic service. In short, even in an environment of limited competition, reducing the value of one’s product is rarely a good business strategy.

This isn’t to say that ISPs will never violate network neutrality. A few have done so already. The most significant was Comcast’s interference with the BitTorrent protocol last year. I think there’s plenty to criticize about what Comcast did. But there’s a big difference between interfering with one networking protocol and the kind of comprehensive filtering that network neutrality advocates fear. And it’s worth noting that even Comcast’s modest interference with network neutrality provoked a ferocious response from customers, the press, and the political process. The Comcast/BitTorrent story certainly isn’t going to make other ISPs think that more aggressive violations of network neutrality would be a good business strategy.

So it seems to me that new regulations are unnecessary to protect network neutrality. They are likely to be counterproductive as well. As Ed has argued, defining network neutrality precisely is surprisingly difficult, and enacting a ban without a clear definition is a recipe for problems. In addition, there’s a real danger of what economists call regulatory capture—that industry incumbents will find ways to turn regulatory authority to their advantage. As I document in my study, this is what happened with 20th-century regulation of the railroad, airline, and telephone industries. Congress should proceed carefully, lest regulations designed to protect consumers from telecom industry incumbents wind up protecting incumbents from competition instead.


Innovation vs. Safety in Self-driving Technologies

Over at Ars Technica, the final installment of my series on self-driving cars is up. In this installment I focus on the policy implications of self-driving technologies, asking about regulation, liability, and civil liberties.

Regulators will face a difficult trade-off between safety and innovation. One of the most important reasons for the IT industry’s impressive record of innovation is that the industry is lightly regulated and the basic inputs are cheap enough that almost anyone can enter the market with new products. The story of the innovative company founded in someone’s garage has become a cliche, but it also captures an important part of what makes Silicon Valley such a remarkable place. If new IT products were only being produced by large companies like Microsoft and Cisco, we’d be missing out on a lot of important innovation.

In contrast, the automobile industry is heavily regulated. Car manufacturers are required to jump through a variety of hoops to prove to the government that new cars are safe, have acceptable emissions, get sufficient gas mileage, and so forth. There are a variety of arguments for doing things this way, but one important consequence is that it makes it harder for a new firm to enter the market.

These two very different regulatory philosophies will collide if and when self-driving technologies mature. This software, unlike most other software, will kill people if it malfunctions. And so people will be understandably worried about the possibility that just anyone can write software and install it in their cars. Indeed, regulators are likely to want to apply the same kind of elaborate testing regime to car software that now applies to the rest of the car.

On the other hand, self-driving software is in principle no different from any other software. It’s quite possible that a brilliant teenager could produce dramatically improved self-driving software from her parents’ basement. If we limit car hacking to those engineers who happen to work for a handful of large car companies, we may be foregoing a lot of beneficial progress. And in the long run, that may actually cost lives by depriving society of potentially lifesaving advances in self-driving technology.

So how should the balance be struck? In the article, I suggest that a big part of the solution will be a layered architecture. I had previously made the prediction that self-driving technologies will be introduced first as safety technologies. That is, cars will have increasingly sophisticated collision-avoidance technologies. Once car companies have figured out how to make a virtually uncrashable car, it will be a relatively simple (and safe) step to turn it into a fully self-driving one.

My guess is that the collision-avoidance software will be kept around and serve as the lowest layer of a self-driving car’s software stack. Like the kernels in modern operating systems, the collision-avoidance layer of a self-driving car’s software will focus on preventing higher-level software from doing damage, while actual navigational functionality is implemented at a higher level.

One beneficial consequence is that it may be possible to leave the higher levels of the software stack relatively unregulated. If you had software that made it virtually impossible for a human being to crash, then it would be relatively safe to run more experimental navigation software on top of it. If the higher-level software screwed up, the low-level software should detect the mistake and override its instructions.

And that, in turn, leaves some hope that the self-driving cars of the future could be a hospitable place for the kind of decentralized experimentation that has made the IT industry so innovative. There are likely to be strict limits on screwing around with the lowest layer of your car’s software stack. But if that layer is doing its job, then it should be possible to allow more experimentation at higher layers without endangering peoples’ lives.

If you’re interested in more on self-driving cars, Josephine Wolff at the Daily Princetonian has an article on the subject. And next Thursday I’ll be giving a talk on the future of driving here at Princeton.


Bandwidth Needs and Engineering Tradeoffs

Tom Lee wonders about a question that Ed has pondered in the past: how much bandwidth does one human being need?

I’m suspicious of estimates of exploding per capita bandwidth consumption. Yes, our bandwidth needs will continue to increase. But the human nervous system has its own bandwidth limits, too. Maybe there’ll be one more video resolution revolution — HDTV2, let’s say (pending the invention of a more confusing acronym). But to go beyond that will require video walls — they look cool in Total Recall, but why would you pay for something larger than your field of view? — or three-dimensional holo-whatnots. I’m sure the latter will be popularized eventually, but I’ll probably be pretty old and confused by then.

The human fovea has a finite number of neurons, and we’re already pretty good at keeping them busy. Personally, I think that household bandwidth use is likely to level off sometime in the next decade or two — there’s only so much data that a human body can use. Our bandwidth expenses as a percentage of income will then start to fall, both because the growth in demand has slowed and because income continues to rise, but also because the resource itself will continue to get cheaper as technology improves.

When thinking about this question, I think it’s important to remember that engineering is all about trade-offs. It’s often possible to substitute one kind of computing resource for another. For example, compression replaces bandwidth or storage with increased computation. Similarly, caching substitutes storage for bandwidth. We recently had a talk by Vivek Pai, a researcher here at Princeton who has been using aggressive caching algorithms to improve the quality of Internet access in parts of Africa where bandwidth is scarce.

So even if we reach the point where our broadband connections are fat enough to bring in as much information as the human nervous system can process, that doesn’t mean that more bandwidth wouldn’t continue to be valuable. Higher bandwidth means more flexibility in the design of online applications. In some cases, it might make more sense to bring raw data into the home and do calculations locally. In other cases, it might make more sense to pre-render data on a server farm and bring the finished image into the home.

One key issue is latency. People with cable or satellite TV service are used to near-instantaneous, flawless video content, which is difficult to stream reliably over a packet-switched network. So the television of the future is likely to be a peer-to-peer client that downloads anything it thinks its owner might want to see and caches it for later viewing. This isn’t strictly necessary, but it would improve the user experience. Likewise, there may be circumstances where users want to quickly load up their portable devices with several gigabytes of data for later offline viewing.

Finally, and probably most importantly, higher bandwidth allows us to economize on the time of the engineers building online applications. One of the consistent trends in the computer industry has been towards greater abstraction. There was a time when everyone wrote software in machine language. Now, a lot of software is written in high-level languages like Java, Perl, or Python that run slower but make life a lot easier for programmers. A decade ago, people trying to build rich web applications had to waste a lot of time optimizing their web applications to achieve acceptable performance on the slow hardware of the day. Today, computers are fast enough that developers can use high-level frameworks that are much more powerful but consume a lot more resources. Developers spend more time adding new features and less time trying to squeeze better performance out of the features they already have. Which means users get more and better applications.

The same principle is likely to apply to increased bandwidth, even beyond the point where we all have enough bandwidth to stream high-def video. Right now, web developers need to pay a fair amount of attention to whether data is stored on the client or the server and how to efficiently transmit it from one place to another. A world of abundant bandwidth will allow developers to do whatever makes the most sense computationally without worrying about the bandwidth constraints. Of course, I don’t know exactly what those frameworks will look like or what applications they will enable, but I don’t think it’s too much of a stretch to think that we’ll be able to continue finding uses for higher bandwidth for a long time.