April 18, 2014

Luis Villa

avatar

Maybe "Open Source" Cars Aren't So Crazy After All

I wrote last week about the case for open source car software and, lo and behold, BMW might be pushing forward with the idea- albeit not in self-driving cars quite yet. ;)

Tangentially, I put “open source” in scare quotes because the car scenario highlights a new but important split in the open source and free software communities. The Open Source Initiative’s open source definition allows use of the term ‘open source’ to describe code which is available and modifiable but not installable on the intended device. Traditionally, the open source community assumed that if source was available, you could make modifications and install those modifications on the targeted hardware. This was a safe assumption, since the hardware in question was almost always a generative, open PC or OS. That is no longer the case- as I mentioned in my original car article, one might want to sign binaries so that not just anyone could hack on their cars, for example. Presumably even open source voting machines would have a similar restriction.

Another example appears to be the new ‘google phone’ (G1 or Android). You can download several gigs of source code now, appropriately licensed, so that the code can be called ‘open source’ under the OSI’s definition. But apparently you can’t yet modify that code and install the modified binaries to your own phone.

The new GPL v3 tries to address this issue by requiring (under certain circumstances) that GPL v3′d code be installable on devices with which it is shipped. But to the best of my knowledge no other license is yet requiring this, and the v3 is not yet widespread enough to put a serious dent in this trend.

Exactly how ‘open’ code like this is is up for discussion. It meets the official definition, but the inability to actually do much with the code seems like it will limit the growth of constructive community around the software for these types of devices- phones, cars, or otherwise. This issue bears keeping in mind when thinking about openness for source code of closed hardware- you will certainly see ‘open source’ tossed around a lot, but in this context, it may not always mean what you think it does.

avatar

Hot Custom Car (software?)

I’ve found Tim’s bits on life post-driving interesting. I’ve sometimes got a one-track mind, though- so what I really want to know is if I’ll be able to hack on that self-driving car. I mentioned this to Tim, and he said he wasn’t sure either- so here is my crack at it.

We’re not very good at making choices like this. Historically, liability constrained software development at large institutions (the airlines had a lot of reasons not to let people to hack on their airplanes) and benign neglect was sufficient to regulate hacking of personal software- if you hacked your PC or toaster, no one cared because it had no impact (a form of Lessig’s regulation by architecture). The net result was that we didn’t need to regulate software very much, we got lots of innovation from individual developers, and we stayed bad at making choices like ‘how should we regulate people’s ability to hack?’

Individuals are now beginning to own hackable devices that can also harm the neighbors, though, so the space in between large institution and isolated hacker is filling up. For example, the FCC regulates your ability to modify your own wireless devices, so that you can’t interfere with other people’s spectrum. And some of Prof. Jonathan Zittrain’s analysis suggests that we might want to even regulate PCs, since they can now frequently be vectors for spam and viruses. Tim and I are normally fairly anti-regulation, and pro-open source, but even we are aware that cars running around all over the place driven by by potentially untested code might also fit in this gap- and be more worthy of regulation.

So what should happen? Should we be able to hack our cars (more than we already do), and if so, under what conditions?

It’d help if we could better measure the risks and benefits involved. Unfortunately, probably because we regulate software so rarely, our metrics for assessing the risks and benefits of software development aren’t very good. One such metric is Prof. Zittrain’s ‘generativity’; Dan Wallach’s proposal to measure the ‘O(n)’ of potential system damage is another. Neither are perfect fits here, but that only confirms that we need more such tools in our software policy toolkit.

This lack of tools shouldn’t stop us from some basic, common-sense analysis, though. On the pro side, the standard arguments for open source apply, though perhaps not as strongly as usual, since many casual hackers might be discouraged at the thought of hacking their own car. We probably would want car manufacturers to pool their safety expertise, which would be facilitated by openness. Finally, we might also want open code for auditing reasons- with millions of lives on the line, this seems like a textbook case for wanting ‘many eyes‘ to take a look at the code.

If we accept these arguments on the ‘pro’ hacking side, what then? First, we could require that the car manufacturers use test-driven development, and share those tests with the public- perhaps even allowing the public to add new tests. This would help avoid serious safety problems in the ‘original’ code, and home hackers might be blocked from loading new code into their cars unless the code was certified to have passed the tests. Second, we could treat the consequences very seriously- ‘driving’ with bad code could be treated similarly to DUI. Third, we could make sure that the safety fallbacks (emergency brake systems, etc.) are in separate, redundant (and perhaps only mechanical?) unhackable systems. Having such systems is going to be good engineering whether the code is open or not, and making them unhackable might be a good compromise. (Serious engineers, instead of compsci BAs now in law school, should feel free to suggest other techniques in the comments.)

Bottom line? First, we don’t really know- we just have pretty poor analytical tools for this type of problem. But if we take a stab at it, we can see that there are some potential solutions that might be able to harness the innovation and generativity of open source in our cars without significantly compromising our safety. At least, not compromising it any moreso than the already crazy core idea :)

[picture is 'Car Show 2', by starmist1, used under the CC-BY license.]

avatar

Cloud(s), Hype, and Freedom

Richard Stallman’s recent description of ‘the cloud’ as ‘hype’ and a ‘trap’ seems to have stirred up a lot of commentary, but not a lot of clear discussion of the problems Stallman raised. This isn’t surprising- the term ‘the cloud’ has always been vague. (It was hard to resist saying ‘cloudy.’ ;) When people say ‘the cloud’ they are really lumping at least four ‘cloud types’ together.

traditional applications, hosted elsewhere

Probably the most common type of ‘cloud’ is a service that takes a traditional software functionality and moves it to remotely hosted, (typically) web-delivered servers. Gmail and salesforce.com are like this- fairly traditional email and CRM applications, ‘just’ moved to the web.

If Stallman’s ‘hype’ claim is valid anywhere, it is here. Administration and maintenance costs are definitely lower when an expert like Google funds and runs the server, and reliability may improve as well. But the core functionality of these apps, and the ability to access data over a network, have been present since the dawn of networked computing. On average, this is undoubtedly a significant change in quality, but only rarely a change in type- making the buzz much harder to justify.

Stallman’s ‘trap’ charge is more complex. Computer users have long compromised on personal control by storing data remotely but accessing it via standardized protocols. This introduced risks- you had to trust the data host and couldn’t tinker with the server- but kept some controls- you could switch clients, and typically you could export the data. Some web apps still strike that balance- for example, most gmail features are accessible via good old POP and IMAP. But others don’t.

Getting your data out of a service like salesforce can be a ‘hidden cost’ of an apparently free service, and even with a relatively standards-based service like gmail you have no freedom to make changes to the server. These risks are what Stallman means when he talks about a ‘trap’, and regardless of your conclusion about them, understanding them is important.

services involving data that can’t (yet) be managed locally

Google Maps and Google Search are the canonical examples of this type of cloud service- heaps of data so large that one would need a large data center to host your own copy and a very, very fat pipe to keep it up-to-date.

Hype-wise, these are a mixed bag. These services definitely bring radical new functionality that traditionally can’t exist- I can’t store all of google maps on my phone. That hype is justified. At the same time, our personal ability to store and process data is still growing quickly, so the claims that this type of cloud service will always ‘require’ remote servers may be overblown.

‘Trap’-wise? Dependence on these services reminds me of ‘dependence’ on a library before the internet- you can work to make sure your library respects your privacy, prefer public libraries to private ones, or establish a personal library if your reading interests are narrow, but in the end eschewing large libraries is likely to be a case of cutting off your nose to spite your face. We’re in the same state with this type of cloud service. You can avoid them, but those concerned with freedom might be better off understanding and fixing them than condemning them altogether.

services that make creation of new data technically or economically feasible

Facebook and wikipedia are the canonical examples here. Unlike the first two types of cloud, where data was available but inconvenient before it ended up in the cloud, this class of cloud applications creates information that wasn’t previously feasible to collect at all.

There may well not be enough hype around this type of cloud. Replicating web scale collaborative facilities like these will be very difficult to do in a p2p fashion, and the impact of the creation of new information (even when it is as mundane as facebook’s data often is) is hard to understate.

Like the previous type of cloud, it is hard to call these a trap per se- they do make it hard to leave, but they do so by providing new functionality that is very hard to get with any traditional software model.

services offering computing and storage, rather than data

The most recent type of cloud service is remotely provisioned computing and storage, like Amazon’s EC2/S3 and Google’s App Engine. This is perhaps the most purely generative type of cloud, allowing individuals to create new services and scale them out to service millions of people without having to invest in their own physical infrastructure. It is hard to see any way in which this can reasonably be called ‘hype,’ given the reach it allows individuals and small or transient groups to have which might otherwise cost them many thousands of dollars.

From a freedom perspective, these can be both the best and worst of the cloud types. On the plus side, these services can be incredibly transparent- developers who use them directly have access to their own source code, and end users may not know they are using them at all. On the down side, especially for proprietary platforms like App Engine, these can have very deep lock-in- it is complicated, expensive, and risky to switch deployment platforms after achieving success. And they replace traditional, very open platforms- a tradeoff that isn’t always appreciated.

takeaways

‘The cloud’ isn’t going away, but hopefully we can clarify our thinking about it by talking about the different types of clouds. Hopefully this post is a useful step in that direction.

[This post is an extension of some ideas I've been playing around with on my own blog and at the autonomo.us group blog; readers curious about these issues may want to read further in those places. I also recommend reading this piece, which set me on the (very long) road to this particular post.]

avatar

Political Information Overload and the New Filtering

[We're pleased to introduce Luis Villa as a guest blogger. Luis is a law student at Columbia Law School, focusing on law and technology, including intellectual property, telecommunications, privacy, and e-commerce. Outside of class he serves as Editor-in-Chief of the Science and Technology Law Review. Before law school, Luis did great work on open source projects, and spent some time as "geek in residence" at the Berkman Center. -- Ed]

[A big thanks to Ed, Alex, and Tim for the invitation to participate at Freedom To Tinker, and the gracious introduction. I'm looking forward to my stint here. -- Luis]

A couple weeks ago at the Web 2.0 Expo NY, I more-or-less stumbled into a speech by Clay Shirky titled “It’s Not Information Overload, It’s Filter Failure.” Clay argues that there has always been a lot of information, so our modern complaints about information overload are more properly ascribed to a breakdown in the filters – physical, economic, and social- that used to keep information at bay. This isn’t exactly a shockingly new observation, but now that Clay put it in my head I’m seeing filters (or their absence) everywhere.

In particular, I’m seeing lots of great examples in online politics. We’ve probably never been so deluged by political information as we are now, but Clay would argue that this is not because there is more information- after all, virtually everyone has had political opinions for ages. Instead, he’d say that the old filters that kept those opinions private have become less effective. For example, social standards used to say ‘no politics at the dinner table’, and economics used to keep every Luis, Ed, and Alex from starting a newspaper with an editorial page. This has changed- social norms about politics have been relaxed, and ‘net economics have allowed for the blooming of a million blogs and a billion tweets.

Online political filtering dates back at least to Slashdot’s early attempts to moderate commenters, and criticism of them stretches back nearly as far. But the new deluge of political commentary from everyone you know (and everyone you don’t) rarely has filtering mechanisms, norms, or economics baked in yet. To a certain extent, we’re witnessing the birth of those new filters right now. Among the attempts at a ‘new filtering’ that I’ve seen lately:

  • The previously linked election.twitter.com. This is typical of the twitter ‘ambient intimacy‘ approach to filtering- everything is so short and so transient that your brain does the filtering for you (or so it is claimed), giving you a 100,000 foot view of the mumblings and grumblings of a previously unfathomably vast number of people.
  • fivethirtyeight.com: an attempt to filter the noise of the thousands of polls into one or two meaningful numbers by applying mathematical techniques originally developed for analysis of baseball players. The exact algorithms aren’t disclosed, but the general methodologies have been discussed.
  • The C-Span Debate Hub: this has not reached its full potential yet, but it uses some Tufte-ian tricks to pull data out of the debates, and (in theory) their video editing tool could allow for extensive discussion of any one piece of the debate, instead of the debate as a whole- surely a way to get some interesting collection and filtering.
  • Google’s ‘In Quotes’: this takes one first step in filtering (gathering all candidate quotes in one place, from disparate, messy sources) but then doesn’t build on that.

Unfortunately, I have no deep insights to add here. Some shallow observations and questions, instead:

  • All filters have impacts- keeping politics away from the dinner table tended to mute objections to the status quo, the ‘objectivity’ of the modern news media filter may have its own pernicious effects, and arguably information mangled by PowerPoint can blow up Space Shuttles. Have the designers of these new political filters thought about the information they are and are not presenting? What biases are being introduced? How can those be reduced or made more transparent?
  • In at least some of these examples the mechanisms by which the filtering occurs are not a matter of record (538′s math) or are not well understood (twitter’s crowd/minimal attention psychology). Does/should that matter? What if these filters became ‘dominant’ in any sense? Should we demand the source for political filtering algorithms?
  • The more ‘fact-based’ filters (538, inquotes) seem more successful, or at least more coherent and comprehensive. Are opinions still just too hard to filter with software or are there other factors at work here?
  • Slashdot’s nearly ten year old comment moderation system is still quite possibly the least bad filter out there. None of the ‘new’ politics-related filters (that I know of) pulls together reputation, meta-moderation, and filtering like slashdot does. Are there systemic reasons (usability, economics, etc.?) why these new tools seem so (relatively) naive?

We’re entering an interesting time. Our political process is becoming both less and more mediated- more ‘susceptible to software’ in Dave Weinberger’s phrase. Computer scientists, software interaction designers, and policy/process wonks would all do well to think early and often about the filters and values embedded in this software, and how we can (and can’t) ‘tinker’ with them to get the results we’d like to see.