April 26, 2024

Source Code and Object Code

[This item is long and geeky. Sorry about that. I hope that at least some of you will find it interesting. The rest of you can skip right to the (slightly) pithier items below.]

When lawyers discuss software, they typically draw distinctions between source code and object code. These distinctions often fail to account for the nature of real code, and so are less useful than they seem. Indeed, they are often misleading.

Here’s an example of a source/object code distinction. Larry Lessig proposes that object code be subject to copyright (as it is today), but only if the associated source code is put in escrow, to be released when the copyright expires (a requirement that does not exist today).

The source/object code distinction assumes that code is developed in a particular way, which I’ll call the Standard Scenario for short. In the Standard Scenario, the programmer writes code in a textual form (“source code”), and this code is translated (by a program called a compiler) into another form (“object code”) which can be executed directly by a computer. The code is then distributed in object code form.

In the Standard Scenario, there are indeed two forms of code. Source code is human-readable but cannot be executed directly. Object code is not human readable but can be executed by a computer. Much information is lost in the translation from source to object code, so the object code cannot be used to reconstruct fully the original source code.

The problem is that the Standard Scenario is becoming less and less standard.

As David Reed points out, sometimes there is only one form of code. In a scripting language like Perl, the programmer writes source code. However, rather than translating that code into object code, the code is fed directly to a program called an interpreter, which carries out the code’s instructions – but without ever translating the code into object code. So there is no object code, and the code is distributed and executed in the original source code form.

What Reed doesn’t say is that sometimes there are more than two forms of code. In Java, for instance, a programmer writes code in the Java language. This code is translated by a compiler into a form called “bytecode” which, contrary to the Standard Scenario, is not object code, i.e. it cannot be executed directly by a computer. The code is distributed in bytecode form, and it can be executed in one of two ways: by feeding it directly to an interpreter, or by having the consumer translate it into object code. (The latter is now more common.) Thus, in Java there are (usually but not always) three forms of code: one for writing the program, one for distributing it, and a third one for executing it.

But wait – it gets worse. Even object code can get translated into another form before being executed. For example, Intel’s popular Pentium II microprocessor, which is the heart of many PCs, takes the “object code” that it is given and translates it into another form called micro-op code. It is the micro-op code that actually gets executed by the processor. This translation happens right inside the microprocessor, and is done because the chip’s designers found it easier to execute the translated form of code than the original object code.

The upshot of all this is that the tidy assumptions made in many legal analysis often do not hold. Object code is not the only form of code that can be executed. Source code can sometimes be executed without an intervening translation to object code. Executable code can be human-readable. Code can be distributed in a form that is neither source code nor object code. Object code can be translated further. An analysis that simply says “Treat source code this way, and object code that way” is not complete.

This is not to say that the situation is hopeless. Sometimes you know you’re in the Standard Scenario, and none of these complications arise. Even if you’re in a different scenario, there are sensible distinctions to be drawn and sensible conclusions that can be reached. What you need is a rule that will apply to code in any of its myriad forms.

As an example, we can rewrite Lessig’s rule in a way that accounts for all of the different types of code. Where Lessig said, “You can copyright object code, as long as you escrow the source code,” we can say, “You can copyright any form of code, as long as you escrow that code in the form in which you customarily read and edited it.” This captures the spirit of Lessig’s rule, which is that once the copyright expires, anyone should be able to read and modify the code just as readily as the initial author could. (Lessig understands code pretty well – for a lawyer – so let’s give him the benefit of the doubt and assume that this is what he meant all along.)

Other legal rules might not fare so well. For example, a rule saying “source code is constitutionally protected speech, but executable code is not” would be incomplete and inconsistent. (It might also be wrong constitutionally, but let’s ignore that issue.) A more clearly stated rule might say “Code is constitutionally protected speech if it is human-readable,” which is essentially the same as saying “… if a human reader can extract meaning from it.” Now we have a rule that is complete and consistent, but we no longer have the illusion of a bright line rule.