February 2, 2025

Still More on Programs vs. Data

My previous postings on the program vs. data distinction have drawn quite a few comments. (To see them, click the “followups” links on my previous postings.) I’m going to let the conversation settle a bit before commenting again. But just to stir things up, here is another challenging case.

Some programs are never meant to be executed. They do instruct a computer to do something, but their author never intends them to be executed, and they never are executed. Why would somebody write a program that they know will never be executed? Let me explain.

Suppose you are studying mathematical logic, and you want to know whether a particular assertion can be proven from some set of initial axioms. It turns out that there is a deep equivalence between mathematical logic and programming languages, such that a given statement is provable in a logic if and only if a computer program of a certain type exists. It follows that you can show that a statement is provable by demonstrating the existence of a certain type of program. You don’t have to run the program; you just have to show that it is a valid program according to the rules of a particular programming language.

This isn’t just a theoretical parlor trick. Researchers often use this equivalence to show the provability of logical statements by exhibiting the existence of programs. I myself have done this and have published two papers about it. In the course of doing this, I wrote lots of program code, but never once did I run any of it.

If you believe that the author’s intent is what makes something a program, then the code I wrote is not a program, since I never meant it to control a computer (and it never did). If, on the other hand, you believe that program-ness is inherent in the potential capability of the object itself, then my code is a program.

Spread of the Slammer/Sapphire Worm

A new paper by well-regarded networking researchers analyzes the spread of the recent Slammer/Sapphire worm. The worm spread at astonishing speed, doubling the number of infected hosts every 8.5 seconds, and infecting 90% of the susceptible machines on the Net within ten minutes. Researchers had predicted that such fast-spreading worms could exist, but this is the first one seen in the wild.

The clear lesson is that a network attack can cause very widespread damage before human network operators can react. Only a widely implemented, automated shutdown procedure can hope to slam the door on a worm like this. Fortunately, Slammer/Sapphire did not carry a malicious payload. The next time we may not be so fortunate.

[Thanks to Sue Ferrara for the link.]

More on Programs vs. Data

Karl-Friedrich Lenz reacts to my previous posting on how to distinguish programs from data, by insisting on the importance of having a simple definition of “program.” He is right about the value of a simple definition. And he is right to observe that my previous posting doesn’t argue against the existence of such a definition, although it does imply that the definition might be difficult to apply in practice. Lenz suggests a simple definition: “If the object instructs a computer to do something, it is a program. The remaining cases are data.”

Maximillian Dornseif weighs in with a thought-provoking example to show difficulty of applying this seemingly simple definition.

Another troublesome cases arises in logic programming, a style of programming that is implemented by programming languages like Prolog. In logic programming, you don’t tell the computer what to do or even how to do it. Instead, you specify the attributes of an object you want, and the computer figures out how to find or construct such an object. You state facts and relationships, and then you ask a question. At no point do you tell the computer what steps to execute or how to go about doing anything; that is all handled by a pre-packaged program called the Prolog Interpreter. Computer scientists talk about Prolog programs, but a Prolog program doesn’t seem to meet Lenz’s definition.

Now, we might try to stretch Lenz’s definition by saying that the Prolog program, even if it is only a listing of facts, does “instruct” the computer to do something, because the programmer wrote it knowing that it would cause the computer to behave in a certain way. But such a definition is too broad. A Word document is written with the purpose of causing the computer to do something, but that doesn’t make it a program. Besides, it seems unsatisfactory to call something a program or not based on the state of mind of its author.

Still, I’m not giving up on the quest for a simple definition.

Standards, or Collusion?

John T. Mitchell at InteractionLaw writes about the potential antitrust implications of backroom deals between copyright owners and technology makers.

If a copyright holder were to agree with the manufacturers of the systems for making lawful copies and of the systems for playing them to eliminate all trade in lawful copies unless each transaction (each resale, trade, gift or rental) has the consent of the copyright holder, there is of course no doubt that such agreement would constitute a naked restraint of trade. If, instead, the copyright holder agreed with the manufactures of copying and playing technologies to deploy a system which simply obeys the instructions of the copyright holder (including instructions which have the purpose and effect of eliminating the resale, trade, gift or rental of the copy, or of enlarging the copyright monopoly by charging for private performances), then the agreement to have technology automatically do the deed is certainly no better than the first. It is akin to a company saying to the prospective co-conspirator: “Listen, I can’t agree with you to do what you are asking because my lawyers tell me it would be illegal, so what I’ll do is program my machine to do what you tell it to do, but just don’t tell me.”

I understand that antitrust law is suspicious of backroom deals in which companies agree not to produce certain otherwise legal products, but that there are some exceptions for standard-setting. Perhaps that is why the various inter-industry groups try to dress up their agreements as “standards.” As I have written before, most of these agreements don’t look at all like technical standards, and to label them as such is misleading.

True technical standards are voluntary, and allow products to be more functional by giving them a way to interoperate (i.e., to work together). Most of the DRM “standards” are mandatory, and make products less functional by banning some kinds of interoperation.

Whether these agreements violate antitrust law is beyond my expertise, but I do know that a reasonable exemption for technical standard-setting ought not to apply to them.

Programs vs. Data

Maximillian Dornseif asks how one can draw the line between programs and data. This is important because the law often treats the two differently. He concludes that no clear line can be drawn.

This is a more difficult question than non-techies might think. A key attribute of the “von Neumann architecture” of today’s computers is that programs and data are in the same memory, so the computer can create, store, and process programs using the same facilities that are used for data. (Princetonians like to point out that this approach, named for a Princeton person, drove out an inferior alternative known as the “Harvard architecture.”)

Some cases are easy. The English text of this paragraph is data. Microsoft Word, considered as a whole, is a program. But some cases are trickier. What about the formula typed into a cell of a spreadsheet? I would call that a program. What about the commands you type in your computer’s command-line interface? Also a program.

If Microsoft Word is a program, what about a Word document? It’s mostly data, but it may contain programs, in the form of Word macros. You can’t tell, just by double-clicking a Word document, whether it contains macros.

Adobe PostScript documents are a really interesting case, too. PostScript is the predecessor to the PDF format. PostScript describes the layout of a page by giving a computer program for drawing the page. The program might contain pieces like, “move up one inch, then draw a 3-pixel-wide, quarter-inch-long line to the right” or “move 0.1 inches to the right, then display a capital ‘A’ in 12-point Times-Roman font.” PostScript does more than this. It really is a full-fledged programming langauge, letting you define new commands and everything.

People did use the programming features of PostScript. For example, a calendar-printing program would do computations to figure out whether it was a leap year, and which day of the week each month started on. If you were dedicated enough, you could write a PostScript program to balance your checkbook.

People tend to think of PostScript documents like they think of PDF documents, as being passive data displayed on a page – and that is what they look like. But under the covers, a PostScript page is a computer program. A full account of PostScript requires that we consider PostScript documents to be both data and program. Calling them one or the other is misleading.

The lesson of this is that it is an oversimplification to say that an object must be either data or program. Like a Word document, a single object can contain both program and data. Like a PostScript document, it can be both. A naive “I know it when I see it” approach to distinguishing programs from data will not be accurate.