December 3, 2024

Programs vs. Data

Maximillian Dornseif asks how one can draw the line between programs and data. This is important because the law often treats the two differently. He concludes that no clear line can be drawn.

This is a more difficult question than non-techies might think. A key attribute of the “von Neumann architecture” of today’s computers is that programs and data are in the same memory, so the computer can create, store, and process programs using the same facilities that are used for data. (Princetonians like to point out that this approach, named for a Princeton person, drove out an inferior alternative known as the “Harvard architecture.”)

Some cases are easy. The English text of this paragraph is data. Microsoft Word, considered as a whole, is a program. But some cases are trickier. What about the formula typed into a cell of a spreadsheet? I would call that a program. What about the commands you type in your computer’s command-line interface? Also a program.

If Microsoft Word is a program, what about a Word document? It’s mostly data, but it may contain programs, in the form of Word macros. You can’t tell, just by double-clicking a Word document, whether it contains macros.

Adobe PostScript documents are a really interesting case, too. PostScript is the predecessor to the PDF format. PostScript describes the layout of a page by giving a computer program for drawing the page. The program might contain pieces like, “move up one inch, then draw a 3-pixel-wide, quarter-inch-long line to the right” or “move 0.1 inches to the right, then display a capital ‘A’ in 12-point Times-Roman font.” PostScript does more than this. It really is a full-fledged programming langauge, letting you define new commands and everything.

People did use the programming features of PostScript. For example, a calendar-printing program would do computations to figure out whether it was a leap year, and which day of the week each month started on. If you were dedicated enough, you could write a PostScript program to balance your checkbook.

People tend to think of PostScript documents like they think of PDF documents, as being passive data displayed on a page – and that is what they look like. But under the covers, a PostScript page is a computer program. A full account of PostScript requires that we consider PostScript documents to be both data and program. Calling them one or the other is misleading.

The lesson of this is that it is an oversimplification to say that an object must be either data or program. Like a Word document, a single object can contain both program and data. Like a PostScript document, it can be both. A naive “I know it when I see it” approach to distinguishing programs from data will not be accurate.