August 24, 2016

avatar

Why Understanding Programs is Hard

Senator Sam Brownback has reportedly introduced a bill that would require the people rating videogames to play the games in their entirety before giving a rating. This reflects a misconception common among policymakers: that it’s possible to inspect a program and figure out what it’s going to do.

It’s true that some programs can be completely characterized by inspection, but this is untrue for many programs, including some of the most interesting and useful ones. Even very simple programs can be hard to understand.

Here, for example, is a three-line Python program I just wrote:

import sys, sha

h = sha.new(sha.new(sys.argv[1]).digest()[:9]).digest()

if h.startswith(“abcdefghij”): print “Drat”

(If you don’t speak Python, here’s what the program does: it takes the input you give it, and performs some standard, complicated mathematical operations on the input to yield a character-string. If the first ten characters of that string happen to be “abcedfghij”, then the program prints the word “Drat”; otherwise it doesn’t print anything.)

Will this program ever print a four-letter word? There’s no practical way to tell. It’s obvious that the program’s behavior depends on what input you give it, but is there any input that causes h to start with the magic value abcedfghij? You can run the program and inspect the code but you won’t be able to tell. Even I don’t know, and I wrote the program!

Now you might object that even if we can’t tell whether the program will output a four-letter word, the presence of print “Drat” in the code shows that the programmer at least wanted “Drat” to be a possible output.

Fair enough; but here’s a slightly more complicated program that might or might not print “Drat”.

import sys, sha

h = sha.new(sha.new(sys.argv[1]).digest()[:9]).digest()

if h.startswith(“abcdef”): print h[6:9]

The behavior of this program again depends on its input. For some inputs, it will produce no output. For other inputs, it will produce an output that depends in a complicated way on the input it got. Can this program ever print “Drat”? You can’t tell, and neither can I.

Nonexperts are often surprised to learn that programs can do things the programmers didn’t expect. These surprises can be vexing; but they’re also the main reason computer science is fun.

Comments

  1. a bill that would require the people rating videogames to play the games in their entirety before giving a rating.

    One word: nethack.

  2. This should be obvious to anyone who knows what a bug is. After all, a bug is nothing more than unexpected behavior that happens to be unwanted. Unfortunately most people are unable to make the necessary inference, and legislators seem particularly dense in this regard.

  3. Michael,

    Some non-experts interpret bugs as cases where a programmer meant to type one perfectly clear piece of code but instead typed another perfectly clear piece of code by mistake. Like a typo: easily spotted and easily fixed.

  4. I’ve got my degree in CS, but my crypto knowledge is somewhat weak. That being said:

    sha1 (and all other hashing functions we know of) map any number of bits to some fixed number of bits (128 for sha1?). Given the want for as few collisions as possible, is it a safe assumption that all possible combinations of 128 bits would thus be used as output? If that is a same assumption, it is given that some input (of 128+ bits) would map onto a string starting with “abcdefdrat”, right?

    If I’m wrong on the number of bits for sha1, just change my numbers as needed, it doesn’t change the argument.

    Now, this doesn’t take into account how long it would take to generate such an output :)

  5. Sorry, left something out there:

    I still agree with your point. I’m just saying, hashing functions have fairly uniform output.

  6. Bill,

    That’s why I put in the “[:9]” part, which restricts the input to the second (outer) hashfunction to 72 bits. That means there are only 2^72 possible values of h, so (treating the hashfunction as “random”) the odds that h can ever start with the 80-bit value “abcdefdrat” are about one in 256.

  7. Awww, I thought you were going to actually answer the question of “Why understanding programs is hard”…

    I wonder if policymakers would understand the difficulties better if they realized that there are significant analogies to be made between computer code and legal code. Legal code is written for humans, about humans, by humans, computer code for computers, about humans, by humans. Both try to achieve a sort of precision, but neither can reach perfect precision given the unceasingly complex world in which the code is embedded.

    On the specific topic of that bill, though, that requires another analogy, between understanding a game in its entirety and understanding a program. While the analogy is somewhat obvious for programming practitioners, I can see how it would be hard for non-programmers to grasp.

  8. Brownback’s suggested “play through” method of videogame inspection has its faults. But when policymakers need to have code evaluated, isn’t the Brownback method preferable to the static analysis approach that has been used in other contexts (e.g. voting)?

    It is important for Brownback and his colleagues to understand that playing through a game may not reveal all its possible outputs. But the practical, action-guiding question they should be asking is: What’s the best available method for evaluating videogames?

  9. The problem with the legal analogy for policy makers seems obvious to me. You see, policy makers get to hand off their mistakes to some other guy down the road. Likely, years down the road. These people (judges) get to interpret the law, and are allowed significant latitude in that interpretation.

    Code isn’t really the same thing. The code does not grow by itself, it does not change itself (except in rare cases.) And there isn’t an intelligent being on the other end of the code interpreting it. When the coder goofs, it’s a permanent goof. No safety net.

    That being said, I’m not entirely sure that standardizing a review process for the rating of video games is a bad idea. While your point is compelling, the truth is that the game will do more or less what the game does. These people aren’t going to be sifting through lines of code. The raters won’t be drawing flow charts, much as the FAA doesn’t simply review the plans for a plane. They fly the plane. These people will play the game. While it isn’t a perfect solution, as so ingeniously illustrated, it’s better than what we have.

    The real concern here should not be whether a direct logical analysis of the problem proves that, technically, you can’t be sure if the game you’re playing is the game. Instead, it should be the concern that this board will end up something like the MPAA, stiffing little code shops and only catering to the big ones.

  10. I’m sure I could take the most innocent kids game and make it look like soft pr0n, likewise, I could take GTA and make it look like Sunday at Church.

  11. Ed,

    Fair enough.

  12. While you might be able to look at the code and see what was intended, there is no way to really be sure what will always happen. For example, what if someone finds a buffer overrun in the network code for a multiplayer game (or even the OS) and writes a program to cause the other player to see porn in the game by patching the game in memory? Also, what if doing certain things in the game (possibly involving cheat codes) cause the game’s to crash and the graphics to glitch in a way that it looks like there could be a dirty word in them (like the whole SEX in a stack of Coke cans thing or how you can see anything in clouds)? Finally, if it is a console game, what about someone removing the CD/DVD while the game is running and inserting a different game, which could cause the above glitches?

  13. Senator Sam Brownback may have a perfectly good understanding of computer science (even computability theory).

    You are assuming he doesn’t actually want to prohibit games whose output is unknowable and/or unpredictable, or games that take several years to play, if that all possible outputs are to be collected and analysed.

    Some senators might prohibit display or imagery of pubic hair, having full knowledge that this then causes difficulty for purveyors of certain classes of art. This does not necessarily demonstrate ignorance… then again…

  14. Blah blah blah. Stuff stuff stuff. I defer to Anatoly’s wise comment, which said more in less space than any other comment before (including and after) this one.

  15. Wasn’t there an unintended planet Arse generated in Elite?

  16. avatar Shane Selman says:

    The problem with Brownback’s bill is a simple one. The ‘Hot Coffee’ incident that garnered so much attention would never have come up even in an exhaustive play through. It was not part of the ‘standard’ play.

    Another issue with playthrough legislation is what degree of completeness you require. More and more games are growing more open ended, so that it can reasonably take 100’s of hours to complete every single objective and unlock every single toy and secret. What constitutes an unreasonable burden in those cases, and how can you define an objective standard of due diligence?

    The last objection I have to ‘playthrough’ is the issue of emergent behavior. As AI’s get smarter, and our ability to simulate behavior in real time against complex models improves, we will see more and more GENUINE unpredictability in games. Black & White is the oft quoted example. The behavior of your avatar utilized a decision model that was highly variable – occasionally resulting in comical, or disturbing emergent behaviors.

    How do you reconcile the ability of a game to respond in novel and unpredictable ways with the notion of a ‘playthrough’ rating? This is not an idle concern, or one for ‘years down the road’. It’s coming now with Spore, and now that the path has been blazed, other games will not be far behind.

  17. avatar David Phillip Oster says:

    http://perl.plover.com/yak/cs/samples/slide001.html and following is an entertaining presentation on telling if a computer program is going to do something. http://perl.plover.com/yak/cs/samples/slide020.html and following is specifically on the halting problem, which according to Rice’s theorem can be used to prove the knowability of most properties of computer programs.

    My writing is dry. The presentation I point at is lively and fun. Give it a read to see why this bill is the equivalent of the state of Indiana legislating that “pi” is equal to 3.2. http://www.straightdope.com/classics/a3_341.html

  18. It seems to me that a more relevant class of examples are the “Easter Eggs” that get hidden in much commercial software, often to the distress of the QA departments that fail to find them.

  19. I don’t see how playing the game through and then giving it a rating is any worse than giving it a rating -without- playing it through.

    Requiring that the person giving the rating plays through the game does not directly indicate “a misconception […] that it’s possible to inspect a program and figure out what it’s going to do.”

  20. From the beautiful-post-slain-by-ugy fact-dept.

    “would require the people rating videogames to play the games”

    No. That’s incorrect. It’s false. The journo who paraphrased it got it wrong.

    Here’s what he actually said:

    brownback.senate.gov/pressapp/record.cfm?id=269277&&days=365&

    WASHINGTON – U.S. Senator Sam Brownback today reintroduced the Truth in Video Game Rating Act which would require a video game rating organization to review the entire playable content of a game before assigning a content rating.

    “Video game reviewers should be required to review the entire content of a game to ensure the accuracy of the rating,” said Brownback. “The current video game ratings system is not as accurate as it could be because reviewers do not see the full content of games and do not even play the games they rate.”

    Currently game reviewers do not play the games prior to determining ratings. Their reviews are based on taped segments of the game submitted by the game’s producer to the Entertainment Software Ratings Board. Such taped segments may or may not fully represent the game’s content. The bill would prohibit video game producers and distributors from withholding or hiding playable content from a ratings organization.

    Brownback continued, “Game reviewers must have access to the entire game for their ratings to accurately reflect a game’s content.”

    Which all makes complete sense. The sensation-mongering reporter seems to have read it as a mathematical impossibility in order to hype up the story.

  21. avatar Mikko Parviainen says:

    There is a big problem with massively multiplayer online games. How do you find out what content other players make? (This might not be a problem depending on how the law is worded.) Also, you’d have to let somebody review entire updates.

    Also, how do you control what people will buy from the Internet? I play an MMOG (EVE Online) and I haven’t bought a thing off-line. Granted, I had to have a credit card which makes me an adult, and therefore most probably I’m permitted to play any legal game I want, but there’s still a problem in my opinion.

    This is also a problem in Europe: Germany has very stringent content laws about computer games and they are now pushing the same legislation to the whole EU.

  22. Interesting. I just don’t understand what playing the game all the way through would do. Generally, I’m guessing the people doing the rating are going to be taking the standard way through the game, rather tham necessarily trying the types of edge-cases that bring up easter-eggs… Also, as another poster mentioned, there are many games where you could play them through repeatedly, and never have the same experience (nethack).
    Who’s paying these raters? For some games, playing all the way through may be a simple process, but for many games, that could take a *ton* of work!

  23. It seems to miss the point.

    Most people don’t care or object to the projectile trajectory algorithm, they object to the pornographic or bloody images.

    The simple point (which would have avoided the GTA easter-egg problem) is that if you have data/pictorial content in the game, you can assume there will be a way of intentionally accessing it. So instead of requiring a play-through (and do you go into the XXX theater or not if you don’t need to?), just looking at the image database should suffice. (The GTA fix was to remove the offending content entirely).

    The one applicable instance I can think of is Q-Bert. When something lands on him, he generates a purely random complaint sound with the comic balloon with the star, exclamation point, etc. representing a nasty word. But after a million or so trials, something which can be mistaken for a swear word will come up.

  24. I don’t think a video game has to display every single potential outcome in order to rate it sufficiently, so the above arguments about a program’s inherent “unknowability”, even to the programmer, are besides the point.

    As soon as a scary animal is shown, for example, there goes the “Early Childhood” rating. As soon as a curse is uttered, there goes there “Everyone” rating. As soon as a naked figure is shown, there goes the “Teen” rating, or whatever.

    So its just a matter of a certain events taking place eliminating certain lower ratings, and not every event manifesting itself in order for an opinion to be formed. Once a game character shoots another character, the rating will be adjusted whether or not in other portions of the game the character does not shoot anyone.

    Now of course it could be possible for a game to show nothing but bunny rabbits eating grass for 10 hours, garnering a “C” rating and oops, every 1/1000 random moment show a murder scene picture, but I think there’s an amount of trust to expect.

  25. My first take, why this law will miss its goal of ‘protecting’ minors:
    The Sims is still one of the most successful games of all times. But how would you react to the glee of a kid who ‘tortures’ his little toy-beings by not only trying to ease their plights (the obvious goal of the game), but by creating a living hell for them, till they starve, burn, get elecrocuted, drown. And these are only the physical horrors…

    In ‘The Sims’, there’s no graphic blood-n-gore, and only pixelated nudity, but I think a repeated kind of ‘misuse’ of the Sims in this way would be much more disturbing than seeing the little gamer blasting his – or her – counterstrike opponents to smithereens.
    I guess no rating comission would prohibit ‘The Sims’ – despite its inherent chance of becoming a virtual torture chamber. Misuse of power is not as graphic as most people think.

    And there’s another take to the problem of rating a game in its ‘entirety’, independent of the visual output of a game. The last line to decide wether something indecent or harmful to minors is inherent to a game are the human judges – and the cultural bias of the society they spring from.
    Thus this proposed law will have no absolute effect, like a digital XOR whether a game will be allowable or not. It will merely reflect the subjective viewpoint of those who define what will be allowable.

    Is ‘America’s Army’ recommended for children of 12 and older because of the omission of realistic effects of bullet and blast wounds? Or should it be rated 18+ because of this misleading illusion of a ‘clean’ battle with well defined enemies, tricking the kids to believe in ‘clean’, clear cut wars?
    I doubt that a Brownback-commission would rate ‘America’s Army’ as ‘Adult’, though for very different – sociocultural – reasons than ‘The Sims’.

    Which game is harmful? The one which can act as a mirror to a gamer’s sadistic or destructive impulses, or the one who pretend to mask these impulses with toned-down graphics, for a just cause?
    None of them shows overtly violent grahics… and nothing on code level would hint on ‘abusive’ use…

    A final word from LtCol. Wardynski, co-creator of AA:
    “Those gamers aren’t driven from ‚America’s Army’ because of the
    violence in Iraq, Wardynski said: ‚It’s just a backdrop.’”
    Get this man into a Brownback-rating commission – and GTA should pass.

  26. Chris,

    You seem to be assuming that after playing a game for a while, you can be sure that there isn’t something more extreme that hasn’t happened yet. Sure, all the animals so far have been cute and fuzzy, but will something scary jump out from behind the next tree?

  27. They can start by reviewing the “entire playable content” of any computer chess program that you care to name and when they have finished that. Hmmm, by the time they have finished that we will all be well beyond caring.

    Worth noting that each and every attempt to legislate morality throughout history has not only failed but usually left the people concerned poorer and worse off than they were to begin with. I could give examples but certain groups of people are sensitive to having their foolishness pointed out in public.

  28. @Seth

    That’s fine in principle, but you’re missing 2 things:

    First, name one instance where the taped content was misleading and caused the rating to be wrongly applied to a game. I can’t think of one. The current controversy over ratings happened because of the hot coffee mod to GTA:SA, which unlocked content that would not have been visible whether the raters played the game or not. The M rating was perfectly appropriate for that game.

    Second, how much of the game do you play to choose the rating? Game publishers know the content and can give highlights of the most extreme elements of the game on the tape to ensure the rating is correct. if raters are forced the play the game, how long do they play for? They can play for 10 hours, 50 hours, 100 hours and not necessarily unlock the most extreme content.

  29. I have seen so many poor game ratings I understand the need to revise. However, I do not agree that it has anything to do with playing the game.

    Game content can be rated like anything else, you need not physically interact with the game in order to rate it.

    If there is profanity, nudity, gore etc. it is quite obvious. Why would ANYONE give a rating if they play the game and shoot someone – or if they clearly know the player interaction is to shoot someone.

    What difference does it make? Getting more intelligent and/or sensitive people to rate the games sounds like a better choice.

  30. The USA should perhaps get over itself a little. No other country seems to have such a weird world view. Take GTA, and the Hot Coffee mod. We are talking about a game where your first goal is to beat someone down to get armed, and further goals are to kill various people, run guns, drugs and cars, blow things up, etc., etc.

    Somehow, this is fine.

    But suddenly! Hot Coffee comes along. It adds a tiny (timewise) amount of porn to it, and suddenly!! Oh no! It should be banned.

    Something just doesn’t add up.

  31. print h[6:9]

    unless i am mistakes this slices 3 characters out of h from position 6 to 9, therefore it could never print out ‘Drat’ but may print ‘Dra’

  32. avatar Anonymous says:

    I fail to see the relevance of your examples to rating video games.

    To begin with, it is unlikely that there will be any production code that has the potential to do anything radically unexpected other than crash or glitch.

    Sure, some of your code might suddenly let you walk through walls. No, a typo somewhere will not suddenly fill your screen with NSFW images or print a bunch of swearwords.

    When your code fails, it simply fails. It doesn’t suddenly pop up mentally scaring images or knock your ESRB rating up two notches. And it most definitely doesn’t add a new level or more content to your game.

    I can say with 100% certainty that any unintentional coding mishaps will not change your games rating in any way. If you have the potential to show it, it must already be in use somewhere else anyways.