May 28, 2024

Bio Analogies in Computer Security

Every so often, somebody gets the idea that computers should detect viruses in the same way that the human immune system detects bio-viruses. Faced with the problem of how to defend against unexpected computer viruses, it seems natural to emulate the body’s defenses against unexpected bio-viruses, by creating a “digital immune system.”

It’s an enticing idea – our immune systems do defend us well against the bio-viruses they see. But if we dig a bit deeper, the analogy doesn’t seem so solid.

The human immune system is designed to stave off viruses that arose by natural evolution. Confronted by an engineered bio-weapon, our immune systems don’t do nearly so well. And computer viruses really are more like bio-weapons than like evolved viruses. Computer viruses, like bio-weapons, are designed by people who understand how the defensive systems work, and are engineered to evade the defenses.

As far as I can tell, a “digital immune system” is just a complicated machine learning algorithm that tries to learn how to tell virus code apart from nonvirus code. To succeed, it must outperform the other machine learning methods that are available. Maybe a biologically inspired learning algorithm will turn out to be the best, but that seems unlikely. In any case, such an algorithm must be justified by performance, and not merely by analogy.


  1. Andrew Johnson says

    Not true. This what what intrusion detection systems like Tripwire and AIDE do.

    I don’t think they’re quite the same, such systems detect the changes that a virus makes to files on disk, but can’t find one that only exists in memory or hiding in a file that they’re not monitoring. A stealth virus could be content to just infect binaries belonging to a user, installing itself as a browser plugin say. My approach would detect this when the browser first loads the plugin because it won’t have been signed (unless it came in a Trojan Horse).

  2. >There is another biologically-derived approach to >virus detection which isn’t obviously being tried >at the moment: Stop trying to identify all of the >bad guys, concentrate on being able to recognize >who your friends are.

    Not true. This what what intrusion detection systems like Tripwire and AIDE do. I don’t know of any systems like for Windows though, it’s true.

  3. Andrew Johnson says

    There is another biologically-derived approach to virus detection which isn’t obviously being tried at the moment: Stop trying to identify all of the bad guys, concentrate on being able to recognize who your friends are.

    Instead of trying to recognize some signature of all the viruses out there which you don’t want to run, why not at installation time sign (watermark) all pages of the executables and code libraries that you do want to be able to use? Any virus that manages to find a way into the machine through some buffer overflow or by modifying an existing executable would not match the watermark test (which would be cryptographic using a host or cluster-specific private key), so the OS would refuse to give execute permission to the memory page containing the bad code.

    This would need support from the OS kernel and probably some modification to the compiler to allow enough space for the watermarking (which has to be robust against relocation by the loader), but it should be readily achievable with current technology. There are various developments of this idea that could be introduced – having different levels of signing to give different permissions to the executable for example, so stuff run as root needs a higher level of signature etc.

    In some ways this is comparable to Microsoft’s Trusted Computing architecture ideas, but it’s the installer that does the signing instead of the code’s author, and the system administrator is thus always in full control of what is allowed to be run. Of course it only stops machine code viruses; anything written in an interpreted language could still contain a virus, and it does cause a problem for software developers, JIT compilers and anything that uses self-modifying code.

  4. I just blogged this… but realized people are working on it. It’s great to get your perspe, of course, Prof. Felten.

  5. Chris Tunnell says

    Yes, but viruses (in the computer sense) have the intelligence of an amoeba, I think you are picking apart the analogy.

    I disagree with your first statement. We design bio-weapons like we design most tangible things, by modelling them off of some natural entity. For example, anthrax can be found in dead animals, but we engineer it into a useable weapon. I think you aren’t giving credit to the scientists who make them since it does involve a lot of man hour.

  6. Bio-weapons are not “engineered” with knowledge of how the human immune system works(*). They are just carefully selected agents that are known to quickly overwhelm the typical victim: by the time the humoral immune response kicks in, it’s too little, too late.

    For things that _are_ specifically “engineered” to take advantage of how the immune system works, well, we have a different name for that: cancer(**). This suggests an interesting (if somewhat disturbing) prediction should “digital immune systems” become viable…

    (*) It’s hard to imagine how it could be done anyways given our current ignorance of the system.

    (**) Cancer cells have “learned” (via continued attack by the immune system) how to avoid destruction. Some even have the ability to turn the table and kill the immune system’s representatives that are trying to kill them…

  7. I think the analogy is flawed mainly because it isn’t scoped properly. In most cases the model is something like:

    Computer = Organism
    AV software = Immune system
    Virus = Virus

    Whereas I think it’s more like:
    Computer = Cell
    Collective security monitoring (both technical means and human community) = Immune system
    Internet = Organism (or maybe Society, with individual networks being Organisms)
    Virus = Virus

    A virus doesn’t infect an organism — it infects cells. Those cells are frequently forfeited in favor of stemming the spread of the virus to other cells.

    Similarly, individual computers are infected by viruses, but the response is to quash the spread first, then try to recover the individual machines. That’s not always possible, so regeneration…er…restoring from backup is sometimes necessary. Additionally, an individual network can quarantine itself from the rest of the Internet the same way a person with the flu can temporarily avoid society until they’re less contagious.

  8. Chris Tunnell says

    I agree that the argument for evolutionary learning programs is alluring, but it isn’t pragmatic. I think two better options would be to let people have a better understanding of what is actually running on their machine, because even in Linux, it isn’t clear cut what program is what. This allows foreign programs to hibernate in memory.

    Even if the machine learning program did work, you would still have the problem of people not caring to use it. It seems as if most viruses today are benign, so there is no incentive to take them off a machine.

    Also, the way medicine works to cure viruses is a doctor says, “I have seen this before, this is how you get cured.” Isn’t this Norton Update in its current state?