Li Zhuang, Feng Zhou, and Doug Tygar have an interesting new paper showing that if you have an audio recording of somebody typing on an ordinary computer keyboard for fifteen minutes or so, you can figure out everything they typed. The idea is that different keys tend to make slightly different sounds, and although you don’t know in advance which keys make which sounds, you can use machine learning to figure that out, assuming that the person is mostly typing English text. (Presumably it would work for other languages too.)
Asonov and Agrawal had a similar result previously, but they had to assume (unrealistically) that you started out with a recording of the person typing a known training text on the target keyboard. The new method eliminates that requirement, and so appears to be viable in practice.
The algorithm works in three basic stages. First, it isolates the sound of each individual keystroke. Second, it takes all of the recorded keystrokes and puts them into about fifty categories, where the keystrokes within each category sound very similar. Third, it uses fancy machine learning methods to recover the sequence of characters typed, under the assumption that the sequence has the statistical characteristics of English text.
The third stage is the hardest one. You start out with the keystrokes put into categories, so that the sequence of keystrokes has been reduced a sequence of category-identifiers – something like this:
35, 12, 8, 14, 17, 35, 6, 44, …
(This means that the first keystroke is in category 35, the second is in category 12, and so on. Remember that keystrokes in the same category sound alike.) At this point you assume that each key on the keyboard usually (but not always) generates a particular category, but you don’t know which key generates which category. Sometimes two keys will tend to generate the same category, so that you can’t tell them apart except by context. And some keystrokes generate a category that doesn’t seem to match the character in the original text, because the key happened to sound different that time, or because the categorization algorithm isn’t perfect, or because the typist made a mistake and typed a garbbge charaacter.
The only advantage you have is that English text has persistent regularities. For example, the two-letter sequence “th” is much more common that “rq”, and the word “the” is much more common than “xprld”. This turns out to be enough for modern machine learning methods to do the job, despite the difficulties I described in the previous paragraph. The recovered text gets about 95% of the characters right, and about 90% of the words. It’s quite readable.
[Exercise for geeky readers: Assume that there is a one-to-one mapping between characters and categories, and that each character in the (unknown) input text is translated infallibly into the corresponding category. Assume also that the input is typical English text. Given the output category-sequence, how would you recover the input text? About how long would the input have to be to make this feasible?]
If the user typed a password, that can be recovered too. Although passwords don’t have the same statistical properties as ordinary text (unless they’re chosen badly), this doesn’t pose a problem as long as the password-typing is accompanied by enough English-typing. The algorithm doesn’t always recover the exact password, but it can come up with a short list of possible passwords, and the real password is almost always on this list.
This is yet another reminder of how much computer security depends on controlling physical access to the computer. We’ve always known that anybody who can open up a computer and work on it with tools can control what it does. Results like this new one show that getting close to a machine with sensors (such as microphones, cameras, power monitors) may compromise the machine’s secrecy.
There are even some preliminary results showing that computers make slightly different noises depending on what computations they are doing, and that it might be possible to recover encryption keys if you have an audio recording of the computer doing decryption operations.
I think I’ll go shut my office door now.