June 15, 2024

Acoustic Snooping on Typed Information

Li Zhuang, Feng Zhou, and Doug Tygar have an interesting new paper showing that if you have an audio recording of somebody typing on an ordinary computer keyboard for fifteen minutes or so, you can figure out everything they typed. The idea is that different keys tend to make slightly different sounds, and although you don’t know in advance which keys make which sounds, you can use machine learning to figure that out, assuming that the person is mostly typing English text. (Presumably it would work for other languages too.)

Asonov and Agrawal had a similar result previously, but they had to assume (unrealistically) that you started out with a recording of the person typing a known training text on the target keyboard. The new method eliminates that requirement, and so appears to be viable in practice.

The algorithm works in three basic stages. First, it isolates the sound of each individual keystroke. Second, it takes all of the recorded keystrokes and puts them into about fifty categories, where the keystrokes within each category sound very similar. Third, it uses fancy machine learning methods to recover the sequence of characters typed, under the assumption that the sequence has the statistical characteristics of English text.

The third stage is the hardest one. You start out with the keystrokes put into categories, so that the sequence of keystrokes has been reduced a sequence of category-identifiers – something like this:

35, 12, 8, 14, 17, 35, 6, 44, …

(This means that the first keystroke is in category 35, the second is in category 12, and so on. Remember that keystrokes in the same category sound alike.) At this point you assume that each key on the keyboard usually (but not always) generates a particular category, but you don’t know which key generates which category. Sometimes two keys will tend to generate the same category, so that you can’t tell them apart except by context. And some keystrokes generate a category that doesn’t seem to match the character in the original text, because the key happened to sound different that time, or because the categorization algorithm isn’t perfect, or because the typist made a mistake and typed a garbbge charaacter.

The only advantage you have is that English text has persistent regularities. For example, the two-letter sequence “th” is much more common that “rq”, and the word “the” is much more common than “xprld”. This turns out to be enough for modern machine learning methods to do the job, despite the difficulties I described in the previous paragraph. The recovered text gets about 95% of the characters right, and about 90% of the words. It’s quite readable.

[Exercise for geeky readers: Assume that there is a one-to-one mapping between characters and categories, and that each character in the (unknown) input text is translated infallibly into the corresponding category. Assume also that the input is typical English text. Given the output category-sequence, how would you recover the input text? About how long would the input have to be to make this feasible?]

If the user typed a password, that can be recovered too. Although passwords don’t have the same statistical properties as ordinary text (unless they’re chosen badly), this doesn’t pose a problem as long as the password-typing is accompanied by enough English-typing. The algorithm doesn’t always recover the exact password, but it can come up with a short list of possible passwords, and the real password is almost always on this list.

This is yet another reminder of how much computer security depends on controlling physical access to the computer. We’ve always known that anybody who can open up a computer and work on it with tools can control what it does. Results like this new one show that getting close to a machine with sensors (such as microphones, cameras, power monitors) may compromise the machine’s secrecy.

There are even some preliminary results showing that computers make slightly different noises depending on what computations they are doing, and that it might be possible to recover encryption keys if you have an audio recording of the computer doing decryption operations.

I think I’ll go shut my office door now.


  1. 15 minutes yeap and it was a journal entry too. I do know that and what is the big deal about it because what it contained as typewritten info really isn’t anyone’s business because if I do remember correctly it was very personal stuff that noone should know about other than whom it belongs to it isnt like the object in question is a big deal. What is a big deal is the fact that the “journal entry” was typed on a computer with basic programs. And “no” passwords do not protect either.

  2. 15 minutes yeap and it was a journal entry too. I do know that and what is the big deal about it because what it contained as typewritten info really isn’t anyone’s business because if I do remember correctly it was very personal stuff that noone should know about other than whom it belongs to

  3. An episode of the BBC TV show “Spooks” (aka: MI-5 here in the US) used such a device in one episode. The target was given a set of cufflinks that had a transmitter built in and a training document (a resume that managed to incorporate every letter of the alphabet–not sure if special characters were included).

    As the targetr typed on the keyboard, the transmitter sent the sounds to a nearby surveillance van where the computer trnslated and reassembled what was being typed.

  4. grassroots2012 says

    play the keyboard as music
    the universe is recording everything you say, think, do
    say exactly everything you mean
    peace be with all

  5. Here are reports from a student project I supervised here at Chinese University of Hong Kong, on keyboard acoustic attacks using multiple microphones. It extends the above results from using one microphone a lot of data mining to using multiple microphones and simple post-processing and/or datamining:




  6. […] 14.10.2005Paru go¶ci z Uniwersytetu Berkeley opracowa³o metodê dekodowania tekstu pisanego na komputerze na podstawie nagrania audio pracuj±cej klawiatury. Czy nied³ugo zaczniemy pracowaæ na komputerach w d¼wiêkoszczelnych pomieszczeniach? […]

  7. A few sound sources dispersing a stream of randomized keyboard clickety clax may not only fix the hole but induce a calm and easy mood in the protected individual.

    Trigger with a microphone built into the keyboard, for the faint-hearted.

  8. interesting but kind of obvious…. and smells a bit of some v rudimentary form of van eck phreaking… but erm… why not use van eck phreaking in the first place – it would probably be easier! 🙂

  9. when I think about my own typing, I must think of a lot of “backspacing”. Even marking (parts of )words with the mouse and deleting it must be taken into account.
    In phase 3, when trying to make sense, I reckon that is a not-so-simple detail to resolve…

  10. i need to know how to secrtly know what a person is typing like a dowloaded program or something

  11. I wish I had this system incorporated in to my knowledge, it is so useful, I often forget my passwords.


  13. Oops. not lpaster, it was for Reed

    I’m sorry 🙁

    [Will mod please correct it in my first message and delete this one. Thanks]

  14. lpaster, great idea!

    Looks like you should build a small speaker into each key and then make all speakers produce random key sound. This will help against your 3d mics system attack.

    Microsoft Natural SE (Secure Edition) 101-key/101-speaker keyboard. Just came to my mind…

  15. 1. three brothers from Israel were jailed several years ago for telecom fraud that involved the amazing capabilities of one brother who was blind, but was able to listen, analyze, remember and distinguish dial tone sounds of people making phone calls, and repeat them later (to dial international calls and obtein credit card numbers).
    they scammed millions from Israeli and other telecom corporates.

    2. Neal Stephanson wrote in Cryptonomicon about a method of tracking typed text on a screen by remotely capturing frequency changes of the monitor. it was described as Van Eck phreaking.

  16. It seems like you could further improve this using multiple mics, and comparing the arrival times of the sounds to get rough position information. The timing required to tell one side of the keyboard from the other is quite modest. This would help catagorize the keys. It would also make the idea of using your computer speakers to play false keyboard sounds a much less effective defense.

  17. I still giggle about morons who dial using touchtones while being recorded. A very simple device is all that’s needed to decode phone numbers from TV shows etc. I did not capture it, but I was wondering when Tommy Lee was dialing one celeb after another on his current show if this would work or do cellphones not emit touchtone sounds?

  18. Edward W. Felten

    Yes, you’re rigth. It uses sound, but this is pretty much useless IMHO.
    Electromagnetic waves can be catch up to miles from the source.
    Sound can only be recorded near the source and you must account
    with all the noise.

    You can create a virus to record the sound waves from your monitor
    microphone and have it to analyse the data and grab whatever you
    write, but at that level you can make it simple and just grab the keys
    using the OS API. Pretty much simpler …

    So i guess they have targeted this to any real spy situations where you
    need to know what people is doing in the computer without actually have any
    access. However i have a pretty good solution (aldo annoying to the user)
    for this problem: Just record some garbage keystrokes and have it randomly
    being played to create some environment sounds 😉

    Something like the http://www.lorem-ipsum.info/ should be enougth

  19. It strikes me that this method (once successfully packaged as an application) could also be used by employers to keep an eye on exactly what their employees are doing (typing) during the day (granted it’d be expensive, but it’s alot more covert than a keylogger). I mean, how much time do you spend emailing colleagues/friends instead doing what you’re being paid to do??

  20. Computer Security Student says

    I always thought this should be possible because whenever I type a commonly used password on a keyboard, it always sounds the same. Amazing what people can do though.

  21. I suppose this is interesting and possibly scary, but let’s not forget that there are many, many easier ways to find out everything you type (or do) on your computer.

    For example, when’s the last time you checked your keyboard to see if there’s a key-logger on it?

    Probably not recently… And anyone can buy those for cheap.

  22. Rawwa,

    If you read the description, you’ll see that this is different from the older TEMPEST results you cite. Previous results have captured electromagnetic (i.e., radio) emanations from devices. These results capture sound waves, and analyze them in a new way.

    As far as I know, this IS new.

  23. This idea is old. I’ve seen some research results on this 4 years ago, using a radio
    to tune into the keyboard working frequencies. For those who like this kind of stuff,
    it is even possible to see what is on other people screen within a mile. Just point an
    antena in the rigth direction and do some DSP on the received signal.

    Take a look at:
    http://jya.com/emr.pdf (this is an article from 1985!!!)

    And a quick search on Google has show a lot more on the NSA as for example

    Just searc on google:
    tempest site:www.nsa.gov

    As people can see this is really nothing new …

  24. Record the clatter while on the phone to a call-center operative. You often know the text they are typing in: your address, phone and other details. Any passwords they type should be easy to pick out using this method.

  25. Well this idea of sniffing keystrokes by using a microphone DOES work
    well. The method I used to try out the idea was to reuse well known cryptanalytical tricks.

    As long as you are able to deter difference in sounds for each keypress, you can easily decipher what was actually typed. The problem of trying
    to “hear” difference between each keypress is actually the biggest

    You can drastically simplify the analysis by having statistical data
    about timing distribution between keypresses given a certain keymap ( a system could have qwerty or dvorak map for instance , this changes the timing distribution of typing).
    This lessens the requirements of the microphones.

    Having this makes it more than just plausible to “crack” whatever you hear
    is being typed. The only difference between old cryptanalysis and this one
    is that we are dealing with sounds which represents letters.
    The techniques are still the same as for cracking a monoaphabetic cipher.

  26. Hmmm… Silencers! Computer Silencers!
    Not at all a bad idea for starting a new Business!
    Who is willing to invest!

  27. Get two canvas sacks, a pair of these (http://www.inition.co.uk/inition/product_glove_measurand_shapehand.php), and one of these (http://www.inition.co.uk/inition/product_hmd_cybermind_visette45sxga.php). Then RF generators, noise machines, may as well get a fog machine too….

  28. Thanks for shutting your door, there was too much background noise coming from down the hall and I got a much better recording with the door shut.

  29. It doesn’t matter what type of keyboard, only that you are typing a language with well known statistical distributions of one, two, and three letter combinations… and keys that make slightly differing sounds…

  30. Valdis Kletnieks says

    The person who is glad that they use a Dvorak keyboard shouldn’t be laughing – this technique assumes *nothing* about the keyboard layout. All it cares is that ‘D’ ‘v’ and so on are the same key each time. In order to actually defeat this, you’d need to scramble the keyboard layout in a truly random fashion after each keypress, so each letter could be on any key each time. Of course, you’d need one of these keyboards so you could *find* the letter:


    (Note – Art Lebedev is a graphics designer, not an engineer. That is a keyboard that *should* exist, not one that actually does…)

    Alternate possible solutions include membrane keyboards and the projected keyboards available on some mobile units…

  31. Couldnt they just make SILENT KEYBOARDS????


  32. …having worked on military software in a high security environment, I can tell you that these rooms where the programmers work emit an annoying humming sound to counter these kind of attacks (how they do it exactly, I don’t know, but it IS annoying)… and of course the physical ability to enter the premise is restricted through routine means, including being accompanied by a gaurd… I think the walls are made with special sound proofing measures also.

  33. Hey I was wondering if I could get a copy of this program? It sounds fun to play around with 🙂 No I don`t intend on grabing someone passwords.. Just seeing if it really works. Its hard to belive. Thanks

  34. Glad I use a Dvorak keyboard! Hahahahaha

  35. if anyone from slashdot reads this far: Doug is not a student at Berkeley, but a professor.

  36. The obvious solution is to encrypt things in your head before typing them.

  37. This could be easily circumvented in high-security applications by installing a small module in the keyboard that emits random key sounds every time a key is struck. Also, was this test done on a buckling spring keyboard? Would those be easier or harder to crack with this method?

  38. Impressive.

    I recall a 1940’s typewriter my grandfather used. It required that you push on each key very hard. He could do 45 words a minute on the thing.

    When he typed, the thing was extremely loud. We could hear it through the whole house. And it was clear that each key made a slightly different sound when he typed.

    I never even thought of applying the concept to a keyboard for a computer.

  39. How does this approach deal with backspaces, punctuation marks, shortcut keys and modifiers? I imagine that in practice this approach gets to be considerably more difficult, given the unpredictability of error rates. This approach also assume that the input represents a continuous stream, when in in fact (such as when I am using the mouse to highlight and delete text and using arrow keys to move around within text, etc.) it is not.

  40. Meanwhile, towelie is tapping the memory to Funkytown into a keypad.

    True and somewhat related story: my old apartment had one of those access keypads (rather than keycards or something worthwhile) protecting the lock. It only had 4 digits.

    One day when I was getting home, I forgot the passcode. Just blanked. Our landlord thankfully wasn’t stupid enough to have made it our address.

    Luckily, the faceplate was about 5 years old and the passcode had never been changed; all the numbers on the keys you actually needed to hit were worn completely off – all three numbers. They keys you didn’t need to hit still had shiny black paint on them.

    Didn’t take me too long to re-guess the passcode at that point.

  41. I seem to remember this sounding pretty well-established when they were talking about the new (fiasco, spy-proofing-wise) American embassy in Moscow about 20 or so years ago. Apparently windows also make great speakers for this sort of thing, so you just train your mics on the windows of the appropriate room(s). This was then coutnered, at least partially, by sticking speakers by the windows for interference. I think there was also some deal where you could read IBM Selectrics (the big players at the time) by getting a read on electrical impulses or something.

  42. In one of Rex Stout’s Nero Wolfe novels, a crucial clue was the sound of
    a person dialing a rotary phone — not the clicks that would be heard on
    an extension phone, but the sounds as heard by people standing in the
    same room as the person dialing the number.

  43. And that’s what’s published in the open world. How long you suppose the various intelligece agencies have been onto this?

  44. Can You Hear Me Now?

    Ed Felten reports on a new technique to turn go from a recording of typing to the sequence of keystrokes: Li Zhuang, Feng Zhou, and Doug Tygar have an interesting new paper showing that if you have an audio…

  45. george humphrey says

    This is similar to the technique used by the allies in world war II. Using microphones covertly installed in “axis-friendly” embassies they were able to record the sounds of typing being done on the teletype style consoles that were used to generate encrypted messages. The in-clear message was typed in and the console output the encrypted message.

    After recording the sounds, the allied intelligence experts displayed the sounds on oscilliscopes and were able to “read” the in-clear text that was typed in just by visually recognizing the different appearence of the sound made by each key.

    See the novel “Codebreakers” that was published in the, (I think), 1980’s

  46. Use caplocks/shift/alt/contr/+key combinations, and lots of coughing!

  47. I was wondering why this hadn’t gotten more attention… my comments here: http://josephhall.org/nqb2/index.php/2005/09/04/mic_strokes

  48. Well, i actually have read about a secretary who could tell the keys pressed by her boos only by hearing him typing. She even used his passwords to buy stuff on the net. A fairly recent slashdot story, I think.

    Amazing super power.

  49. Brian Srivastava says

    It would be interesting to know if the NSA/CIA types have known this was plausible for some time. It might furthur explain the banning of furbies.

    As a sort of practical matter it would be interesting to see how this applies to the ‘real world’, I doubt its very often anyone sits down at a computer and just starts typing out a report. You can probably fairly easily pick out mouse clicks, but you completely lose context of what is being typed, did that sentence go in notepad, instant messaging or a google search? Still it sounds like this would have some fairly significant implications for companies that don’t want their documents leaked out all over the place. Good reason to keep ipods and computers without build in microphones.

    P.S. the garbbge charaacter. was cute

  50. There was an episode of Due South in which Constable Fraser attempted to guess a password by listening to it being typed, and then attempting to match the sound with his own typing. He seemed to be aiming mostly for the rhythm of the typing rather than the sound of the individual keystrokes – and, of course, the show was fiction and not making much effort at realism.

  51. Good point. Using timing would probably make the attack much more effective, allowing higher accuracy and/or shortening the length of the recording needed.

  52. Florian Weimer says

    This attack could probably be improved significantly if the timing difference between different key pairs is taken into account. There are some results indicating that this is enough to recover interesting data, see the paper on SSH timing attacks by Song, Wagner and Tian.