March 24, 2018

Zfone Encrypts VoIP Calls

Phil Zimmerman, who created the PGP encryption software, and faced a government investigation as a result, now offers a new program, Zfone, that provides end-to-end encryption of computer-to-computer (VoIP) phone calls, according to a story in yesterday’s New York Times.

One of the tricky technical problems in encrypting communications is key exchange: how to get the two parties to agree on a secret key that only they know. This is often done with a cumbersome “public key infrastructure” (PKI), which wouldn’t work well for this application. Zfone has a clever key exchange protocol that dispenses with the PKI and instead relies on the two people reading short character strings to each other over the voice connection. This will provide a reasonably secure shared secret key, as long as the two people recognize each others’ voices.

(Homework problem for security students: What does the string-reading accomplish? Based on just the information here, how do you think the Zfone key exchange protocol works?)

In the middle of the article is this interesting passage:

But Mr. Zimmermann, 52, does not see those fearing government surveillance — or trying to evade it — as the primary market [for Zfone]. The next phase of the Internet’s spyware epidemic, he contends, will be software designed to eavesdrop on Internet telephone calls made by corporate users.

“They will have entire digital jukeboxes of covertly acquired telephone conversations, and suddenly someone in Eastern Europe is going to be very wealthy,” he said.

Though the article doesn’t say so directly, this passage seems to imply that Zfone can protect against spyware-based eavesdropping. That’s not right.

One of the challenges in using encryption is that the datastream is not protected before it is encrypted at the source, or after it is decrypted at the destination. If you and I are having a Zfone-protected conversation, spyware on your computer could capture your voice before it is encrypted for transmission to me, and could also capture my voice after it is decrypted on your computer. Zfone is helpless against this threat, as are other VoIP encryption schemes.

All of this points to an interesting consequence of strong encryption. As more and more communications are strongly encrypted, would-be spies have less to gain from wiretapping and more to gain from injecting malware into their targets’ computers. Yet another reason to expect a future with even more malware.


  1. Brian Kemp says:

    How about an “anonymizer” bootable read-only media much like was done with AnonymOS (a fork of OpenBSD)?

    Reboot scrap machine with read-only media. However, it’s not foolproof–it relegates malware to the BIOS (which means it’s still possible), and a static trusted read only OS (which is hard to update and get new features) and what happens when one of your discs ends up in the wrong hands?

    Just a thought.


  2. Interesting. But if an eavesdropping party was listening from the beginning of the call and heard the short strings, would they be able to record the conversation then crack the encryption?

  3. V,

    No; listening to the recital and learning the short strings won’t help the adversary. To explain why this is so would require delving into the details of the cryptographic math that the protocol uses. The math isn’t too complicated, but I wanted to avoid digressing to explain it. Maybe someday I’ll write a blog post about it.

  4. This doesn’t help me call my bank. I don’t know my bank’s employees’ voices, and they don’t know mine.

  5. The phrase the callers read is the fingerprint of the keys the software has created on the fly. It sounds like Zphone is doing PKI dynamically behind the scenes and having the callers verify fingerprints (something that’s part of a good PKI exchange). This prevents the man in the middle attack.

  6. Regarding the spyware comment, I believe Mr. Zimmerman was referring to a situation where a single infected computer might compromise the entire network by working as a sniffer, capturing VoIP packets. When using Zfone, all packets would be encrypted, and sniffing would become pointless. Of course, if all the computers on the network are infected, the attacker can not only evesdrop, but also record passwords and steal private keys.

  7. J.B. Nicholson-Owens says:

    The Zfone source code can only be copied “a reasonable number” (section 1a) of times, one is not allowed to make the software do what the user needs it to do (section 2a disallows modifications not specified in section 1), and one is disallowed from copying the software beyond what is described in section 1 (section 2b). Sections 2d and 2f prohibit sharing copies of the source code except in one circumstance.

    Section 2e of Zfone’s source code license tries to set restrictions for merely running the compiled program (something the FSF once said couldn’t be done under American copyright law outside of a license manager or an encryption manager).

    Section 3 of Zfone’s source code license tries to prohibit users from discussing “any security-related bug, problem, deficiency, or weakness in the Zfone software on any web site or other public forum, or otherwise disclose or provide any such information to anyone else” without Zimmermann’s permission.

    Unlike PGP which at one time was considered semi-free software because it didn’t convey the freedoms to use, copy, distribute, and modify the program to all of its users, this program’s license tries to curtail one’s freedom of speech in addition to taking away one’s software freedom. Ironic that this should come from the man who was once under criminal investigation by the US Government (a time he refers to as “government persecution” on his website) in which he probably felt the loss of his civil liberties. I very much doubt that Zfone’s software would qualify as semi-free software. Zfone should be avoided. Instead it would be better to enhance free software VOIP (such as Ekiga) to do the job of sending and receiving strongly encrypted data, and making free software VOIP programs compatible with Zfone so that interoperability is possible without giving up valuable freedoms.

  8. bignose says:


    You don’t know the people managing or working at your bank? Then why do you trust them with your money?

    As the problems of enforced trust in faceless corporations become more evident, perhaps it’s time once again to support community-based institutions with which people can build a trust relationship. Surely financial institutions would be an obvious place to start.

    If someone is getting paid with profits from investing my money, and is empowered with holding and moving it around, it’s unacceptable to not know who they are. That’s the problem to fix: regain trust in your financial institution.

  9. There is a dark side to “community-based” anything, though — if someone takes a personal dislike to you you may not have a more impartial source for some service, and if someone influential in that community takes a personal dislike to you they may be able to blackball you with respect to something (traditionally, employment in a certain sector). Favoritism would also become a problem. The upside of faceless institutions is being part of the anonymous mass of their clients rather than being a specific name and face to them, ensuring impartial treatment.

    Of course, as these faceless institutions develop huge databases of marketing department fodder, it does become possible for them to start systematically singling someone out for special or for shittier treatment, but it is unlikely they’ll bother because of sheer weight of numbers. There would be a maximum of a dozen or so people singled out, diluted in millions rather than just the population of a village. And of course if someone is singled out for shoddier treatment they can probably fairly trivially trick the organization in question into interacting with them as it does with an anonymous member of the herd instead.

    As a general rule, I like my major organizations faceless and being only a number to them, and especially on the net I don’t like being forced to establish an “identity” to use a site or whatever. On the net, you’re a number most of the times; just an IP address which may be dynamic in which case it only tells someone maybe what geographic region you’re in (and even that is sometimes misused by Web sites to present different versions of themselves or restrict access in different areas, instead of obeying the end-to-end principle). When you’re not, you’re usually pseudonymous and can change name or start over at will if some relationship goes sour. Getting too much spam? Change your email address. Some jerk stalking you and flaming you wherever you go on usenet? Change name and other header info. (Also handy to compartmentalize, so nobody in your main haunt knows that the guy asking about OTC priapism remedies over there was you, and therefore they can’t make jokes about it at your expense…)

    Once an online site makes you “register”, though, the advantage shifts solidly to the faceless operators. To them, you may become a target for whatever reason, or another user may become a VIP; either may lead to differential and unfair treatment. It won’t be long before there’s a big scandal, I predict, in which some commerce site is found to have engaged in differential pricing based on purchasing histories; someone who establishes a persistent and hard-to-abandon-and-start-afresh identity with them might have their buying pattern profiled and a “what he’s willing to pay” calculation done, and prices adjusted (likely inflated) next time he logs on. Effective market segmentation and price discrimination at the scale of the individual buyer is a marketer’s wet dream — and a customer advocate’s nightmare.

    Of course, it can get even worse: a “universal logon” which some already are trying to invent (ex. MS passport) and market as a convenience would also make it harder to compartmentalize and harder to change identity and start over, making it harder to escape spammers, stalkers, being singled out for shoddy treatment someplace, etc.

  10. From Neo:
    [i]”It won’t be long before there’s a big scandal, I predict, in which some commerce site is found to have engaged in differential pricing based on purchasing histories;”[/i]

    Not exactly what you wrote, but the recent [url=]Netflix[/url] case essentially fits the bill.

  11. Actually, Zimmerman demo’d this at Black Hat ’05 in Vegas. Unfortunately, the BH Site doesn’t seem to have any of the presentation on it.

    As I recall (however much you want to trust that…), the string reading was to provide a side channel to confirm each other’s public keys (which, as usual, were only used to securely exchange the symmetric session key). Since the whole purpose of PKI is to prevent MiM attacks, this was a replacement. The short strings were derived from the other party’s public key in some way, ensuring that you had the correct public key.

    Phil also mentioned something about “public key chains” where reading back the strings would also validate that all previous conversations, but I don’t remember the details from almost a year ago… assuming he even gave them.

  12. Neo, Cymbaline: Amazon got in hot water for setting prices based on customers’ profie info (zip code), and they’ve been accused of setting prices based on individual purchase history though I don’t know how reliable that info is. This was back in 2000/2001ish, IIRC.

  13. DavidTC says:

    To g:

    First of all, note this is only important the first time. Think of it as SSH keys. Any time *past* the first, you should know what your bank’s keys look like.

    Anyway, if you don’t know the voice of the other person, it still protects against MiM attacks. Why? Because you’ll notice if you start talking to someone and they read off the numbers in one voice, and then start talking in another.

    Thisis how MiM works. They intercept your handshake, and talk to you instead, and they connect to where you want to connect, and talk to that person. Two securely encrypted conversations, with them in the middle decoding and re-encoding for the other person.

    With reading the numbers off, they can’t do that without some near-magical voice recognization and recreation. There is no software that lets me take someone saying ‘3 5 1 5 2 8’ and change it to ‘2 5 1 6 0 2’ in real time. They could try to use a database of voices to match it up, but that doesn’t work too well if you do it first thing in a conversation.

    Now, they could intercept your call, and merely *pretend* to be your bank, but that’s foilable with basic steps, like the customer asking ‘When did I open my account?’ or ‘How much was my deposit last week?’. (In addition, the bank has to authenticate you, but they already have to do that.)

    However, this is still one possible way to MiM the whole thing. They would have to sit in the middle and actually carry out the conversation with both of you. I.e, you tell them something, and they tell the bank, and the bank tells them and they tell you. That would almost certainly require at least two people, and is, frankly, a rather goofy idea.