November 29, 2020

Apple's File Labeling: An Effective Anticopying Tool?

Recently it was revealed that Apple’s new DRM-free iTunes tracks come with the buyer’s name encoded in their headers. Randy Picker suggested that this might be designed to deter copying – if you redistribute a file you bought, your name would be all over it. It would be easy for Apple, or a copyright owner, to identify the culprit. Or so the theory goes.

Fred von Lohmann responded, suggesting that Apple should have encrypted the information, to protect privacy while still allowing Apple to identify the original buyer if necessary. Randy responded that there was a benefit to letting third parties do enforcement.

More interesting than the lack of encryption is the apparent lack of integrity checks on the data. This makes it pretty easy to change the name in a file. Fred predicts that somebody will make a tool for changing the name to “Steve Jobs” or something. Worse yet, it would be easy to change the data in a file to frame an innocent person – which makes the name information pretty much useless for enforcement.

If you’re not a crypto person, you may not realize that there are different tools for keeping information secret than for detecting tampering – in the lingo, different tools for ensuring confidentiality than for ensuring integrity.

[UPDATE (June 7): I originally wrote that Apple had apparently not put integrity checks in the files. That now appears to be wrong, so I have rewritten this post a bit.]

Apple apparently used crypto to protect the integrity of the data. Done right, this would let Apple detect whether the name information in a file was accurate. (You might worry that somebody could transplant the name header from one file to another, but proper crypto will detect that.) Whether to use this kind of integrity check is a separate question from whether to encrypt the information – you can do either, or both, or neither.

From a security standpoint, the best way to do guarantee integrity in this case is to digitally sign the name data, using a key known only to Apple. There’s a separate key used for verifying that the data hasn’t been modified. Apple could choose to publish this verification key if they wanted to let third parties verify the name information in files.

But there’s another problem – and a pretty big one. All a digital signature can do is verify that a file is the same one that was sold to a particular customer. If a file is swiped from a customer’s machine and then distributed, you’ll know where the file came from but you won’t know who is at fault. This scenario is very plausible, given that as many as 10% of the machines on the Net contain bot software that could easily be directed to swipe iTunes files.

Which brings us to the usual problem with systems that try to label files and punish people whose labels appear on infringing files. If these people are punished severely, the result will be unfair and no prudent person will buy and keep the labeled files. If punishments are mild, then users might be willing to distribute their own files and claim innocence if they’re caught. It’s unlikely that we could reliably tell the difference between a scofflaw user and one victimized by malware, so there seems to be no escape from this problem.

Comments

  1. There is a real privacy issue here.

    Whilst many fans of a musician would be happy to identify themselves as a promoter of the musician’s work (and the musician and others would be pleased to recognise them), it is the fan’s right to determine what information they publish and not Apple’s.

    Simply because it is a potential copyright infringement to make and distribute a copy, this does not trump a fan’s human right to privacy – (aka the right to publication).

    This is a big cock-up on the part of Apple – another example of copyright holders overreaching themselves.

  2. Incidentally, it is not enough to encrypt someone’s private data. It can be decrypted through brute force, so this is not safe.

    Even if apple wished to give each customer a unique file (serial numbered), they’d still be obliged to inform the customer of this. It isn’t necessarily only Apple that will know which customer received which file.

  3. Because of the inherent weakness of the system for litigation, my first impression was that it was a simple deterrent to those who would not want (or not know it was possible) to go through the extra step in stripping the name information out of their files. The fact that apple chose to use the customer’s name instead of their appleID seems to support this, since most people would be more weary of leaking thier name than simply thier apple username.

    However, the fact that Apple didn’t publish this publicly though, flies in the face of that theory: The same people who know about the name information embedded through the blogosphere are the same demographic who would easily be able to use any name stripping or masking tools.

  4. I don’t think it was an anti-copying tool. I think it was a market-research tool. If it’s not evidence that needs to stand up in court, the fact that 0.1% of the populace can change the tags is not very significant.

  5. But, remember Seth, the RIAA don’t need anything to stand up in court – they never get that far. They just have to collect supposedly incriminating circumstantial evidence that supports their litigation and leads to out of court settlements. So then it’s the victim’s problem to decide whether it would stand up in court and proceed on the basis it doesn’t.

    The poor punter is now their own judge&jury (and invariably poorly suited to that role).

    TANJ!

  6. Crosbie Fitch wrote:
    “Incidentally, it is not enough to encrypt someone’s private data. It can be decrypted through brute force, so this is not safe.”

    Actually, if the encryption were implemented properly it could not be broken through brute force. Strong encryption is pretty much unbreakable, at least with currently available technology.

    My question is, what’s to stop someone from simply removing the headers, encrypted or not? Are they going to force all tracks to have these headers before they will play? If so, how are they going to deal with all the existing music people have in their collections which they obtained from some source other than iTunes?

  7. What if there’s also a hidden label? It would be a lot less believable for a user to say “I don’t know how those files got there” if the obvious label had been modified.

  8. “Are they going to force all tracks to have these headers before they will play? If so, how are they going to deal with all the existing music people have in their collections which they obtained from some source other than iTunes?”

    That’s pure fantasy. Is it likely that EMI would have sufficient clout to insist on that? Moreover, since the customer might also play the files in WinAmp, Rhythmbox, Amarok, VLC, on a Zune, etc., etc., EMI would also need agreement from every other distibutor of software players and every other maker of digital audio devices capable of playing MP4 audio. In addition, since EMI already said that the choice of format is down to the download store, they’d have to have the same functionality for every other format that any store might sell in.

    This whole story reeks of paranoia.

  9. Nick, I agree that Apple is highly unlikely to require the headers in all music files. My point is that as long as Apple doesn’t do this, it would be trivial to write a program that takes the Apple’s DRM-less AAC files as input and outputs fully functional AAC files that have been sanitized of any personally identifying information. Of course, as Don Marti says, they may also be hiding some sort of personally identifying information in a less obvious place, but even in this case it’s probably only a matter of time before someone figures out how to strip this out too.*

    So whatever else the technology does, it is unlikely to catch sophisticated file-swappers, who will simply strip out the personally identifying information. As Ed mentioned in his post, the main danger is that these files will leak onto P2P networks without any direct involvement from the person who purchased them**. Even worse, once a file with your name embedded in it gets out on the P2P networks there’s no way for you to take it back. The file could hang around for months or years, just waiting for some RIAA minion to stumble across it.

    *According to the EFF, these files do not have an audio watermark, which is something that might actually be non-trivial to remove: http://www.eff.org/deeplinks/archives/005282.php

    **There are a number of ways this could happen. Ed mentions malware, but on a more low-tech level, it could just be that the purchaser lets the wrong person borrow his or her computer.

  10. Cryptography issues aside (and privacy for that matter), a name is a powerful thing. Sure, we can speculate on ways in which these files could be taken surreptitiously and added to p2p networks, but we can probably also agree that a lot of these files are added by the users themselves. If that is the case, then their name in the header, at least at a personal level, becomes an endorsement for potential piracy. Many people are on the fence concerning DRM technologies, and provided the act of sharing to a p2p network is anonymous and ostensibly benign, they may be fine with this “shadow activity”. At least from a philosophical point of view, the name-containing headers demand something from the ‘owner’ in return for submitting something to a collective: A realization that such an act is a willful and personal choice entailling all related responsibility.

    Even if this responsibility realization is an unintentional consequence, it nevertheless seems to be a strategy that operates on a more psychological level, perhaps helping users and file owners to confront the moral and ethical ether that tends to shroud discussions of distributed networks.

  11. Personally I think a mountain is being made from a mole hill…

    “If a file is swiped from a customer’s machine and then distributed, you’ll know where the file came from but you won’t know who is at fault. This scenario is very plausible, given that as many as 10% of the machines on the Net contain bot software that could easily be directed to swipe iTunes files.”

    Chances are there’s more valuable information to be swiped from somebody’s computer than iTunes songs. Most applications that are registered have personal info, and most likely if a machine is either compromised enough or simply stolen then accounts and passwords, serial numbers, or any digital wallet info is worth far more to a thief that a few iTunes songs…

  12. “If these people are punished severely, the result will be unfair and no prudent person will buy and keep the labeled files.”

    I don’t understand this at all. How is choosing to purchase a product legally not prudent if someone else infringes the copyright and gets punished? The way I see it, infringement nor punishment has any bearing on the quality of product to non-infringing users.

  13. Tor,
    Choosing to purchase a product where your user info can easily be spoofed is not legally prudent. Everyone with access to your real name can distribute files that look as though they are yours. If you do not have an account with Apple’s itunes store, that kind of attack is useless.*

    A strange game. The only winning move is not to play.

    *still problematic, though. In the U.S. the standard of proof in civil cases is “the preponderance of evidence”, not the criminal standard of “beyond a reasonable doubt.”

  14. Jonah,

    “Actually, if the encryption were implemented properly it could not be broken through brute force.”

    Quote of the year! Tell that to AACS.

    These are my assumptions:
    1) Cyphertext of name exists in AAC file
    2) Common or garden cryptographic tools can generate this cyphertext given Apple’s public key
    3) Apple provides public key to RIAA so they can demonstrate to judge that “Joe Bloggs” has shared a file containing the name “Joe Bloggs”, i.e. the cyphertext is identical
    4) Public key and plaintext/salt format leaks out
    5) Brute force dictionary attack

    “Strong encryption is pretty much unbreakable” is not a safe statement.

    6) If Apple has to keep a corresponding database of more extensive plaintext, then there’s no point encrypting it in the AAC file, they would simply put its secure hash in the AAC file. And then we’re back to the equivalent of opaque serial numbers.

    So, either names appear in plaintext, or serial numbers are used. It is not safe to store encrypted names in the AAC file. QED

  15. Stuart Lynne says:

    Don’t worry about files getting swiped from your PC (or Mac)… Think more about the files on that iPod that got swiped from your car last week.

    Now all those tracks may become available for sharing somewhere with your name plastered all over them..

    Or at least that will be what anyone with a decent lawyer will claim if caught file sharing…

  16. AACS is different, because the content itself is encrypted, which means the users (potential attackers) have access to the decryption keys (and are now routinely publishing them on the net).

    A file with unencrypted content but an encrypted customer-ID is another matter; the decryption key can be kept truly secret by the vendor, as the user’s player software doesn’t need it.

    Given that, there are three ways to try to affix identifying information to, say, an audio file:
    * Embed discrete chunks of information in the file with the variable information, amid the chunks containing the fixed information (the audio). Each bit belongs to only one or the other sort of data. Defeated easily by taking several instances of the same tune, which you have reason to believe have different identifying information, and making a hybrid file where each bit is randomly and uniformly selected from the corresponding bit in one of the files. Made a bit trickier if there are variations in the placement or lengths of variable sections. Even then once you have enough versions, it’s easy to use some diffs to find the variable sections’ boundaries in each file and the constant sections and build a clean file from the latter.
    * Convolve the identifying information with the audio in some way, subject to the constraint that the audio still work with minimal distortion in player software unaware of the convolution. Essentially watermarking, requiring the high-entropy bits in the music (low-order bits, generally) be co-opted. With coding schemes like mp3 (or mpeg or jpeg) this doesn’t assign the identifying information to specific bits that you can just zero out. If the files are all the same length and the audio bitrate the same at every point, you can still use the “randomly mix several files” method to defeat it. Otherwise, you need to play back the audio from each, capture as WAV or whatever, average the WAV audio tracks, and convert back to a compressed format (e.g. ogg vorbis). Output is clean but may have a slight loss of quality due to lossy reencoding.
    * Convolve the identifying information with the audio in some way, NOT subject to the constraint that the audio still work with minimal distortion in player software unaware of the convolution. This is the only hard one. Hard to defeat, since the audio can be strongly encrypted with the variable information in some way, making it impossible to separate them without using the decryption key. It’s still crackable: it’s just as crackable as AACS, and for the same reasons. The WAV-averaging trick even still works, provided the DRM prevents audio capture and re-encoding (secure media pathway to the sound card?); Vista users won’t be able to use the WAV-averaging trick. But it’s also hard for the audio vendor, who has to make his customers use his proprietary player software and get them to purchase tracks that emphatically won’t work in an iPod, in Winamp, in Rhythmbox, in their car stereos …

  17. Er, the WAV-averaging trick no longer works, provided the DRM prevents audio capture. 😛

  18. Ord,

    Though, I agree that proving you don’t even own an iTunes account is a better defense, it still doesn’t protect your name from being spoofed. If you are sued, you’ll be able to verify the Apple’s signature against the spoofed name.

    Abstaining purchasing this music offers no effective protection from the headache of abusive litigation. If someone wants to spoof you to get you sued, they can do so regardless of whether you’ve purchased music or not, but the spoofing is ineffective/transparent.

  19. graphex says:

    I’m a bit on the fence about this. Buying a DRM-free song from Apple is useful to me because I run in to the 5-machine rule from time to time (3 macs, 2 pcs, various ipods). I don’t intentionally share my files, but sometimes people do use my computers without my supervision. I’d hate for a friend to get caught by the RIAA gestapo who then sends a MIB to my house to accuse me of piracy. I’d be forced to decide between calling my friend a criminal, or incriminating myself.

    Given this scenario, I would certainly partake in the use of software which easily removed the personally identifying information from DRM-free iTunes files. From what the EFF article says, that software shouldn’t be terribly difficult to create. Now the question in my mind is: would software to strip personally identifying information from “DRM-free” files be legal or not?

  20. Tomer Chachamu says:

    “If a file is swiped from a customer’s machine and then distributed, you’ll know where the file came from but you won’t know who is at fault.”

    The RIAA doesn’t care. If it’s your file, you’re at fault. If it’s your wireless router, you’re at fault.

    Even if it wasn’t signed or verifiable, they’de probably represent this data as “indisputable evidence”.

  21. Just for devil’s advocacy, how is this header information different in an interesting way from the “This program registered to Craig Shergold” line that pops up in the splash screen of so many programs that we buy and run? Obviously it’s less likely that someone or something will swipe your shareware or your old installation of Microsoft Word than that they’ll swipe your music, but I’m not sure how interesting that difference is.

  22. Paul, it’s not different – if copies of software are modified to embed the licensee’s name (without their consent/control).

    However, the thing is, software doesn’t tend to package itself up in single files that it then modifies with the licensee’s name. It’s rarely packaged for sharing/space-shifting purposes.

    It could be comparable to some software applications embedding the licensee’s name and address within each document they produce – without telling the user of the software that this is done, or providing facilities to control this.

    This issue isn’t necessarily new. And many music files purchased online may have been containing personal information for some time.

    The important question is: Is it ethical?

    It’s only a secondary question as to whether it is an effective anti-copying mechanism (and whether or not it was ever expected to behave as such).

  23. Like all these things, it is the sort of lock that keeps honest people out.

    Without an audio watermark you can very easily strip it back to analog, and re-encode it as an MP3 then buy a generic-brand MP3 player to listen to (for half the price of an iPod and with batteries that you can replace rather than throwing the whole thing away in a year’s time). Most modern mobile phones play MP3 and come with a headset so iPod isn’t the market-dominator that it once was, if Apple want to make iPods harder to use than mobile phone players then they can watch their market vanish.

    However, the deterrent only has to be good enough to make average Joe home-user think twice about sharing with his friends. As Richard Stallman points out, if you reinforce the message that helping other people is BAD then eventually it starts to sink in. This is just one of many ways to get that message across to the public. Eventually, abused people start to feel sympathetic towards their abuser and subjugated people start to see their overlord as their benefactor — it’s a survival instinct.

  24. However, the thing is, software doesn’t tend to package itself up in single files that it then modifies with the licensee’s name. It’s rarely packaged for sharing/space-shifting purposes.

    The traditional way is to make the software package a standard item which can be downloaded and then require a license key which contains the name and address of the licensee. Most software companies make no secret of it and even pop up a box to say “this software is licensed to Foo”. This has been standard practice for many years.

    Doing it without consent is another issue again, I think that we have privacy laws that require a declaration of this sort of stuff (i.e. an explicit Privacy Policy) and if the company uses information for something beyond their own declared policy then they are probably going to get into trouble. Most users don’t bother reading the policy.

  25. The intersection of the law and this technology is interesting to me. If Apple did have some information hidden in the file using steganography, presumably during any prosecution they would have to release the information of how that information was contained in the file.

    Does this suggests that, in order to keep the information from one trial being useful to others, that Apple would use a unique key for each file, embedding information each file using that unique key? That would also suggest Apple is keeping a database, to connect the key of each file to a user, perhaps with a serial number embedded into each file? Is there anything special in the iTunes agrement which wuld give use a clue that that is what Apple is doing?


  26. Don’t worry about files getting swiped from your PC (or Mac)… Think more about the files on that iPod that got swiped from your car last week.

    Now all those tracks may become available for sharing somewhere with your name plastered all over them..

    Or at least that will be what anyone with a decent lawyer will claim if caught file sharing…

    All the more reason to have part of your wifi network unencrypted, and keep a record of that, and make sure your network gets listed on several cracker sites that list wifi hot spots. Periodically purge your log files, so no one can tell who accessed your network when, then the defense simply is: Some very bad cracker stole those files off my network!

  27. There is a non trivial processing cost associated with encrypting the private data and signing the track to ensure the integrity of the combined file. Simply adding an encrypted blob of data that uniquely and anonymously identifies the user is insufficient as it has to have an integrity assertion added that proves the link between the specific user identification data used and the musicvideo track data for this specific instance. The actual processing effort is pretty trivial but the system that signs the combined file has to be an entity that has control of an official Apple signing key. This is a very important secret so it isn’t going to be shared out (Doh!). This makes the signing task very difficult to offload safely to a third party and that makes it very hard to build a solution for this while retaining the option to use CDN’s to provide your high volume data delivery capability. iTunes certainly used Akamai for it’s CDN capability in the past and although I’m not sure if that is still the case it would be very surprising if they did not place a high business value on being able to do this when traffic volume peaks on iTunes. The same problem does not arise for DRM’ed iTunes tracks because the signingencryption processes are already carried out by the iTunes client (using the customer’s own keys). This isn’t particularly “secure” (PyTunes made hay with that for a while) but it certainly makes it much easier to build a scalable delivery system which must be a significant business priority for Apple.

  28. “UFOs are real, there are free energy technologies, and there is a secret government covering it all up.”

    Ed, can you send in some black helicopters to clean up some of the offtopic posts we’ve been getting lately? 🙂

    [Black helicopters? There are no black helicopters here. You must have seen, um, swamp gas or something. –Ed]

  29. Good point helvick. Things change when you start scaling things up.

  30. can’t both, the signature and the account information, be stripped from the files and still keep the file playable?
    and if not, there’s still the analogue loop.

  31. The watermark-problem can be extended to other scenarios than your computer being stolen or hacked. In Norway it is legal to share your legally obtained music with close relatives, (eg. mother, father, siblings), and a few close friends. In addition to this you are not legally obliged to identify members of your close family, even if the police is investigating them for criminal activity.

    In short: Norwegians can legally share a watermarked file bought from Apple with a few family members, and if that file ends up on a file-sharing system the buyer is not obliged to tell who the file was shared with.

    Thus: Apple’s file labeling is not an effective anticopying tool.

  32. Stefano says:

    Why should I expend extra money for “DRM-free” tracks, if I can’t share them anyway?

  33. Pardon my ignorance in all these as I am a truly outsider of iTune. I saw shops selling iTune download cards. So why can’t people use fake names? and fake or disposable e-mail addresses?

    Creating fake identity would be far easier to crack the protection wouldn’t it?

  34. The money trail still leads directly back to you, unless you pay cash for some kind of preloaded “credit card” that can be used in online transactions somewhere where no-one knows you from Adam.

  35. Stefano… maybe some people like to choose their own player rather than living with vendor lock-in. Also, DRM is notoriously unreliable and the people pushing DRM keep pushing harder with additional “features” to make it worse and worse (e.g. sections in movies that you can’t skip through, next thing there will be adverts that you can’t switch off).

  36. ‘The money trail still leads directly back to you, unless you pay cash for some kind of preloaded “credit card” that can be used in online transactions somewhere where no-one knows you from Adam.’

    Gee, someone would be pretty dumb setting out for a scheme like this using credit card.

    I am talking about those those download card much like those cell phone recharge card. All they need from you is cash. As in many secret penetration cases, walking around is far easier than direct attack and far more effective.

    Another question: Does this mean that one cannot sell the iTune sound one gets sick of it anymore?

    If I buy a CD and give it to someone for birthday, he/she can transfer to the player without trouble. But if I buy the same number of tracks from iTune, the tracks are branded with my name and information and does that mean I cannot give that to another person, just like the CD?

    How does transfer of ownership be dealt with in this scheme?

  37. Where would you be able to pay cash for a prepaid card like that, with no identification or credit-check required — and where online could it be used? I’ve yet to see an e-tailer that accepts payment methods other than credit cards, and, rarely, money orders or cheques. Credit cards need ID and a credit check to get. Money orders and cheques are drawn against a traceable account, even if you mail them from an anonymous postbox in the middle of nowhere instead of the one on your street or in the local mall. You’d need a “credit card” that looked and acted like one as far as the vendor was concerned when they entered its number, but acted more like a prepaid gift card as far as the buyer was concerned.

  38. Anonymous says:

    Walmart sells prepaid debit cards that work just like credit cards, and so do other companies and I know there is no ID check to buy them at Walmart. These cards are designed to replace travelers checks and should work just like a normal credit card.
    Also, I believe there are prepaid itunes cards (similar to the Wii/Xbox 360 points cards) that you can buy at stores such as Walmart that will let you buy and download tracks without needing a credit card.

  39. Praise is nice, but actual discussion with original thought is better.