Today, Netflix announced it is canceling its plans for a second Netflix Prize contest, one that reportedly would have involved the release of more information than the first. As I argued earlier, I feared that the new contest would have put the supposedly private movie viewing and rating habits of Netflix customers at great risk, and I applaud Netflix for making a very responsible decision. No doubt, pressure from the private lawsuit and FTC investigation helped Netflix make up its mind, and both are reportedly going away as a result of today’s action.
Netflix Cancels the Netflix Prize 2
March 12, 2010 by
The Netflix prize brought out into the light of day and showed everyone what works, what doesn’t work, how good and how limited data mining is. All of that probably served to accelerate research into data mining by five to ten years.
The examples of loss of anonymity seem to all have been cases where the individual provided personal information about movie viewing and the researchers were then able to match that information to the Netflix data base.
I am not aware of any examples “in the wild” where anonymity was broken or harm resulted. So we have an important an interesting piece of research stopped because of arguments which are hypothetical.
I agree that ZIP code information posed a problem but there were ways that could have been dealt with by substituting a proxy. The biggest vulnerability in the database came from the names of the movies but since they were not used there was no reason to include them in Netflix 2.
This is a very sad day, when threats of lawsuits and fearmongering prevent real scientific progress…
Yes indeed, it’s a very sad day when concerns about privacy get in the way of a corporation’s desire to maximize its own profit.
Too bad that there was no way to recover the identity of the individuals, so there were no real privacy concerns. Just some lawyers looking for profit, hugely inflating the risks…
If we had to avoid scientific progress (and the Netflix challenge was a catalyst for very significant improvements) every time that there is a perceived risk, then we would not be able to have this conversation…
I’m sorry, but you don’t seem to have understood the research, Anonymous. It is *easily* possible to de-anonymize data like this. This has nothing to do with standing in the way of research. It has to do with the privacy of the people who borrowed films from Netflix believing that only Netflix knew what they watched.
It has been demonstrated over and over again that there is no such thing as “de-anonymized” data. So the right way to go about this is to *ask* every customer if they would consent to have their purchasing and rating data disclosed, informing them that they could possible be identified by this data. Then, people who feel that it might be a problem if their employers could find out what kind of p*rn or Arabic movies they watch can opt out.
Just because lawyers have to spell it out for the company does not mean that the lawyers are hindering research. Biological research has ethics committees and guidelines on how to do research with personal data. We need this in computer science as well.