As part of a research project on web browser security we are currently taking a “census” of browser installations. We hope you’ll agree to participate.
If you do participate, a small snippet of JavaScript will collect your browser’s settings and send them to our server. We will record a cryptographic hash of those settings in our research database. We will also store a non-unique cookie (saying only that you participated) in your browser. We will do all of this immediately if you click this link.
(If you want to see in advance the Javascript code we run on participants’ machines, you can read it here.)
[I revised this entry to be more clear about what we are doing. — Ed]
Not from my computer, Dan. I have adblocked urchin.js and other google analytics scripts, and many other tracking scripts. In fact, I have adblocked almost everything I commonly encounter that’s loaded third-party from many websites and that doesn’t add value to the site for me, ads being only a portion of that. Generally, if I notice intrusive advertising or just slow loading of a page I will click the adblock thingie and start looking for anything that looks like it doesn’t add value — being loaded from a third-party site being the number-one cause of suspicion, and being a SCRIPT rather than an IMG or BKGND being the number-two cause.
It’s amazing how much faster some pages load when this cruft is removed from them. If you ever see a page that slowly fills in while your browser’s status bar flickers back and forth between “transferring data from http://www.x.y.z.com” and “contacting host” for sixty different hostnames, it’s one that can be loaded in a third of the time or less if you aggressively filter out the chaff from the wheat.
I mostly leave non-third-party objects alone, though, unless they’re ads, or scripts that load ads or do something else annoying (such as try to deny access to ad-blockers!, or interfere with right-click-save-as or with navigation, though I mostly just avoid sites that are so disrespectful to their visitors as to try to trap them there).
That ChoicePoint and DoubleClick and Google and friends can’t track my every online movement is just a happy side-effect of all this.
(Actually, I have a gmail account and suspect that my Google searches are associated with it, so Google can track quite a bit still. If I want to search for something I really don’t want tracked, say personal medical related stuff, I can avoid that too. I could install tor, or at least use a proxy. With a proxy, Google doesn’t know who performed the search, but the proxy knows who searched and for what; there’s a tradeoff there between the devil you know and the devil you don’t. Google is sure to log this information, and who knows under what circumstances it may leak or sell it; and it’s a high-profile target for anyone looking for such data on someone. The proxy is a much smaller, less well known such target, and someone would have to guess which one I used, but it might also log things, and expecting that a lot of what it logs will be “juicy”, it might run a lucrative sideline of selling the information to private eyes or similarly. If I was worried about being blackmailed, I’d avoid a simple web anonymizer in favor of using tor; if I’d just rather large corporations know less about me, the web anonymizer is easier to use.)
I’m angry that I had to click twice. Thanks to all the complainers for forcing me to do double the work, not to mention all the effort I’m having to put into complaining.
Some have complained that the link collects the information without warning or explaining first.
But any site you visit could have done this without even telling you. Unless you routinely disable JavaScript and other browser functionality, why be more distrusting of Freedom to Tinker than any other site you might visit?
Browsers (supposedly) enforce a strict sandbox, other than certain selected, non-identifying, non-private system information which web pages can access. I think when there is a platform deliberately designed to be a sandbox, the normal expectation is that it is fair game for applications to collect whatever information the system (browser in this case) will allow.
In contrast, when there is a platform which does not enforce full security sandboxes, like most PC desktop environments (even running as a non-administrator/non-root account, every app can still access all of your user account’s data), there is an expectation that applications behave in certain ways and require explicit user permission to access and modify some data.
If there is a consensus that web pages have access to too much data–then the proper response is not to play whack-a-mole and condemn every individual web site that attempts to collect it–but instead to urge browser makers to tighten default security (such as not passing referrer data, and not allowing cookies to be accessed except for explicitly typed URLs or local bookmarks), even if it breaks some web sites.
I was going to say something along the same lines. All Javascript based tracking and analytics (eg Google Analytics, Omniture) work invisibly and capture the same data submitted here. If this site had GA installed like many other blogs they’d already be capturing the raw data (minus the hash).
it’s different in a research context. If you don’t understand that, you don’t understand human subjects and IRBs.
Participants didn’t necessarily see this page nor the “Thanks for participating” page. It may be rendered inside a zero-height, zero-width iframe inside some blog causing data to be collected via XSRF. So many teaching opportunities, that I willing gave my data anyway 😉
Lynx just took me to a page that said “Thanks for participating! “. I allowed cookies.
The JS looped apparently forever (until the browser timed it out and killed it) on Konqueror 3.5.9.
Do you have approval from Princeton’s IRB for this research? I find that many folks who deal primarily with technical issues (especially computer scientists) don’t feel they need to deal with IRBs, informed consent, etc. for data gathering such as this, or for user studies, testing of prototypes. Of course, that’s a violation of Federal Law, not to mention common standards of research ethics.
ok… so it looks like you’re counting md5 hashes of a unique string that you build on the client side (presumably to avoid transmitting this information back to your server).
That’s interesting… I can imagine you can enumerate (maybe) all the unique strings and compare their hashes to the ones you’re collecting in order to get information more interesting than just a count of hashes. I was more concerned about personally-identifiable information — or things close to that — like IP address… although you do seem to add MAC addresses to the string for windows boxes (`SELECT * FROM Win32_NetworkAdapterConfiguration`).
Seems like you could have had a splash page that explained the project and risks associated with participation, calculated the unique string, calculated the hash, displayed both of these things to the user (along with linked explanations of each element) emphasizing that no data has been submitted to your servers yet and then offered a link that would transmit the hash (and in such a way that users know that only the hash is being sent). Clicking the link would then act as an acknowledgment to participate.
Maybe you thought that this would affect your results enough that you got exemption from informed consent protocols from the Princeton IRB?
When I clicked on the link I expected to see a page that described the project in further detail and provided additional information about exactly what would be installed on my system. Instead I discovered that I had already become a participant in the survey.
Freedom to Tinker readers are probably a bit more security savvy than the average bear and would probably choose not to click on questionable links. Most of us probably assume, based on the content of the blog, that we won’t get sucker-punched here. I guess that’s what we call the Circle of Trust. Is that what this survey is *really* about?
You can read all the Javascript (which lists all the data collected) here: http://scoop.princeton.edu/scoop.js (it also uses a separate md5 library).
But I agree, it’s a bit alarming to see it having already taken the data when you click the link, without an additional confirmation.
Yeah, from a research ethics point of view, I expected an informed consent statement or a debrief. I expected to see a statement about what data is collected, how it would be used and how it would be stored… along with an impact statement that examined the consequences of a breach. For example, if this database were to fall in the hands of Satan (or choose your favorite bad guy), what would be my exposure?
You make me nervous… it’s sneaky to put it at princeton.edu as if it makes you more trustworthy. But I read this blog because I trust it, so whatever.
Of course, my first thought is there’s a certain selection bias just in the fact that you’re mentioning it on here, as you tend to attract a particular crowd. Which then gets modified by the comments above, as I expect the crowd you attract is likely to be among the more paranoid browsers.
I’m even paranoid of clicking the link. What if that’s the real test. To see how many people will follow a blind link that could lead them heaven’s knows where…
Well, I’m only paranoid enough to think of it. I will actually click it and help out.
no informed consent?
The truly secure (paranoid?) surf with javascript disabled.
Is it helpful to the survey if users whitelist the survey site and permit the cookie? Or would that act intrinsically skew your data?
At least such NoScript users are already well accustomed to spending more than one click on a website. 🙂