In this second installment of the “No Boundaries” series, we show how a long-known vulnerability in browsers’ built-in password managers is abused by third-party scripts for tracking on more than a thousand sites.
by Gunes Acar, Steven Englehardt, and Arvind Narayanan
We show how third-party scripts exploit browsers’ built-in login managers (also called password managers) to retrieve and exfiltrate user identifiers without user awareness. To the best of our knowledge, our research is the first to show that login managers are being abused by third-party scripts for the purposes of web tracking.
The underlying vulnerability of login managers to credential theft has been known for years. Much of the past discussion has focused on password exfiltration by malicious scripts through cross-site scripting (XSS) attacks. Fortunately, we haven’t found password theft on the 50,000 sites that we analyzed. Instead, we found tracking scripts embedded by the first party abusing the same technique to extract emails addresses for building tracking identifiers.
The image above shows the process. First, a user fills out a login form on the page and asks the browser to save the login. The tracking script is not present on the login page [1]. Then, the user visits another page on the same website which includes the third-party tracking script. The tracking script inserts an invisible login form, which is automatically filled in by the browser’s login manager. The third-party script retrieves the user’s email address by reading the populated form and sends the email hashes to third-party servers.
You can test the attack yourself on our live demo page.
We found two scripts using this technique to extract email addresses from login managers on the websites which embed them. These addresses are then hashed and sent to one or more third-party servers. These scripts were present on 1110 of the Alexa top 1 million sites. The process of detecting these scripts is described in our measurement methodology in the Appendix 1. We provide a brief analysis of each script in the sections below.
Why does the attack work? All major browsers have built-in login managers that save and automatically fill in username and password data to make the login experience more seamless. The set of heuristics used to determine which login forms will be autofilled varies by browser, but the basic requirement is that a username and password field be available.
Login form autofilling in general doesn’t require user interaction; all of the major browsers will autofill the username (often an email address) immediately, regardless of the visibility of the form. Chrome doesn’t autofill the password field until the user clicks or touches anywhere on the page. Other browsers we tested [2] don’t require user interaction to autofill password fields.
Thus, third-party javascript can retrieve the saved credentials by creating a form with the username and password fields, which will then be autofilled by the login manager.
Why collect hashes of email addresses? Email addresses are unique and persistent, and thus the hash of an email address is an excellent tracking identifier. A user’s email address will almost never change — clearing cookies, using private browsing mode, or switching devices won’t prevent tracking. The hash of an email address can be used to connect the pieces of an online profile scattered across different browsers, devices, and mobile apps. It can also serve as a link between browsing history profiles before and after cookie clears. In a previous blog post on email tracking, we described in detail why a hashed email address is not an anonymous identifier.
Scripts exploiting browser login managers
List of sites embedding scripts that abuse login manager for tracking
“Smart Advertising Performance” and “Big Data Marketing” are the taglines used by the two companies who own the scripts that abuse login managers to extract email addresses. We have manually analyzed the scripts that contained the attack code and verified the attack steps described above. The snippets from the two scripts are given in Appendix 2.
Adthink (audienceinsights.net): After injecting an invisible form and reading the email address, Adthink script sends MD5, SHA1 and SHA256 hashes of the email address to its server (secure.audienceinsights.net). Adthink then triggers another request containing the MD5 hash of the email to data broker Acxiom (p-eu.acxiom-online.com).
The Adthink script contains very detailed categories for personal, financial, physical traits, as well as intents, interests and demographics. It is hard to comment on the exact use of these categories but it gives a glimpse of what our online profiles are made up of:
birth date, age, gender, nationality, height, weight, BMI (body mass index), hair_color (black, brown, blond, auburn, chestnut, red, gray, white), eye_color (amber, blue, brown, grey, green), education, occupation, net_income, raw_income, relationship states, seek_for_gender (m, f, transman, transwoman, couple), pets, location (postcode, town, state, country), loan (type, amount, duration, overindebted), insurance (car, motorbike, home, pet, health, life), card_risk (chargeback, fraud_attempt), has_car(make, model, type, registration, model year, fuel type), tobacco, alcohol, travel (from, to, departure, return), car_hire_driver_age, hotel_stars |
OnAudience (behavioralengine.com): The OnAudience script is most commonly present on Polish websites, including newspapers, ISPs and online retailers. 45 of the 63 sites that contain OnAudience script have “.pl” country code top-level domain.
The script sends the MD5 hash of the email back to its server after reading it through the login manager. OnAudience script also collects browser features including plugins, MIME types, screen dimensions, language, timezone information, user agent string, OS and CPU information. The script then generates a hash based on this browser fingerprint. OnAudience claims to use anonymous data only, but hashed email addresses are not anonymous. If an attacker wants to determine whether a user is in the dataset, they can simply hash the user’s email address and search for records associated with that hash. For a more detailed discussion, see our previous blog post.
Is this attack new? This and similar attacks have been discussed in a number of browser bug reports and academic papers for at least 11 years. Much of the previous discussion focuses on the security implications of the current functionality, and on the security-usability tradeoff of the autofill functionality.
Several researchers showed that it is possible to steal passwords from login managers through cross-site scripting (XSS) attacks [3,4,5,6,7]. Login managers and XSS is a dangerous mixture for two reasons: 1) passwords retrieved by XSS can have more devastating effects compared to cookie theft, as users commonly reuse passwords across different sites; 2) login managers extend the attack surface for the password theft, as an XSS attack can steal passwords on any page within a site, even those which don’t contain a login form.
How did we get here? You may wonder how a security vulnerability persisted for 11 years. That’s because from a narrow browser security perspective, there is no vulnerability, and everything is working as intended. Let us explain.
The web’s security rests on the Same Origin Policy. In this model, scripts and content from different origins (roughly, domains or websites) are treated as mutually untrusting, and the browser protects them from interfering with each other. However, if a publisher directly embeds a third-party script, rather than isolating it in an iframe, the script is treated as coming from the publisher’s origin. Thus, the publisher (and its users) entirely lose the protections of the same origin policy, and there is nothing preventing the script from exfiltrating sensitive information. Sadly, direct embedding is common — and, in fact, the default — which also explains why the vulnerabilities we exposed in our previous post were possible.
This model is a poor fit for reality. Publishers neither completely trust nor completely mistrust third parties, and thus neither of the two options (iframe sandboxing and direct embedding) is a good fit: one limits functionality and the other is a privacy nightmare. We’ve found repeatedly through our research that third parties are quite opaque about the behavior of their scripts, and at any rate, most publishers don’t have the time or technical knowhow to evaluate them. Thus, we’re stuck with this uneasy relationship between publishers and third parties for the foreseeable future.
The browser vendor’s dilemma. It is clear that the Same-Origin Policy is a poor fit for trust relationships on the web today, and that other security defenses would help. But there is another dilemma for browser vendors: should they defend against this and other similar vulnerabilities, or view it as the publisher’s fault for embedding the third party at all?
There are good arguments for both views. Currently browser vendors seem to adopt the latter for the login manager issue, viewing it as the publisher’s burden. In general, there is no principled way to defend against third parties that are present on some pages on a site from accessing sensitive data on other pages of the same site. For example, if a user simultaneously has two tabs from the same site open — one containing a login form but no third party, and vice versa — then the third-party script can “reach across” browser tabs and exfiltrate the login information under certain circumstances. By embedding a third party anywhere on its site, the publisher signals that it completely trusts the third party.
Yet, in other cases, browser vendors have chosen to adopt defenses even if necessarily imperfect. For example, the HTTPOnly cookie attribute was introduced to limit the impact of XSS attacks by blocking the script access to security critical cookies.
There is another relevant factor: our discovery means that autofill is not just a security vulnerability but also a privacy threat. While the security community strongly prefers principled solutions whenever possible, when it comes to web tracking, we have generally been willing to embrace more heuristic defenses such as blocklists.
Countermeasures. Publishers, users, and browser vendors can all take steps to prevent autofill data exfiltration. We discuss each in turn.
Publishers can isolate login forms by putting them on a separate subdomain, which prevents autofill from working on non-login pages. This does have drawbacks including an increase in engineering complexity. Alternately they could isolate third parties using frameworks like Safeframe. Safeframe makes it easier for the publisher scripts and iframed scripts to communicate, thus blunting the effect of sandboxing. Any such technique requires additional engineering by the publisher compared to simply dropping a third-party script into the web page.
Users can install ad blockers or tracking protection extensions to prevent tracking by invasive third-party scripts. The domains used to serve the two scripts (behavioralengine.com and audienceinsights.net) are blocked by the EasyPrivacy blocklist.
Now we turn to browsers. The simplest defense is to allow users to disable login autofill. For instance, the Firefox preference signon.autofillForms
can be set to false to disable autofilling of credentials.
A less crude defense is to require user interaction before autofilling login forms. Browser vendors have been reluctant to do this because of the usability overhead, but given the evidence of autofill abuse in the wild, this overhead might be justifiable.
The upcoming W3C Credential Management API requires browsers to display a notification when user credentials are provided to a page [8]. Browsers may display the same notification when login information is autofilled by the built-in login managers. Displays of this type won’t directly prevent abuse, but they make attacks more visible to publishers and privacy-conscious users.
Finally, the “writeonly form fields” idea can be a promising direction to secure login forms in general. The briefly discussed proposal defines ways to deny read access to form elements and suggests the use of placeholder nonces to protect autofilled credentials [9].
Conclusion
Built-in login managers have a positive effect on web security: they curtail password reuse by making it easy to use complex passwords, and they make phishing attacks are harder to mount. Yet, browser vendors should reconsider allowing stealthy access to autofilled login forms in the light of our findings. More generally, for every browser feature, browser developers and standard bodies should consider how it might be abused by untrustworthy third-party scripts.
End notes:
[1] We found that login pages contain 25% fewer third-parties compared to pages without login forms. The analysis was based on our crawl of 300,000 pages from 50,000 sites.
[2] We tested the following browsers: Firefox, Chrome, Internet Explorer, Edge, Safari.
[3] https://labs.neohapsis.com/2012/04/25/abusing-password-managers-with-xss/
[4] https://www.honoki.net/2014/05/grab-password-with-xss/
[5] https://web.archive.org/web/20150131032001/http://ha.ckers.org:80/blog/20060821/stealing-user-information-via-automatic-form-filling/
[6] http://www.martani.net/2009/08/xss-steal-passwords-using-javascript.html
[7] https://ancat.github.io/xss/2017/01/08/stealing-plaintext-passwords.html
[8] “User agents MUST notify users when credentials are provided to an origin. This could take the form of an icon in the address bar, or some similar location.” https://w3c.github.io/webappsec-credential-management/#user-mediation-requirement
[9] Originally proposed in https://www.ben-stock.de/wp-content/uploads/asiacss2014.pdf
[10] https://jacob.hoffman-andrews.com/README/2017/01/15/how-not-to-get-phished.html
APPENDICES
Appendix 1 – Methodology
To study password manager abuse, we extended OpenWPM to simulate a user with saved login credentials and added instrumentation to monitor form access. We used Firefox’s nsILoginManager interface to add login credentials as if they were previously stored by the user. We did not otherwise alter the functionality of the password manager or attempt to manually fill login forms. This allowed us to capture actual abuses of the browser login manager, as any exfiltrated data must have originated from the login manager.
We crawled 50,000 sites from the Alexa top 1 million. We used the following sampling strategy: visit all of the top 15,000 sites, randomly sample 15,000 sites from the Alexa rank range [15,000 100,000), and randomly sample 20,000 sites from the range [100,000, 1,000,000). This combination allowed us to observe the attacks on both high and low traffic sites. On each of these 50,000 sites we visited 6 pages: the front page and a set of 5 other pages randomly sampled from the internal links on the front page.
The fake login credentials acted as bait, allowing us to introduce an email and password to the page that could be collected by third parties without any additional interaction. Detection of email address collection was done by inspecting JavaScript calls related to form creation and access, and by the analysis of the HTTP traffic. Specifically, we used the following instrumentation:
- Mutation events to monitor elements inserted to the page DOM. This allowed us to detect the injection of fake login forms. When a mutation event fires, we record the current call stack and serialize the inserted HTML elements.
- Instrument HTMLInputElement to intercept access to form input fields. We log the input field value that is being read to detect when the bait email (autofilled by the built-in password manager) was sniffed.
- Store HTTP request and response data, including POST payloads to detect the exfiltration of the email address or password.
For both JavaScript (1, 2) and HTTP instrumentation (3) we store JavaScript stack traces at the time of the function call or the HTTP request. We then parse the stack trace to pin down the initiators of an HTTP request or the parties responsible for inserting or accessing a form.
We then combine the instrumentation data to select scripts that:
- inject an HTML element containing a password field (recall that the password field is necessary for the built-in password manager to kick in)
- read the email address from the input field automatically filled by the browser’s login manager
- send the email address, or a hash of it, over HTTP
To verify the findings of the automated experiments we manually analyzed sites that embed the two scripts that match these conditions. We have verified that the forms that the scripts inserted were not visible. We then opened accounts on the sites that allow registration and let the browser store the login information (by clicking yes to the dialog in Figure 1). We then visited another page on the site and verified that browser password manager filled the invisible form injected by the scripts.
Appendix 2 – Code Snippets
Has this autofill vulnerability been addressed by the browser manufacturers yet? Anyone know?
Safari “[d]isabled Automatic AutoFill of user names and passwords at page load to prevent sharing information without user consent” in Technology Preview 48, released just two days ago:
https://webkit.org/blog/8084/release-notes-for-safari-technology-preview-48/
(We have not had a chance to test this release yet)
Chrome and Firefox are also considering deploying a fix:
https://bugs.chromium.org/p/chromium/issues/detail?id=798492
https://bugzilla.mozilla.org/show_bug.cgi?id=1427543
I’m not a web-site developer so excuse a my web-dev ignorance, but this scenario stuck me as a potential for exploitation.
If, through this exploit method described, would it be possible for a site who has embedded (say) Paypal onto their web-page (via a hidden iFrame or a window.open method where the source of the child window is manipulated somehow) to then be able to grab your paypal login credentials since the parent window is able to read (and modify ??) elements of the child iFrame or child window?
I understand the PayPal detects that it is running in an iFrame, and blocks rendering of any content. But does the same hold for a Child Window and other services
See http://www.dyn-web.com/tutorials/iframes/refs/iframe.php which demonstrates controlling an iFrame from a parent window.
Hi AConcernedAussie,
That’d be a devastating attack but access to cross-origin frames will be prevented by the same-origin policy.
The post you linked to describes access to frames from the same origin.
Thanks Gunes
My instincts were telling me what you have said, but thought it worth asking 🙂
The link was for my benefit more than anyone elses, when I started looking at the possible broader outcomes
Hi,
My website is listed as a website using audienceinsights js script… but I find no trace of this script in my code source.
So I do not understand how you have found this script on my website and why he is listed in your article…
Could you please contact me and explain?
Maybe there is another js using this script but I am not able to find it.
Thanks for your help
David
Hi David,
Your site embeds media-clic.com, which loads a script from themoneytizer.com, which finally loads audienceinsights.net:
https://www.excel-downloads.com/
->
https://pub3.media-clic.com/www/delivery/asyncjs.js
->
https://ads.themoneytizer.com/s/requestform.js?siteId=10013&formatId=1
->
https://static.audienceinsights.net/t.js
Your site (excel-downloads.com) was crawled on 2017-09-13.
Thanks for your fine efforts on this subject.
Are there any legal minds in the group who can explain how this scripting and password capturing is NOT a violation of telecommunications intercept laws? Did we give away those privacy rights in a user agreement somewhere?
https://www.law.cornell.edu/uscode/text/18/2511
KeePass combined with the extension can autofill the password if the autofill is enabled (configurable through the extension).
For Gecko-based browsers, there is signon.autofillForms option in about_config that can prevent the attack.
I am bit Confused here, Is it possible to extract and send all stored credentials within the browser’s password manager OR only the credentials for that particular site(In which the tracking script is not present on the login page) which user has visited can be possible?
Thanks
Ankur
Hi Ankur,
Only the credentials saved for that particular domain can be extracted.
This is just one way you can be tracked online by third-party services. Plugging this “hole” wouldn’t do much to stop the tracking.
The standard way to integrate with a third-party network is for the site owner to place HTML code (either JavaScript or even an IMG tag) that sends the site’s own cookie ID for a user to the third party. The site can via a server-side API call, or via an offline batch process, send all user information associated with that cookie ID to the third party. The third-party code doesn’t need to “scrape” anything for this to work.
Of course, this requires cooperation of the primary website. But any site including tracking code is already working with the tracker.
Fixing this would also protect against malicious scripts that exploit the same vulnerability through XSS attacks or malvertising campaigns. The server-side communication is possible, but costly compared to including one line of JavaScript.
Keeper has written a response to this issue on our blog:
https://blog.keepersecurity.com/2018/01/02/response-princetons-center-information-technology-policy-article/
Per the 1Password comment above, would something like LastPass, with autofill enabled, also be vulnerable? I am guessing it would be.
I’ve tried the demo, but after sending the fake e-mail/password I’ve got an error page (404) only…
At the next try it was working, but the result is two question marks 🙂
(I use Firefox with NoScript addon)
But you’re right, if I allow running JS from rawgit, then it can steal my e-mail-password couple.
It is a little bit scarie… 🙁
I use LastPass on Firefox but I also got 2 question marks – even with auto fill enabled.
Have I done something wrong?
Disclosure: I work for AgileBits, the makers of 1Password.
1Password is not vulnerable to this attack specifically because we have never allowed for “automatic autofill”. (Despite strong user request for such a behavior.) 1Password will automatically fill a form on the user’s command, but never without some user action.
We’ve required user action precisely because we consider the web page to be a very hostile environment. What David Silver refers to as “sweep attacks” have been known about both in theory and practice for quite some time. But even prior to learning of those, we felt that user action should be required.
Here is something I wrote in 2014 in response to some of the many customer requests for more automated behavior.
https://discussions.agilebits.com/discussion/comment/153916/#Comment_153916
That was what I thought when I read about this on BGR, but they specifically called out 1Password and LastPass browser plugins:
“To quickly fill in usernames and passwords saved in a password management app like 1Password and LastPass, you have probably installed browser addons. It’s those tiny browser apps that are targeted by scripts.“
Agile Bits also addressed this directly on their blog:
https://blog.agilebits.com/2017/12/30/1password-keeps-you-safe-by-keeping-you-in-the-loop/
If the 3rd party script is on a login page (which happens less according to the analysis, but still happens), a user might use the 1Password keyboard shortcut to fill in the legitimate form. Will 1Password fill in the credentials into the fake form too?
1 password does indeed autofill some sites I go to, including banking sites. I don’t understand Goldberg’s comment
You must have had some other password manager (most likely that built-in to the browser) save those passwords if they’re being filled with no interaction on your part. 1Password indeed does not fill anything until you ask it to. It has no option to autofill a password upon page load.
Here, firefox addon Privacy Badger (PB) immediately flagged rawgit.com as a tracker and blocked the sniffer script. Probably it was known before? Sadly, it is not really an option to recommend PB: users do enjoy the faster page-loads, but when a site breaks, and that can happen, they are clueless at first, and then annoyed by the fact, that PB cannot read their minds and so they have to manage something; even though the PB interface is quite easy to use, IMO.
Just to clarify, we use RawGit (rawgit.com) to serve code from GitHub with the right content type.
https://github.com/rgrove/rawgit/blob/master/FAQ.md
RawGit is not responsible for any of the code that they seem to serve. Privacy Badger must have flagged it perhaps because some other sites embed code from GitHub through rawgit.com.
Anything that adds usability overhead to password manager auto-fill feels like a challenging proposal. (And user opt-out is always a relatively ineffective control to mitigate systemic issues like this.)
But what about auto-applying the write-only property to a form as soon as it’s been auto-filled? In other words, once the browser has auto-filled a field, the field is considered to be in a locked-down state with no further DOM access. That could create some publisher pain for those who are using JS to access the email field in legit ways to instrument a better login form, but that would be putting the burden on a small class of websites, and not on users using auto-fill.
That sounds like an interesting idea to explore. One can imagine autofilled credentials are not needed to be checked for password strength or duplicate usernames – common cases of legit script access to login forms. Still, one needs telemetry or web measurement data to back this up.
The question is whether browsers will ever ship write-only elements or similar protections 🙂