In this installment of the “No Boundaries” series we show how wholesale collection of user interactions by third-party analytics and session replay scripts cause inadvertent collection of passwords.
By Steve Englehardt, Gunes Acar and Arvind Narayanan
Following the recent report that Mixpanel, a popular analytics provider, had been inadvertently collecting passwords that users typed into websites, we took a deeper look [1]. While Mixpanel characterized it as a “bug, plain and simple” — one that it had fixed — we found that:
- Mixpanel continues to grab passwords on some sites, even with the patched version of its code.
- The problem is not limited to Mixpanel; also affected are session replay scripts, which we revealed earlier to be scooping up various other types of sensitive information.
- There is no foolproof way for these third party scripts to prevent password collection, given their intended functionality. In some cases, password collection happens due to extremely subtle interactions between code from different entities.
Overall, we think that the approach of third-party scripts collecting the entirety of web pages or form inputs, and attempting to filter out sensitive information is incompatible with user security and privacy.
Password leaks are not limited to Mixpanel
In our research we found password leaks to four different third-party analytics providers across a number of websites. The sources are numerous: several variants of a “Show Password” feature added by site owners, an unexpected interaction with an unrelated third-party script, unanticipated changes to page structure by browser extensions, and even a bug in privacy tools of one of the analytics libraries. However, the underlying cause is the same: wholesale collection of user input data, with protection provided by set of blacklist-based heuristics to filter password fields. We argue that this heuristic approach is bound to fail, and provide a list of examples in which it does.
This summary is provided not as an exhaustive list of all possible vulnerabilities, but rather as examples of how things can go wrong. A detailed description of each vulnerability and the vendor response is available in the Appendix.
Show Password features place passwords in unprotected fields
Many websites implement mobile-friendly password visibility toggles which make it possible to “unmask” the entered password and check it for errors. In order to implement this, the user’s password must be placed in a field that doesn’t have its “type” property set to “password”, since browsers will automatically mask any text entered into those fields. We found leaks due to several variations of this feature, including to Mixpanel (A.1), to FullStory (A.2), and to SessionCam (A.3).
Video showing the inadvertent password collection by Mixpanel:
Subtle interactions with unrelated scripts can lead to leaks
Sites may include a number of third-party scripts which alter or annotate the page in unexpected ways. In the example given in (A.4), a third-party analytics script from Adobe stored the typed password in a cookie when the password field clicked. On the same page, session replay script from Userreplay collects all cookies, effectively causing the password leak to Userreplay.
Bugs in analytics scripts can lead to password leaks that go unnoticed
Even in cases where publisher sites take an active role to prevent leaks to third parties, things can go wrong. Passwords were leaking to FullStory (A.5) due to a bug in one of their redaction features. The feature was implemented in a such a way that, when applied to a password input, the password would leak to FullStory, but would not be displayed in the resulting session recording that the publisher could later review. Thus, it would be difficult for a publisher to discover the leak.
Browser extensions can alter the page in a way that leads to leaks
In their announcement, Mixpanel explained that “[the password leak] could happen in other scenarios where browser plugins (such as the 1Password password manager) and website frameworks place sensitive data into form element attributes.” The problem is not limited to a small set of browser extensions, but rather any extension which alters the page structure. Neither the site owner nor the analytics provider can be expected to anticipate all possible structural changes an extension might perform.
As an example, we examined browser extensions which automatically make password fields visible to the user. There are a ton of such extensions, which are collectively used by 120,000+ users. We found that users of the Unmask Password and Show Password Chrome extensions would, on some sites, have their passwords leaked to Mixpanel (A.6) and FullStory (A.7) respectively. In both cases the leaks were caused by a variant of the “Show Password” feature described above.
A better look at the Mixpanel’s Autotrack
The Autotrack feature allows sites to collect interaction events on a website, like clicks or form interactions, without needing to specify which elements to monitor. The automated collection of all interactions, including values entered into input fields, is the service’s main selling point: if a site owner ever wants to start monitoring a new input field, Mixpanel will already have the complete history of the various inputs provided by the visitors for this field.
Autotrack doesn’t appear to be designed to collect sensitive data. Instead, Mixpanel suggests sites can use the service to perform benign analytical tasks, like finding the commonly used search terms. However, sites collect all types of sensitive data through input elements: usernames and passwords, health information, banking information, and so on. Automatically determining which fields are sensitive is a difficult task, and an incorrect classification runs the risk of scooping up the sensitive user data — even if the user never submits the form [6].
Similar to the session replay scripts we studied in the past, Mixpanel implements several heuristics [7] in attempt to automatically exclude specific types of sensitive information from collection. The rules most relevant to password fields are: remove any input field which is of the password type or has a name property (or “id” property, if name does not exist) that contains the substring “pass”. Mixpanel also offers sites the ability to manually exclude parts of the page, which we discuss later in the post.
Not a bug, but a broken design
Mixpanel attributed the cause of their previous password leak to a change in another third-party library. The third-party library “placed copies of the values of hidden and password fields into the input elements’ attributes, which Autotrack then inadvertently received”. Indeed, the previous version of Autotrack only filtered password fields’ “value” property, which stores the password entered by the user. The attributes of the field, which can be used to add arbitrary metadata about the password field is left unfiltered. The fix deployed by Mixpanel changed this, filtering both the value property and all attributes from password fields. This plugs that specific hole, but as the Testbook example (A.1) shows, sites may handle sensitive data in other ways Mixpanel didn’t predict.
“Show Password” feature is commonly implemented by switching the “type” property of the password input field from “password” to “text”.
The majority of the leaks examined above were caused by the mobile-friendly “Show Password” option. This feature is commonly implemented by switching the “type” property of the password input field from “password” to “text” (see: 1, 2, 3). The feature can be implemented by the first party directly or by a browser extension. In fact, this is how the Unmask Password extension is implemented, and was the cause of the ECRent password leak (A.6). The third-party scripts we studied don’t filter generic text fields as strictly as password fields. This is also the primary cause of the leaks on Testbook (A.1), PropellerAds (A.2), and johnlewis.com (A.3).
It may be tempting to filter passwords stored in generic text fields based on other properties of the input field, such as the name, class, or id. Mixpanel’s relatively simple implementation of this filtering [8] provides a perfect case study of this mitigation: it excludes generic text fields which contain the substring “pass” in their “name” attribute (or “id”, if “name” does not exist). Using our crawl data we found that 15% of the 36,972 total password fields discovered will not match this substring filter. Indeed, the third most frequent password field name attribute “pwd” would be missed, as will common translations of “password”. A word cloud of the 50 most commonly missed terms is given in footnote [9].
Of course Mixpanel’s heuristic could be updated to include these new fields, but that would just continue the game of whack-a-mole. There will inevitably be another password field formatted or handled in a way that this new heuristic fails to handle and user passwords will continue leaking.
Manual redaction is not a silver bullet
Mixpanel offers developers a way to manually specify fields that should be excluded from collection. Developers can simply add the class `mp-no-track` or `mp-sensitive` to an element to prevent sensitive information leaks. Indeed, this was the solution Mixpanel recommended in their response to our disclosure [3]. At first glance this might seem to mitigate the problems outlined in this post — anything missed by the automatic filtering can simply be manually filtered by the site. However, our research into session replay scripts found that companies repeatedly failed to prevent data leaks through manual redaction.
In Mixpanel’s case, redaction feature is will negate the main benefit of Autotrack – collection without a manual review of the fields. We signed up for a Mixpanel account and enabled Autotrack in the dashboard [10]. During the process, we didn’t see any warnings about the risks of Autotrack, nor could we find a way to review all of the collected data. To discover the collected passwords, we needed to manually add an “event probe” to the password field or login form. This may explain why it took over 9 months for a Mixpanel user to discover the inadvertent password collection introduced back in March 2017, despite its presence on 4% of Mixpanel’s projects.
Where to go from here?
We focus on password fields in this post because they are the most constrained user input and should be the easiest to redact. Despite this, several of the major input scraping scripts are still unable to prevent password leaks. Financial, health, and other sensitive data are often collected using generic “text” fields which may have ambiguous input labels. We expect them even more difficult to filter in an automated way.
We show that the indiscriminate collection of form data is a security disaster waiting to happen. We’ve highlighted these risks before, and the analysis included in this post shows that the problem persists. We don’t view these issues as bugs to be fixed, but rather vulnerabilities inherent to this class of analytics scripts.
Appendix: Password leaks and disclosures
A1. Mixpanel continues to unintentionally collect passwords
The Autotrack feature allows sites to collect analytics on form interactions such as when checking out products or signing in to your account. Mixpanel Autotrack normally tries to exclude password fields from the collected data, but the filtering relies on fragile assumptions about page composition and markup.
Fig 1. Password collection by Mixpanel on testbook.com.
One of the password leaks occurs on testbook.com’s [2] login form when a user makes use of the “Show Password” feature, which causes the user’s password to be displayed in cleartext. Once the user takes a further action, such as editing the password or hiding it again, the password will be collected by Mixpanel. The collection happens regardless of whether the user ultimately submits the login form.
We reported the issue to Mixpanel. Their response can be found in [3].
Mixpanel currently lists the Autotrack feature as “on hold”, and appears to have disabled it for new projects. But sites that were already using Autotrack at the time of the incident are not affected by this change.
A2. Password collection due to a “Show password” feature: The PropellerAds’ login form contains an invisible text field, which also holds a copy of the typed password. When a user wants to display the password in cleartext, the password field is replaced by the text field. FullStory’s auto exclusion filters fail to recognize the password in the text field, which causes the password to be collected by FullStory.
We reported this issue to FullStory and PropellerAds. FullStory responded promptly and said that “[they] are in touch with the customer to ensure that all inappropriate data is deleted and that they update their exclusion rules to comply with our Acceptable Use Policy.” FullStory’s complete response can be found in [4].
Fig 2. Password is collected on PropellerAds’ login form by FullStory.
A3. Johnlewis leaks
The registration page of the johnlewis.com website implements the “Show password” feature in the same way as PropellerAds: a copy of the password is always stored in an invisible text field, which replaces the password field when users want to show their password. This time session replay script from sessioncam.com fails to filter out passwords in the text field and sends it to its servers. We reported the issue to SessionCam and johnlewis.com. Response from Sessioncam can be found in [5].
On both of these cases (johnlewis.com and propellerads.com) password is grabbed by the session replay scripts even if the user does not make use of the Show/Hide password feature.
Fig 3. Password leaks to Sessioncam on johnlewis.com.
A4. Password leaks due to interaction with other analytics scripts.
Passwords on Capella University’s admission login page leaks due to an unexpected interaction of different third-party scripts. When a user clicks on the password field, Adobe’s Analytics ActivityMap script stores the password in a cookie called “s_sq”. On the same page, session replay script from Userreplay collects and sends all cookies, which effectively cause passwords to be collected by Userreplay.
Fig 4. The password stored in the cookies are leaked to Userreplay on Capella.edu website.
A5. Password leaks due to a bug in redaction features
Fig 5: The WP Engine login page leaks passwords via FullStory’s keystroke logger. We decode the keyCode values sent to FullStory to demonstrate that they match what was typed into the password field.
In the screenshot above, we demonstrate how passwords entered on WP Engine’s login page leak to FullStory via their keystroke logger. From what we can tell, this leak is the result of WP Engine taking proactive steps to protect user data. Rather than relying on FullStory’s automatic exclusion of password fields, WP Engine added FullStory’s manual redaction tag (i.e., `fs-hide`) to the field.
FullStory’s documentation explains that fields hidden with the `fs-hide` tag will not be visible in recordings, and that “some raw input events (e.g., key and click events) are also redacted when they relate to an excluded element.” Through manual debugging, we observed that the use of `fs-hide` tag changed the way FullStory’s script classifies the password field, eventually causing it to collect the password. Following our disclosure, FullStory promptly fixed the issue and released a security notice to their customers, stating that the bug affected less than 1% of sites.
A6. Unmask Password browser extension causes leaks on ECRent
Fig 6. Mixpanel will collect passwords on ECrent registration page when the Unmask Password Chrome extension is in use. The password leaks in a base64 encoded query string to Mixpanel. ECrent is a rental platform that was in the Alexa top 10,000 sites at the time of measurement.
Mixpanel’s response to this issue was as the follows: “Unfortunately, there’s little we (or the website owners who are our customers) can do to detect if the end user modifies their browser to make unexpected changes to the DOM. In this case, the solution is the same as for the concerns you reported earlier: explicitly blacklist sensitive input fields using the mechanism we provide.”
A7. Show Password causes leaks on Lenovo
Fig 7. FullStory will inadvertently collect passwords when the Show Password Chrome extension is in use. Similar to the Propeller Ads example above, this extension implements the show password functionality by swapping the current password field with a new cleartext field. The new field is not excluded from FullStory’s recordings — any further edits to the cleartext will cause password to be collected by FullStory.
Endnotes
We thank Jonathan Mayer for his valuable feedback.
[1] Our rationale for publicizing these leaks is not to point fingers at specific first or third parties. Mixpanel, for example, handled their previous password incident quickly and with transparency. These aren’t bugs that need to be fixed, but rather insecure practices that should be stopped entirely. Even if the specific problems highlighted in this post were fixed, we suspect we’d be able to continue to find variants of the same leaks elsewhere. Thankfully these password leaks can’t be exploited publicly, since the analytics data is only available to first parties. Instead, these leaks expose users to an increased risk to data breaches, an increased potential for data access abuse, and to unclear policies regarding data retention and sharing.
[2] testbook.com is the 2360th most popular site globally according to Alexa; testbook.com mobile app has 1,000,000 – 5,000,000 downloads.
[3] Mixpanel’s complete response:
“Thank you for reporting this. Mixpanel takes security very seriously, and values the contributions of researchers to help our customers be more secure. Our Autotrack feature primarily relies upon the type of HTML input fields to determine whether it is sensitive or not. As you’ve noticed, it backstops that with a simple pattern to try to identify cases where a website is collecting sensitive information in non-password/hidden fields. Per our documentation, if a customer is collecting sensitive information in non-password fields, they should explicitly blacklist it for collection[.] ”
[4] FullStory’s responses:
- “Thank you for writing in. We are in touch with the customer to ensure that all inappropriate data is deleted and that they update their exclusion rules to comply with our Acceptable Use Policy.
Our Acceptable Use Policy makes clear in straightforward language that our goal is to avoid receiving sensitive data in the first place; we believe such data should never leave the end user’s device. Our engineering team is working on several techniques to do even more to help our customers avoid the sort of mistakes your research has highlighted. We welcome any and all additional effort that helps us protect our customers and their customers’ data.” - “Thanks for reaching back out and for sending over both disclosures. While a more complete reply is forthcoming, I wanted to send over a quick update.Our engineers have been actively looking into both disclosures throughout the morning. We expect to ship a code change to address the `.fs-hide` issue momentarily and we’ll be back in touch with a more detailed response to both disclosures after that code change ships.Again, we really appreciate your disclosing these issues to us.”
- “I wanted to follow up and let you know that we fixed the bug associated with the .fs-hide issue and the fix is currently in production as of 4:00 PM EST this afternoon. HTML elements containing the `.fs-hide` selector will no longer record keystrokes. Further, we changed the functionality of the recording code so that it will no longer record the actual keys alongside keystroke data. Thus, any future regression will not run the risk of subtly sending keystroke data to FullStory.
We have followed up with WPEngine and are working to follow up with any other customers who may have been affected by this particular issue. Thanks again for disclosing this issue to us. The level of detail in the disclosure was particularly helpful in bringing clarity to the diagnosis of the problem. Regarding the disclosure for the Show Password Chrome extension, we’re exploring mechanisms to mitigate this and other cases where sensitive fields are duplicated in the DOM.“
[5] SessionCam’s response: “Thank you for bringing this to our attention, we had been investigating this issue and are working on a fix. This fix will go live ASAP, in the meantime all affected sessions are being deleted. The pages in question are no longer live with SessionCam.”
[6] Through manual analysis, we found that Mixpanel sends user inputs on a field-by-field basis as soon as the field loses focus. This means that user data is sent to Mixpanel even when the user chooses not to submit a form.
[7] The most recent version of their heuristic, available on Mixpanel’s open source software repository, boils down to four main rules:
- Skip input elements with type “hidden” or “password”
- Skip input elements with a “name” attribute (or “id” attribute, if “name” does not exist) that matches the following regular expression after stripping non-alphanumeric characters:
- “sensitiveNameRegex = /^cc|cardnum|ccnum|creditcard|csc|cvc|cvv|exp|pass|seccode|securitycode|securitynum|socialsec|socsec|ssn/i;”
- Skip input element values that appear to be credit card numbers. This is done by checking if the value matches the following regular expression after stripping any spaces and dashes:
- “ccRegex = /^(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(6(?:011|5[0-9]{2})[0-9]{12})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0-9]{11})|((?:2131|1800|35[0-9]{3})[0-9]{11}))$/;”
- Skip input element values that appear to be Social Security Numbers. This is done by checking if the value matches the following regular expression:
- “ssnRegex = /(^\d{3}-?\d{2}-?\d{4}$)/;”
Input values that match these filters are excluded from collection by Mixpanel.
[8] Comparing Mixpanel’s password field detection heuristics to those of two popular password manager browser extensions (Lastpass and 1Password), we found that Mixpanel’s password detection heuristic is far less comprehensive compared to theirs. For instance, Lastpass and 1Password’s heuristics consider the translation of the word “password” in different languages such as “contraseña”, “passwort”, “mot de passe” or “密码”, when detecting password fields. This is true despite the incentives; if a password manager fails to detect a password field, it’s a usability problem; if Mixpanel fails to detect a password field, it’s a security problem.
[9] Wordcloud of 50 most common password field name/id attributes that don’t include the substring “pass”, and will thus not be excluded by Mixpanel. This data was collected from a crawl of ~50K sites (~300K page visits). “Undefined” stands for password fields without any name or id attributes. Many of these words are translations of the word “password”, such as: “senha”(Portugese), “sifre” (Turkish), or “kennwort” (German).
[10] Mixpanel’s dashboard for enabling Autotrack, before it was disabled for all new projects. Note that there are no warnings of possible sensitive data collection.
Credits: Wordcloud image is generated by https://worditout.com.
Is there any identifier in html or js code which could flag a website end USER if there is keylogging software or analytics scripts? Are there measures the user could take to confuse loggers, such as type into a notepad document between password keystrokes, or use the backspace key and do partial erasures?
Thanx for breaking prescient research! Please take care and take safety measures in real life, which I am sure you do, if you publicly take the lead in this sort of thing, because it is scary how it could threaten bad actors.
Sincerely,
Louise
“… we found that Mixpanel’s password detection heuristic is far less comprehensive compared to theirs…. This is true despite the incentives; if a password manager fails to detect a password field, it’s a usability problem; if Mixpanel fails to detect a password field, it’s a security problem.”
I’m not at all surprised that usability problems get more attention than this sort of hidden security problem. If a password manager misclassifies a password field, users will immediately notice, and some of them will file bug reports. If Mixpanel misclassifies a password field, probably nobody will notice for a long time.