by Steven Englehardt [0], Gunes Acar, and Arvind Narayanan
So far in the No boundaries series, we’ve uncovered how web trackers exfiltrate identifying information from web pages, browser password managers, and form inputs.
Today we report yet another type of surreptitious data collection by third-party scripts that we discovered: the exfiltration of personal identifiers from websites through “login with Facebook” and other such social login APIs. Specifically, we found two types of vulnerabilities [1]:
- seven third parties abuse websites’ access to Facebook user data
- one third party uses its own Facebook “application” to track users around the web.
Vulnerability 1: Third parties piggyback on Facebook access granted to websites
Facebook Login and other social login systems simplify the account creation process for users by decreasing the number of passwords to remember. But social login brings risks: Cambridge Analytica was found misusing user data collected by a Facebook quiz app which used the Login with Facebook feature. We’ve uncovered an additional risk: when a user grants a website access to their social media profile, they are not only trusting that website, but also third parties embedded on that site.
We found seven scripts collecting Facebook user data using the first party’s Facebook access [4]. These scripts are embedded on a total of 434 of the top 1 million sites (UPDATE: see clarification below). We detail how we discovered these scripts in Appendix 1 below. Most of them grab the user ID, and two grab additional profile information such as email and username. We are unable to determine whether first parties are aware of this particular data access [5].
The user ID collected through the Facebook API is specific to the website (or the “application” in Facebook’s terminology), which would limit the potential for cross-site tracking. But these app-scoped user IDs can be used to retrieve the global Facebook ID, user’s profile photo, and other public profile information, which can be used to identify and track users across websites and devices [6].
Company | Script Address | Facebook Data Collected |
OnAudience* | http://api.behavioralengine.com/scripts/be-init.js | User ID (hashed), Email (hashed), Gender |
Augur | https://cdn.augur.io/augur.min.js | Email, Username |
Lytics** | https://d3c…psk.cloudfront.net/opentag-54778-547608.js?_=[*] | User ID |
ntvk1.ru | https://p1.ntvk1.ru/nv.js | User ID |
ProPS^ | http://st-a.props.id/ai.js | User ID (has code to collect more) |
Tealium^ | http://tags.tiqcdn.com/utag/ipc/[*]/prod/utag.js | User ID |
Forter^ | https://cdn4.forter.com/script.js?sn=[*] | User ID |
* OnAudience stopped collecting this information after we released the results of a previous study in the No Boundaries series, which showed them abusing browser autofill to collect user email addresses.
**(Added 2018-04-22; 5:55pm): The script loaded by Opentag tag manager includes a code snippet which accesses the Facebook API and sends the user ID to an endpoint of the form https://c.lytics.io/c/1299?fbstatus=[...]&fbuid=[...]&[...]
This snippet appears to be a modified version of an example code snippet found on the lytics.github.io
website with the description “Capturing Facebook Events”. The example code appears to provide instructions for first parties to collect Facebook user ID and login status.
^ Although we observe these scripts query the Facebook API and save the user’s Facebook ID, we could not verify that it is sent to their server due to obfuscation of their code and some limitations of our measurement methods.
While we can’t say how these trackers use the information they collect, we can examine their marketing material to understand how it may be used. OnAudience, Tealium AudienceStream, Lytics, and ProPS all offer some form of “customer data platform”, which collect data to help publishers to better monetize their users. Forter offers “identity-based fraud prevention” for e-commerce sites. Augur offers cross-device tracking and consumer recognition services. We were unable to determine the company which owns the ntvk1.ru domain.
Vulnerability 2: Tracking users around the web with the Facebook Login service
Some third parties use the Facebook Login feature to authenticate users across many websites: Disqus, a commenting widget, is a popular example. However, hidden third-party trackers can also use Facebook Login to deanonymize users for targeted advertising. This is a privacy violation, as it is unexpected and users are unaware of it. But how can a hidden tracker get the user to Login with Facebook? When the same tracker is also a first party that users visit directly. This is exactly what we found Bandsintown doing. Worse, they did so in a way that allowed any malicious site to embed Bandsintown’s iframe to identify its users.
We discovered that the iframe injected by Bandsintown would pass the user’s information to the embedding script indiscriminately. Thus, any malicious site could have used their iframe to identify visitors. We informed Bandsintown of this vulnerability and they confirmed that it is now fixed.
Conclusion
This unintended exposure of Facebook data to third parties is not due to a bug in Facebook’s Login feature. Rather, it is due to the lack of security boundaries between the first-party and third-party scripts in today’s web. Still, there are steps Facebook and other social login providers can take to prevent abuse: API use can be audited to review how, where, and which parties are accessing social login data. Facebook could also disallow the lookup of profile picture and global Facebook IDs by app-scoped user IDs. It might also be the right time to make Anonymous Login with Facebook available following its announcement four years ago.
Clarification (2018-04-19 2:25am): In a previous version of the post we listed three sites (fiverr.com, bhphotovideo.com, and mongodb.com) which embed scripts that match the URL patterns given above. Third-party scripts may contain different contents when loaded by different sites, even though the scripts are served from same or similar URLs. We confirmed that the Forter scripts embedded on fiverr.com and bhphotovideo.com do NOT include functionality to access Facebook data. On mongodb.com we only observed the presence of an Augur script. We have published an updated list of sites, marking the ones where we have confirmed the presence of functionality to access Facebook data.
Correction (2018-04-22; 05:55pm): In a previous version of this post, we listed a Lytics script (https://c.lytics.io/static/io.min.js
) as the cause of Facebook API access. Although this script is used to send the Facebook user ID to Lytics (c.lytics.io
), the code which accesses the Facebook API was served within an OpenTag script as described above. The code snippet in the OpenTag script responsible for accessing Facebook user data is likely configured by the first party, so we have removed our previous opinion that first parties are likely unaware of the data access.
Several companies stated that they do not use Facebook data for third-party tracking purposes.
[0] Steven Englehardt is currently working at Mozilla as a Privacy Engineer. He coauthored this post in his Princeton capacity, and this post doesn’t necessarily represent Mozilla’s views.
[1] We use the term “vulnerability” to refer to weaknesses arising from insecure design practices on today’s web, rather than its commonly understood sense in computer security of weaknesses arising due to software bugs.
[2] In this post we focus on websites which use Facebook Login, but the vulnerabilities we describe are likely to exist for most social login providers and on mobile devices. Indeed, we found scripts that appear to grab user identifiers from the Google Plus API and from the Russian social media site VK , but we limited our investigation to Facebook Login as it’s the most widely used social SDK on the web.
[3] When the user completes the Facebook login procedure on a website that uses Facebook’s Javascript SDK, the SDK stores an authentication token in the page. When the user navigates to a new page, the SDK automatically reestablishes the authentication token using the browser’s Facebook cookies. All third-party queries to the SDK automatically use this token.
[4] In order to better understand the level of integration a third party has with the first party, we categorize scripts based on their use of the first party’s Application ID (or AppId), which is provided to Facebook during the login initialization phase to identify the site. Inclusion of a site’s application ID and initialization code in the third-party library suggests a tighter integration—the first party was likely required to configure the third-party script to access the Facebook SDK on their behalf. While application IDs aren’t meant to be secrets, we take the lack of an App ID to imply loose integration—the first party may not be aware of the access. In fact, all of the scripts in this category take the same actions when embedded on a simple test page with no prior business relationship.
[5]. The following could indicate the first party’s awareness of the Facebook data access:
1) third-party initiates the Facebook login process instead of passively waiting for the login to happen; 2) third-party includes the unique App ID of the website it is embedded on. The seven scripts listed above neither initiate the login process, nor contain the app ID of the websites.
Still, it is very hard to be certain about the exact relationship between the first parties and third parties.
[6] The application-scoped IDs can be resolved to real user profile information by querying Facebook’s Graph API or retrieve the user’s profile photo (which does not even require authentication!). When security researchers showed that it is possible to map app-scoped IDs to Facebook IDs and download profile pictures Facebook responded as follows: “This is intentional behavior in our product. We do not consider it a security vulnerability, but we do have controls in place to monitor and mitigate abuse.” A Facebook interface with similar controls was reportedly used to harvest of 2 Billion Facebook users’ public profile data. Note that although the endpoint found by the researchers does not work anymore, the following endpoint still redirects to users’ profile page: https://www.facebook.com/[app_scoped_ID].
APPENDIX:
Appendix 1 — Measurement Methods
To study the abuse of social login APIs we extended OpenWPM to simulate that the user has authenticated and given full permissions to the Facebook Login SDK on all sites. We added instrumentation to monitor the use of the Facebook SDK interface (`window.FB`). We did not otherwise inject the user’s identity into the page, so any exfiltrated personal data must have been queried from our spoofed API.
As in our previous measurements, we crawled 50,000 sites from the Alexa top 1 million in June 2017. We used the following sampling strategy: visit all of the top 15,000 sites, randomly sample 15,000 sites from the Alexa rank range [15,000 100,000), and randomly sample 20,000 sites from the range [100,000, 1,000,000). This combination allowed us to observe the attacks on both high and low traffic sites. On each of these 50,000 sites we visited 6 pages: the front page and a set of 5 other pages randomly sampled from the internal links on the front page.
To spoof that a user is logged in, we create our own `window.FB` object and replicate the interface of version 2.8 of the Facebook SDK. The spoofed API has the following properties:
- For method calls that normally return personal information we spoof the return values as if the user is logged in and call and necessary callback function arguments.
- These include `FB.api()`, `FB.init(), `FB.getLoginStatus()`, `FB.Event.subscribe()` for the events `auth.login`, `auth.authResponseChange`, and `auth.statusChange`, and `FB.getAuthResponse()`.
- For the Graph API (`FB.api`), we support most of the profile data fields supported by the real Facebook SDK. We parse the requested fields and return a data object in the same format the real graph API would return.
- For method calls that don’t return personal information we simply call a no-op function and ignore any callback arguments. This helps minimize breakage if a site calls a method we don’t fully replicate.
- We fire `window.fbAsyncInit` once the document has finished loading. This function is normally called by the Facebook SDK.
The spoofed `window.FB` object is injected into every frame on every page load, regardless of the presence of a real Facebook SDK. We then monitor access to the API using OpenWPM’s Javascript call monitoring. All HTTP request and response data, include HTTP POST payloads are examined to detect the exfiltration of any of the spoofed profile data (including that which has been hashed or encoded).
For both calls to `window.FB` and HTTP data, we store the Javascript stack trace at the time of execution. We use this stack trace to understand which APIs scripts accessed and when they were sending data back. For some scripts our instrumentation only captured the API access, but not the exfiltration. In these cases, we manually debugged the scripts to determine whether the data was only used locally or if it was obfuscated before being transmitted. We explicitly note the cases where we could not make this determination.
Appendix 2 — Third parties which access the Facebook API on behalf of first parties
We also found a number of third-party scripts interacting with the Facebook API, which appear to be operating on behalf of the first party [4]. These companies offer a range of services, such as integrating multiple social login options, monitoring social media engagement, and aggregating customer data. As a specific example, BlueConic offers a Facebook Profile transfer service, that copies information from the user’s Facebook profile information to BlueConic’s data platform. Additional third-party services which access Facebook profile information on the first party’s behalf include: Zummy, Social Miner, Limespot (personalizer.io), Kissmetrics, Gigya, and Webtrends. (Update: Limespot informed us that they deactivated the relevant feature in November 2017.)
Image assets used in figures are from the Noun Project:
computer tower by Melvin, Female by SBTS, javascript file by Adnen Kadri, click by Aybige
As co-founder and CEO of LimeSpot I would like to point to a misleading statement and erroneous implications in this article.
For starters, some of the information on which this article is based is six months or more out of date and is incorrect. On being notified of this the authors have added an update, but, inaccurately and confusingly, LimeSpot is still included in a list of third-party services which are stated to “access Facebook profile information on the first party’s behalf.” We don’t.
LimeSpot infers patterns of consumer behaviour from large-scale data processing. We believe strongly in consumer privacy and reject the implication that we abuse consumer data. From its inception in 2013, LimeSpot has had a firm policy of keeping personal data private, for example by anonymizing it and processing it in real time. LimeSpot does not buy, sell or share personal data.
Finally, in a recent article (https://www.thestar.com/opinion/contributors/2018/03/27/who-owns-your-digital-footprints.html) I have argued explicitly for individual rights concerning digital footprints and privacy. LimeSpot supports giving consumers the power to grant or withhold approval from companies that want to use personal data. In our view, regulated data transparency would balance consumers’ right to privacy and companies’ ability to use personal data: it would enable online service providers to deliver a much better user experience, because consumers would not have to deal with intrusive content based on irrelevant signals or algorithms’ incorrect interpretations of online behaviour. We welcome informed debate on this subject.
The real issue here seems to be a complete lack of sandboxing. Seems like these ads/trackers could do anything on the page they are embedded in, like stealing session cookies and other private information, etc. The Facebook API would just be a common interface for stealing useful information from various sites.
Congrats to the WebTAP team for one more grass root approach to identify potential breaches of privacy, which as they suggest, is just a symptom of the lack of standards and privacy regulations. Also reassures the importance of website admin education (or “over” education?). These “civil society” actions, however, seem to be the only hope of accountability through denouncing, at least for now.
Additionally, there seems to be a certain degree of mystery, which is turning the work into a real investigative work. What would make website creators to include these scripts to the pages? The authors suggest two scenarios; however they quickly clear them out to be the instances they have found. Indeed, a little fuzziness in that regard.
In regards to the second vulnerability, the study also seems to reaffirm the fact that Facebook also tracks user’s visiting of websites that use FB as a third-party domain. I consider this even more scary, as it transcends the user understandings of his relationship and limits with Facebook.
Facebook is “tracking user’s visiting of websites that use FB as a third party domain” for what boils down to one simple reason – advertising revenue. Martin, you are correct, and I’m jumping in on your comment to add motive behind the disconnect between the user and how facebook uses the users info.
The more facebook understands about you, and believe me, there is not that they don’t know, the better there advertising will work. Google was the first tech giant that truly made the online advertising model work with AdWords. However, with the recent shift towards mobile devices (for those who don’t know – mobile has overtaken desktop in searches as well as total time of use).
The more information facebook knows about you, the more successful a layman can be at running an advertising campaign and actually hitting the right target marketing and seeing ROAS (known as Return on Ad Spend). Facebook is really going above and beyond to help the tens of thousands of illegitimate ad firms popping up each day. Everyone has a cousin who is going this, or knows someone who has a family member who is in the industry – seriously. And it’s scary to think that facebook is well on their way to possessing the psychodynamic info that is so personal and incredibly accurate – we may actually see facebook take over Google with Total monies spent on advertising.
It’s all sickening and the measures for safeguarding are pathetic – you next boss probably has hacked your facebook account, so be sure to clean it up sooner than later. I only speak from past experience, as well as my career in the internet advertising industry as a 10 year veteran.
Facebook clearly became abusive about gaining access to users and tracking them for gain. I am sure if you look closer Facebook is directly sharing what it collects to any third party willing to pay for it. Well what do people expect when they get something for nothing? Got to figure all those people who work on Facebook are not there as volunteers. Zuckerberg makes a good dollar as well as his other executives. I tolerated some of it until the recent events and Zuckerberg’s inability to apologize or take real action was enough to delete my Facebook account.
Looks like the company that owns the ntvk1.ru domain is Natimatica
As CMO of Tealium, I need to clarify some misleading statements that are included within this article.
Tealium acts as a JavaScript (tag) management and customer data management software for businesses. Tealium in no way, shape, or form uses Facebook data (or any customer data) in the way this article implies. Our software is used by companies to manage their own customers’ data, and we do not use that data for any other purpose, nor do we buy, share, or sell that data.
We are advocates of customer data privacy, strong data governance, and transparency. We are actively helping companies prepare for new, upcoming privacy regulations and welcome the opportunity to discuss this and help rectify the confusion contained within this article.
Thank you,
Adam
Nice work!
This is insanely complicated to utilize what is ultimately a form of entertainment. The Internet is the most powerful thing humanity has invented and I hope against hope we all eventually learn how to use it without compromising our personal security.