May 26, 2020

Archives for March 2020

Vulnerability reporting is dysfunctional

By Kevin Lee, Ben Kaiser, Jonathan Mayer, and Arvind Narayanan

In January, we released a study showing the ease of SIM swaps at five U.S. prepaid carriers.  These attacks—in which an adversary tricks telecoms into moving the victim’s phone number to a new SIM card under the attacker’s control—divert calls and SMS text messages away from the victim. This allows attackers to receive private information such as SMS-based authentication codes, which are often used in multi-factor login and password recovery procedures. 

We also uncovered 17 websites that use SMS-based multi-factor authentication (MFA) and SMS-based password recovery simultaneously, leaving accounts open to takeover from a SIM swap alone; an attacker can simply reset a victim’s account password and answer the security challenge when logging in. We responsibly disclosed the vulnerabilities to those websites in early January, urging them to make changes to disallow this configuration. Throughout the process, we encountered two wider issues: (1) lack of security reporting mechanisms, and (2) a general misunderstanding of authentication policies. As a result, 9 of these 17 websites, listed below, remain vulnerable by default.

Disclosure Process. On each website, we first looked for email addresses dedicated to vulnerability reporting; if none existed, we looked for the companies on bug bounty platforms such as HackerOne. If we were unable to reach a company through a dedicated security email or through bug bounty programs, as a last resort, we reached out through customer support channels. Sixty days after our reports, we re-tested the configurations at the companies, except for those that reported that they had fixed the vulnerabilities.

Outcomes. Three companies—Adobe, Snapchat, and eBay—acknowledged and promptly fixed the vulnerabilities we reported. In one additional case, the vulnerability was fixed, but only after we exhausted the three contact options and reached out to company personnel via a direct message on Twitter. In three cases—Blizzard, Microsoft, and Taxact—our vulnerability report did not produce the intended effect (Microsoft and Taxact did not understand the issue, Blizzard provided a generic acknowledgment email), but in our 60-day re-test, we found that the vulnerabilities had been fixed (without the companies notifying us). As such, we do not know whether the fixes were implemented in light of our research.

Among the responses we received, there were several failure modes, which were not mutually exclusive. 

  • In five cases, personnel did not understand our vulnerability report, despite our attempts to make it as clear as possible (see Appendix B of our paper). Three of them—Microsoft, Paypal, and Yahoo—demonstrated knowledge of SIM swap attacks, but did not realize that their SMS authentication policies were allowing for vulnerable accounts. Paypal, for instance, closed our report as out-of-scope, claiming that “the vulnerability is not in Paypal, as you mentioned this is an issue with the carriers and they need to fix it on their side.While phone number hijackings are the result of poor customer authentication procedures at the carriers, account hijackings resulting from SMS passcode interception are the result of poor authentication policies at websites. The remaining two websites—Taxact and Gaijin Entertainment—misinterpreted our disclosure as a feature request and feedback, respectively.
  • Three of the four reports we submitted to third-party bug bounty programs were disregarded due to the absence of a bug (our findings are not software errors, but rather, logically inconsistent customer authentication policies). Reports are screened by employees of the program, who are independent of the website, and passed on to the website’s security teams if determined to be in scope. These third-party platforms appear to be overly strict with their triage criteria, preventing qualified researchers from communicating with the companies. This issue is not unique to our study, either. A few weeks ago, security researchers also reported difficulties with submitting vulnerability reports to Paypal, which uses HackerOne as its sole security reporting mechanism. HackerOne employs mechanisms that restrict users from submitting future reports after too many closed reports, which could disincentivize users from reporting legitimate vulnerabilities.
  • In five cases, we received no response. 
  • All four attempts to report security vulnerabilities through customer support channels were fruitless: either we received no response or personnel did not understand the issue.   

We have listed all 17 responses in the table below. Unfortunately, nine of these websites use SMS-based MFA and SMS-based password recovery by default and remain so as of this writing. Among them are payment services PayPal and Venmo. The vulnerable websites cumulatively have billions of users. 

Recommendations

We recommend that companies make the following changes to their vulnerability response:

  1. Companies need to realize that policy-related vulnerabilities are very real, and should use threat modeling to detect these. There seems to be a general lack of knowledge about vulnerabilities arising from weak authentication policies.
  2. Companies should provide direct contact methods for security reporting procedures. A bug bounty program is not a substitute for a robust security reporting mechanism, yet some companies are using it as such. Furthermore, customer support channels—whose personnel are unlikely to be trained to respond to security vulnerability disclosures—add a level of indirection and can lead to vulnerability reports being forwarded to inappropriate teams.  

Our paper, along with our dataset, is located at issms2fasecure.com.

Thanks to Malte Möser for providing comments on a draft.

Building a Bridge with Concrete… Examples

Thanks to Annette Zimmermann and Arvind Narayanan for their helpful feedback on this post.

Algorithmic bias is currently generating a lot of lively public and scholarly debate, especially amongst computer scientists and philosophers. But do these two groups really speak the same language—and if not, how can they start to do so?

I noticed at least two different ways of thinking about algorithmic bias during a recent research workshop on the ethics of algorithmic decision-making at Princeton University’s Center for Human Values, organized by political philosopher Dr. Annette Zimmermann. Philosophers are thinking about algorithmic bias in terms of things like the inherent value of explanation, the fairness and accountability rights afforded to humans, and whether groups that have been systematically affected by unfair systems should bear the burden for integration when transitioning to a uniform system. Computer scientists, by contrast, are thinking about algorithmic bias in terms of things like running a gradient backwards to visualize a heat map, projecting features into various subspaces devoid of protected attributes, and tuning hyperparameters to better satisfy a new loss function. Of course these are vast generalizations about the two fields, and there are plenty of researchers doing excellent work at the intersection, but it seems that for the most part while philosophers are debating which sets of ethical axioms ought to underpin algorithmic decision-making system, computer scientists are in the meantime already deploying these systems into the real world.

In formulating loss functions, consequentialists might prioritize maximizing accurate outcomes for the largest possible number of people, even if that is at the cost of fair treatment, whereas deontologists might prioritize treating everyone fairly, even if that is at the cost of optimality. But there isn’t a definitive “most moral” answer, and if something like equalizing false positive rates were the key to fairness, we would not be having the alarming headlines of algorithmic bias that we have today.

Inundated with various conflicting definitions of fairness, scientists are often optimizing for metrics they believe to be best and proceeding onwards. For example, one might reasonably think that the way to ensure fairness of an algorithm between different racial groups could be to enforce predictive parity (equal likelihood of accurate positive predictions), or to equalize false error rates, or just to treat similar individuals similarly. However, it is actually mathematically impossible to simultaneously satisfy seemingly reasonable fairness criteria like these in most real world settings. It is unclear how to choose amongst the criteria, and even more unclear how one would go about translating complex ideas that may require consideration, such as systematic oppression, into a world of optimizers and gradients.

Since concrete mappings between a mathematical loss function and moral concepts are likely impossible to dictate, and philosophers are unlikely to settle on an ultimate theory of fairness, perhaps for now we can adopt a strategy that is, at least, not impossible to implement: a purposefully created, context- and application-specific validation/test set. The motivation behind this is that even if philosophers and ethicists cannot decisively articulate a set of general, static fairness desiderata, perhaps they can make more domain-specific, dynamic judgements: for instance whether one should prefer a system that gives person A with a set of attributes and features a loan or not. And they can also say that for person B and C and so on. Of course there will not be unanimous agreement, but at least a general consensus towards a particular outcome as preferable over the other. One could then create a whole set of such examples. Concepts like the idea that similar people should be treated similarly in a given decision scenario—the ‘like cases maxim’ in legal philosophy—could be encoded into this test set by having groups of people that differ only in a protected attribute be given the same result, and even concepts like equal accuracy rates across protected groups could be encoded in by having the test set be represented by equal numbers of people from each group rather than proportional to the real world majority/minority representations. However, the test set is not a constructually valid way to enforce these fairness constraints, and it shouldn’t be either, because the reason why such a test set would exist is that the right fairness criteria are not actually known, otherwise it would just be explicitly formulated into the loss function.

At this juncture, ethicists and computer scientists could usefully engage in complementary work: ethicists could identify difficult edge cases that challenge what we think about moral questions and incorporate this into the test set, and computer scientists could work on optimizing accuracy rates on a given validation set. There are a few crucial differences, however, from similar collaborative approaches in other domains like when doctors are called on to provide expert labels on medical data so models can be trained to detect things like eye diseases. There is now the new notion that the distribution of the test set, in addition to just the labels, are going to be specifically decided upon by domain experts. Further, this collaboration would last beyond just the labeling of the data. Failure cases should be critically investigated earlier in the machine learning pipeline in an iterative and reflective way to ensure things like overfitting are not happening. Whether performing well on the hidden test set requires learning fairer representations in the feature space or thresholding different groups differently, scientists will build context-specific models that encompass certain moral values defined by ethicists, who are grounding the test set in examples of realizations of such values.

But does this proposal mean adopting a potentially dangerous, ethically objectionable “the ends justify the means” logic? Not necessarily. With algorithm developers working in conjunction with ethicists to ensure the means are not unsavory, this could be a way to bridge the divide between abstract notions of fairness, and concrete ways of implementing systems.

This may not be a long-term ideal way to deal with the problem of algorithmic fairness because of the difficulty in generalizing between applications, and in situations where creating an expert-curated test set is too expensive or not scalable, not preferred over satisfying one of the many mathematical definitions of fairness, but it could be one possible way to incorporate philosophical notions of fairness into the development of algorithms. Because technologists are not going to hold off and wait on deploying machine learning systems until they are in a state of fairness everyone agrees on, finding a way of incorporating philosophical views about central moral values like fairness and justice into algorithmic systems right now is an urgent problem.

Supervised machine learning has traditionally been focused on predicting based on historical and existing data, but maybe we can structure our data in a way that is a model not of the society we actually live in, but of the one we hope to live in. Translating complex philosophical values into representative examples is not an easy task, but it is one that ethicists have been doing a version of for centuries in order to investigate moral concepts—and perhaps it can also be the way to convey some sense of our morals to machines.

The CheapBit of Fitness Trackers Apps

Yan Shvartzshnaider (@ynotez) and Madelyn Sanfilippo (@MrsMRS_PhD)

Fitness trackers are “[devices] that you can wear that records your daily physical activity, as well as other information about your health, such as your heart rate” [Oxford Dictionary]. The increasing popularity of wearable devices offered by Apple, Google, Nike inadvertently led cheaper versions to flood the market, along with the emergence of alternative non-tech, but fashionable brand devices. Cheaper versions ostensibly offer similar functionality for one-tenth of the price, which makes them very appealing to consumers. On Amazon, many of these devices receive overall positive feedback and an average of 4-5 star reviews. Some of them are even labeled as “Amazon’s choice” and “Best buyer” (e.g. Figure 1), which reinforces their popularity.

In this blog post, we examine privacy issues around these cheaper alternatives devices, specifically focusing on the ambiguities around third party apps they are using. We report our preliminary results into a few apps that seem to dominate the marketspace. Note that fashion brands also employ third party apps like WearOS by Google, but they tend to be more recognizable and subject to greater consumer protection scrutiny. This makes them different than lesser-known devices.

Figure 1: LETSCOM, uses VeryFitPro, with over 13K reviews, labeled as Amazon’s Choice and is marketed to children.

Do consumers in fact pay dearly for the cheaper version of these devices?

Privacy issues are not unique to cheaper brands. Any “smart device” that has the ability to collect, process and share information about you and the surrounding environment, can potentially violate your privacy.  Security issues also play an important role. Services like Mozilla’s Privacy Not Included and Consumer reports help navigate the treacherous landscape.  However, even upholding the Minimum Security Standards  doesn’t prevent privacy violations due to inappropriate use of information, see Strava and Polar incidents.  

Given that most of the analysis is typically done by an app paired with a fitness tracker, we decided to examine the “CheapBit” products sold on Amazon,  with a large average number of reviews and answered questions, to see which apps they pair with. We found that the less-expensive brands are dominated by a few third-party apps primarily developed by small teams (or individuals) and do not provide any real description as to how data are used and shared. 

But what do we know about these apps?   

The VeryFitPro app seems to be the choice of many of the users buying the cheaper fitness trackers alternatives. The app has  5,000,000+ installs, according to Google Play, where it lists an email of the developer and the website with just a QR code to download the app. The app has access to an extensive list of permissions: SMS, Camera, Location, Wifi information, Device ID & Call information, Device & app history, Identity, Phone, Storage, Contacts, and Photo/Media/Files! The brief privacy policy appears to be translated into English using an automatic translation tool, such as Google Translate.

Surprisingly,  what appears to be the same app on the Apple Store points to a different privacy policy altogether, hosted on a Facebook page! The app  provides a different contact email  () and policy is even shorter than on the Play Store. In a three-paragraph policy, we are reassured that  “some of your fitness information and sports data will be stored in the app, but your daily activities data will never be shared without permission.” and with a traditional “We reserve the right, in our decision to change, modify, add or remove portions of this policy at any time. Please check this page periodically for any changes. Publish any changes to these terms if you continue to use our App future will mean that you have accepted these adjustments. [sic]” No additional information is provided.

While we found the VeryFitPro to be common among cheap fitness trackers, especially high-rated ones, it is not unique. Other apps such as JYouPro, which has access to the same range of permissions, offer privacy policy which is just two paragraphs long which also reassures users that “[they] don’t store personal information on our servers unless required for the on-going operation of one of our services.” The Apple version offers a slightly longer version of the policy. In it, we find that “When you synchronise the Band data, e.g. to JYouPro Cloud Service, we may collect data relating to your activities and functionalities of JYouPro, such as those obtained from our sensors and features on JYouPro, your sleeping patterns, movement data, heart rate data, and smart alarm related information.” Given that JYouPro is used by a large number of devices, their “Cloud service” seems to be sitting on a very lucrative data set. The policy warns us: “Please note also that for the above, JYouPro may use overseas facilities operated and controlled by JYouPro to process or back up your personal data. Currently, JYouPro has data centres in Beijing and Singapore.

These are however not the worst offenders. Developers behind apps like MorePro and Wearfit didn’t even bother to translate their privacy policies from Chinese!

Users’ privacy concerns

These third-party apps are incredibly popular and pervade the low-end wearable market: VeryFitPro ( 5,000,000+ installs), JYouPro (500,000+ installs), WearFit (1,000,000+ installs). With little oversight, they are able to collect and process lots of potentially sensitive information from having access to contacts, camera, location, and other sensors data from a large number of users.  Most of them are developed by small teams or unknown Chinese firms, which dominate the mHealth market.  

A small portion of users on Amazon express privacy concerns. For one of the top selling products LETSCOM Fitness Tracker  which uses VeryFitPro with 4/5 stars, 14,420 ratings and 1000+ answered questions, marketed towards “Kids Women and Men”, we were able to find only a few questions on privacy.  Notably, none of the questions was upvoted, so we suspect the remain unseen by the typical buyer. For example, one user was asking “What is the privacy policy for the app? How secure is the personal information? [sic]” to which another user (not the manufacturer) replied “A: This connects to your phone by bluetooth. That being said, I guess you could connect it only when you are in a secure location but then you wouldn’t have the message or phone notifications.” A similar concern was raised by another user “What is this company’s policy on data privacy? Will they share or sell the data to third parties?”

In another popular product, Lintelek Fitness Tracker with Heart Rate Monitor which used VeryFitPro with 4/5 stars, 4,050 ratings. Out of 1000+ answered questions, only a couple mentioned privacy. The first user gave a product 1 start with ominous warning “Be sure to read the privacy agreement before accepting this download”. Interestingly, the second user rated the product with 5 stars and gave a very positive review that ends with “Only CON: read the privacy statement if you are going to use the text/call feature. They can use your information. I never turned it on – I always have my phone anyway.

The fact that buyers of these devices do not investigate the privacy issues is troubling. Previous research showed that consumers will think that if a company has a privacy policy it protects their privacy. It seems to be clear that consumers need help from the platform. Amazon, Google and Apple ought to better inform consumers about potential privacy violations. In addition to consumer protection obligations by these platforms, regulators ought to apply increased scrutiny. While software are not conventional medical devices, hence not covered by HIPAA, some medical apps do fall under FDA authority, including apps that correspond with wearables.  Furthermore, as in Figure 1 shows, these devices are marketed to children so the app should be subject to enforcement of children’s privacy standards like COPPA

In conclusion, the lesser-known fitness tracking brands offer a cheaper alternative to high-end market products. However, as previous research showed, consumers of these devices are potentially paying a high-privacy price. The consumers are left to fend for themselves. In many cases, the cheaper devices pertaining to firms outside of US jurisdiction and thus US and European regulations are difficult to enforce.  Furthermore, global platforms like Amazon, Google, Apple, and others seem to turn a blind eye to privacy issues and help to promote these devices and apps. They offer unhelpful and possibly misleading labels to the consumers such as Amazon’s “best seller”, “Amazon’s choice”, Google’s Play Store’s download count and star ratings, which exacerbate an already global and complex issue. It requires proactive action on behalf of all parties to offer lasting protection of users’ privacy, one that incorporates the notions of established societal norms and expectations.


We would like to thank Helen Nissenbaum for offering her thoughts on the topic.