December 9, 2022

Archives for March 2020

Vulnerability reporting is dysfunctional

By Kevin Lee, Ben Kaiser, Jonathan Mayer, and Arvind Narayanan

In January, we released a study showing the ease of SIM swaps at five U.S. prepaid carriers.  These attacks—in which an adversary tricks telecoms into moving the victim’s phone number to a new SIM card under the attacker’s control—divert calls and SMS text messages away from the victim. This allows attackers to receive private information such as SMS-based authentication codes, which are often used in multi-factor login and password recovery procedures. 

We also uncovered 17 websites that use SMS-based multi-factor authentication (MFA) and SMS-based password recovery simultaneously, leaving accounts open to takeover from a SIM swap alone; an attacker can simply reset a victim’s account password and answer the security challenge when logging in. We responsibly disclosed the vulnerabilities to those websites in early January, urging them to make changes to disallow this configuration. Throughout the process, we encountered two wider issues: (1) lack of security reporting mechanisms, and (2) a general misunderstanding of authentication policies. As a result, 9 of these 17 websites, listed below, remain vulnerable by default.

Disclosure Process. On each website, we first looked for email addresses dedicated to vulnerability reporting; if none existed, we looked for the companies on bug bounty platforms such as HackerOne. If we were unable to reach a company through a dedicated security email or through bug bounty programs, as a last resort, we reached out through customer support channels. Sixty days after our reports, we re-tested the configurations at the companies, except for those that reported that they had fixed the vulnerabilities.

Outcomes. Three companies—Adobe, Snapchat, and eBay—acknowledged and promptly fixed the vulnerabilities we reported. In one additional case, the vulnerability was fixed, but only after we exhausted the three contact options and reached out to company personnel via a direct message on Twitter. In three cases—Blizzard, Microsoft, and Taxact—our vulnerability report did not produce the intended effect (Microsoft and Taxact did not understand the issue, Blizzard provided a generic acknowledgment email), but in our 60-day re-test, we found that the vulnerabilities had been fixed (without the companies notifying us). As such, we do not know whether the fixes were implemented in light of our research.

Among the responses we received, there were several failure modes, which were not mutually exclusive. 

  • In five cases, personnel did not understand our vulnerability report, despite our attempts to make it as clear as possible (see Appendix B of our paper). Three of them—Microsoft, Paypal, and Yahoo—demonstrated knowledge of SIM swap attacks, but did not realize that their SMS authentication policies were allowing for vulnerable accounts. Paypal, for instance, closed our report as out-of-scope, claiming that “the vulnerability is not in Paypal, as you mentioned this is an issue with the carriers and they need to fix it on their side.While phone number hijackings are the result of poor customer authentication procedures at the carriers, account hijackings resulting from SMS passcode interception are the result of poor authentication policies at websites. The remaining two websites—Taxact and Gaijin Entertainment—misinterpreted our disclosure as a feature request and feedback, respectively.
  • Three of the four reports we submitted to third-party bug bounty programs were disregarded due to the absence of a bug (our findings are not software errors, but rather, logically inconsistent customer authentication policies). Reports are screened by employees of the program, who are independent of the website, and passed on to the website’s security teams if determined to be in scope. These third-party platforms appear to be overly strict with their triage criteria, preventing qualified researchers from communicating with the companies. This issue is not unique to our study, either. A few weeks ago, security researchers also reported difficulties with submitting vulnerability reports to Paypal, which uses HackerOne as its sole security reporting mechanism. HackerOne employs mechanisms that restrict users from submitting future reports after too many closed reports, which could disincentivize users from reporting legitimate vulnerabilities.
  • In five cases, we received no response. 
  • All four attempts to report security vulnerabilities through customer support channels were fruitless: either we received no response or personnel did not understand the issue.   

We have listed all 17 responses in the table below. Unfortunately, nine of these websites use SMS-based MFA and SMS-based password recovery by default and remain so as of this writing. Among them are payment services PayPal and Venmo. The vulnerable websites cumulatively have billions of users. 


We recommend that companies make the following changes to their vulnerability response:

  1. Companies need to realize that policy-related vulnerabilities are very real, and should use threat modeling to detect these. There seems to be a general lack of knowledge about vulnerabilities arising from weak authentication policies.
  2. Companies should provide direct contact methods for security reporting procedures. A bug bounty program is not a substitute for a robust security reporting mechanism, yet some companies are using it as such. Furthermore, customer support channels—whose personnel are unlikely to be trained to respond to security vulnerability disclosures—add a level of indirection and can lead to vulnerability reports being forwarded to inappropriate teams.  

Our paper, along with our dataset, is located at

Thanks to Malte Möser for providing comments on a draft.

Building a Bridge with Concrete… Examples

Thanks to Annette Zimmermann and Arvind Narayanan for their helpful feedback on this post.

Algorithmic bias is currently generating a lot of lively public and scholarly debate, especially amongst computer scientists and philosophers. But do these two groups really speak the same language—and if not, how can they start to do so?

I noticed at least two different ways of thinking about algorithmic bias during a recent research workshop on the ethics of algorithmic decision-making at Princeton University’s Center for Human Values, organized by political philosopher Dr. Annette Zimmermann. Philosophers are thinking about algorithmic bias in terms of things like the inherent value of explanation, the fairness and accountability rights afforded to humans, and whether groups that have been systematically affected by unfair systems should bear the burden for integration when transitioning to a uniform system. Computer scientists, by contrast, are thinking about algorithmic bias in terms of things like running a gradient backwards to visualize a heat map, projecting features into various subspaces devoid of protected attributes, and tuning hyperparameters to better satisfy a new loss function. Of course these are vast generalizations about the two fields, and there are plenty of researchers doing excellent work at the intersection, but it seems that for the most part while philosophers are debating which sets of ethical axioms ought to underpin algorithmic decision-making system, computer scientists are in the meantime already deploying these systems into the real world.

In formulating loss functions, consequentialists might prioritize maximizing accurate outcomes for the largest possible number of people, even if that is at the cost of fair treatment, whereas deontologists might prioritize treating everyone fairly, even if that is at the cost of optimality. But there isn’t a definitive “most moral” answer, and if something like equalizing false positive rates were the key to fairness, we would not be having the alarming headlines of algorithmic bias that we have today.

Inundated with various conflicting definitions of fairness, scientists are often optimizing for metrics they believe to be best and proceeding onwards. For example, one might reasonably think that the way to ensure fairness of an algorithm between different racial groups could be to enforce predictive parity (equal likelihood of accurate positive predictions), or to equalize false error rates, or just to treat similar individuals similarly. However, it is actually mathematically impossible to simultaneously satisfy seemingly reasonable fairness criteria like these in most real world settings. It is unclear how to choose amongst the criteria, and even more unclear how one would go about translating complex ideas that may require consideration, such as systematic oppression, into a world of optimizers and gradients.

Since concrete mappings between a mathematical loss function and moral concepts are likely impossible to dictate, and philosophers are unlikely to settle on an ultimate theory of fairness, perhaps for now we can adopt a strategy that is, at least, not impossible to implement: a purposefully created, context- and application-specific validation/test set. The motivation behind this is that even if philosophers and ethicists cannot decisively articulate a set of general, static fairness desiderata, perhaps they can make more domain-specific, dynamic judgements: for instance whether one should prefer a system that gives person A with a set of attributes and features a loan or not. And they can also say that for person B and C and so on. Of course there will not be unanimous agreement, but at least a general consensus towards a particular outcome as preferable over the other. One could then create a whole set of such examples. Concepts like the idea that similar people should be treated similarly in a given decision scenario—the ‘like cases maxim’ in legal philosophy—could be encoded into this test set by having groups of people that differ only in a protected attribute be given the same result, and even concepts like equal accuracy rates across protected groups could be encoded in by having the test set be represented by equal numbers of people from each group rather than proportional to the real world majority/minority representations. However, the test set is not a constructually valid way to enforce these fairness constraints, and it shouldn’t be either, because the reason why such a test set would exist is that the right fairness criteria are not actually known, otherwise it would just be explicitly formulated into the loss function.

At this juncture, ethicists and computer scientists could usefully engage in complementary work: ethicists could identify difficult edge cases that challenge what we think about moral questions and incorporate this into the test set, and computer scientists could work on optimizing accuracy rates on a given validation set. There are a few crucial differences, however, from similar collaborative approaches in other domains like when doctors are called on to provide expert labels on medical data so models can be trained to detect things like eye diseases. There is now the new notion that the distribution of the test set, in addition to just the labels, are going to be specifically decided upon by domain experts. Further, this collaboration would last beyond just the labeling of the data. Failure cases should be critically investigated earlier in the machine learning pipeline in an iterative and reflective way to ensure things like overfitting are not happening. Whether performing well on the hidden test set requires learning fairer representations in the feature space or thresholding different groups differently, scientists will build context-specific models that encompass certain moral values defined by ethicists, who are grounding the test set in examples of realizations of such values.

But does this proposal mean adopting a potentially dangerous, ethically objectionable “the ends justify the means” logic? Not necessarily. With algorithm developers working in conjunction with ethicists to ensure the means are not unsavory, this could be a way to bridge the divide between abstract notions of fairness, and concrete ways of implementing systems.

This may not be a long-term ideal way to deal with the problem of algorithmic fairness because of the difficulty in generalizing between applications, and in situations where creating an expert-curated test set is too expensive or not scalable, not preferred over satisfying one of the many mathematical definitions of fairness, but it could be one possible way to incorporate philosophical notions of fairness into the development of algorithms. Because technologists are not going to hold off and wait on deploying machine learning systems until they are in a state of fairness everyone agrees on, finding a way of incorporating philosophical views about central moral values like fairness and justice into algorithmic systems right now is an urgent problem.

Supervised machine learning has traditionally been focused on predicting based on historical and existing data, but maybe we can structure our data in a way that is a model not of the society we actually live in, but of the one we hope to live in. Translating complex philosophical values into representative examples is not an easy task, but it is one that ethicists have been doing a version of for centuries in order to investigate moral concepts—and perhaps it can also be the way to convey some sense of our morals to machines.

The CheapBit of Fitness Trackers Apps

Yan Shvartzshnaider (@ynotez) and Madelyn Sanfilippo (@MrsMRS_PhD)

Fitness trackers are “[devices] that you can wear that records your daily physical activity, as well as other information about your health, such as your heart rate” [Oxford Dictionary]. The increasing popularity of wearable devices offered by Apple, Google, Nike inadvertently led cheaper versions to flood the market, along with the emergence of alternative non-tech, but fashionable brand devices. Cheaper versions ostensibly offer similar functionality for one-tenth of the price, which makes them very appealing to consumers. On Amazon, many of these devices receive overall positive feedback and an average of 4-5 star reviews. Some of them are even labeled as “Amazon’s choice” and “Best buyer” (e.g. Figure 1), which reinforces their popularity.

In this blog post, we examine privacy issues around these cheaper alternatives devices, specifically focusing on the ambiguities around third party apps they are using. We report our preliminary results into a few apps that seem to dominate the marketspace. Note that fashion brands also employ third party apps like WearOS by Google, but they tend to be more recognizable and subject to greater consumer protection scrutiny. This makes them different than lesser-known devices.

Figure 1: LETSCOM, uses VeryFitPro, with over 13K reviews, labeled as Amazon’s Choice and is marketed to children.

Do consumers in fact pay dearly for the cheaper version of these devices?

Privacy issues are not unique to cheaper brands. Any “smart device” that has the ability to collect, process and share information about you and the surrounding environment, can potentially violate your privacy.  Security issues also play an important role. Services like Mozilla’s Privacy Not Included and Consumer reports help navigate the treacherous landscape.  However, even upholding the Minimum Security Standards  doesn’t prevent privacy violations due to inappropriate use of information, see Strava and Polar incidents.  

Given that most of the analysis is typically done by an app paired with a fitness tracker, we decided to examine the “CheapBit” products sold on Amazon,  with a large average number of reviews and answered questions, to see which apps they pair with. We found that the less-expensive brands are dominated by a few third-party apps primarily developed by small teams (or individuals) and do not provide any real description as to how data are used and shared. 

But what do we know about these apps?   

The VeryFitPro app seems to be the choice of many of the users buying the cheaper fitness trackers alternatives. The app has  5,000,000+ installs, according to Google Play, where it lists an email of the developer and the website with just a QR code to download the app. The app has access to an extensive list of permissions: SMS, Camera, Location, Wifi information, Device ID & Call information, Device & app history, Identity, Phone, Storage, Contacts, and Photo/Media/Files! The brief privacy policy appears to be translated into English using an automatic translation tool, such as Google Translate.

Surprisingly,  what appears to be the same app on the Apple Store points to a different privacy policy altogether, hosted on a Facebook page! The app  provides a different contact email  () and policy is even shorter than on the Play Store. In a three-paragraph policy, we are reassured that  “some of your fitness information and sports data will be stored in the app, but your daily activities data will never be shared without permission.” and with a traditional “We reserve the right, in our decision to change, modify, add or remove portions of this policy at any time. Please check this page periodically for any changes. Publish any changes to these terms if you continue to use our App future will mean that you have accepted these adjustments. [sic]” No additional information is provided.

While we found the VeryFitPro to be common among cheap fitness trackers, especially high-rated ones, it is not unique. Other apps such as JYouPro, which has access to the same range of permissions, offer privacy policy which is just two paragraphs long which also reassures users that “[they] don’t store personal information on our servers unless required for the on-going operation of one of our services.” The Apple version offers a slightly longer version of the policy. In it, we find that “When you synchronise the Band data, e.g. to JYouPro Cloud Service, we may collect data relating to your activities and functionalities of JYouPro, such as those obtained from our sensors and features on JYouPro, your sleeping patterns, movement data, heart rate data, and smart alarm related information.” Given that JYouPro is used by a large number of devices, their “Cloud service” seems to be sitting on a very lucrative data set. The policy warns us: “Please note also that for the above, JYouPro may use overseas facilities operated and controlled by JYouPro to process or back up your personal data. Currently, JYouPro has data centres in Beijing and Singapore.

These are however not the worst offenders. Developers behind apps like MorePro and Wearfit didn’t even bother to translate their privacy policies from Chinese!

Users’ privacy concerns

These third-party apps are incredibly popular and pervade the low-end wearable market: VeryFitPro ( 5,000,000+ installs), JYouPro (500,000+ installs), WearFit (1,000,000+ installs). With little oversight, they are able to collect and process lots of potentially sensitive information from having access to contacts, camera, location, and other sensors data from a large number of users.  Most of them are developed by small teams or unknown Chinese firms, which dominate the mHealth market.  

A small portion of users on Amazon express privacy concerns. For one of the top selling products LETSCOM Fitness Tracker  which uses VeryFitPro with 4/5 stars, 14,420 ratings and 1000+ answered questions, marketed towards “Kids Women and Men”, we were able to find only a few questions on privacy.  Notably, none of the questions was upvoted, so we suspect the remain unseen by the typical buyer. For example, one user was asking “What is the privacy policy for the app? How secure is the personal information? [sic]” to which another user (not the manufacturer) replied “A: This connects to your phone by bluetooth. That being said, I guess you could connect it only when you are in a secure location but then you wouldn’t have the message or phone notifications.” A similar concern was raised by another user “What is this company’s policy on data privacy? Will they share or sell the data to third parties?”

In another popular product, Lintelek Fitness Tracker with Heart Rate Monitor which used VeryFitPro with 4/5 stars, 4,050 ratings. Out of 1000+ answered questions, only a couple mentioned privacy. The first user gave a product 1 start with ominous warning “Be sure to read the privacy agreement before accepting this download”. Interestingly, the second user rated the product with 5 stars and gave a very positive review that ends with “Only CON: read the privacy statement if you are going to use the text/call feature. They can use your information. I never turned it on – I always have my phone anyway.

The fact that buyers of these devices do not investigate the privacy issues is troubling. Previous research showed that consumers will think that if a company has a privacy policy it protects their privacy. It seems to be clear that consumers need help from the platform. Amazon, Google and Apple ought to better inform consumers about potential privacy violations. In addition to consumer protection obligations by these platforms, regulators ought to apply increased scrutiny. While software are not conventional medical devices, hence not covered by HIPAA, some medical apps do fall under FDA authority, including apps that correspond with wearables.  Furthermore, as in Figure 1 shows, these devices are marketed to children so the app should be subject to enforcement of children’s privacy standards like COPPA

In conclusion, the lesser-known fitness tracking brands offer a cheaper alternative to high-end market products. However, as previous research showed, consumers of these devices are potentially paying a high-privacy price. The consumers are left to fend for themselves. In many cases, the cheaper devices pertaining to firms outside of US jurisdiction and thus US and European regulations are difficult to enforce.  Furthermore, global platforms like Amazon, Google, Apple, and others seem to turn a blind eye to privacy issues and help to promote these devices and apps. They offer unhelpful and possibly misleading labels to the consumers such as Amazon’s “best seller”, “Amazon’s choice”, Google’s Play Store’s download count and star ratings, which exacerbate an already global and complex issue. It requires proactive action on behalf of all parties to offer lasting protection of users’ privacy, one that incorporates the notions of established societal norms and expectations.

We would like to thank Helen Nissenbaum for offering her thoughts on the topic.

Ballot-level comparison audits: BMD

In my previous posts, I’ve been discussing ballot-level comparison audits, a form of risk-limiting audit. Ballots are imprinted with serial numbers (after they leave the voter’s hands); during the audit, a person must find a particular numbered ballot in a batch of a thousand (more or less).

With CCOS (central-count optical scan) this works fine: the CCOS prints the serial numbers consecutively, and the human auditor can easily find the right ballot in a minute or two. With PCOS (precinct-count optical scan), we are reluctant to print the serial numbers consecutively, because the order in which people insert their ballots at the polling place is visible to the public, and (in theory) someone could learn how you voted by correlating with the CVR file.

What about ballot-marking devices (BMDs)? How do the serial numbers work for use in ballot-level comparison audits?

First of all, let’s remember that RLAs of BMD-marked ballots are not very meaningful, because the RLA can only assure that what’s marked on the paper is correctly tabulated. Because most voters don’t inspect what’s marked on the paper, the RLA cannot assure that what the voter indicated to the BMD (on the touchscreen) has been correctly tabulated, if the BMD had been hacked to make it cheat.

But suppose we set that concern aside. And indeed, some jurisdictions are conducting “RLAs” on BMD-marked ballots. So let’s examine how such “RLAs” should work.

If the BMD prints a serial number onto the marked ballot before presenting the ballot for the voter to examine, then the voter can see the serial number, and can make a note of it. Then the voter can sell their vote, by telling the criminal vote-buyer the serial number. Or the voter can be coerced to do so. You may think this is a far-fetched scenario, but voter coercion and vote selling were common in the 19th-century and early 20th-century United States, and occurs now in some other countries.

Some “all-in-one” BMDs incorporate a scanning function, and don’t require a separate PCOS scanner. Suppose such a BMD prints a serial number onto the marked ballot after presenting the ballot for the voter to examine? That helps address the “voter-sees-the-number” problem. But it’s unpleasant to contemplate voting machines that can mark your ballot after the last time you see it. Any voting machine whose physical hardware can print votes onto the ballot after the last time the voter sees the paper,  is not a voter verified paper ballot system, and is not acceptable. But even so–suppose we permit this–we are in a similar situation to PCOS ballots. That is, the serial numbers should be in random order, not consecutive order, because otherwise observers in the polling place could calculate what serial number you’ll get.

And therefore, ballot-comparison audits of BMD-marked ballots run into just the same problem as audits of PCOS-scanned ballots, and maybe the same solutions would apply.

Because of this problem, some manufacturers of BMDs have done the same as manufacturers of PCOS: omit serial numbers entirely. For example, the ExpressVote and ExpressVote XL do not print serial numbers on the ballot*, and therefore their ballots (like PCOS ballots) cannot be easily audited by ballot-level comparison audits (except by a cumbersome “transitive audit”).

*Based on information about the ExpressVote and ExpressVote XL as configured in 2019 and deployed in more than one state, including New Jersey.

Finding a randomly numbered ballot

In my previous posts, I’ve been discussing ballot-level comparison audits, a form of risk-limiting audit. Ballots are imprinted with serial numbers (after they leave the voter’s hands); during the audit, a person must find a particular numbered ballot in a batch of a thousand (more or less).

If the ballot papers are numbered consecutively, that’s not too difficult. But if the serial numbers are in random order, it’s very time-consuming.

An answer to the second puzzle.

So here’s my next idea. Likely I’m not the first to think of it, so I can’t claim much credit. And this idea may or may not be practical; it would need to be tested in practice.

Problem: You have a batch of serial-numbered ballots, like this, and you need to find the one numbered 0236000482.

Take the pile of ballots, and feed them through a high-volume scanner. Scanners that can do 140 pages per minute cost about $6000. The computer attached to the scanner can use OCR (optical character recognition) software just on the corners of the page, to find and recognize the serial number. When it finds the right number, the computer commands the scanner to stop.

Then the human auditor can pick up the last-scanned page, and examine it to make sure it’s the right number.

If the OCR software does not work perfectly (false positives), no harm done: the human sees that it’s the wrong number, and resumes the scanner. False negatives are more annoying, but still recoverable: the human would have to search through the entire pile. Because we don’t rely on the scanner to work perfectly, because the scanner is not counting or tabulating votes, there’s no need to put this equipment through an EAC certification process.

As you’ll notice, the serial number is printed in fairly low-quality, hard-to-read print. This might pose problems for the OCR software. Better-quality printing would help the OCR, but it would help the humans too, and might be worth doing in any case.

Another variant of this solution is to print the serial number as a barcode in addition to human-readable digits. That would be easier for the scanner to recognize. If the PCOS tries to cheat in some way by making the barcode mismatch the human-readable number, this will be detected immediately by the human auditor.

Puzzle number 3: The solution I propose in this article might work; but surely a creative person can find even better ways to support ballot-level comparison audits of PCOS machines.