October 6, 2022

Cross-Layer Security: A Holistic View of Internet Security 

By Henry Birge-Lee, Liang Wang, Grace Cimaszewski, Jennifer Rexford and Prateek Mittal

On February 3, 2022, attackers launched a highly effective attack against the Korean cryptocurrency exchange KLAYswap. We discussed the details of this attack in our earlier blog post “Attackers exploit fundamental flaw in the web’s security to steal $2 million in cryptocurrency.” However, in that post we only scratched the surface of potential countermeasures that could prevent such attacks. In this new post, we will discuss how we can defend the web ecosystem against attacks like these. This attack was composed of multiple exploits at different layers of the network stack. We term attacks like this,  “cross-layer attacks,” and offer our perspective on why they are so effective. Furthermore, we propose a practical defense strategy against them that we call “cross-layer security.” 

As we discuss below, cross-layer security involves security technologies at different layers of the network stack working in harmony to defend vulnerabilities that are difficult to catch at a single layer alone.

At a high level, the adversary’s attack affected many layers of the networking stack:

  • The network layer is responsible for providing reachability between hosts on the Internet. The first part of the adversary’s attack involved targeting the network layer with a Border Gateway Protocol (BGP) attack that manipulated routes to hijack traffic intended for the victim.
  • The session layer is responsible for secure end-to-end communication over the network. To attack the session layer, the adversary leveraged its attack on the network layer to obtain a digital certificate for the victim’s domain from a trusted Certificate Authority (CA). With this digital certificate the adversary established encrypted and secure TLS sessions with KLAYswap users.
  • The application layer is responsible for interpreting and processing data that is sent over the network. The adversary used the hijacked TLS sessions with KLAYswap customers to serve malicious Javascript code that compromised the KLAYswap web application and caused users to unknowingly transfer their funds to the adversary.

The difficulty of fully protecting against cross-layer vulnerabilities like these is that they exploit the interactions between the different layers involved: a vulnerability in the routing system can be used to exploit a weak link in the PKI, and even the web-development ecosystem is involved in this attack because of the way javascript is loaded. The cross-layer nature of these vulnerabilities often leads developers working in each layer to dismiss the vulnerability as a problem with other layers. 

There have been several attempts to secure the web against these kinds of attacks at the HTTP layer. Interestingly, these technologies often ended up dead-in-the-water (as was the case with HTTP pinning and Extended Validation certificates). This is because the HTTP layer alone does not have the routing information needed to properly detect these attacks and can only rely on information that is available to end-user applications. This potentially causes HTTP-only defenses to block connections when benign events take place, like when a domain chooses to move to a new hosting provider or changes its certificate configuration because these look very similar to routing attacks at the HTTP layer. 

Due to the cross-layer nature of these vulnerabilities, we need a different mindset to fix the problem: people at all layers need to fully deploy any security solutions that are realistic at that layer. As we will explain below, there is no silver bullet that can be quickly deployed at any layer; instead, our best hope is more modest (but easier to deploy) security improvements for all the layers involved. Working under a “the other layer will fix the problem” attitude simply perpetuates these vulnerabilities.

Below are some short-term and ideal long-term expectations for each layer of the stack involved in these attacks. While in theory, any layer implementing one of these “long-term” security improvements could drastically reduce the attack surface, these technologies have still not seen the type of deployment needed for us to rely on them in the short term. On the other hand, all the technologies in the short-term list have seen some degree of production-level/real-world deployment and are something members of these communities can start using today without much difficulty.

Short-Term ChangesLong-Term Goals
Web apps (application layer)Reduce the use of code loaded from external domainsSign and authenticate all code being executed
The PKI/TLS (session layer)Universally deploy multiple vantage point validationAdopt a technology to verify identity based on cryptographically-protected DNSSEC which provides security in the presence of powerful network attacks
Routing (network layer)Sign and verify routes with RPKI and follow the security practices outlined by MANRSDeploy BGPSec for near-complete elimination of routing attacks

To elaborate:

At the application layer: Web apps are downloaded over the Internet and are completely decentralized. For the time being, there is no mechanism in place to universally vouch for the authenticity of code or content that is contained in a web app. If an adversary can obtain a TLS certificate for google.com and intercept your connection to Google, your browser (right now) will have no way of knowing that it is being served content that did not actually come from Google’s servers. However, developers can remember that any third-party-dependency (particularly those loaded from different domains) can be a third-party-vulnerability and limit the use of third-party code on their website (or host third-party code locally to reduce the attack surface). Furthermore, both locally hosted and third-party hosted content can be secured with subresource integrity where a cryptographic hash (included on the webpage) vouches for the integrity of dependencies. This lets developers provide cryptographic signatures for the dependencies on their webpage. Doing this vastly reduces the attack surface forcing the attacks to target only a single connection with the victim’s web server as opposed to the many different connections involved in retrieving different dependencies.

At the session layer: CAs need to establish the identity of customers requesting certificates and, while there are proposals to use cryptographic DNSSEC to verify identity (like DANE), the status quo is to verify identity via network communications with the domains listed in certificate requests. Thus, global routing attacks are likely to be very effective against CAs unless we make more substantial changes to the way certificates are issued. But this does not mean all hope is lost. Many network attacks are not global but are actually localized to a specific part of the Internet. CAs are capable of mitigating these attacks by verifying domains from several vantage points spread throughout the Internet. This allows some of the CAs vantage points to be unaffected by the attack and communicate with the legitimate domain owner. Our group at Princeton designed multiple vantage point validation and worked with the world’s largest web PKI CA Let’s Encrypt to develop the first ever production deployment of it. CAs can and should use multiple vantage points to verify domains making them immune to localized network attacks and ensuring that they see a global perspective on routing.

At the network layer: In routing, protecting against all BGP attacks is difficult. It requires expensive public-key operations on every BGP update using a protocol called BGPsec that current routers do not support. However, recently there has been significantly increased adoption of a technology called the Resource Public Key Infrastructure (RPKI) that prevents global attacks by establishing a cryptographic database of which networks on the Internet control which IP address blocks. Importantly, when properly configured, RPKI also specifies what size IP prefix should be announced which prevents global and highly-effective sub-prefix attacks. In a sub-prefix attack the adversary announces a longer, more-specific IP prefix than the victim and benefits from longest-prefix-match routing to have its announcement preferred by the vast majority of the Internet. RPKI is fully compatible with current router hardware. The only downside is that RPKI can still be evaded with certain local BGP attacks where, instead of claiming to own the victim’s IP address which is checked against the database, an adversary simply claims to be an Internet provider of the victim. The full map of which networks are connected to which other networks is not currently secured by the RPKI. This leaves a window for some types of BGP attacks which we have seen in the wild. However the impact of these attacks is significantly reduced and often affects only a part of the Internet. In addition, the MANRS project provides recommendations for best operational practices including RPKI that help prevent and mitigate BGP hijacks.

Using Cross-Layer Security to Defend Cross-Layer Attacks

Looking across these layers we see a common trend: in every layer there are proposed security technologies that could potentially stop attacks like the KLAYswap attack. However, these technologies all face deployment challenges. In addition, there are more modest technologies that are seeing extensive real-world deployment today. But each of these deployed technologies alone can be evaded by an adaptive adversary. For example, RPKI can be evaded by local attacks, multiple-vantage-point validation can be evaded by global attacks, etc. However, if we instead look at the benefit offered by all of these technologies together deployed at different layers, things look more promising. Below is a table summarizing this:

Security Technology/LayerGood at detecting routing attacks which affect the entire InternetGood at detecting routing attacks which affect part of the InternetLimits the number of potential targets for routing attacks
RPKI at the Network LayerYesNoNo
Multiple-Vantage-Point Validation at the Session LayerNoYesNo
Subresource Integrity and Locally Hosted Content at the Application LayerNoNoYes

This synergy of security technologies deployed at different layers is what we call cross-layer-security. RPKI alone can be evaded by clever adversaries (using attack techniques we are seeing more and more in the wild). However, the attacks that evade RPKI tend to be local (i.e., not affecting the entire Internet). This synergizes with multiple-vantage-point validation that is best at catching local attacks. Furthermore, because even these two technologies working together do not fully eliminate the attack surface, improvements at the web layer that reduce the reliance on code loaded from external domains help to even further reduce the attack surface. At the end of the day, the entire web ecosystem can benefit tremendously from each layer deploying security technologies that leverage the information and tools available exclusively to that layer. Furthermore, when working in unison, these technologies together can do something that none of them could do alone: stop cross-layer attacks.

Cross-layer attacks are surprisingly effective because no one layer has enough information about the attack to completely prevent it. Hopefully, each layer does have the ability to protect against a different portion of the attack surface. If developers across these different communitie know what type of security is realistic and expected of their layer in the stack, we will see some meaningful improvements.

Even though the ideal endgame is to deploy a security technology that is capable of fully defending against cross-layer attacks, we have not yet seen wide scale adoption of any such technology. In the meantime if we continue to solely focus security against cross-layer attacks in a single layer, these attacks will take significantly longer to protect against. Changing our mindset and seeing the strengths and weaknesses of each layer lets us protect against these attacks much more quickly by increasing the use of  synergistic technologies at different layers that have already seen real-world deployment.

The anomaly of cheap complexity

Why are our computer systems so complex and so insecure?  For years I’ve been trying to explain my understanding of this question. Here’s one explanation–which happens to be in the context of voting computers, but it’s a general phenomenon about all our computers:

There are many layers between the application software that implements an electoral function and the transistors inside the computers that ultimately carry out computations. These layers include the election application itself (e.g., for voter registration or vote tabulation); the user interface; the application runtime system; the operating system (e.g., Linux or Windows); the system bootloader (e.g., BIOS or UEFI); the microprocessor firmware (e.g., Intel Management Engine); disk drive firmware; system-on-chip firmware; and the microprocessor’s microcode. For this reason, it is difficult to know for certain whether a system has been compromised by malware. One might inspect the application-layer software and confirm that it is present on the system’s hard drive, but any one of the layers listed above, if hacked, may substitute a fraudulent application layer (e.g., vote-counting software) at the time that the application is supposed to run. As a result, there is no technical mechanism that can ensure that every layer in the system is unaltered and thus no technical mechanism that can ensure that a computer application will produce accurate results. 

[Securing the Vote, page 89-90]

So, computers are insecure because they have so many complex layers.

But that doesn’t explain why there are so many layers, and why those layers are so complex–even for what “should be a simple thing” like counting up votes.

Recently I came across a really good explanation: a keynote talk by Thomas Dullien entitled “Security, Moore’s law, and the anomaly of cheap complexity” at CyCon 2018, the 10th International Conference on Cyber Conflict, organized by NATO.

Thomas Dullien’s talk video is here, but if you want to just read the slides, they are here.

As Dullien explains,

A modern 2018-vintage CPU contains a thousand times more transistors than a 1989-vintage microprocessor.  Peripherals (GPUs, NICs, etc.) are objectively getting more complicated at a superlinear rate. In his experience as a cybersecurity expert, the only thing that ever yielded real security gains was controlling complexity.  His talk examines the relationship between complexity and failure of security, and discusses the underlying forces that drive both.

Transistors-per-chip is still increasing every year; there are 3 new CPUs per human per year.  Device manufacturers are now developing their software even before the new hardware is released.  Insecurity in computing is growing faster than security is improving.

The anomaly of cheap complexity.  For most of human history, a more complex device was more expensive to build than a simpler device.  This is not the case in modern computing. It is often more cost-effective to take a very complicated device, and make it simulate simplicity, than to make a simpler device.  This is because of economies of scale: complex general-purpose CPUs are cheap.  On the other hand, custom-designed, simpler, application-specific devices, which could in principle be much more secure, are very expensive.  

This is driven by two fundamental principles in computing: Universal computation, meaning that any computer can simulate any other; and Moore’s law, predicting that each year the number of transistors on a chip will grow exponentially.  ARM Cortex-M0 CPUs cost pennies, though they are more powerful than some supercomputers of the 20th century.

The same is true in the software layers.  A (huge and complex) general-purpose operating system is free, but a simpler, custom-designed, perhaps more secure OS would be very expensive to build.  Or as Dullien asks, “How did this research code someone wrote in two weeks 20 years ago end up in a billion devices?”

Then he discusses hardware supply-chain issues: “Do I have to trust my CPU vendor?”  He discusses remote-management infrastructures (such as the “Intel Management Engine” referred to above):  “In the real world, ‘possession’ usually implies ‘control’. In IT, ‘possession’ and ‘control’ are decoupled. Can I establish with certainty who is in control of a given device?”

He says, “Single bitflips can make a machine spin out of control, and the attacker can carefully control the escalating error to his advantage.”  (Indeed, I’ve studied that issue myself!)

Dullien quotes the science-fiction author Robert A. Heinlein:

“How does one design an electric motor? Would you attach a bathtub to it, simply because one was available? Would a bouquet of flowers help? A heap of rocks? No, you would use just those elements necessary to its purpose and make it no larger than needed — and you would incorporate safety factors. Function controls design.” 

 Heinlein, The Moon Is A Harsh Mistress

and adds, “Software makes adding bathtubs, bouquets of flowers, and rocks, almost free. So that’s what we get.”

Dullien concludes his talk by saying, “When I showed the first [draft of this talk] to some coworkers they said, ‘you really need to end on a more optimistic note.”  So Dullien gives optimism a try, discussing possible advances in cybersecurity research; but still he gives us only a 10% chance that society can get this right.


Postscript:  Voting machines are computers of this kind.  Does their inherent insecurity mean that we cannot use them for counting votes?  No. The consensus of election-security experts, as presented in the National Academies study, is: we should use optical-scan voting machines to count paper ballots, because those computers, when they are not hacked, are much more accurate than humans.  But we must protect against bugs, against misconfigurations, against hacking, by always performing risk-limiting audits, by hand, of an appropriate sample of the paper ballots that the voters marked themselves.

Toward Trustworthy Machine Learning: An Example in Defending against Adversarial Patch Attacks (2)

By Chong Xiang and Prateek Mittal

In our previous post, we discussed adversarial patch attacks and presented our first defense algorithm PatchGuard. The PatchGuard framework (small receptive field + secure aggregation) has become the most popular defense strategy over the past year, subsuming a long list of defense instances (Clipped BagNet, De-randomized Smoothing, BagCert, Randomized Cropping, PatchGuard++, ScaleCert, Smoothed ViT, ECViT). In this post, we will present a different way of building robust image classification models: PatchCleanser. Instead of using small receptive fields to suppress the adversarial effect, PatchCleanser directly masks out adversarial pixels in the input image. This design makes PatchCleanser compatible with any high-performance image classifiers and achieve state-of-the-art defense performance.

PatchCleanser: Removing the Dependency on Small Receptive Fields

The limitation of small receptive fields. We have seen the small receptive field plays an important role in PatchGuard: it limits the number of corrupted features and lays a foundation for robustness. However, the small receptive field also limits the information received by each feature; as a result, it hurts the clean model performance (when there is no attack). For example, the PatchGuard models (BagNet+robust masking) can only have a 55%-60% clean accuracy on the ImageNet dataset while state-of-the-art undefended models, which all have large receptive fields, can achieve an accuracy of 80%-90%.

This huge drop in clean accuracy discourages the real-world deployment of PatchGuard-style defenses. A natural question to ask is:

Can we achieve strong robustness without the use of small receptive fields? 

YES, we can. We propose PatchCleanser with an image-space pixel masking strategy to make the defense compatible with any state-of-the-art image classification model (with larger receptive fields).

A pixel-masking defense strategy. The high-level idea of PatchCleanser is to apply pixel masks to the input image and evaluate model predictions on masked images. If a mask removes the entire patch, the attacker has no influence over the classification, and thus any image classifier can make an accurate prediction on the masked image. However, the challenge is: how can we mask out the patch, especially when the patch location is unknown? 

Pixel masking: the first attempt. A naive approach is to choose a mask and apply it to all possible image locations. If the mask is large enough to cover the entire patch, then at least one mask location can remove all adversarial pixels. 

We provide a simplified visualization below. When we apply masks to an adversarial image (top of the figure), the model prediction is correct as “dog” when the mask removes the patch at the upper left corner. Meanwhile, the predictions on other masked images are incorrect since they are influenced by the adversarial patch — we see a prediction disagreement among different masked images. On the other hand, when we consider a clean image (bottom of the figure), the model predictions usually agree on the correct label since both we and the classifier can easily recognize the partially occluded dog. 

visual examples of one-mask predictions

Based on these observations, we can use the disagreement in one-mask predictions to detect a patch attack; a similar strategy is used by the Minority Reports defense, which takes inconsistency in the prediction voting grid as an attack indicator. However, can we recover the correct prediction label instead of merely detecting an attack? Or equivalently, how can we know which mask removes the entire patch? Which class label should an image classifier trust — dog, cat, or fox?  

Pixel masking: the second attempt. The solution turns out to be super simple: we can perform a second round of masking on the one-masked images (see visual examples below). If the first-round mask already removes the patch (top of the figure), then our second-round masking is applied to a “clean” image, and thus all two-mask predictions will have a unanimous agreement. On the other hand, if the patch is not removed by the first-round mask (bottom of the figure), the image is still “adversarial”. We will then see a disagreement in two-mask predictions; we shall not trust the prediction labels.

visual examples of two-mask predictions

In our PatchCleanser paper, we further discuss how to generate a mask set such that at least one mask can remove the entire patch regardless of the patch location. We further prove that if the model predictions on all possible two-masked images are correct, PatchCleanser can always make correct predictions. 

PatchCleanser performance. In the figure below, we plot the defense performance on the 1000-class ImageNet dataset (against a 2%-pixel square patch anywhere on the image). We can see that PatchCleanser significantly outperforms prior works (which are all PatchGuard-style defenses with small receptive fields). Notably, (1) the certified robust accuracy of PatchCleanser (62.1%) is even higher than the clean accuracy of all prior defenses, and (2) the clean accuracy of PatchCleanser (83.9%) is similar to vanilla undefended models! These results further demonstrate the strength of defenses that are compatible with any state-of-the-art classification models (with large receptive fields). 

Clean accuracy and certified robust accuracy on the ImageNet dataset; certified robust accuracy evaluated against a 2%-pixel square patch anywhere on the image

(certified robust accuracy is a provable lower bound on model robust accuracy; see the PatchCleanser paper for more details)

Takeaways. In PatchCleanser, we demonstrate that small receptive fields are not necessary for strong robustness. We design an image-space pixel masking strategy that is compatible with any image classifier. The compatibility allows us to use state-of-the-art image classifiers and achieves significant improvements over prior works.

Conclusion: Using Logical Reasoning for Building Trustworthy ML systems

In the era of big data, we have been amazed at the power of statistical reasoning/learning: an AI model can automatically extract useful information from a large amount of data and significantly outperforms manually designed models (e.g., hand-crafted features). However, these learned models can have unexpected behaviors when encountered with “adversarial examples” and thus lacks reliability for security-critical applications. In our posts, we demonstrate that we can additionally apply logical reasoning to statistically learned models to achieve strong robustness. We believe the combination of logical and statistical reasoning is a promising and important direction for building trustworthy ML systems.

Additional Reading

Toward Trustworthy Machine Learning: An Example in Defending against Adversarial Patch Attacks

By Chong Xiang and Prateek Mittal

Thanks to the stunning advancement of Machine Learning (ML) technologies, ML models are increasingly being used in critical societal contexts — such as in the courtroom, where judges look to ML models to determine whether a defendant is a flight risk, and in autonomous driving,  where driverless vehicles are operating in city downtowns. However, despite the advantages, ML models are also vulnerable to adversarial attacks, which can be harmful to society. For example, an adversary against image classifiers can augment an image with an adversarial pixel patch to induce model misclassification. Such attacks raise questions about the reliability of critical ML systems and have motivated the design of trustworthy ML models. 

In this 2-part post on trustworthy machine learning design, we will focus on ML models for image classification and discuss how to protect them against adversarial patch attacks. We will first introduce the concept of adversarial patches and then present two of our defense algorithms: PatchGuard in Part 1 and PatchCleanser in Part 2.

Adversarial Patch Attacks: A Threat in the Physical World

The adversarial patch attack, first proposed by Brown et al., targets image recognition models (e.g., image classifiers). The attacker aims to overlay an image with a carefully generated adversarial pixel patch to induce models’ incorrect predictions (e.g., misclassification). Below is a visual example of the adversarial patch attack against traffic sign recognition models: after attaching an adversarial patch, the model prediction changes from “stop sign” to “speed limit 80 sign” incorrectly.

a visual example of adversarial patch attacks taken from Yakura et al.

Notably, this attack can be realized in the physical world. An attacker can print and attach an adversarial patch to a physical object or scene. Any image taken from this scene then becomes an adversarial image. Just imagine that a malicious sticker attached to a stop sign confuses the perception system of an autonomous vehicle and eventually leads to a serious accident! This threat to the physical world motivates us to study mitigation techniques against adversarial patch attacks.

Unfortunately, security is never easy. An attacker only needs to find one strategy to break the entire system while a defender has to defeat as many attack strategies as possible. 

In the remainder of this post, we discuss how to make an image classification model as secure as possible: able to make correct and robust predictions against attackers who know everything about the defense and who might attempt to use an adversarial patch at any image location and with any malicious content.

This as-robust-as-possible notion is referred to as provable, or certifiable, robustness in the literature. We refer interested readers to the PatchGuard and PatchCleanser papers for its formal definitions and security guarantees

PatchGuard: A Defense Framework Using Small Receptive Field + Secure Aggregation

PatchGuard is a defense framework for certifiably robust image classification against adversarial patch attacks. Its design is motivated by the following question:

How can we ensure that the model prediction is not hijacked by a small localized patch? 

We propose a two-step defense strategy: (1) small receptive fields and (2) secure aggregation. The use of small receptive fields limits the number of corrupted features, and secure aggregation on a partially corrupted feature map allows us to make robust final predictions.  

Step 1: Small Receptive Fields. The receptive field of an image classifier (e.g., CNN) is the region of the input image that a particular feature looks at (or is influenced by). The model prediction is based on the aggregation of features extracted from different regions of an image. By using a small receptive field, we can ensure that only a limited number of features “see” the adversarial patch. 

The example below illustrates that the adversarial patch can only corrupt one feature — the red vector on the right – when we use a model with small receptive fields, marked with red and green boxes over the images.

.

We provide another example below for large receptive fields. The adversarial pixels appear in the receptive fields – the area inside the red boxes – of all four features and lead to a completely corrupted feature map, making it nearly impossible to recover the correct prediction. 

Step 2: Secure Aggregation. The use of small receptive fields limits the number of corrupted features and translates the defense into a secure aggregation problem: how can we make a robust prediction based on a partially corrupted feature map? Here, we can use any off-the-shelf robust statistics techniques (e.g., clipping and median) for feature aggregation. 

In our paper, we further propose a more powerful secure aggregation technique named robust masking. Its design intuition is to identify and remove abnormally large features. This mechanism introduces a dilemma for the attacker. If the attacker wants to launch a successful attack, it needs to either introduce large malicious feature values that will be removed by our defense, or use small feature values that can evade the masking operation, but are not malicious enough to cause misclassification. 

This dilemma further allows us to analyze the defense robustness. For example, if we consider a square patch that occupies 1% of image pixels, we can calculate the largest number of corrupted features and quantitatively reason about the worst-case feature corruption (details in the paper). Our evaluation shows that, for 89.0% of images in the test set of the ImageNette dataset (a 10-class subset of the ImageNet dataset), our defense can always make correct predictions, even when the attacker has full access to our defense setup and can place a 1%-pixel square patch at any image location and with any malicious content. We note that the result of 89.0% is rigorously proved and certified in our paper, giving the defense theoretical and formal security guarantees. 

Furthermore, PatchGuard is also scalable to more challenging datasets like ImageNet. We can achieve certified robustness for 32.2% of ImageNet test images, against a 1%-pixel square patch. Note that the ImageNet dataset contains images from 1000 different categories, which means that an image classifier that makes random predictions can only correctly classify roughly 1/1000=0.1% of images.

Takeaways

The high-level contribution of PatchGuard is a two-step defense framework: small receptive field and secure aggregation. This simple approach turns out to be a very powerful strategy: the PatchGuard framework subsumes most of the concurrent (Clipped BagNet, De-randomized Smoothing) and follow-up works (BagCert, Randomized Cropping, PatchGuard++, ScaleCert, Smoothed ViT, ECViT). We refer interested readers to our robustness leaderboard to learn more about state-of-the-art defense performance.

Conclusion

In this post, we discussed the threat of adversarial patch attacks and presented our PatchGuard defense algorithm (small receptive field and secure aggregation). This defense example demonstrates one of our efforts toward building trustworthy ML models for critical societal applications such as autonomous driving.

In the second part of this two-part post, we will present PatchCleanser — our second example for designing robust ML algorithms.

Most top websites are not following best practices in their password policies

By Kevin Lee, Sten Sjöberg, and Arvind Narayanan

Compromised passwords have consistently been the number one cause of data breaches by far, yet passwords remain the most common means of authentication on the web. To help, the information security research community has established best practices for helping users create stronger passwords. These include:

  • Block weak passwords that have appeared in breaches or can be easily guessed.
  • Use a strength meter to give users helpful real-time feedback. 
  • Don’t force users to include specific character-classes in their passwords. 

While these recommendations are backed by rigorous research, no one has thoroughly investigated whether websites are heeding the advice.

In a new study, we empirically evaluated compliance with these best practices. We reverse-engineered the password policies at 120 of the top English-language websites, like Google, Facebook, and Amazon. We found only 15 of them were following best practices. The remaining 105 / 120 either leave users at risk for password compromise or frustrated from being unable to use a sufficiently strong password (or both). The following table summarizes our findings:

We compare our key findings with best practices from prior research.

We found that more than half of the websites allowed the most common passwords, like “123456”, to be used. Attackers can guess these passwords with minimal effort, which opens the door to account hijacking.

Amazon allowed us to change the password on our account to “11111111”, a common and easily-guessed password.

Few websites had adopted strength meters, and of those, we found websites misusing meters to encourage complex passwords over strong, hard-to-guess passwords (e.g., preferring the predictable “Password123” over “bdmt7gg82nkc”—which we had randomly generated on our password manager). This not only defeats the purpose of password strength meters, but can lead to more user frustration.

Facebook using its password strength meter as a nudge towards incorporating specific character types in passwords.

Finally, we found almost half of the websites requiring users to include specific character-classes in their password, despite decades of research against it and outcry from users themselves

Intuit requires passwords include uppercase characters, lowercase characters, numbers, and symbols.

Our study reveals a huge gap between research and practice when it comes to password policies. Passwords have been heavily researched, yet few websites have implemented password policies that reflect the lessons learned. At the same time, research has not paid attention to practice. In our paper, we discuss ways for both sides to come together to address this disconnect. One idea for future research: directly engage with system administrators, in order to understand their mindset on password security. Perhaps password policy is meant to be security theater—giving users a sense of safety without actually improving security. Or maybe websites have shifted their attention to adopting other authentication technologies, like SMS-based multi-factor authentication (which also suffers from severe weaknesses, as we discovered in previous research on SIM swaps and number recycling). Perhaps websites have to deal with security audits from firms like Deloitte recommending outdated practices. Or maybe websites face other practical constraints that the information security community doesn’t know about. 

Our peer-reviewed paper is located at passwordpolicies.cs.princeton.edu.