January 20, 2018

How have In-Flight Web Page Modification Practices Changed over the Past Ten Years?

When we browse the web, there are many parties and organizations that can see which websites we visit, because they sit on the path between web clients (our computers and mobile devices), and the web servers hosting the sites we request. Most obviously, Internet Service Providers (ISPs) are responsible for transmitting our web traffic, but reports (e.g. [1], [2], [3]) have shown that they may also inject ads into users’ requested web pages to increase revenue. Other parties may also intercept our web traffic for a wide variety of reasons: content-distribution networks (or CDNs) receive requests for websites that are geographically farther away to speed up response time, enterprise software and programs running on our devices may check incoming websites for added security or privacy before passing the website to our browser, and malicious adversaries may attempt to inject malware into requested web content before we receive it.

 

In 2007, a research group at the University of Washington conducted a study to measure how often these web page modifications occur in practice, and to determine who is responsible for the modifications. Web page modifications were identified using a small piece of software embedded in a test web page, a so-called “web tripwire”, that compared a known good representation of the web page with the version of the test web page users saw in their browsers. The researchers then attributed the modifications to ISPs, malicious attackers, and client software such as ad blockers, using IP addresses and by finding identifying keywords in the injected web content. They found that only about 1.3% of participating web clients saw page modifications. But much about how we interact with and browse the web has changed over the past ten years. More specifically, with the emergence of mobile technologies and new network parties such as CDNs, it is important to learn if and how these new developments have affected in-flight modification practices.

 

We invite you to take part in our research study. Following the same setup as the UW study, we have created a test web page containing a “web tripwire”. If it detects any in-flight page modifications in our test page, it sends us a copy of the modified version of our web page that your browser received. We minimize the information that we collect to detect page modifications. In addition to page modification data, we only record information that web servers normally record, such as IP address, browser type, date and time of page request, and a cookie to differentiate between users. We will permanently remove any personal information found in the page modifications before sending the modification data to our servers.

 

By participating in this study, you are helping us gather information crucial for guiding research and building tools to improve web privacy. If you’re willing to contribute to our study, it’s as simple as visiting our test web page: http://stormship.cs.princeton.edu. If possible, we also ask you to visit our page through multiple different devices and browsers, as this will help diversify our collected data. Our test page contains more details about our study, and we will post our results there when we have completed our measurements.

Please reach out to or with any questions, concerns, or feedback. We greatly appreciate your help in our efforts to improve web privacy!

Routing Detours: Can We Avoid Nation-State Surveillance?

Since 2013, Brazil has taken significant steps to build out their networking infrastructure to thwart nation-state mass surveillance.  For example, the country is deploying a 3,500-mile fiber cable from Fortaleza, Brazil to Portugal; they’ve switched their government email system from Microsoft Outlook to a state-built system called Expresso; and they now have the largest IXP ecosystem in the world.  All of these measures aim to prevent the country’s Internet traffic from traversing the United States, thereby preventing the United States from conducting surveillance on their citizens’ data.  But Brazil isn’t the only country that has concerns about their Internet traffic passing through the United States.  Deutsche Telekom lobbied for tougher privacy protection by keeping German traffic within its national borders.  Canadian traffic has been found to routinely pass through the United States, which is a violation of Canadian network sovereignty.  Russian president Putin has called for “better protection of communication networks” and passed a law that requires foreign companies to keep Russian users’ data on servers inside the country.  

To quantify which countries Internet traffic traverses and measure how successful any particular country might be at detouring its traffic around known surveillance states, we actively measured and analyzed the traffic originating in five different countries: Brazil, Netherlands, Kenya, India, and the United States.  

  • First, to understand the current state of transnational routing (the “status quo”), we measured the country-level traffic paths for the Alexa Top 100 domains in each respective country using RIPE Atlas probes and the MaxMind geolocation service.  
  • Next, we measured how successful clients in Brazil, Netherlands, Kenya, India, and the United States might be at avoiding other countries of interest using open DNS resolvers and using an overlay network.  

The rest of this post summarizes these two parts of the study and highlights some of the results.

The Status Quo: Even Local Traffic Can Detour through Surveillance States

Despite the extreme efforts of certain countries to “keep local traffic local”, and in particular to avoid having traffic traverse the United States, our measurement study indicates that these goals have not yet been reached, for two reasons: 1) lack of domain hosting diversity and 2) lack of routing diversity.

Lack of Hosting Diversity. We find that hosting for many popular websites lacks diversity. We found that about half of the Alexa Top 100 domains are hosted in a single country; in these cases, a user cannot avoid the domain’s hosting country when accessing it.  In many cases, even popular local websites are hosted outside the country where citizens are trying to access them.  For example, more than 50% of the top domains in Brazil and India are hosted in the United States; in total, about 50% of the .br domains are hosted outside Brazil. More hosting diversity, as could be enabled with CDNs, would allow for the potential to avoid more countries more often.

Lack of Geographic Diversity. Internet paths also lack geographic diversity: about half of the paths originating in Kenya to the most popular Kenyan websites traverse the United States or Great Britain.  Much of this phenomenon is due to “tromboning,” whereby an Internet path starts and ends in a country, yet transits an intermediate country; for example, about 13% of the paths that we explored from RIPE Atlas probes in Brazil to the top domains in Brazil trombone through the United States. More than 50% of the paths from the Netherlands to their top domains transit the United States, and about half of Kenyan paths traverse the United States and Great Britain.

Towards User-Controlled Routing Detours

We next asked whether clients could take advantage of the fact that many popular websites are georeplicated, coupled with a client’s ability to selectively “bounce” packets through overlay nodes, might give some users opportunities to avoid certain countries. We studied whether users could exploit open DNS resolvers to discover hosting diversity, and overlay network relays to intentionally introduce routing detours. Previous work in overlay networks, such as RON, tries to route around failures, whereas our work tries to route around countries.  Our results show that in some cases, users can select paths to specifically avoid certain countries; in cases where local traffic leaves the country only to return (a phenomenon sometimes called “tromboning”), the use of local relays can sometimes ensure that local traffic stays within the country.  For example, without these techniques, Brazilian traffic transited Spain, Italy, France, Great Britain, Argentina, Ireland (among others). Using even a modest overlay network deployment of 12 relays across 10 countries (Brazil, United States, Ireland, Germany, Spain, France, Singapore, Japan, South Korea, and Australia), clients in Brazil could completely avoid these countries for the top 100 domains.  The overlay network can also be used to keep local traffic local; the percentage of tromboning paths from Brazil decreases from 13.2% of domestic paths to 9.7%.

Unfortunately, some of the more prominent surveillance states are also some of the least avoidable countries.  Most countries depend highly on the United States for connectivity to other locations on the Internet.  Neither Brazil, India, Kenya, nor the Netherlands can completely avoid the United States with the country avoidance techniques.  The inability of these techniques to successfully avoid the United States typically results from the lack of hosting diversity for many websites, which are solely hosted in the United States. Using the overlay network, both Brazilian and Netherlands clients were able to avoid the United States for about 65% of sites; even in these cases, the United States is completely unavoidable for about 10% of sites.  Traffic from Kenya can avoid the United States for only about 40% of the top domains.  On the other hand, the United States can avoid every other country for all sites, with the exception of France and the Netherlands which the United States can nonetheless avoid for 99% of the top 100 domains.  

More Information and Next Steps

A more detailed writeup is available on the RANSOM project website (https://ransom.cs.princeton.edu/). Encouraged by the ability to use overlay networks to avoid surveillance states in certain cases, we are in the process of designing and building a RANSOM prototype. We welcome feedback on this project as we embark on the next steps.

Security Audit of Safeplug "Tor in a Box"

Last month at the FOCI workshop, we presented a security analysis of the Safeplug, a $49 box which promised users “complete security and anonymity” online by sending all of their web traffic through the Tor onion routing network. Safeplug claims to offer greater usability, particularly for non-technical customers, than the state-of-the-art in anonymous Internet browsing: the Tor Browser Bundle (TBB). However, we found that the hardened browser in the TBB is very important for security, and we found a number of usability and security problems with the Safeplug, including the ability for a local or remote attacker to silently turn off Tor or modify other device settings.  Our research concluded that users should run the Tor Browser Bundle if they can; if not, then there is some value in a torifying proxy like Safeplug as long as users are aware of its limitations.  For the rest of this post I’ll review our findings and highlight the differences and tradeoffs between the Tor Browser Bundle and a torifying proxy, like the Safeplug.
[Read more…]