July 18, 2019

Demystifying The Dark Web: Peeling Back the Layers of Tor’s Onion Services

by Philipp Winter, Annie Edmundson, Laura Roberts, Agnieskza Dutkowska-Żuk, Marshini Chetty, and Nick Feamster

Want to find US military drone data leaks online? Frolick in a fraudster’s paradise for people’s personal information? Or crawl through the criminal underbelly of the Internet? These are the images that come to most when they think of the dark web and a quick google search for “dark web” will yield many stories like these. Yet, far less is said about how the dark web can actually enhance user privacy or overcome censorship by enabling anonymous browsing through Tor. Recently, for example, Brave, dedicated to protecting user privacy, integrated Tor support to help users surf the web anonymously from a regular browser. This raises questions such as: is the dark web for illicit content and dealings only? Can it really be useful for day-to-day web privacy protection? And how easy is it to use anonymous browsing and dark web or “onion” sites in the first place?

To answer some of these pressing questions, we studied how Tor users use onion services. Our work will be presented at the upcoming USENIX Security conference in Baltimore next month and you can read the full paper here or the TLDR version here.

What are onion services?: Onion services were created by the Tor project in 2004. They not only offer privacy protection for individuals browsing the web but also allow web servers, and thus websites themselves, to be anonymous. This means that any “onion site” or dark web site cannot be physically traced to identify those running the site or where the site is hosted. Onion services differ from conventional web services in four ways. First, they can only be accessed over the Tor network. Second, onion domains, (akin to URLs for the regular web), are hashes over their public key and consist of a string of letters and numbers, which make them long, complicated, and difficult to remember. These domains sometimes contain prefixes that are human-readable but they are expensive to generate (e.g. torprojectqyqhjn.onion). We refer to these as vanity domains. Third, the network path between the client and the onion service is typically longer, meaning slower performance owing to longer latencies. Finally, onion services are private by default, meaning that to find and use an onion site, a user has to know the onion domain, presumably by finding this information organically, rather than with a search engine.

What did we do to investigate how Tor users make use of onion services?: We conducted a large scale survey of 517 Tor users and interviewed 17 Tor users in depth to determine how users perceive, use, and manage onion services and what challenges they face in using these services. We asked our participants about how they used Tor’s onion services and how they managed onion domains. In addition, we asked users about their expectations of privacy and their privacy and security concerns when using onion services. To compliment our qualitative data, we analyzed “leaked” DNS lookups to onion domains, as seen from a DNS root server. This data gave us insights into actual usage patterns to corroborate some of the findings from the interviews and surveys. Our final sample of participants were young, highly educated, and comprised of journalists, whistleblowers, everyday users wanting to protect their privacy to those doing competitive research on others and wanting to avoid being “outed”. Other participants included activists and those who wanted to avoid government detection for fear of persecution or worse.

What were the main findings? First, unsurprisingly, onion services were mostly used for anonymity and security reasons. For instance, 71% of survey respondents reported using onion services to protect their identity online. Almost two thirds of the survey respondents reported using onion services for non-browsing activities such as TorChat, a secure messaging app built on top of onion services. 45% of survey participants had other reasons for using Tor such as to help educate users about the dark web or for their personal blogs. Only 27% of survey respondents reported using onion services to explore the dark web and its content “out of curiosity”.

Second, users had a difficult time finding, tracking, and saving onion links. Finding links: Almost half of our survey respondents discovered onion links through social media such as Twitter or Reddit or by randomly encountering links while browsing the regular web. Fewer survey respondents discovered links through friends and family. Challenges users mentioned for finding onion services included:

  • Onion sites frequently change addresses and so often onion domain aggregators have broken and out of date links.
  • Unlike traditional URLS, onion links give no indication of the content of the website so it is difficult to avoid potentially offensive or illicit content.
  • Again, unlike traditional URLS, participants said it is hard to determine through a glance at the address bar if a site is the authentic one you are trying to reach instead of a phishing site.

A frequent wish expressed by participants was for a better search engine that is more up to date and gives an indication of the content before one clicks on the link as well as authenticity of the site itself.

Tracking and Saving links: To track and save complicated onion domains, many participants opted to bookmark links but some did not want to leave a trace of websites they visited on their machines. The majority of other survey respondents had ad-hoc measures to deal with onion links. Some memorized a few links and did so to protect privacy by not writing the links down. However, this was only possible for a few vanity domains in most cases. Others just navigated to the places where they found the links in the first place and used the links from there to open the websites they needed.

Third, onion domains are also hard to verify as authentic. Vanity domains: Users appreciated vanity domains where onion services operators have taken extra effort and expense to set up a domain that is almost readable such as the case of Facebook’s onion site, facebookcorewwwi.onion. Many participants liked the fact that vanity domains give more indication of the content of the domain. However, our participants also felt vanity domains could lead to more phishing attacks since people would not try to verify the entire onion domain but only the readable prefix. “We also get false expectations of security from such domains. Somebody can generate another onion key with same facebookcorewwwi address. It’s hard but may be possible. People who believe in uniqueness of generated characters, will be caught and impersonated.” – Participant S494

Verification Strategies: Our participants had a variety of strategies such as cutting and pasting links, using bookmarks, or verifying the address in the address bar to check the authenticity of a website. Some checked for a valid HTTPS certificate or familiar images in the website. However, a over a quarter of our survey respondents reported that they could not tell if a site was authentic (28%) and 10% did not even check for authenticity at all. Some lamented this is innate to the design of onion services and that there is not real way to tell if an onion service is authentic epitomized by a quote from Participant P1: “I wouldn’t know how to do that, no. Isn’t that the whole point of onion services? That people can run anonymous things without being able to find out who owns and operates them?”

Fourth, onion lookups suggest typos or phishing. In our DNS dataset, we found similarities between frequently visited popular onion sites such as Facebook’s onion domain and similar significantly less frequently visited websites, suggesting users were making typos or potentially that phishing sites exist. Of the top 20 onion domains we encountered in our data set, 16 were significantly similar to at least one other onion domain in the data set. More details are available in the paper.

What do these findings mean for Tor and onion services? Tor and onion services do have a part to play in helping users to protect their anonymity and privacy for reasons other than those usually associated with a “nefarious” dark web such as support for those overcoming censorship, stalking, and exposing others’ wrong-doing or whistleblowing. However, to better support these uses of Tor and onion services, our users wanted onion service improvements. Desired improvements included more support for Tor in general in browsers, improvement in performance, improved privacy and security, educational resources on how to use Tor and onion services, and finally improved onion services search engines. Our results suggest that to enable more users to make use of onion services, users need:

  • better security indicators to help them understand Tor and onion services are working correctly
  • automatic detection of phishing in onion services
  • opt in publishing of onion domains to improve search for legitimate and legal content
  • better ways to track and save onion links including privacy preserving onion bookmarking.

Future studies to further demystify the dark web are warranted and in our paper we make suggestions for more work to understand the positive aspects of the dark web and how to support privacy protections for everyday users.

You can read more about our study and its limitations here (such as the fact our participants were self-selected and may not represent those who do use the dark web for illicit activities for instance) or skim the paper summary.

When Terms of Service limit disclosure of affiliate marketing

By Arunesh Mathur, Arvind Narayanan and Marshini Chetty

In a recent paper, we analyzed affiliate marketing on YouTube and Pinterest. We found that on both platforms, only about 10% of all content with affiliate links is disclosed to users as required by the FTC’s endorsement guidelines.

One way to improve the situation is for affiliate marketing companies (and other “influencer” agencies) to hold their registered content creators to the FTC’s endorsement guidelines. To better understand affiliate marketing companies’ current practices, we examined the terms and conditions of eleven of the most common affiliate marketing companies in our dataset, and specifically noted whether they required content creators to disclose their affiliate content or whether they mentioned the FTC’s guidelines upon registration.

Affiliate program Requires disclosure?
AliExpress No
Amazon Yes
Apple No
Commission Junction No
Ebay Yes
Impact Radius No
Rakuten Marketing No
RewardStyle N/A
ShopStyle Yes
ShareASale No

The table above summarizes our findings. All the terms and conditions were accessed May 1, 2018 from the affiliate marketing companies’ websites. We did not hyperlink those terms and conditions that were not available publicly. All the companies that required disclosure also mentioned the FTC’s endorsement guidelines.

Out of the top 10 programs in our corpus, only 3 explicitly instructed their creators to disclose their affiliate links to their users. In all three cases (Amazon, Ebay, and ShopStyle), the companies called out the FTC’s endorsement guidelines. Of particular interest is Amazon’s affiliate marketing terms and conditions (Amazon was the largest affiliate marketing program in our dataset).

Amazon’s terms and conditions: When content creators sign up on Amazon’s website, they are bound by the programs terms and agreements Section 5 titled: “Identifying Yourself as an Associate”.

Figure 1: The disclosure requirement in Section 5 of Amazon’s terms and conditions document.

As seen in Figure 1, the terms of Section 5 do not explicitly mention the FTC’s endorsement guidelines but constrain participants to add only the following disclosure to their content: “As an Amazon Associate I earn from qualifying purchases”. In fact, the terms go so far as to warn users that “Except for this disclosure, you will not make any public communication with respect to this Agreement or your participation in the Associates Program”.

However, if participants click on the “Program Policies” link in the terms and conditions—which they are also bound to by virtue of agreeing to the terms and conditions—they are specifically asked to be responsible for the FTC’s endorsement guidelines (Figure 2): “For example, you will be solely responsible for… all applicable laws (including the US FTC Guides Concerning the Use of Endorsement and Testimonials in Advertising)…”. Here, Amazon asks the content creators to comply with the FTC’s guidelines, without exactly specifying how. It is important to note that the FTC’s guidelines themselves do not enforce any specific disclosure statement constraints on content creators, but rather suggest that content creators use clear and explanatory disclosures that convey the advertising relationship behind affiliate marketing to users.

Figure 2: The disclosure requirement from Amazon’s “Program Policies” page.

We learned about these clauses from the coverage of our paper on BBC’s You and Yours podcast (~ 16 mins in). A YouTuber on the show pointed out that he was constrained by the Amazon’s clause to not disclose anything about the affiliate program publicly.

Indeed, as we describe in the above sections, Amazon’s terms and conditions seem contradictory to their Program Policies. On the one hand, Amazon binds its participants to the FTC’s endorsement guidelines but on the other, Amazon severely constrains the disclosures content creators can make about their participation in the program.

Further, researchers are still figuring out which types of disclosures are effective from a user perspective. Content creators might want to adapt the form and content of disclosures based on the findings of such research and the affordances of the social platforms. For example, on YouTube, it might be best to call out the affiliate relationship in the video itself—when content creators urge participants to “check out the links in the description below”—rather than merely in the description. The rigid wording mandated by Amazon seemingly prevents such customization, and may not make the affiliate relationship adequately clear to users.

Affiliate marketing companies wield strong influence over the content creators that register with their programs, and can hold them accountable to ensure they disclose these advertising relationships in their content. At the very least, they should not make it harder to comply with applicable laws and regulations.

New Jersey Takes Up Net Neutrality: A Summary, and My Experiences as a Witness

On Monday afternoon, I testified before the New Jersey State Assembly Committee on Science, Technology, and Innovation, which is chaired by Assemblyman Andrew Zwicker, who also happens to represent Princeton’s district.

On the committee agenda were three bills related to net neutrality.

Let’s quickly review the recent events. In December 2017, the Federal Communications Commission (FCC) recently rolled back the now-famous 2015 Open Internet Order, which required Internet service providers (ISPs) to abide by several so-called “bright line” rules, which can be summarized as (1) no blocking lawful Internet traffic; (2) no throttling or degrading the performance of lawful Internet traffic; (3) no paid prioritization of one type of traffic over another; (4) transparency about network management practices that may affect the forwarding of traffic.  In addition to these rules, the FCC order also re-classified Internet service as a “Title II” telecommunications service—placing it under the jurisdiction of the FCC’s rulemaking authority—overturning the previous “Title I” information services classification that ISPs previously enjoyed.

The distinction of Title I vs. Title II classification is nuanced and complicated, as I’ve previously discussed. Re-classification of ISPs as a Title II service certainly comes with a host of complicated regulatory strings attached.  It also places the ISPs in a different regulatory regime than the content providers (e.g., Google, Facebook, Amazon, Netflix).

The rollback of the Open Internet Order reverted not only the ISPs’ classification of Title II service, but also the four “bright line rules”. In response, many states have recently been considering and enacting their own net neutrality legislation, including Washington, Oregon, California, and now New Jersey. Generally speaking, these state laws are far less complicated than the original FCC order. They typically involve re-instating the FCC’s bright-line rules, but entirely avoid the question of Title II classification.

On Monday, the New Jersey State Assembly considered three bills relating to net neutrality. Essentially, all three bills amount to providing financial and other incentives to ISPs to abide by the bright line rules.  The bills require ISPs to follow the bright line rules as a condition for:

  1.  securing any contract with the state government (which can often be a significant revenue source);
  2. gaining access to utility poles (which is necessary for deploying infrastructure);
  3. municipal consent (which is required to occupy a city’s right-of-way).

I testified at the hearing, and I also submitted written testimony, which you can read here. This was my first experience testifying before a legislative committee; it was an interesting and rewarding experience.  Below, I’ll briefly summarize the hearing and my testimony (particularly in the context of the other testifying witnesses), as well as my experience as a testifying witness (complete with some lessons learned).

My Testimony

Before I wrote my testimony, I thought hard about what a computer scientist with my expertise could bring to the table as a testifying expert. I focused my testimony on three points:

  • No blocking and no throttling are technically simple to implement. One of the arguments that those opposed to the legislation are making is that different state laws on blocking and throttling could become exceedingly difficult to implement, particularly if each state has its own laws. In short, the argument is that state laws could create a complex regulatory “patchwork” that is burdensome to implement. If we were considering a version of the several-hundred-page FCC’s Open Internet Order in each state, I might tend to agree. But, the New Jersey laws are simple and concise: each law is only a couple of pages. The laws basically say “don’t block or throttle lawful content”. There are clear carve-outs for illegal traffic, attack traffic, and so forth. My comments essentially focused on the simplicity of implementation, and that we need not fear a patchwork of laws if the default is a simple rule that simply prevents blocking or throttling. In my oral testimony, I added (mostly for color) that the Internet, by the way, is already a patchwork of tens of thousands of independently operated networks across hundreds of countries, and that our protocols support carrying Internet traffic over a variety of physical media, from optical networks to wireless networks to carrier pigeon. I also took the opportunity to make the point that, by the way, ISPs are in a relative sense, pretty good actors in this space right now, in contrast to other content providers who have regularly blocked access to content either for anti-competitive reasons, or as a condition for doing business in certain countries.
  • Prioritization can be useful for certain types of traffic, but it is distinct from paid prioritization. Some ISPs have been making arguments recently that prohibiting paid prioritization would prohibit (among other things) the deployment of high-priority emergency services over the Internet. Of course, anyone who has taking an undergraduate networking course will have learned about prioritization (e.g., Weighted Fair Queueing), as well as how prioritization (and even shaping) can improve application performance, by ensuring that interactive, delay-sensitive applications such as gaming are not queued behind lower priority bulk transfers, such as a cloud backup. Yet, prioritization of certain classes of applications over others is a different matter from paid prioritization, whereby one customer might pay an ISP for higher prioritization over competing traffic. I discussed the differences at length.I also talked about how prioritization and paid prioritization could more generally: it’s not just about what a router does, but about who has access to what infrastructure. The bills address “prioritization” merely as a packet scheduling exercise—a router services one queue of packets at a faster rate than another queue. But, there are plenty of other ways that some content can be made to “go faster” than others; one such example is the deployment of content across a so-called Content Delivery Network (CDN)—a distributed network of content caches that are close to users. Some application or content providers may enjoy unfair advantage (“priority”) over others merely by virtue of the infrastructure it has access to. Today’s laws—neither the repealed FCC rules nor the state law—do not say anything about this type of prioritization, which could be applied in anti-competitive ways.Finally, I talked about how prioritization is a bit of a red herring as long as there is spare capacity. Again, in an undergraduate networking course, we talk about resource allocation concepts such as max-min fairness, where every sender gets the capacity they require as long as capacity exceeds total demand. Thus, it is also important to ensure that ISPs and application providers continue to add capacity, both in their networks and at the interconnects between their networks.
  • Transparency is important for consumers, but figuring out exactly what ISPs should expose, in a way that’s meaningful to consumers and not unduly burdensome, is technically challenging. Consumers have a right to know about the service that they are purchasing from their ISP, as well as whether (and how well) that service can support different applications. Disclosure of network management practices and performance certainly makes good sense on the surface, but here the devil is in the details. An ISP could be very specific in disclosure. It could say, for example, that it has deployed a token bucket filter of a certain size, fill rate, and drain rate and detail the places in its network where such mechanisms are deployed. This would constitute a disclosure of a network management practice, but it would be meaningless for consumers. On the other hand, other disclosures might be so vague as to be meaningless; a statement from the ISP that says they might throttle certain types of high volume traffic a times of high demand might not be meaningful in helping a consumer figure out how certain applications might perform. In this sense, paragraph 226 of the Restoring Internet Freedom Order, which talks about consumers’ needs to understand how the network is delivering service for the applications that they care about is spot on. There’s only one problem with that provision: Technically, ISPs would have a hard time doing this without direct access to the client or server side of an application. In short: Transparency is challenging. To be continued.

The Hearing and Vote

The hearing itself was a interesting. There were several testifying witnesses opposing the bills: Jon Leibowitz, from Davis Polk (retained by Internet Service Providers); and a representative from US Telecom. The arguments against the bills were primarily legal and business-oriented. Essentially, the legal argument against the bills is that the states should leave this problem to the federal government. The arguments are (roughly) as follows: (1) The Restoring Internet Freedom Order prevents state pre-emption; (2) The Federal Trade Commission has this well-in-hand, now that ISPs are back in Title I territory (and as former commissioner, Leibowitz would know well the types of authority that the FTC has to bring such cases, as well as many cases they have brought against Google, Facebook, and others); (3) The state laws will create a patchwork of laws and introduce regulatory uncertainty, making it difficult for the ISPs to operate efficiently, and creating uncertainty for future investment.

The arguments in opposition to the bill are orthogonal to the points I made in my own testimony. In particular, I disclaimed any legal expertise on pre-emption. I was, however, able to comment on whether I thought the second and third arguments held water from a technical perspective. While the second point about the FTC authority is mostly a legal question, I understood enough about the FTC act, and the circumstances under which they bring cases, to comment on whether technically the bills in question give consumers more power than they might otherwise have with just the FTC rules in place. My perspective was that they do, although this point is a really interesting case of the muddy distinction between technology and the law: To really dive into arguments around this point, it helps to know a bit about both technology and the law. I was able to comment on the “patchwork” assertion from a technical perspective, as I discussed above.

At the end of the hearing, there was a committee vote on all three bills. It was interesting to see both the voting process, and the commentary that each committee member made with their votes.  In the end, there were two abstentions, with the rest in favor. The members who abstained did so largely on the legal question concerning state pre-emption—perhaps foreshadowing the next round of legal battles.

Lessons Learned

Through this experience, I once again saw the value in having technologists at the table in these forums, where the laws that govern the future of the Internet are being written and decided on. I learned a couple of important lessons, which I’ve briefly summarized below.

My job was to bring technical clarity, not to advocate policy. As a witness, technically I am picking a side. And, in these settings, even when making technical points, one is typically doing so to serve one side of a policy or legal argument. Naturally, given my arguments, I registered for a witness in favor of the legislation.

However, and importantly: that doesn’t mean my job was to advocate policy.  As a technologist, my role as a witness is to explain to the lawmakers technical concepts that can help them make better sense of the various arguments from others in the room. Additionally, I steered clear of rendering legal opinions, and where my comments did rely on legal frameworks, I made it clear that I was not an expert in those matters, but was speaking on technical points within the context of the laws, as I understood them.  Finally, when figuring out how to frame my testimony, I consulted many people: the lawmakers, my colleagues at Princeton, and even the ISPs themselves. In all cases, I asked these stakeholders about the topics I might focus on, as opposed to asking what, specifically I should say. I thought hard about what a computer scientist could bring to the discussion, as well as ensuring that what I said was technically accurate and correct.

A simple technical explanation is of utmost importance. In such a committee hearing, advocates and lobbyists abound (on both sides); technologists are rare. I suspect I was the only technologist in the room. Additionally, most of the people in the room have jobs to make arguments that serve a particular stakeholder.  In doing so, they may muddy the waters, either accidentally or intentionally. To advance their arguments, some people may even say things that are blatantly false (thankfully that didn’t happen on Monday, but I’ve seen it happen in similar forums). Perhaps surprisingly, such discourse can fly by completely unnoticed, because the people in the room—especially the decision-makers—don’t have as deep of an understanding of the technology as the technologists.  Technologists need to be in the room, to shed light and to call foul—and, importantly, to do so using accessible language and examples that non-technical policy-makers can understand.