August 6, 2020

Can the exfiltration of personal data by web trackers be stopped?

by Günes Acar, Steven Englehardt, and Arvind Narayanan.

In a series of posts on this blog in 2017/18, we revealed how web trackers exfiltrate personal information from web pages, browser password managers, form inputs, and the Facebook Login API. Our findings resulted in many fixes and privacy improvements to browsers, websites, third parties, and privacy protection tools. However, the root causes of these privacy failures remain unchanged, because they stem from the design of the web itself. In a paper at the 2020 Privacy Enhancing Technologies Symposium, we recap our findings and propose two potential paths forward.

Root causes of failures

In the two years since our research, no fundamental fixes have been implemented that will stop third parties from exfiltrating personal information. Thus, it remains possible that similar privacy vulnerabilities have been introduced even as the specific ones we identified have mostly been fixed. Here’s why.

The web’s security model allows a website to either fully trust a third party (by including the third-party script in a first party context) or not at all (by isolating the third party resource in an iframe). Unfortunately, this model does not capture the range of trust relationships that we see on the web. Since many third parties cannot provide their services (such as analytics) if they are isolated, the first parties have no choice but to give them full privileges even though they do not trust them fully.

The model also assumes transitivity of trust. But in reality, a user may trust a website and the website may trust a third party, but the user may not trust the third party. 

Another root cause is the economic reality of the web that drives publishers to unthinkingly adopt code from third parties. Here’s a representative quote from the marketing materials of FullStory, one of the third parties we found exfiltrating personal information:

Set-up is a thing of beauty. To get started, place one small snippet of code on your site. That’s it. Forever. Seriously.

A key reason for the popularity of these third parties is that the small publishers that rely on them lack the budget to hire technical experts internally to replicate their functionality. Unfortunately, while it’s possible to effortlessly embed a third party, auditing or even understanding the privacy implications requires time and expertise.

This may also help explain the limited adoption of technical solutions such as Caja or ConScript that better capture the partial trust relationship between first and third parties. Unfortunately, such solutions require the publisher to reason carefully about the privileges and capabilities provided to each individual third party. 

Two potential paths forward

In the absence of a fundamental fix, there are two potential approaches to minimize the risk of large-scale exploitation of these vulnerabilities. One approach is to regularly scan the web. After all, our work suggests that this kind of large-scale detection is possible and that vulnerabilities will be fixed when identified.

The catch is that it’s not clear who will do the thankless, time-consuming, and technically complex work of running regular scans, detecting vulnerabilities and leaks, and informing thousands of affected parties. As researchers, our role is to develop the methods and tools to do so, but running them on a regular basis does not constitute publishable research. While we have undertaken significant efforts to maintain our scanning tool OpenWPM as a resource for the community and make scan data available, regular and comprehensive vulnerability detection requires far more resources than we can afford.

In fact, in addition to researchers, many other groups have missions that are aligned with the goal of large-scale web privacy investigations: makers of privacy-focused browsers and mobile apps, blocklist maintainers, advocacy groups, and investigative journalists. Some examples of efforts that we’re aware of include Mozilla’s maintenance of OpenWPM since 2018, Brave’s research on tracking protection, and DuckDuckGo’s dataset about trackers. However, so far, these efforts fall short of addressing the full range of web privacy vulnerabilities, and data exfiltration in particular. Perhaps a coalition of these groups can create the necessary momentum.

The second approach is legal, namely, regulation that incentivizes first parties to take responsibility for the privacy violations on their websites. This is in contrast to the reactive approach above which essentially shifts the cost of privacy to public-interest groups or other external entities.

Indeed, our findings represent potential violations of existing laws, notably the GDPR in the EU and sectoral laws such as HIPAA (healthcare) and FERPA (education) in the United States. However, it is unclear whether the first parties or the third parties would be liable. According to several session replay companies, they are merely data processors and not joint controllers under the GDPR. In addition, since there are a large number of first parties exposing user data and each violation is relatively small in scope, regulators have not paid much attention to these types of privacy failures. In fact, they risk effectively normalizing such privacy violations due to the lack of enforcement. Thus, either stepped-up enforcement of existing laws or new laws that establish stricter rules could shift incentives, resulting in stronger preventive measures and regular vulnerability scanning by web developers themselves. 

Either approach will be an uphill battle. Regularly scanning the web requires a source of funding, whereas stronger regulation and enforcement will raise the cost of running websites and will face opposition from third-party service providers reliant on the current lax approach. But fighting uphill battles is nothing new for privacy advocates and researchers.

Update: Here’s the video of Günes Acar’s talk about this research at PETS 2020.

Safely opening PDFs received by e-mail (or fax?!)

Many election administrators in U.S. states and counties need to receive and open PDF files from voters. Some of these administrators receive these PDFs as e-mail attachments. These may be filled-out voter registration forms, or even voted ballots from UOCAVA (overseas and military) voters. We all know that malware can lurk in e-mail attachments; how can those election officials protect themselves from being hacked?

Internet return of voted ballots is inherently insecure; that’s a separate issue and I’ll discuss it below. For now, how can one safely open a PDF attachment?

I discussed this question with Dan Guido, cybersecurity consultant and CEO of trailofbits.com. The safe way to view a PDF is inside the Chrome or Firefox browser. Printing a PDF directly from Chrome (or Firefox) to your printer is reasonably safe. The unsafe way to view a PDF is with your favorite PDF-viewer app such as Adobe Reader.

The reason is simple: Google (for Chrome) and Mozilla (for Firefox) have put enormous effort into making their PDF viewers safe, putting them inside a “sandbox” that the hackers can’t get out of — and they’ve largely succeeded.

The PDF file format has hundreds of obscure features and complex functionality that are not needed for simple documents. Chrome and Firefox don’t bother to understand the obscure features: they concentrate on getting the common features displayed safely. On the other hand, Adobe Reader does handle all the features of PDF; that’s a much larger thing to get perfectly right, and (perhaps) security is not Adobe’s highest priority.

Sometimes that means that Chrome or Firefox don’t render your document properly; but this is unlikely to be a problem for simple documents such as voter-registration forms or optical-scan ballots.

In some ways that’s a bit disappointing. I like Adobe Reader’s navigation and document-viewing facilities much more than I like the browser’s built-in PDF display. But I should be careful to use Adobe tools only for documents whose provenance I know, or that have been otherwise vetted.

If you do save your PDF to a file, and are tempted to open it later: again, you can use Chrome or Firefox to open it. (See also: PDF.js) If you want to open it in a full-featured (but less secure) tool, first use a PDF “triage tool” such as PDFid, which will scan the file and tell you if anything looks suspicious.

Is it safe to use Fax?

Many jurisdictions still permit (or require) forms and ballots to be sent to them by Fax. Is that safe?

Once upon a time, a “fax machine” was connected to a “land line” that went through the “phone network.” How safe that was in 1985 is no longer relevant today, when nobody has a “fax machine” and the “phone network” is the Internet.

Most voters, and many election administrators, use on-line fax services such as HelloFax. The voter logs in and upload a PDF file; the fax service converts it to a fax-format bitstream and sends it into the part of the Internet called “the phone system”; the receiver logs in (perhaps to a different on-line fax service) and downloads a PDF file that has been converted from the bitstream.

This has so many points of insecurity: the sender’s online-fax service company may be more or less vulnerable to hackers (or insiders); the receiver’s online-fax service, ditto; and the fax-format bitstream is transmitted unencrypted, unauthenticated across the phone network.

In contrast, e-mail can be a lot more secure than that. If you use a major e-mail provider (such as gmail, Microsoft, fastmail) that knows what it’s doing; and if the recipient also uses a reputable e-mail provider, then: your e-mail is uploaded encrypted (and authenticated) to an SMTP server, which goes encrypted (and authenticated) to another SMTP server, which is downloaded encrypted (and authenticated) to the recipient’s mail reader. The vast majority of Internet e-mail traffic is protected this way.

So e-mail your stuff, don’t fax it.

Is e-mail secure? Can we vote that way?

If e-mail is so much more secure than it was 30 years ago, can we safely vote by e-mail?

Unfortunately, no. Even if Internet messages (by e-mail or other protocols) are safe in transmission, the biggest security lapses are in the server computers and especially in the client’s (voter’s) computers. Hackers who can penetrate the security of those systems can change votes before they’re sent, or after they’re received (but before they’re counted).

Furthermore, e-mail is sent from the voter’s computer to the SMTP server (at Google, or Microsoft, or fastmail…) where it is unencrypted and reencrypted for sending to the receiver’s SMTP server (at Microsoft, or fastmail, or Google, …). It’s like, you mail your absentee ballot to your landlord, who takes it out of its envelope, puts it in a fresh envelope, and mails it to an election official. Even if we trust our landlord (and I expect Google, Microsoft, and fastmail are doing a good job), should we need to trust this intermediary? The citizenry elect their government; we don’t entrust this process to a few big tech companies.

And finally, 6% of email (that’s either outbound or inbound from gmail.com) is still unencrypted–that is, insecure. Six percent may not seem like a lot, but it’s millions of users.

Is e-mail voter-registration secure enough?

Internet return of voted ballots, which is not securable by any known technology. But voter-registration can reasonably be done by e-mail: the voter sends in a form, perhaps a scan-to-PDF of their printed and signed registration form. The reason this can work, when it can’t work for voted ballots, is the ability to audit the individual transaction: after a few days, the voter can check the status of their registration with the election official, or the election official can contact the voter to check up. So even if there’s hacking in the client or server computer, it can be detected and corrected. With ballots, we have the secret ballot: nobody is supposed to learn how you voted. Without the ability to check and correct later, “did my ballot get counted for the person I voted for?”, internet voting is insecurable.

NJ agrees No Internet voting in July, vague about November

A formal settlement agreement has been submitted to the NJ Superior Court regarding online ballot access in the 2020 elections.

On May 4, 2020,  New Jersey’s Division of Elections was caught trying to adopt vote-by-Internet on the stealth, even though the law forbids it.  That is, not only is Internet voting inherently insecurable, there’s a 2010 Court Order still in effect that says, “computers utilized for election-related duties shall at no time be connected to the Internet.”  That’s based on the New Jersey Superior Court’s finding that “As long as computers, dedicated to handling election matters, are connected to the Internet, the safety and security of our voting systems are in jeopardy,” in the case of Gusciora v. Corzine.

Penny Venetis, attorney for the Gusciora plaintiffs, filed a motion (in early May) with the Court, to make the State abandon its plans for online voting, on the basis that receiving ballots e-mailed or uploaded on the Internet clearly violates this order.  The Court ordered the parties to reach a settlement by June 8, or report their separate positions.

The State’s initial position was that they would use Democracy Live’s “OmniBallot” online voting system, that permits the voter to choose (1) ballot download (for printing and marking at home), (2) ballot download and mark-on-home-computer (for the voter to print and physically mail), or (3) ballot upload through Democracy Live’s portal.   Democracy Live’s voting system is insecure in all sorts of unsurprising and surprising ways.  Even so, the State proposed to use this for disabled voters, overseas and military voters, and, basically, any voter who wanted to use it.   Doing so would leave New Jersey’s 2020 election extremely insecure.

Plaintiffs’ position was that (1) ballot download has several security problems and should therefore be limited to voters who absolutely need it, specifically, voters with disabilities and military/overseas voters;  (2) computerized ballot marking has even more security problems and should be limited to voters with disabilities that prevent them from hand-marking a paper ballot; (3) no votes should be transmitted over the internet; and (4) if the State is outsourcing ballot-delivery services to private companies, then those companies should not snarf and resell all sorts of personal information about voters and their browser-fingerprints.

The parties did reach a compromise settlement; in mid-June they agreed:

  1. An “electronic ballot access or delivery system” may be used only for public health purposes during the July 2020 primary and November 2020 general election, only for voters with disabilities and military/overseas voters.
  2. Unvoted ballots may be electronically delivered to those voters.
  3. Voters with disabilities may print the unvoted ballot for hand marking and return by U.S. mail or other nonelectronic means; the military or overseas voters may print the ballot for hand marking and then return it by the means specified for them in New Jersey law (N.J.S.A. 19:59).
  4. A voter with a disability who is unable to mark a ballot by hand may be given the choice to use accessible technology to indicate vote selections on the computer, then print and mail (or otherwise physically return) the paper ballot.
  5. Voters’ ballot selections (votes) are never to be transmitted over the internet.  No personal voter information (or information about the voter’s computer or browser) may be gathered, analyzed, or sold.
  6. The State will follow these rules (and write them into vendor contracts) for the July primary.  If the State contemplates using any system in November that does not satisfy these criteria, the State must notify the Plaintiffs no later than August 21st — and if they do so, the schedule is laid out for Plaintiffs and the State to file briefs in whatever lawsuit might ensue.

Although the State didn’t tell us until much later, on May 28th they put out a Request for Bids for a system satisfying our criteria; by June 7th they had already selected a vendor (Voting Works) whose product looks a lot more respectful, compared to Democracy Live, of basic election security principles and voters’ privacy (based on the bid document that Voting Works sent to the State, and on an interview with an executive at Voting Works).

Even so, during the settlement negotiations in early June the State vigorously resisted admitting that Internet voting is not permitted by New Jersey law.  That’s even though: New Jersey statutes clearly enumerate what kinds of voting systems are permissible, and Internet voting is not among them; the statutes clearly lay out the certification requirements for voting systems, and Internet voting is not certified; and the Court Order pretty clearly says that voting systems are not to be connected to the Internet.

Based on the compromise agreement, at least this time the State can’t covertly adopt Internet voting. If the State notifies Prof. Venetis on August 21 that they’re planning to use some sort of on-line voting system that does not satisfy the criteria enumerated above, then she will seek a court order to prevent any internet-based system from being used. Based on the language of the court’s 2010 order “computers utilized for election-related duties shall at no time be connected to the Internet” and the court’s 2010 opinion (quoted in the first paragraph above), the State will have an uphill battle defending an internet-based voting system.