June 23, 2017

Archives for February 2010

Mozilla Debates Whether to Trust Chinese CA

[Note our follow-up posts on this topic: Web Security Trust Models, and Web Certification Fail: Bad Assumptions Lead to Bad Technology]

Sometimes geeky technical details matter only to engineers. But sometimes a seemingly arcane technical decision exposes deep social or political divisions. A classic example is being debated within the Mozilla project now, as designers decide whether the Mozilla Firefox browser should trust a Chinese certification authority by default.

Here’s the technical background: When you browse to a secure website (typically at a URL starting with “https:”), your browser takes two special security precautions: it sets up a private, encrypted “channel” to the server, and it authenticates the server’s identity. The second step, authentication, is necessary because a secure channel is useless if you don’t know who is on the other end. Without authentication, you might be talking to an impostor.

Suppose you’re connecting to https://mail.google.com, to pick up your Gmail. To authenticate itself to you, the server will (1) do some fancy math to prove to you that it knows a certain encryption key, and (2) present you with a digital certificate (or “cert”) attesting that only Google knows that encryption key. The cert is created by a Certification Authority (“CA”), which asserts that it has done the necessary due diligence to establish that the designated encryption key is known only to Google Inc.

If the CA is competent and honest, then you can rely on the cert, and your connection will be secure. But a dishonest CA can trick you into talking to an impostor site, so you need to be cautious about which CAs you trust. Your browser comes preinstalled with a list of CAs whom it will trust. In principle you can change this list, but almost nobody does. So browser vendors effectively decide which CAs their users will trust.

With this background in mind, let’s unpack the Mozilla debate. What set off the debate was the addition of the China Internet Network Information Center (CNNIC) as a trusted CA in Firefox. CNNIC is not part of the Chinese government but many people assert that it would be willing to act in concert with the Chinese government.

To see why this is worrisome, let’s suppose, just for the sake of argument, that CNNIC were a puppet of the Chinese government. Then CNNIC’s status as a trusted CA would give it the technical power to let the Chinese government spy on its citizens’ “secure” web connections. If a Chinese citizen tried to make a secure connection to Gmail, their connection could be directed to an impostor Gmail site run by the Chinese government, and CNNIC could give the impostor a cert saying that the government impostor was the real Gmail site. The Chinese citizen would be fooled by the fake Gmail site (having no reason to suspect anything was wrong) and would happily enter his Gmail password into the impostor site, giving the Chinese government free run of the citizen’s email archive.

CNNIC’s defenders respond that any CA could do such a thing. If the problem is that CNNIC is too close to a government, what about the CAs already on the Firefox CA list that are governments? Isn’t CNNIC being singled out because it is Chinese? Doesn’t the country with the largest Internet population deserve at least one slot among the dozens of already trusted CAs? These are all good questions, even if they’re not the whole story.

Mozilla’s decision touches deep questions of fairness, trust, and institutional integrity that I won’t even pretend to address in this post. No single answer will be right for all users.

Part of the problem is that the underlying technical design is fragile. Any CA can certify to any user that any server owns any name, so the consequences of a misplaced trust decision are about as bad as they can be. It’s tempting to write this off as bonehead design, but in truth the available design options are all unattractive.

The Engine of Job Growth? Tracking SBA-backed Loans Through Recovery.gov

Last week at a Town Hall Meeting in New Hampshire, President Obama stated that “we’re going to start where most new jobs start—with small businesses,” and he encouraged Congress to transfer $30 billion from the Troubled Asset Relief Program to a new program called the Small Business Lending Fund. As this proposal was unveiled, the Administrator of the U.S. Small Business Administration (SBA) Karen Mills sat directly behind the President, reflecting the fact that the Administration’s proposal is a vote of confidence in the SBA and its existing loan programs.

The central role proposed for the SBA invites questions about existing SBA loans made with Recovery Act funds. These loans can be tracked through Recovery.gov, the official “user-friendly, public-facing website” that has evolved under the direction of the Recovery Accountability and Transparency Board, an agency created when the President signed into law the American Recovery and Reinvestment Act of 2009 (ARRA) on February 17, 2009.

Curious about how well Recovery.gov works, I analyzed a stimulus loan to a business in Red Lodge, Montana, where I live. First I accessed “Agency Reported” data through Recovery.gov, and then compared that information with what I could learn from field visits with the loan recipient and the community banker who made the loan.

What the drill-down map at Recovery.gov tells you: According to the map available at the official website, a local business called “Sheep Mountain Feed” received an $81,000 loan through the Small Business Administration’s (SBA) “Rural Lender Advantage.”

What the drill-down map at Recovery.gov doesn’t tell you: The official website does not specify how the loan proceeds were spent. Nor does the website explain if the $81,000 is the face value of the loan or the amount guaranteed by the SBA. For that matter, SBA’s role in making the loan is not clarified.

To learn more about these things, I called Sheep Mountain Feed and arranged a visit with the owner, a woman named Deb Padget who, before opening the store, had ranched 2,000 head of bison. I also met with the local banker who arranged the loan (the SBA relies on lenders to make the loans it guarantees), and an SBA employee based in Helena Montana. And for background I reviewed the June 8, 2009 Federal Register Notice relating to SBA’s temporary 90% guarantee (thanks to Princeton’s Fed Thread project).

Sheep Mountain Feed is a retail store catering to animal farmers and pet owners that sells animal feed, electric fencing, baby chicks, and other odds and ends such as buckets and horseshoes sold at any rural animal store. When Deb decided to buy the business in April of 2009, she had managed the retail store for three years, and she wanted to make some changes. Without abandoning the “large-animal” owners who had built the feed business, she saw an opportunity to focus more on pet owners. “Everybody in Red Lodge has a dog,” she told me. “Not everybody has a horse.”

She would need to buy pet supplies to take things in this new direction, and she would also need money to buy the business and remodel the interior of the store. This is how she spent the loan proceeds that she eventually received—buying and remodeling Sheep Mountain Feed, and purchasing inventory. However, the first bank she visited rejected her within ten minutes. At the second bank she tried out, she met with local loan officer and learned quickly that he was also from a North Dakota farming family. Here she got a warmer welcome, and was told that her timing was good: In March 2009, about one month before Deb’s visit, the SBA received $730 million in funding from the ARRA to offer increased loan guarantees and the temporary elimination of loan fees.

To get this “stimulus loan” Deb would need to submit a business plan with her loan application, but she’d never before needed a business plan and didn’t even have an executive summary. She was sent to an SBA employee in Billings for free counseling, and this employee helped Deb to prepare a business plan from scratch. (At one point, in order to develop Deb’s financial projections, the SBA contact called her own dog-groomer to find out about the going-rate for grooming sessions in Billings).

The U.S. Small Business Administration (SBA) was created in 1953 as an independent agency of the federal government to help people start and grow businesses. Even without the stimulus money, SBA’s so-called 7(a) loan program guarantees up to 85% of a qualifying loan made to a local business through a local bank. The guarantee is designed to induce local banks to lend more into the community by removing most of the risk of default. And as previously mentioned, in early 2009 the SBA received Recovery money to guarantee up to 90% of 7(a) loans. This is the kind of loan that Deb received.

In addition to subsidizing SBA’s temporary 90 percent guarantee, the Recovery Act also allowed SBA to temporarily waive certain fees that it charges. Usually the agency collects fees equal to three percent of the loan’s face value to cover delinquencies. Lenders and borrowers pay these fees. In this case, the community bank that made the loan and Deb would have had to pay $2,790 just to close the deal. We know this because the breakdown of the loan to Sheep Mountain Feed at USASpending.gov shows an “original subsidy cost” of $2,790. By studying the data at USASpending, and interviewing offline sources, it also emerged that $81,000 is the amount guaranteed by the SBA (Sheep Mountain Feed got $90,000).

The takeaway from this study is that Recovery.gov provides good data, but not always enough context (e.g. an explanation of SBA’s role) to understand the data. Yet in the absence of Recovery.gov, even learning that Sheep Mountain Feed received a government-backed loan would have been difficult, so the official website is a helpful starting point for people motivated to track stimulus money.

By disseminating information about a Montana-based loan to citizens in every state, including citizens not predisposed to support any specific local project, Recovery.gov provides the public with information about what the government is doing and invites feedback. How the government processes this feedback—and in general takes advantage of the insight of people inside and outside the Federal government—is an open question, but at least the Recovery Board is on it, and now it’s also the focus of a working group (pursuant to OMB’s December 8, 2009 Open Government Directive).

In that spirit, here are a few suggestions for making Recovery.gov more useful to people trying to track SBA-backed stimulus loans.

(1) Create web links to the SBA website where the agency explains how the standard and stimulus-enriched 7(a) loan program works (SBA itself does not make loans, but instead guarantees a portion of loans made and administered by banks);

(2) Create links to the Small Business Act (15 U.S.C. § 636, as amended), the relevant provisions of the American Recovery and Reinvestment Act of 2009 affecting the SBA, (ARRA, P. L. 111-5, §§501-502), and the provisions of the Department of Defense Appropriations Act, 2010 that extend the stimulus-enriched SBA program through the end of February 2010;

(3) Establish links from Recovery.gov to USASpending.gov, particularly targeted links showing the source of the stimulus loan information. Recovery.gov does explain that “Agency Reported Data” comes from three sources, including USAspending.gov, but there are no links from stimulus projects to USASpending.

This project was more about Recovery.gov than the SBA, but listening to President Obama urge the creation of a Small Business Lending Fund because it “will help small banks do even more of what our economy needs – and that’s ensure that small businesses are once again the engine of job growth in America,” there was the obvious question about the $90,000 loan to Sheep Mountain Feed: Would it create or retain any jobs? I put this question to Deb. She said that the loan “created” one full-time job, her job running the business. She’s also employing a dog-groomer part-time, and another part-time employee (a student) who works on weekends. Getting these facts is easier than knowing if the full $90,000 loan to Sheep Mountain Feed should be credited to the Recovery Act. Would the business have received the loan anyway, even without SBA’s extra 5% guarantee and the temporary elimination of $2,790.00 in fees? The only sure thing is that estimating the employment impact of the Recovery Act is complicated (it was the subject of a recent OMB Guidance Memorandum). That’s something everybody can agree on.

The Traceability of an Anonymous Online Comment

Yesterday, I described a simple scenario where a plaintiff, who is having difficulty identifying an alleged online defamer, could benefit from subpoenaing data held by a third party web service provider. Some third parties—like Facebook in yesterday’s example—know exactly who I am and know whenever I visit or post on other sites. But even when no third party has the whole picture, it may still be possible to identify me indirectly, by combining data from different third parties. This is possible because loading one webpage can potentially trigger dozens of nearly simultaneous web connections to various third party service providers, whose records can then be subpoenaed and correlated.

Suppose that I post an anonymous and potentially defamatory comment on a Boing Boing article, but Boing Boing for some reason is unable to supply the plaintiff with any hints about who I am—not even my IP address. The plaintiff will only know that my comment was posted publicly at “9:42am on Fri. Feb 5.” But as I mentioned yesterday, Boing Boing—like almost every other site on the web—takes advantage of a handful of useful third party web services.

For example, one of these services—for an article that happens to feature video—is an embedded streaming media service that hosts the video that the article refers to. The plaintiff could issue a subpoena to the video service and ask for information about any user that loaded that particular embedded video via Boing Boing around “9:42am on Fri. Feb 5.” There might be one user match or a few user matches, depending on the site’s traffic at the time, but for simplicity, say there is only one match—me. Because the video service tracks each user with a unique persistent cookie, the service can and probably does keep a log of all videos that I have ever loaded from their service, whether or not I actually watched them. The subpoena could give the plaintiff a copy of this log.

In perusing my video logs, the plaintiff may see that I loaded a different video, earlier that week, embedded into an article on TechCrunch. He may notice further that TechCrunch uses Google Analytics. With two more subpoenas—one to TechCrunch and one to Google—and some simple matching up of dates and times from the different logs, the plaintiff can likely rebuild a list of all the other Analytics-enabled websites that I’ve visited, since these will likely be noted in the records tied to my Analytics cookie.

The bottom line: From the moment I first load that video on Boing Boing, the plaintiff gains the power to traverse multiple silos of data, held by independent third party entities, to trace my activities and link my anonymous comment to my web browsing history. Given how heavily I use the web, my browsing history will tell the plaintiff a lot about me, and it will probably be enough to uniquely identify who I am.

But this is just one example of many potential paths that a plaintiff could take to identify me. Recall from yesterday that when I visit Boing Boing, the site quietly forwards my information to the servers of at least 17 other parties. Each one of these 17 is a potential subpoena target in the first round of discovery. The information culled from this first round—most importantly, what other websites I’ve visited and at what times—could inform a second round of subpoenas, targeted to these other now-relevant websites and third parties. From there, as you might already be able to tell, the plaintiff can repeat this data linking process and expand the circle of potentially identifying information.

A recent privacy study from Berkeley shows how far such a strategy might reach. The Berkeley researchers found that nearly all of the top 100 sites on the web contain some sort of “web bug,” another term for the hidden web connection that allows a third party to automatically track a user on the site. Some of these sites will load dozens of web bugs on each page visit, which will litter user data far and wide on third party servers. Moreover, the study found that Google Analytics—by far the most popular website statistics service—was used by more than 70% of all sites they surveyed in March 2009. Once they add other Google-run services like Doubleclick and Adsense into the calculation, this figure rises to 88% of all sites that use some Google service—an astonishingly broad and dominant ability to follow users as they browse the web. But even other smaller, but still popular, third party entities have significant reach across thousands of sites across the web.

The traceability of any given site visitor will still depend on context: the number of third party services used by the site, the popularity of each third party service across the web, the types of identifying data that these parties collect and store, whether the speaker used any online anonymity tools, and many other site-specific factors.

Despite the variability in third party tracing capabilities, the nearly simultaneous connections to a few third party services means that the results of tracing can be combined. By sleuthing through information held in third party dossiers, logs and databases, plaintiffs in John Doe lawsuits will have many more discovery options than they had ever previously imagined.