November 29, 2024

Archives for April 2011

Tracking Your Every Move: iPhone Retains Extensive Location History

Today, Pete Warden and Alasdair Allan revealed that Apple’s iPhone maintains an apparently indefinite log of its location history. To show the data available, they produced and demoed an application called iPhone Tracker for plotting these locations on a map. The application allows you to replay your movements, displaying your precise location at any point in time when you had your phone. Their open-source application works with the GSM (AT&T) version of the iPhone, but I added changes to their code that allow it to work with the CDMA (Verizon) version of the phone as well.

When you sync your iPhone with your computer, iTunes automatically creates a complete backup of the phone to your machine. This backup contains any new content, contacts, and applications that were modified or downloaded since your last sync. Beginning with iOS 4, this backup also included is a SQLite database containing tables named ‘CellLocation’, ‘CdmaCellLocaton’ and ‘WifiLocation’. These correspond to the GSM, CDMA and WiFi variants of location information. Each of these tables contains latitude and longitude data along with timestamps. These tables also contain additional fields that appear largely unused on the CDMA iPhone that I used for testing — including altitude, speed, confidence, “HorizontalAccuracy,” and “VerticalAccuracy.”

Interestingly, the WifiLocation table contains the MAC address of each WiFi network node you have connected to, along with an estimated latitude/longitude. The WifiLocation table in our two-month old CDMA iPhone contains over 53,000 distinct MAC addresses, suggesting that this data is stored not just for networks your device connects to but for every network your phone was aware of (i.e. the network at the Starbucks you walked by — but didn’t connect to).

Location information persists across devices, including upgrades from the iPhone 3GS to iPhone 4, which appears to be a function of the migration process. It is important to note that you must have physical access to the synced machine (i.e. your laptop) in order to access the synced location logs. Malicious code running on the iPhone presumably could also access this data.

Not only was it unclear that the iPhone is storing this data, but the rationale behind storing it remains a mystery. To the best of my knowledge, Apple has not disclosed that this type or quantity of information is being stored. Although Apple does not appear to be currently using this information, we’re curious about the rationale for storing it. In theory, Apple could combine WiFi MAC addresses and GPS locations, creating a highly accurate geolocation service.

The exact implications for mobile security (along with forensics and law enforcement) will be important to watch. What is most surprising is that this granularity of information is being stored at such a large scale on such a mainstream device.

Oak Ridge, spear phishing, and i-voting

Oak Ridge National Labs (one of the US national energy labs, along with Sandia, Livermore, Los Alamos, etc) had a bunch of people fall for a spear phishing attack (see articles in Computerworld and many other descriptions). For those not familiar with the term, spear phishing is sending targeted emails at specific recipients, designed to have them do an action (e.g., click on a link) that will install some form of software (e.g., to allow stealing information from their computers). This is distinct from spam, where the goal is primarily to get you to purchase pharmaceuticals, or maybe install software, but in any case is widespread and not targeted at particular victims. Spear phishing is the same technique used in the Google Aurora (and related) cases last year, the RSA case earlier this year, Epsilon a few weeks ago, and doubtless many others that we haven’t heard about. Targets of spear phishing might be particular people within an organization (e.g., executives, or people on a particular project).

In this posting, I’m going to connect this attack to Internet voting (i-voting), by which I mean casting a ballot from the comfort of your home using your personal computer (i.e., not a dedicated machine in a precinct or government office). My contention is that in addition to all the other risks of i-voting, one of the problems is that people will click links targeted at them by political parties, and will try to cast their vote on fake web sites. The scenario is that operatives of the Orange party send messages to voters who belong to the Purple party claiming to be from the Purple party’s candidate for president and giving a link to a look-alike web site for i-voting, encouraging voters to cast their votes early. The goal of the Orange party is to either prevent Purple voters from voting at all, or to convince them that their vote has been cast and then use their credentials (i.e., username and password) to have software cast their vote for Orange candidates, without the voter ever knowing.

The percentage of users who fall prey to targeted attacks has been a subject of some controversy. While the percentage of users who click on spam emails has fallen significantly over the years as more people are aware of them (and as spam filtering has improved and mail programs have improved to no longer fetch images by default), spear phishing attacks have been assumed to be more effective. The result from Oak Ridge is one of the most significant pieces of hard data in that regard.

According to an article in The Register, of the 530 Oak Ridge employees who received the spear phishing email, 57 fell for the attack by clicking on a link (which silently installed software in their computers using to a security vulnerability in Internet Explorer which was patched earlier this week – but presumably the patch wasn’t installed yet on their computers). Oak Ridge employees are likely to be well-educated scientists (but not necessarily computer scientists) – and hence not representative of the population as a whole. The fact that this was a spear phishing attack means that it was probably targeted at people with access to sensitive information, whether administrative staff, senior scientists, or executives (but probably not the person running the cafeteria, for example). Whether the level of education and access to sensitive information makes them more or less likely to click on links is something for social scientists to assess – I’m going to take it as a data point and assume a range of 5% to 20% of victims will click on a link in a spear phishing attack (i.e., that it’s not off by more than a factor of two).

So as a working hypothesis based on this actual result, I propose that a spear phishing attack designed to draw voters to a fake web site to cast their votes will succeed with 5-20% of the targeted voters. With UOCAVA (military and overseas voters) representing around 5% of the electorate, I propose that a target of impacting 0.25% to 1% of the votes is not an unreasonable assumption. Now if we presume that the race is close and half of them would have voted for the “preferred” candidate anyway, this allows a spear phishing attack to capture an additional 0.12% to 0.50% of the vote.

If i-voting were to become more widespread – for example, to be available to any absentee voter – then these numbers double, because absentee voters are typically 10% of all voters. If i-voting becomes available to all voters, then we can guess that 5% to 20% of ALL votes can be coerced this way. At that point, we might as well give up elections, and go to coin tossing.

Considering the vast sums spent on advertising to influence voters, even for the very limited UOCAVA population, spear phishing seems like a very worthwhile investment for a candidate in a close race.

What We Lose if We Lose Data.gov

In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.

Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.

On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.

But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.

There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.

When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.

My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.

Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.

Federating the "big four" computer security conferences

Last year, I wrote a report about rebooting the CS publication process (Tinker post, full tech report; an abbreviated version has been accepted to appear as a Communications of the ACM viewpoint article). I talked about how we might handle four different classes of research papers (“top papers” which get in without incident, “bubble papers” which could well have been published if only there was capacity, “second tier” papers which are only of interest to limited communities, and “noncompetitive” papers that have no chance) and I suggested that we need to redesign how we handle our publication process, primarily by adopting something akin to arXiv.org on a massive scale. My essay goes into detail on the benefits and challenges of making this happen.

Of all the related ideas out there, the one I find most attractive is what the database community has done with Proceedings of the VLDB Endowment (see also, their FAQ). In short, if you want to publish a paper in VLDB, one of the top conferences in databases, you must submit your manuscript to the PVLDB. Submissions then go through a journal-like two-round reviewing process. You can submit a paper at any time and you’re promised a response within two months. Accepted papers are published immediately online and are also presented at the next VLDB conference.

I would love to extend the PVLDB idea to the field of computer security scholarship, but this is troublesome when our “big four” security conferences — ISOC NDSS, IEEE Security & Privacy (the “Oakland” conference), USENIX Security, and ACM CCS — are governed by four separate professional societies. Back in the old days (ten years ago?), NDSS and USENIX Security were the places you sent “systems” security work, while Oakland and CCS were where you sent “theoretical” security work. Today, that dichotomy doesn’t really exist any more. You pretty much just send your paper to the conference with next deadline. Pretty much the same community of people serves on each program committee and the same sorts of papers appear at every one of these conferences. (Although USENIX Security and NDSS may well still have a preference for “systems” work, the “theory” bias at Oakland and CCS is gone.)

My new idea: Imagine that we set up the “Federated Proceedings of Computer Security” (representing a federation of the four professional societies in question). It’s a virtual conference, publishing exclusively online, so it has no effective limits on the number of papers it might publish. Manuscripts could be submitted to FPCS with rolling deadlines (let’s say one every three months, just like we have now) and conference-like program committees would be assembled for each deadline. (PVLDB has continuous submissions and publications. We could do that just as well.) Operating like a conference PC, top papers would be accepted rapidly and be “published” with the speed of a normal conference PC process. The “bubble” papers that would otherwise have been rejected by our traditional conference process would now have a chance to be edited and go through a second round of review with the same reviewers. Noncompetitive papers would continue to be rejected, as always.

How would we connect FPCS back to the big four security conferences? Simple: once a paper is accepted for FPCS publication, it would appear at the next of the “big four” conferences. Initially, FPCS would operate concurrently with the regular conference submission process, but it could quickly replace it as well, just as PVLDB quickly became the exclusive mechanism for submitting a paper to VLDB.

One more idea: there’s no reason that FPCS submissions need to be guaranteed a slot in one of the big four security conferences. It’s entirely reasonable that we could increase the acceptance rate at FPCS, and have a second round of winnowing for which papers are presented at our conferences. This could either be designed as a “pull” process, where separate conference program committees pick and choose from the FPCS accepted papers, or it could be designed as a “push” process, where conferences give a number of slots to FPCS, which then decides which papers to “award” with a conference presentation. Either way, any paper that’s not immediately given a conference slot is still published, and any such paper that turns out to be a big hit can always be awarded with a conference presentation, even years after the fact.

This sort of two-tier structure has some nice benefits. Good-but-not-stellar papers get properly published, better papers get recognized as such, the whole process operates with lower latency than our current system. Furthermore, we get many fewer papers going around the submit/reject/revise/resubmit treadmill, thus lowering the workload on successive program committees. It’s full of win.

Of course, there are many complications that would get in the way of making this happen:

  • We need a critical mass to get this off the ground. We could initially roll it out with a subset of the big four, and/or with more widely spaced deadlines, but it would be great if the whole big four bought into the idea all at once.
  • We would need to harmonize things like page length and other formatting requirements, as well as have a unified policy on single vs. double-blind submissions.
  • We would need a suitable copyright policy, perhaps adopting something like the Usenix model where authors retain their copyright while agreeing to allow FPCS (and its constituent conferences) the right to republish their work. ACM and IEEE would require arm-twisting to go along with this.
  • We would need a governance structure for FPCS. That would include a steering committee for selecting the editor/program chairs, but who watches the watchers?
  • What do we do with our journals? FPCS changes our conference process around, but doesn’t touch our journals at all. Of course, the journals could also reinvent themselves, but that’s a separate topic.

In summary, my proposed Federated Proceedings of Computer Security adapts many of the good ideas developed by the database community with their PVLDB. We could adopt it incrementally for only one of the big four conferences or we could go whole hog and try to change all four at once.

Thoughts?

Why seals can't secure elections

Over the last few weeks, I’ve described the chaotic attempts of the State of New Jersey to come up with tamper-indicating seals and a seal use protocol to secure its voting machines.

A seal use protocol can allow the seal user to gain some assurance that the sealed material has not been tampered with. But here is the critical problem with using seals in elections: Who is the seal user that needs this assurance? It is not just election officials: it is the citizenry.

Democratic elections present a uniquely difficult set of problems to be solved by a security protocol. In particular, the ballot box or voting machine contains votes that may throw the government out of office. Therefore, it’s not just the government—that is, election officials—that need evidence that no tampering has occurred, it’s the public and the candidates. The election officials (representing the government) have a conflict of interest; corrupt election officials may hire corrupt seal inspectors, or deliberately hire incompetent inspectors, or deliberately fail to train them. Even if the public officials who run the elections are not at all corrupt, the democratic process requires sufficient transparency that the public (and the losing candidates) can be convinced that the process was fair.

In the late 19th century, after widespread, pervasive, and long-lasting fraud by election officials, democracies such as Australia and the United States implemented election protocols in an attempt to solve this problem. The struggle to achieve fair elections lasted for decades and was hard-fought.

A typical 1890s solution works as follows: At the beginning of election day, in the polling place, the ballot box is opened so that representatives of all political parties can see for themselves that it is empty (and does not contain hidden compartments). Then the ballot box is closed, and voting begins. The witnesses from all parties remain near the ballot box all day, so they can see that no one opens it and no one stuffs it. The box has a mechanism that rings a bell whenever a ballot is inserted, to alert the witnesses. At the close of the polls, the ballot box is opened, and the ballots are counted in the presence of witnesses.

drawing of 1890 polling place
(From Elements of Civil Government by Alexander L. Peterman, 1891)

In principle, then, there is no single person or entity that needs to be trusted: the parties watch each other. And this protocol needs no seals at all!

Democratic elections pose difficult problems not just for security protocols in general, but for seal use protocols in particular. Consider the use of tamper-evident security seals in an election where a ballot box is to be protected by seals while it is transported and stored by election officials out of the sight of witnesses. A good protocol for the use of seals requires that seals be chosen with care and deliberation, and that inspectors have substantial and lengthy training on each kind of seal they are supposed to inspect. Without trained inspectors, it is all too easy for an attacker to remove and replace the seal without likelihood of detection.

Consider an audit or recount of a ballot box, days or weeks after an election. It reappears to the presence of witnesses from the political parties from its custody in the hands of election officials. The tamper evident seals are inspected and removed—but by whom?

If elections are to be conducted by the same principles of transparency established over a century ago, the rationale for the selection of particular security seals must be made transparent to the public, to the candidates, and to the political parties. Witnesses from the parties and from the public must be able to receive training on detection of tampering of those particular seals. There must be (the possibility of) public debate and discussion over the effectiveness of these physical security protocols.

It is not clear that this is practical. To my knowledge, such transparency in seal use protocols has never been attempted.


Bibliographic citation for the research paper behind this whole series of posts:
Security Seals On Voting Machines: A Case Study, by Andrew W. Appel. Accepted for publication, ACM Transactions on Information and System Security (TISSEC), 2011.