March 24, 2017

Archives for April 2011

Tracking Your Every Move: iPhone Retains Extensive Location History

Today, Pete Warden and Alasdair Allan revealed that Apple’s iPhone maintains an apparently indefinite log of its location history. To show the data available, they produced and demoed an application called iPhone Tracker for plotting these locations on a map. The application allows you to replay your movements, displaying your precise location at any point in time when you had your phone. Their open-source application works with the GSM (AT&T) version of the iPhone, but I added changes to their code that allow it to work with the CDMA (Verizon) version of the phone as well.

When you sync your iPhone with your computer, iTunes automatically creates a complete backup of the phone to your machine. This backup contains any new content, contacts, and applications that were modified or downloaded since your last sync. Beginning with iOS 4, this backup also included is a SQLite database containing tables named ‘CellLocation’, ‘CdmaCellLocaton’ and ‘WifiLocation’. These correspond to the GSM, CDMA and WiFi variants of location information. Each of these tables contains latitude and longitude data along with timestamps. These tables also contain additional fields that appear largely unused on the CDMA iPhone that I used for testing — including altitude, speed, confidence, “HorizontalAccuracy,” and “VerticalAccuracy.”

Interestingly, the WifiLocation table contains the MAC address of each WiFi network node you have connected to, along with an estimated latitude/longitude. The WifiLocation table in our two-month old CDMA iPhone contains over 53,000 distinct MAC addresses, suggesting that this data is stored not just for networks your device connects to but for every network your phone was aware of (i.e. the network at the Starbucks you walked by — but didn’t connect to).

Location information persists across devices, including upgrades from the iPhone 3GS to iPhone 4, which appears to be a function of the migration process. It is important to note that you must have physical access to the synced machine (i.e. your laptop) in order to access the synced location logs. Malicious code running on the iPhone presumably could also access this data.

Not only was it unclear that the iPhone is storing this data, but the rationale behind storing it remains a mystery. To the best of my knowledge, Apple has not disclosed that this type or quantity of information is being stored. Although Apple does not appear to be currently using this information, we’re curious about the rationale for storing it. In theory, Apple could combine WiFi MAC addresses and GPS locations, creating a highly accurate geolocation service.

The exact implications for mobile security (along with forensics and law enforcement) will be important to watch. What is most surprising is that this granularity of information is being stored at such a large scale on such a mainstream device.

Oak Ridge, spear phishing, and i-voting

Oak Ridge National Labs (one of the US national energy labs, along with Sandia, Livermore, Los Alamos, etc) had a bunch of people fall for a spear phishing attack (see articles in Computerworld and many other descriptions). For those not familiar with the term, spear phishing is sending targeted emails at specific recipients, designed to have them do an action (e.g., click on a link) that will install some form of software (e.g., to allow stealing information from their computers). This is distinct from spam, where the goal is primarily to get you to purchase pharmaceuticals, or maybe install software, but in any case is widespread and not targeted at particular victims. Spear phishing is the same technique used in the Google Aurora (and related) cases last year, the RSA case earlier this year, Epsilon a few weeks ago, and doubtless many others that we haven’t heard about. Targets of spear phishing might be particular people within an organization (e.g., executives, or people on a particular project).

In this posting, I’m going to connect this attack to Internet voting (i-voting), by which I mean casting a ballot from the comfort of your home using your personal computer (i.e., not a dedicated machine in a precinct or government office). My contention is that in addition to all the other risks of i-voting, one of the problems is that people will click links targeted at them by political parties, and will try to cast their vote on fake web sites. The scenario is that operatives of the Orange party send messages to voters who belong to the Purple party claiming to be from the Purple party’s candidate for president and giving a link to a look-alike web site for i-voting, encouraging voters to cast their votes early. The goal of the Orange party is to either prevent Purple voters from voting at all, or to convince them that their vote has been cast and then use their credentials (i.e., username and password) to have software cast their vote for Orange candidates, without the voter ever knowing.

The percentage of users who fall prey to targeted attacks has been a subject of some controversy. While the percentage of users who click on spam emails has fallen significantly over the years as more people are aware of them (and as spam filtering has improved and mail programs have improved to no longer fetch images by default), spear phishing attacks have been assumed to be more effective. The result from Oak Ridge is one of the most significant pieces of hard data in that regard.

According to an article in The Register, of the 530 Oak Ridge employees who received the spear phishing email, 57 fell for the attack by clicking on a link (which silently installed software in their computers using to a security vulnerability in Internet Explorer which was patched earlier this week – but presumably the patch wasn’t installed yet on their computers). Oak Ridge employees are likely to be well-educated scientists (but not necessarily computer scientists) – and hence not representative of the population as a whole. The fact that this was a spear phishing attack means that it was probably targeted at people with access to sensitive information, whether administrative staff, senior scientists, or executives (but probably not the person running the cafeteria, for example). Whether the level of education and access to sensitive information makes them more or less likely to click on links is something for social scientists to assess – I’m going to take it as a data point and assume a range of 5% to 20% of victims will click on a link in a spear phishing attack (i.e., that it’s not off by more than a factor of two).

So as a working hypothesis based on this actual result, I propose that a spear phishing attack designed to draw voters to a fake web site to cast their votes will succeed with 5-20% of the targeted voters. With UOCAVA (military and overseas voters) representing around 5% of the electorate, I propose that a target of impacting 0.25% to 1% of the votes is not an unreasonable assumption. Now if we presume that the race is close and half of them would have voted for the “preferred” candidate anyway, this allows a spear phishing attack to capture an additional 0.12% to 0.50% of the vote.

If i-voting were to become more widespread – for example, to be available to any absentee voter – then these numbers double, because absentee voters are typically 10% of all voters. If i-voting becomes available to all voters, then we can guess that 5% to 20% of ALL votes can be coerced this way. At that point, we might as well give up elections, and go to coin tossing.

Considering the vast sums spent on advertising to influence voters, even for the very limited UOCAVA population, spear phishing seems like a very worthwhile investment for a candidate in a close race.

What We Lose if We Lose Data.gov

In its latest 2011 budget proposal, Congress makes deep cuts to the Electronic Government Fund. This fund supports the continued development and upkeep of several key open government websites, including Data.gov, USASpending.gov and the IT Dashboard. An earlier proposal would have cut the funding from $34 million to $2 million this year, although the current proposal would allocate $17 million to the fund.

Reports say that major cuts to the e-government fund would force OMB to shut down these transparency sites. This would strike a significant blow to the open government movement, and I think it’s important to emphasize exactly why shuttering a site like Data.gov would be so detrimental to transparency.

On its face, Data.gov is a useful catalog. It helps people find the datasets that government has made available to the public. But the catalog is really a convenience that doesn’t necessarily need to be provided by the government itself. Since the vast majority of datasets are hosted on individual agency servers—not directly by Data.gov—private developers could potentially replicate the catalog with only a small amount of effort. So even if Data.gov goes offline, nearly all of the data still exist online, and a private developer could go rebuild a version of the catalog, maybe with even better features and interfaces.

But Data.gov also plays a crucial behind the scenes role, setting standards for open data and helping individual departments and agencies live up to those standards. Data.gov establishes a standard, cross-agency process for publishing raw datasets. The program gives agencies clear guidance on the mechanics and requirements for releasing each new dataset online.

There’s a Data.gov manual that formally documents and teaches this process. Each agency has a lead Data.gov point-of-contact, who’s responsible for identifying publishable datasets and for ensuring that when data is published, it meets information quality guidelines. Each dataset needs to be published with a well-defined set of common metadata fields, so that it can be organized and searched. Moreover, thanks to Data.gov, all the data is funneled through at least five stages of intermediate review—including national security and privacy reviews—before final approval and publication. That process isn’t quick, but it does help ensure that key goals are satisfied.

When agency staff have data they want to publish, they use a special part of the Data.gov website, which outside users never see, called the Data Management System (DMS). This back-end administrative interface allows agency points-of-contact to efficiently coordinate publishing activities agency-wide, and it gives individual data stewards a way to easily upload, view and maintain their own datasets.

My main concern is that this invaluable but underappreciated infrastructure will be lost when IT systems are de-funded. The individual roles and responsibilities, the informal norms and pressures, and perhaps even the tacit authority to put new datasets online would likely also disappear. The loss of structure would probably mean that sharply reduced amounts of data will be put online in the future. The datasets that do get published in an ad hoc way would likely lack the uniformity and quality that the current process creates.

Releasing a new dataset online is already a difficult task for many agencies. While the current standards and processes may be far from perfect, Data.gov provides agencies with a firm footing on which they can base their transparency efforts. I don’t know how much funding is necessary to maintain these critical back-end processes, but whatever Congress decides, it should budget sufficient funds—and direct that they be used—to preserve these critically important tools.