Side-Channel Leaks in Web Applications

March 23, 2010 by Ed Felten

Popular online applications may leak your private data to a network eavesdropper, even if you’re using secure web connections, according to a new paper by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. (Chen is at Microsoft Research; the others are at Indiana.) It’s a sobering result — yet another illustration of how much information can be leaked by ordinary web technologies. It’s also really clever.

Here’s the background: Secure web connections encrypt traffic so that only your browser and the web server you’re visiting can see the contents of your communication. Although a network eavesdropper can’t understand the requests your browser sends, nor the replies from the server, it has long been known that an eavesdropper can see the size of the request and reply messages, and that these sizes sometimes leak information about which page you’re viewing, if the request size (i.e., the size of the URL) or the reply size (i.e., the size of the HTML page you’re viewing) is distinctive.

The new paper shows that this inference-from-size problem gets much, much worse when pages are using the now-standard AJAX programming methods, in which a web “page” is really a computer program that makes frequent requests to the server for information. With more requests to the server, there are many more opportunities for an eavesdropper to make inferences about what you’re doing — to the point that common applications leak a great deal of private information.

Consider a search engine that autocompletes search queries: when you start to type a query, the search engine gives you a list of suggested queries that start with whatever characters you have typed so far. When you type the first letter of your search query, the search engine page will send that character to the server, and the server will send back a list of suggested completions. Unfortunately, the size of that suggested completion list will depend on which character you typed, so an eavesdropper can use the size of the encrypted response to deduce which letter you typed. When you type the second letter of your query, another request will go to the server, and another encrypted reply will come back, which will again have a distinctive size, allowing the eavesdropper (who already knows the first character you typed) to deduce the second character; and so on. In the end the eavesdropper will know exactly which search query you typed. This attack worked against the Google, Yahoo, and Microsoft Bing search engines.

Many web apps that handle sensitive information seem to be susceptible to similar attacks. The researchers studied a major online tax preparation site (which they don’t name) and found that it leaks a fairly accurate estimate of your Adjusted Gross Income (AGI). This happens because the exact set of questions you have to answer, and the exact data tables used in tax preparation, will vary based on your AGI. To give one example, there is a particular interaction relating to a possible student loan interest calculation, that only happens if your AGI is between $115,000 and $145,000 — so that the presence or absence of the distinctively-sized message exchange relating to that calculation tells an eavesdropper whether your AGI is between $115,000 and $145,000. By assembling a set of clues like this, an eavesdropper can get a good fix on your AGI, plus information about your family status, and so on.

For similar reasons, a major online health site leaks information about which medications you are taking, and a major investment site leaks information about your investments.

The paper goes on to consider possible mitigations. The most obvious mitigation is to add padding to messages so that their sizes are not so distinctive — for example, every message might be padded to make its size a multiple of 256 bytes. This turns out to be less effective than you might expect — significant information can still leak even if messages are generously padded — and the padded messages are slower and more expensive to transmit.

We don’t know which sites the researchers studied, but it seems like a safe bet that most, if not all, of the sites in these product categories have similar problems. It’s important to keep these attacks in perspective — bear in mind that they can only be carried out by someone who can eavesdrop on the network between you and the site you’re visiting.

It’s becoming increasingly clear that securing web-based applications is very difficult, and that the basic tools for developing web apps don’t do much to help. The industry, and researchers, will be struggling with web app security issues for years to come.

Comments

paranoid says

March 30, 2010 at 11:38 pm

I have looked at this some time back and already then it was fixed by Yahoo!. Bing and Google hasn’t addressed this.
Claude says

March 24, 2010 at 10:26 am

This is very very nice paper!
However I am really wondering to which extend the “query work leaks” attack work. I wish they had done some more tests and provided experimental results.

They are at least 2 scenarios, I can think of, where the attack does not work well:
(1) Google signed-in users get personalized suggestions (from their web history)…and these entries would be hard(er) to predict (personalization in this case helps privacy ;-))…
(2) If a user types quickly, the number of AJAX requests can be reduced (i.e. a request might be sent for 2-3 letters)…and this, again, will make the guessing more difficult!

If you are interested by this type of work, please have a look at the paper “Information Private Information Disclosure from Web Searches (the case of Google Web History)”, available at:
http://planete.inrialpes.fr/projects/private-information-disclosure-from-web-searches/
This paper shows how a user’s web history can be inferred from his web searches and more…
- Anonymous says
  
  March 24, 2010 at 11:03 pm
  
  just type the search URL directly into your address bar…
  http://www.google.com/search?q=this is my query
  …
  and you avoid the silly suggestions…
  
  the suggestion capability is cute, but needlessly so.
  - Anonymous says
    
    March 26, 2010 at 12:30 pm
    
    Scroogle doesn’t pass all this traffic and has a side benefit of both encrypting your searches and not getting you personalized by Google.
  - Anonymous says
    
    April 2, 2010 at 2:34 pm
    
    you can turn it off in your google preferences
Jon-Michael C. Brook says

March 23, 2010 at 2:48 pm

The Government performs all sorts of manipulations to avoid side channel attacks. They primarily fall into two categories. Government agencies avoid timing attacks by adding random CPU cycles or network delays, just so an adversary cannot tell they are doing things like a large amount of encryption or ordering a bunch of pizzas. Chen, Wang, Wang and Zhang’s research points to sizing attacks, where memory usage or network packet size may be used to glean a bit more information. As mentioned in the article, random memory calls and network packets, or packet padding will circumvent many of these attacks.

Most of these only work when there is a large amount of information known about the system. Proprietary systems (those built from the ground up) have the security by obscurity aspect. One of the beauties of cloud computing is how well it is defined – I see this as yet another weakness to data storage/processing in the cloud. Then again, it’s probably easier just to look up the user’s data on a public records web site.
Anonymous says

March 23, 2010 at 9:40 am

I’m very far from an expert on this stuff, but…

It has been my understanding that military encryption systems (as of several decades ago) were designed to transmit a continuous stream of (pseudo???)-random “garbage”. The result being that any eavesdropper couldn’t tell when actual traffic was flowing by merely inspecting the data stream. As you point out, simply watching the bursts of data go by does itself leak info, so making it a continuous stream of bits closes that loophole.

unfortunately, that method is probably not particularly practical for the current internet as it would vastly increase the amount of traffic being transported.
- felten says
  
  March 23, 2010 at 11:42 am
  
  Yes, these approaches are probably too expensive for the web setting. Padding messages up to a fixed (or quantized) length is a milder version of this approach, adding some “cover traffic” but not too much.
  - rp says
    
    March 24, 2010 at 11:28 am
    
    Isn’t padding to a quantized length going to give out way too much information? You could pad to a random set of lengths without using significantly more bandwidth. (And somehow I think the bandwidth concerns are pretty meaningless here — unless the serving organization’s pipes are saturated, even doubling the amount of material transmitted is going to be a fraction of a typical video clip or flash graphic — which many of those sites also send out.)
Jeff S. says

March 23, 2010 at 9:14 am

Fascinating analysis, interesting topic.

My contribution is needlessly pedantic and contributes nothing to the discussion, but… can we please banish the phrase “search query” from the vernacular? I shall be sending a strongly worded memo to the President of the Internet.

Sorry to take the conversation off topic right away, but this issue is exceedingly annoying.

Please resume intelligent discourse now.
- felten says
  
  March 23, 2010 at 11:27 am
  
  I’m curious: What’s wrong with the term “search query”? And what would you replace it with?
  - Jeff S. says
    
    March 24, 2010 at 10:00 am
    
    “Search” and “Query” are synonyms. Perhaps not entirely interchangeable, but for most cases, they mean the same thing.
    
    “Search query” is needlessly redundant and wordified, like having a “hamburger sandwich” or driving a “motor car.”
    
    Said another way, how is a “search query” different from other types of queries? In fact, what other types of queries are there? Do they not involve searching?
    
    Alternatives:
    
    “In the end the eavesdropper will know exactly what search you typed.”
    
    -or-
    
    “In the end the eavesdropper will know exactly what query you typed.”
    - Anonymous says
      
      March 29, 2010 at 10:59 am
      
      A Search Query in programming could be a specific query object used to pass to a search provider. var searchQuery = new SearchQuery() { QueryText = “security” };
      Meanwhile, you can have a BlogCommentQuery that is used to query for a BlogComment. var commentQuery = new BlogCommentQuery() { BlogPostId = “9” }; The SearchQuery object is querying a searchable index. The BlogCommentQuery object is querying a table in a database.
      - Anonymous says
        
        April 2, 2010 at 2:33 pm
        
        I didn’t see any code examples in the article. You could also have SrchQuery as the object in programming, and I don’t think you could argue the correct spelling of search is now srch.
- Feto says
  
  March 23, 2010 at 11:53 am
  
  I’m curious… Who is The President of The Internet?
  - Jeff S. says
    
    March 24, 2010 at 9:52 am
    
    “President of the Internet” was supposed to be funny. Next time, I’ll have to use one of those punctuation smileys that all the kids are using these days.

Side-Channel Leaks in Web Applications

Comments

Contributors

Archives by Month

Side-Channel Leaks in Web Applications

Comments

What We Discuss

Contributors

Archives by Month