Tor & Why You May Have Something to Hide – Stuff Yaron Finds Interesting

[Updated to include instructions on how to configure web browsers to only use Tor for some websites but not others.]

Tor is an EFF supported open source software project that makes it difficult for anyone to figure out who a Tor user is talking to on the Internet. For example, someone using Tor can pretty effectively hide which websites they visit, where they download content from, who they are sending e-mail to, etc. As I explain below, Tor is a tool everyone should be interested in, even those who don't think they have anything to hide.

Unfortunately Tor's performance can be quite slow. But using proxy configuration files (pac files) it is possible to configure browsers to use Privoxy/Tor for some websites but not others. This is not a perfect solution since, as I explain below, there are some trivial ways to get around this technique but it is better than nothing.

But I have Nothing to Hide!

I can hear the responses now "But I have nothing to hide, why do I care about Tor?"

You see, the funny part is, you probably don't know what you have to hide. The classic example of this was communism in America during the 1930's. With the great depression leading many to believe that capitalism had failed, the happy communist platitudes of 'from each according to their ability to each according to their need' suddenly didn't sound so bad. Many people went to communist events, read communist literature and even joined the communist party. These folks weren't anti-American, they were just looking for a system that didn't produce as much misery as what they were experiencing in the U.S. at the time.

Later on, as the economy recovered and the facts of what communism in practice looked like came out most people left behind their dalliances with communism realizing it to be a fundamentally flawed belief system that ignored the basics of human nature and created an idealistic fantasy land whose every implementation (think the Soviet Union or China) turned into a nightmare of death on a massive scale.

After World War II ended, the American economy took off and the Soviet Union became America's #1 enemy. This period heralded the "Red Scare". Any person who had so much as touched a communist pamphlet could now find themselves being accused of being a communist sympathizer and thus facing the prospect of either ratting on their friends or having their lives ruined. One can just imagine how many people during this period, innocent upstanding Americans who were curious about Communism in the 20's and 30's lay awake at night wondering if there was any record of what they had done or if one of their 'friends', desperate to avoid the destruction of their own livelihood, would rat them out.

The reality is that we don't know what we have to hide. As the government extends systems like Carnivore and Echelon and as laws such as the U.S. Patriot Act make it easier for the government to collect 'traffic analysis' information (such as who people call, where they send e-mail, etc.) without having to provide any real justification we can expect efforts such as "Terrorism Information Awareness" to bring us that happy day when each of us will have a file either in a government or private database that attempts to identify every website we visit (how often and what we look at), every person we send an e-mail to, who we IM with, etc. In fact, according to CNET's Declan Cullagh, both Europe and the US are now actively working on rules and laws to require that ISPs keep records of all Internet activity for all of their users. As these rules/laws get passed, using the usual three horseman of terrorism, drugs and child pornography, every e-mail, chat session and website that anyone visits will be logged and recorded for anywhere between months to years.

I can only imagine how many Arab and Muslim Americans are now laying awake at night wondering which of the charities they may have innocently given money to or Mosques they may have visited or lectures they may have attended pre-9/11 involved people or organizations that are now on the U.S. Government's official terrorist organization watch list. "Do you now or have you ever supported terrorist activities?"

You don't know what you have to hide and by the time you figure it out, it will likely be too late. This is where Tor comes in. It makes it much easier to hide. The reason to use Tor isn't so much because you have something to hide, the reason to use Tor is so when you find out you had something to hide you can rest a little easier knowing that your secret may be protected.

How Tor Works – The 30 Second Version

Tor is based on various folks volunteering computers which act as Tor routers. Someone using the Tor software on their client is able to bounce their messages around these various routers before their message reaches its final destination (for example, a website they want to retrieve content from). The trick to all of this is that when the message is bouncing around the Tor network it is encrypted. The only router that can view the content is the one at the end who makes the actual request. Similarly, using further encryption tricks, the routers that the message bounces around don't know about each other.

Imagine a message starting at an initial router A, going to router B, followed by router C followed by the final website (or e-mail server or IM server or whatever). What happens is that the client sends an encrypted message to router A. Router A can't read the message and only knows that it is to forward the message to router B. Router B also can't read the message but it knows to forward the message to router C. Router C can read the message, decrypts it and forwards it to the final destination, the website (or e-mail server or…). The website then sends the requested web page back to Router C who encrypts the response such that only the original client can read it. Router C then sends the message to B who sends it to A who sends it to the client who decrypts it.

Notice what happened. Router A knows the identity of the client but doesn't know what message the client sent or where it is going. Router B knows it is routing something (it can't read it) between A and C but doesn't know anything else. Router C can read the message but it doesn't know who sent it. Each router only knows enough to do its job, no more.

The end result is that someone listening at any one point in the system can't figure out anything useful. So someone listening between the client and router A knows that the client is sending out requests but it has no idea what's in the requests or where they are going. Someone listening between Router C and the final website (assuming SSL isn't being used) can see the requests and responses but has no idea who made the request.

The end result is a fairly robust (but not perfect, see below) privacy.

The More People Use Tor, The More Security Tor Provides

One of the more interesting aspects of Tor is that the more people use it, the more security it provides.

The first reason is that if the only people using Tor are people who currently have something to hide then anytime a government or other interested entity sees Tor traffic they can use it as a flag that whomever the original client is, they are probably worth investing further. But the more people use Tor, especially people who have nothing to hide (that they know about) the less useful that flag becomes. In other words, using Tor when you have nothing to hide will make it much more useful when you do.

The second reason is more technical but equally important. Organizations that can intercept Internet traffic at multiple points can perform certain kinds of analysis that would potentially allow them to break Tor's anonymity. For an extreme example imagine a Tor network that consists of a single router and only one client. If the observer can intercept all communications leaving the client and all communications leaving the router then they know who the client is talking to since they know that all the data the router is sending must have come from that client. But as more and more people use the router it gets harder and harder to figure out which requests came from which sources. In the general case this means that the more people use Tor the harder it is to use multi-point intercept analysis to figure out anything useful.

Tor's Main Downside – Speed (and how to work around it)

When I first wrote this article in January of 2005 Tor was so slow as to be unusable. Although its speed has significantly increased it is still quite slow and I find that keeping it on for normal browsing is just too frustrating. But there are specific web sites which pose such a significant privacy risk that I'm willing to use Tor with them. The top of my 'privacy threat' websites is Google.

I love using Google but as a consequence it knows a lot more about me than I want it to. So what I wanted was a way to configure my web browser so that requests to Google would go to Tor. The easiest solution I have found is a pac file.

Pac files were invented by Netscape, the original documentation is available here, but they are now supported by all major web browsers. In essence they are Javascript files that are called by the browser to find out if a particular website should be accessed via a proxy. Below I give the pac file I use. It routes all Google requests through privoxy/tor as well as requests to privoxy's 'reserved' addresses that give one local control over privoxy's behavior:

function FindProxyForURL(url, host) {
        if (shExpMatch(host,"*google.*") || 
            shExpMatch(host,"config.privoxy.org") ||
            shExpMatch(host,"p.p")) 
                return "PROXY 127.0.0.1:8118";
        return "DIRECT";
}

Just save this text to a file and call it something like tor.pac and then point your browser's proxy pac file to it.

To configure Firefox to use a pac file under OS X go to Firefox->preferences->General->Connection Settings…->Automatic proxy configuration URL:. Enter in a URL (you can use file:// to point to a local file) that points to your pac file and click reload.

I would avoid using Safari under OS X because its pac file support is seriously buggy.

Note, however, that this technique is very far from secure. A trivial way to defeat this approach is for a website that is on your 'proxy' list to include in its pages a picture from a different domain that it owns. The domain the picture comes from won't be on the 'black' list and so the site can find out who you are. Yes, Firefox can be configured not to load pictures from domains other than the current one and to restrict cookies but my guess is that using scripting or other tricks it will still be possible for a determined site to use 'allied' domains to find out who you are. Therefore this technique is only useful with sites like Google that aren't malicious but, due to their nature, can collect a lot of information you may not want them to.

Using Tor in this way is clearly sub-optimal but I think it's better than not using it at all, which given it's current performance, would be my only other choice.

How Tor Works with a discussion of a few issues (WARNING THIS SECTION IS FOR PROTOCOL GEEKS ONLY)

Tor uses a circuit based design where a Tor client accesses the Tor directory (implemented using Tor's own directory protocol, we don't need no stinking standards) to download a list of available Tor onion routers and their encryption keys. The Tor client then selects a subset of the available onion routers (usually 3) and creates a circuit between them. I'll spare the reader the details of exactly how a circuit gets established and just say that once established each onion router only knows about its predecessor and successor. In addition all data sent down the circuit is encrypted using the final onion router in the circuit's (called the exit router) encryption key. The end result is that only the initial onion router knows who sent the data and only the exit onion router knows where the data is going. Tor only supports TCP connection but it supports both IPv4 and IPv6 addresses.

To use Tor one downloads the Tor client which is a socks proxy that runs on one's machine. One then configures either the OS or individual applications to use the local Tor socks client. Tor supports socks 4, 4a and 5. This is where things get interesting. It turns out that socks 4 could only accept IP addresses. This means that if one uses a socks 4 compatible web browser, for example and navigates to www.cryptome.com the local system will make a DNS request for cryptome's IP. The result is that anyone listening on the client's local network will now know that the client PC wants to contact cryptome. Socks 4a fixes this flaw by modifying socks 4 to accept DNS addresses. In that case the DNS address will be resolved by the exit onion router. The only problem is that there are relatively few systems that support socks 4a. In theory socks 5 solves the issue because socks 5 natively supports accepting DNS addresses. However the Tor folks have found that in practice most socks 5 clients will actually resolve the DNS address locally anyway and pass the IP address to the socks 5 proxy. To be clear, this is not a flaw in the socks 5 protocol, it's a flaw in how the clients are implemented. In fact, to date, the Tor folks have found exactly one socks 5 client that doesn't do the wrong thing – Apple's Safari browser.

There is a workaround for the DNS problem, at least for web browsers (which is good since I use Camino not Safari), use Privoxy. Privoxy is a HTTP Proxy that runs locally and supports socks 4a. It also throws in some other nice features like blocking banner ads and cleaning up HTTP requests to remove sensitive information like the referer (sic) field.

Fundamentally I think Tor should be able to scale. The circuit based design is a bit nasty but it's a hack to deal with current bandwidth and processing speed limitations. Since bandwidth only increases while the size of the source routing list stays constant I suspect bandwidth motivations to use circuit switching will quickly disappear. The circuit switching design is also nice for performance reasons, requiring less processing by the onion routers to route packets, but that too I suspect is just a passing phase that increasing processor capacity will make no longer compelling.

As such my guess is that Tor will eventually become a true packet switched network with every packet individually source routed. In other words, Tor will re-invent TCP over TCP. The only real long term challenge I see to Tor is latency. Tor's security depends on routing packets randomly which means intentionally sending them in sub-optimal (from a latency perspective) directions. It doesn't take a visionary to recognize that latency will increasingly be the single biggest performance issue for most Internet operations. Today latency is hidden by the relatively low bandwidths and slow processes currently in use. But over time as bandwidth bloats to fantastic portions (terabyte per second networks anyone?) and processors reach speeds that today we would only use in comic books the constancy of the speed of light will increasingly stand out.

This will be, I suspect Tor's greatest long term threat. But even this threat can be dealt with. For example, by intentionally sending bogus data in various directions and hiding one's real requests amongst the bogus ones it should be possible to more optimally route one's requests. I also expect that ridiculous levels of effort will be put into caching and distributed processing technologies in order to push services and information as close as possible to users in order to minimize latency. The consequence of those distributed technologies is that it will be easier to send one's data in various random directions and still hit the service one wants since the service itself will be seeded all over the place. There are, as always, limits. Certain services will absolutely require global coordination which means a global synchronization point which means introducing lots of latency. But I suspect we will be amazed by our future selves' creativity in reducing such synchronization points to an absolute minimum.

In any case, my guess is that onion routing and its progeny are a workable approach which is why I keep playing with Tor.

3 thoughts on “Tor & Why You May Have Something to Hide”

Mark Nottingham says:

January 23, 2005 at 1:51 am

Sounds like the old ATT Research CROWDS project (link seems to be broken)…

Yaron says:

January 23, 2005 at 1:29 pm

The idea of an onion router is indeed an old one. This is just one project to make it practically workable. The theory is sound but the perf it just awful.

Brian says:

September 12, 2009 at 9:38 pm

If you want a simple http proxy to protect yourself, try looking here:

http://www.pxylst.info

Proxies are scanned hourly to make sure they work.