<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stuff Yaron Finds Interesting &#187; Tech</title>
	<atom:link href="http://www.goland.org/category/technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.goland.org</link>
	<description>Technology, Politics, Food, Finance, etc.</description>
	<lastBuildDate>Mon, 23 Jan 2012 00:42:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Losing exceptions in C#, there has to be a better way!</title>
		<link>http://www.goland.org/losing_exception/</link>
		<comments>http://www.goland.org/losing_exception/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 00:24:36 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=971</guid>
		<description><![CDATA[A nasty problem I’ve been tangling with for a while now is that C# likes to eat exceptions. If one is already in an exception context and another exception gets thrown then the first exception, by default, is just lost. I explore below some ways to deal with this and honestly they all suck. Does [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">
A nasty problem I’ve been tangling with for a while now is that C# likes to eat exceptions. If one is already in an exception context and another exception gets thrown then the first exception, by default, is just lost. I explore below some ways to deal with this and honestly they all suck. Does anyone have a better idea?
</div>
<span id="more-971"></span>
<div class="Unindented">
So it all started with Using. I often am in a situation where I create stateful objects, do something with them and then need to make sure they get cleaned up so I don’t run out of resources (threads, connections, memory, etc.)
</div>
<div class="Indented">
Typically the way this is handled is via Using. Just wrap the object in Using and one has a (sorta) guarantee that things will be cleaned up. But Using is generally not considered a good idea because it can hide exceptions. See <a class="URL" href="http://msdn.microsoft.com/en-us/library/aa355056.aspx">here</a> for a description with examples of the two flavors of problems with Using.
</div>

<div class="Indented">
The short description of the problem is that Using appears to essentially be some syntactic sugar on top of Finally. To see the issue it’s useful to understand that:
</div>
<pre class="LyX-Code">
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
    doStuff(sqlConnection);
}
</pre>
<div class="Unindented">
Is, near as I can tell anyway, functionally identical to:
</div>
<pre class="LyX-Code">
SqlConnection sqlConnection = null;
try
{
    sqlConnection = new SqlConnection(connectionString);
    doStuff(sqlConnection);
}
finally
{
    if (sqlConnection != null)
    {
        sqlConnection.Dispose();
    }
}
</pre>
<div class="Unindented">
Now imagine that doStuff() throws an exception. Before the exception is caught the finally clause will be called. Now imagine that sqlConnection.Dispose() throws an exception. What will happen is that the sqlConnection.Dispose() exception will mask the doStuff() exception. In the general case this is usually bad. In many cases the thing that will cause sqlConnection.Dispose() to fail will probably be explained by something bad that doStuff() did. So by losing the doStuff() exception we lose the reason why things went wrong in the first place.
</div>
<div class="Indented">
As far as I can tell these are the options for how to deal with this situation:

</div>
<div class="Description">
<span class="Description-entry">Lose one of the exceptions</span> I can set things up so that one of the two exceptions (either doStuff’s or Dispose’s) is simply thrown away. The easy one to get rid of is doStuff since I get that for ’free’ via finally. Getting rid of Dispose’s is also pretty easy, just wrap the Dispose call in a try/catch with an empty catch. It isn’t pretty but it would work. This approach sucks if both exceptions have important system data and picking which one to get rid of is kind of random. After all doStuff’s exception might have caused the Dispose exception, or it might not, I won’t know if I get rid of doStuff’s exception.
</div>
<div class="Description">
<span class="Description-entry">Make one of the exceptions an inner exception</span> In essence I would have to create a new exception that tries to replicate one of the exceptions and puts the other exception as an inner exception. But creating a new exception gets rid of the old stack trace which is losing important information and damned if I can tell which exception should be outer and inner. In reality we are really just abusing inner exception. The point of inner exception is that it somehow caused the outer exception. But in this case the two exceptions might be unrelated so using the inner/outer isn’t great.
</div>
<div class="Description">
<span class="Description-entry">Log one of the exceptions</span> Another option, which is really just slightly better than losing one of the exceptions, is log something about the exception that we choose to lose. If one of the exceptions is a ’show stopper’ exception (e.g. the program is going to exit) then I only need to know about the other exception for investigative purposes. So I can just dump the hidden exception’s data into the log and let the ’killer’ exception go up the stack.
</div>
<div class="Description">
<span class="Description-entry">Use aggregate exceptions</span> This is a new feature introduced by .NET 4.0 that allows one to throw exceptions that consist of collections of exceptions. But unless the code was written from the ground up to deal with aggregate exceptions this pattern is crazy intrusive. Imagine we have handlers for both doStuff and Dispose. If they aren’t expecting an aggregate exception and pattern matching properly then the aggregate exception containing both doStuff and Dispose’s exceptions inside of it will just blow right past those handlers.
</div>

<div class="Unindented">
To be clear, all these options suck. Losing an exception offends me so I just can’t bring myself to do that. The inner exception trick is just wrong. It’s the kind of semantics that will drive other programmers crazy. &ldquo;I see this inner exception but I swear it couldn’t have caused the outer exception, WTF?!?!??&rdquo; Aggregates seem really wacky to me unless code was written from the ground up to deal with them and even then they are really painful. Here is an example of an aggregate exception just to drive home the point:
</div>
<pre class="LyX-Code">
SqlConnection sqlConnection = new SqlConnection(connectionString);
try
{
    doStuff(sqlConnection);
}
catch (Exception e)
{
    try
    {
        connection.Close();
    }
    catch (Exception closeE)
    {
        throw new AggregateException(new Exception[] { e, closeE });
    }
    throw;
}
connection.Close();
</pre>
<div class="Unindented">
Think about that code for a second. If doStuff screws up and Close doesn’t have a problem then a doStuff exception is thrown. If doStuff doesn’t throw an exception but the Close() outside the catch block throws an exception then a Close() exception is thrown. But if both doStuff and Close() throw then an aggregate exception gets thrown. Does anyone want to write the catch clauses for this stuff?!?! The only sane solution is to always throw Aggregate exceptions and then write handlers that pick through them to see what’s there. ICK. It means having to wrap all calls in all cases with a try/catch just to translate their non-aggregate exceptions into aggregate exceptions. No thanks, that’s just nuts.
</div>
<div class="Indented">
So I’m left with logging. But remember that isn’t free either. For example:
</div>
<pre class="LyX-Code">
SqlConnection sqlConnection = new SqlConnection(connectionString);
try
{
    doStuff(sqlConnection);
}
catch (Exception)
{
    try
    {
        sqlConnection.Dispose();
    }
    catch(Exception e)
    {
        log(e);
    }
    throw;
}
sqlConnection.Dispose();

</pre>
<div class="Unindented">
In most cases if I have to pick between doStuff’s exception and the Dispose exception I want doStuff’s. So this means I can’t use &ldquo;using&rdquo; (which forces me to hide doStuff’s exception if Dispose throws) which bloats my code. And I have to remember to repeat Dispose twice (once inside of the catch and once outside how’s that for begging for bugs?). And I have to remember to wrap the Dispose inside the catch in its own try/catch and log. This sucks.
</div>
<div class="Indented">
So all the choices seem to suck. I think the least sucky is logging but it’s still pretty high on the sucky scale.
</div>
<div class="Indented">
Anyone have any better ideas?
</div>]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/losing_exception/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>State diagrams for Paxos made simple</title>
		<link>http://www.goland.org/state_diagrams_for_paxos_made_simple/</link>
		<comments>http://www.goland.org/state_diagrams_for_paxos_made_simple/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 02:19:51 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Internet Protocols]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=958</guid>
		<description><![CDATA[I was reading through Paxos made simple and I really wished there were state diagrams to help explicate the protocol. So I wrote them up and share them below. Please keep in mind that the diagrams just explore naive Paxos, that is single value, no distinguished proposer or distinguished learner. So this version of Paxos [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">
I was reading through <a class="URL" href="http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf">Paxos made simple</a> and I really wished there were state diagrams to help explicate the protocol. So I wrote them up and share them below. Please keep in mind that the diagrams just explore naive Paxos, that is single value, no distinguished proposer or distinguished learner. So this version of Paxos is pretty useless in practice but it completely captures the core mechanisms that make Paxos work (with the exception of how to pick a distinguished learner). Please note that this article is intended to be used by someone going through Paxos made simple. It is an adjunct, not a replacement.
</div>
<span id="more-958"></span>
<h1 class="Section">
<a class="toc" name="toc-Section-1">1</a> Modeling assumptions
</h1>
<div class="Unindented">
Following the lead of the paper, messaging is modeled as one way unicast with guaranteed reliability and with no repeats. Guaranteed reliability, btw, only means that the message will get there, it says nothing about when. 
</div>
<div class="Indented">
One way messaging also means that unless the receiver of the message explicitly decides to respond to the message there is no way for the sender to know when the message arrived or what happened as a consequence of receiving it.
</div>
<div class="Indented">
Sending a message is executed as sendMessage(address, purpose of message, arguments...). Receiving a message is executed as messageReceived(purpose of message, arguments...).
</div>
<div class="Indented">
The state diagrams below are assumed to be single threaded. In cases where an entity is waiting for messages only one message will be processed at a time and any remaining messages that have arrived will be put into an infinitely big queue to be processed one by one. 
</div>
<div class="Indented">
Finally, any time a value is assigned it is assumed to be assigned in some suitably persistent way (e.g. written to a disk, stored on tape, chiseled into a stone tablet, etc.)
</div>
<div class="Indented">
These simplifications do not alter the algorithm, they just make the state diagrams easier to read.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-2">2</a> Proposer
</h1>
<div class="Unindented">
<div class="float">
<a class="Label" name="Figure-1"> </a><div class="figure">
<img class="embedded" src="paxos_made_simple_proposer.svg" alt="figure paxos_made_simple_proposer.svg"/>
<div class="caption">
Figure 1 State diagram for a proposer
</div>

</div>

</div>

</div>
<div class="Indented">
We begin by assuming that some client somewhere instantiates a proposer. How this happens is out of scope for the algorithm definition.
</div>
<div class="Indented">
Notice that there is no failure logic in the state machine. This is because the default algorithm doesn’t send error messages when requests are rejected. So if a quorum’s worth of the proposal requests aren’t accepted then the proposer will stay in the wait state forever.
</div>
<div class="Indented">
Similarly there is no obvious way for the proposer to know if their accept request was accepted by a quorum of acceptors. First, acceptors don’t send errors for rejected requests and second they send acknowledgements of accepted requests to learners, not proposers.
</div>
<div class="Indented">
None of these issues alter the algorithm and there are plenty of ways to deal with them in practice so we follow the Paxos made simple paper and ignore them.
</div>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.1">2.1</a> InitializeSystem
</h2>
<div class="Unindented">
The arguments passed to the proposer are:
</div>
<div class="Description">
<span class="Description-entry">acceptorList</span> This is the list of all acceptors in the cluster. Note that in multicast based systems this could just be the multicast address but in that case there would need to be a fourth argument to record how many acceptors there are in the cluster.
</div>
<div class="Description">
<span class="Description-entry">proposalNumber</span> The proposal number that the proposer is supposed to start out proposing. Note that Paxos requires that different proposers use unique proposal numbers but doesn’t specify how this is to occur. There are a number of ways to achieve this but they aren’t important for understanding the core algorithm. The key thing to understand is that the initial proposalNumber will be unique amongst all other proposals.
</div>
<div class="Description">
<span class="Description-entry">proposalValue</span> This is the value that is supposed to be proposed.
</div>
<div class="Description">
<span class="Description-entry">proposerAddress</span> This is the address to which messages to the proposer can be sent, this is needed by acceptors to send in responses to successful prepare messages.
</div>
<div class="Unindented">
The proposer also has the following local variables:
</div>
<div class="Description">
<span class="Description-entry">acceptedPrepareCount</span> This records how many acceptors have accepted the proposer’s prepare request
</div>
<div class="Description">
<span class="Description-entry">seenAcceptedProposalNumber</span> This is the highest proposal number that was returned in a response to a prepare message seen so far.
</div>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.2">2.2</a> A question of identity
</h2>
<div class="Unindented">
There is an assumption that the proposerAddress is absolutely unique and used only once with a single proposal. Put another way, each time a proposer is instantiated it conceptually created a universally unique address for itself that it will never re-use for any other proposals. That way when it receives a message it knows that the message is for it. This is important because otherwise a proposer could receive a message that got ’stuck’ in the system for an old version of itself that was proposing something else and got re-used for this proposal. 
</div>
<div class="Indented">
Of course in practice one doesn’t have to get anywhere near this fancy. Just sticking the proposalNumber being responded to in the ProposalAccepted message would a long way to taking care of the potential naming issues. But it would also make the diagram more complex.
</div>
<div class="Indented">
It’s interesting to speculate however as to what would happen if two instances of a proposer existed which had the same address. This is entirely plausible, especially in a distributed cloud environment. This isn’t something the algorithm was designed for, there is an assumption that messages fined their way to the ’right’ place, eventually. As long as messages don’t get duplicated (which, for protocols like UDP is also entirely plausible) it’s probably isn’t a big deal but if message can get duplicated then the protocol could entire an ’impossible’ state where two or more acceptors have accepted the same proposalNumber but with different values. That isn’t a legal state in the protocol.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-3">3</a> Acceptor
</h1>
<div class="Unindented">
<div class="float">
<a class="Label" name="Figure-2"> </a><div class="figure">
<img class="embedded" src="paxos_made_simpler_acceptor.svg" alt="figure paxos_made_simpler_acceptor.svg"/>
<div class="caption">
Figure 2 State diagram for an acceptor
</div>

</div>

</div>

</div>
<div class="Indented">
The guard conditions at the start of AcceptPrepare and AcceptAccept should really be on the links and not in the states. But doing it ’right’ made the picture enormous so I had to push them into the states for readability.
</div>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-3.1">3.1</a> InitializeSystem
</h2>
<div class="Unindented">
When an acceptor is created it is passed in:
</div>
<div class="Description">
<span class="Description-entry">learnersList</span> Specifies the list of entities to notify when a value is accepted. 
</div>
<div class="Description">
<span class="Description-entry">acceptorAddress</span> The acceptor’s own address, this is used when notifying learners
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-4">4</a> Learner
</h1>
<div class="Unindented">
<div class="float">
<a class="Label" name="Figure-3"> </a><div class="figure">
<img class="embedded" src="paxos_made_simpler_learner.svg" alt="figure paxos_made_simpler_learner.svg"/>
<div class="caption">
Figure 3 State diagram for a learner
</div>

</div>

</div>

</div>
<div class="Indented">
Other than talking about distinguished learners in the context of scalability the Paxos Made Simple paper doesn’t actually say much about learners. So I’ve taken a very conservative approach and just assumed that learners just record what they learn, no more. There is one state variable, learnedValue. It’s a dictionary where the key is an acceptor’s address and the value is whatever value the acceptor has accepted. Presumably anyone who wants to use what a learner has learned would look at the dictionary and see if any value in the dictionary is quorate.
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/state_diagrams_for_paxos_made_simple/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wrapped or Native Paxos?</title>
		<link>http://www.goland.org/wrapped_vs_native_paxo/</link>
		<comments>http://www.goland.org/wrapped_vs_native_paxo/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 22:49:59 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Internet Protocols]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=953</guid>
		<description><![CDATA[So let’s say I want to build a nice highly consistent multi-data center store, something like Megastore. Most everyone at this point has something like Bigtable already deployed in their data centers. What they typically don’t have is a way to keep different instances of their table stores guaranteed consistent with each other across DCs. [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">
So let’s say I want to build a nice highly consistent multi-data center store, something like <a class="URL" href="http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf">Megastore</a>. Most everyone at this point has something like Bigtable already deployed in their data centers. What they typically don’t have is a way to keep different instances of their table stores guaranteed consistent with each other across DCs. Megastore steps in to address this issue. But this begs a fundamental question - what’s better, to wrap a Paxos coordinator on top of existing table stores or to build a new Paxos native storage service?
</div>
<span id="more-953"></span>
<h1 class="Section">
<a class="toc" name="toc-Section-1">1</a> The architectures
</h1>
<div class="Unindented">
<div class="float">
<a class="Label" name="fig:Two-possible-Megastore"> </a><div class="figure">
<img class="embedded" src="wrapped_vs_native.svg" alt="figure wrapped_vs_native.svg"/>
<div class="caption">
Figure 1 Two possible Megastore style architectures
</div>

</div>

</div>
In the first architecture, the one on the left, each DC contains a machine that is running Paxos. Think of this as a combination of the replication server and coordinator in the Megastore paper. All reads and writes go through the wrapped Paxos instance. Writes must always be relayed to the local table store instance (which runs on the existing table store infrastructure in the same DC) for persistence. Reads, however, can often be handled directly out of the Wrapped Paxos Instance via its local cache.
</div>
<div class="Indented">
In the Native Paxos Instance case there is no existing table store that is being wrapped. Instead each Paxos instance has its own local table store with its own persistent storage (Read: disk) physically on the same machine. There is no call out to a completely stand along table store service.
</div>
<div class="Indented">
So really the only difference between the two architectures is that the Wrapped Paxos architecture uses the Paxos boxes as caches with a separate table service and all of its infrastructure handling persistent storage. With the Native Paxos architecture there is no separate table service, no separate naming/replication/locking infrastructure, just a local single instance Table Store.
</div>
<div class="Indented">
An example of a wrapped Paxos instance could be a VM running Paxos in Windows Azure that talks to Windows Azure Table Store or a machine in a data center running Paxos talking to a local installation of HBase or Mongo or CouchDB. An example of a Native Paxos instance could be a machine running Paxos who records all data persistently to the local machine using MySQL or single instance (e.g. one box) deployments of HBase, Mongo or CouchDB.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-2">2</a> Issues to consider in choosing an architecture
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.1">2.1</a> Are local machines recycled with abandon?
</h2>
<div class="Unindented">
If running in a cloud that doesn’t provide good guarantees that machines (and their local disks) won’t be recycled with abandon then one has little choice but to use a wrapped approach that leverages some existing table store provided by the cloud provider. 
</div>
<div class="Indented">
It’s tempting to argue that the recycling issue isn’t that big a deal. Imagine, for example, that one’s cloud provider has five DCs in a particular region and one has deployed one’s Native Paxos Instances across all 5. In that case the probability of losing all five machines is really low so it’s o.k. if a few get recycled here and there (with total data loss), we can just resynch from the survivors. But to think this way is to ignore the very real possibility of what’s called a &ldquo;poisoned value.&rdquo; 
</div>
<div class="Indented">
A not uncommon bug in replicated systems is that one gets a request with some value that triggers a bug that causes the local machine processing the request to fail hard and possibly get fully recycled by the underlying cloud infrastructure. The problem is that replicated systems like to retry so the message is likely to get repeated to the next instance of the system who then crashes unrecoverably and so on. No amount of replication can protect against this kind of systemic error. So it’s usually considered best practice to have some kind of durable storage, just in case. But if all the crashed machines can potentially be fully recycled with their local disk state lost then there is no real durable storage. So yes, this issue probably matters.
</div>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.2">2.2</a> Lowering cost
</h2>
<div class="Unindented">
The Wrapped Paxos Instance can be made potentially cheaper than the Native Paxos Instance. The reason for this is that with the Native Paxos Instance the capacity of the cluster is limited to the memory and disk space on the smallest machine in the cluster. In the case of the Wrapped Paxos Instance one can represent substantially more data than would fit on a single box’s RAM or disk. And typically, especially for cloud based solutions, the cost of storing data in a cloud provider’s table store is typically much less than running a VM and storing the data on the VM’s disk. So money is saved both in having fewer Paxos clusters but also in storing the persistent data more cheaply. This assumes, of course, that one is willing to take the latency penalty of cache misses, but that is a configurable choice.
</div>
<div class="Indented">
Note that in theory the cost savings shouldn’t be there. For example, most table stores will replicate a value three times inside of a data center. So if one has one’s store replicated across three DCs then a Wrapped Paxos approach will end up with three copies in each DC for a total of six copies. But a Native Paxos approach would just have three copies total (one per DC). But in reality one is highly unlikely to be happy with just three copies in a distributed system (for latency reasons, if nothing else, a single reboot on a machine due to a software or OS upgrade means a DC doesn’t have any local representatives). So what’s substantially more likely is that one will have either five Paxos instances (two DCs have two representatives and one DC having one, mostly for quorum reasons). So in practice the replication costs between the two solutions aren’t that different in pure theory and given the practical realities of the lower cost of storing data in a cloud providers table store the Wrapped Paxos approach is likely to be cheaper.
</div>
<div class="Indented">
Note however that this all assumes that it is practical to ’under provision’ the Wrapped Paxos Instances. In other words it’s o.k. that they don’t have copies in their caches of all data. It’s o.k. that the write capacity of the system is limited to the capacity of the under provisioned cluster, etc. If latency and throughput requirements prevent under provisioning the Wrapped Paxos Instances then there is likely no real cost savings.
</div>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.3">2.3</a> Increased availability
</h2>
<div class="Unindented">
All things being equal (and when are they ever that?) the Wrapped Paxos approach should be less available than the Native Paxos approach. The reason is that the Wrapped Paxos approach introduces a whole level of extra complexity - a full distributed (within a single DC anyway), fully replicated table store. This is an entire major subsystem whose problems will at best just kill all writes and can easily also kill all reads (if the Wrapped Paxos Instances are under provisioned). Now one of the advantages of a multi-DC approach is that if one DC’s table store is having a bad day at least the other DC’s table stores are hopefully still working (unless one has a poisoned value). But losing a full DC is likely to do some pretty bad things to latency and throughput thus reducing availability.
</div>
<div class="Indented">
How critical this factor is really depends upon the maturity and performance of the table store being used. If it’s known to be highly available and highly reliable then in practice this consideration may not necessarily apply.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-3">3</a> A non-issue - lowering latency
</h1>
<div class="Unindented">
In theory the Native Paxos Instance should be faster than the Wrapped Paxos Instance for writes in particular. After all the Native Paxos Instances only need to write to their local disk drives while the Wrapped Paxos Instances must go through a full write to the table store service which is almost certainly on a separate machine. In the worst case a write to the table store will require a full name resolution on the table store service to figure out what machine currently is the table store master for the desired value and then having to send a write to the table store master who then has to replicate it to its two slaves. But given the latencies involved with doing cross data center writes as part of the Paxos algorithm it isn’t clear if this extra overhead on writes really makes all that much of a difference.
</div>
<div class="Indented">
Reads may or may not be faster depending on the cost tradeoff made in deploying the Wrapped Paxos Instances. If low latency is a priority then each Wrapped Paxos Instance can have enough cache to hold all the values it is responsible for overseeing (even if they have to spill over to disk).
</div>
<div class="Indented">
So one suspects that in practice latency is not a compelling reason to choose one architecture over the other.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-4">4</a> Conclusion
</h1>
<div class="Unindented">
In the end the choice is not eternal, one should be able to switch from one architecture to the other. So one could start with a wrapped approach since the table store infrastructure might already be available and then switch to native if that should prove to have useful advantages. My general guess is that anyone who can’t under provision will probably want to run native mostly because it removes a whole layer (the table store service) of things to go wrong.
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/wrapped_vs_native_paxo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Average, percentiles and measuring service performance</title>
		<link>http://www.goland.org/average_percentile_services/</link>
		<comments>http://www.goland.org/average_percentile_services/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 21:34:40 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Internet Protocols]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=923</guid>
		<description><![CDATA[Measuring the performance of services is tricky. There is an almost irresistible desire to measure average performance. But measuring service performance using averages is pretty much guaranteed to provide misleading results. The best way (I know of anyway) to get accurate performance results when measuring service performance is to measure percentiles, not averages. So Do [...]]]></description>
			<content:encoded><![CDATA[<div class="abstract">
Measuring the performance of services is tricky. There is an almost irresistible desire to measure average performance. But measuring service performance using averages is pretty much guaranteed to provide misleading results. The best way (I know of anyway) to get accurate performance results when measuring service performance is to measure percentiles, not averages. So Do Not use averages or standard deviations, Do use percentiles. See below for the details.
</div>
<span id="more-923"></span>
<h1 class="Section">
<a class="toc" name="toc-Section-1">1</a> Averages for service performance are typically wrong 
</h1>
<div class="Unindented">
So let’s say you have your shiny new service and you want to know how its performing. You likely set up some kind of test bench and start firing off a bunch of requests to the service and record the latency and throughput for the requests over some period of time. But how should one present a summary of the performance data?
</div>
<div class="Indented">
In most cases what I see people do next is calculate average latency and average throughput. If they are particularly fancy they might even throw in standard deviations for both.

</div>
<div class="Indented">
Unfortunately in most cases both the average and the standard deviation don’t accurately represent the performance of the system.
</div>
<div class="Indented">
The reason for this is pretty straight forward - the average and standard deviation attempt to describe the characteristics of a set of data assuming they describe a normal curve. This is the famous bell shaped curve. But service performance is almost never normal. In fact performance distribution tends to be pretty flat for most requests and then fall off a cliff. this is not what you would call a normal distribution. There are lots of reasons for this. 
</div>
<div class="Indented">
For example, most services have some kind of caching and typically caching is a pretty good technique but cache misses are expensive. So while most requests will be serviced quickly out of a cache some number of requests will cause a cache miss and will be substantially more expensive to handle. This behavior isn’t really a curve, it’s more like a step function.
</div>
<div class="Indented">
Other reasons are queuing behavior. Most services can be thought of as a series of queues and so long as the load is within the queue’s capacity then everything is fine but once that capacity is exceeded then time outs, failures, etc. will start to happen since the system can’t recover until incoming requests fall enough to let the system catch up. So pretty much the system just shuts down until the queues are cleared.
</div>
<div class="Indented">
Now there are ways to torture normal distributions to get behavior that is closer to what one sees in service behavior. We can start to talk about Kurtosis, skew, etc. But these are all attempts to force the data to be summarized by some model and if that model isn’t appropriate to the data set then the information the model is giving is just plain wrong. So if one is going to start playing around with different types of distributions then this still means one has to collect the same kind of data that percentiles (discussed below) require in order to prove that the distribution is accurate. Well if one is going to do all the work to collect percentile data then why not just use the percentile data?
</div>
<div class="Indented">

But the punch line is this - characterizing a service’s performance across many requests using averages will almost certainly produce misleading data. So please, just say no to averages.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-2">2</a> Even small numbers of customers having a bad experience costs real money
</h1>
<div class="Unindented">
O.k. o.k. so averages are wrong. But they are really easy to calculate and as long as they ’close enough’ aren’t we happy? This is actually something that has been studied and the answer is - no. To help frame this discussion consider the following:
</div>
<div class="Indented">
<a class="URL" href="http://robotics.stanford.edu/~ronnyk/2007GuideControlledExperiments.pdf">Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO</a>– 2007 - 100 ms delays caused a 1% drop in sales at Amazon.
</div>
<div class="Indented">
<a class="URL" href="http://www.scribd.com/doc/16877297/Performance-Related-Changes-and-their-User-Impact">Performance Related Changes and their User Impact</a>– 2009 – ½ second delays saw a loss of 1.2% of revenue/user. This went to 2.8% at 1 second and 4.3% at 2 seconds.
</div>

<div class="Indented">
<a class="URL" href="http://www.gomez.com/pdfs/wp_why_web_performance_matters.pdf">Why Web Performance Matters: Is Your Site Driving Customers Away?</a>– 2010 – Between 2 – 4 seconds 8% of users abandon the site, from 2 -6 seconds it’s 25% and by 10 seconds it’s 38%.
</div>
<div class="Indented">
With this data in mind let’s go back to look at those averages. If the average is say a 50 ms delay shouldn’t everyone be happy? Well not if say 20% of users are seeing 100 ms plus latencies. This wouldn’t show in the average and the standard deviation is largely meaningless anyway since it’s describing probability for the wrong curve. 
</div>
<div class="Indented">
In that case the 20% of users seeing 100 ms plus latencies, using the Amazon number, 20% * 1% = 0.2% of sales just walked out the door. That isn’t a healthy way to run a business. In fact major service companies measure the experience of their users up to 99.9% (as will be explored below) because bad experiences for even small numbers of users have significant financial consequences.
</div>
<div class="Indented">
Put another way, it’s cheaper to create systems that have predicable performance into the 3 9s than to lose sales caused by bad performance at the end of the performance curve.
</div>
<h1 class="Section">
<a class="toc" name="toc-Section-3">3</a> Percentiles
</h1>

<div class="Unindented">
So typically the way we will accurately represent system performance is using percentiles. The idea behind percentiles is pretty straight forward - what percentages of users had a particular experience?
</div>
<div class="Indented">
Imagine, for example, that we ran a test 10 times and the latencies we got back were 1, 2, 1, 4, 50, 30, 1, 3, 2 &amp; 1 ms. The first thing we would do is order the latencies from smallest to largest - 1, 1, 1, 1, 2, 2, 3, 4, 30, 50.
</div>
<div class="Indented">
The median or 50th percentile is the best latency that 50% of the requests experienced. In this case we would count 1/2 the results or 10/2 = 5 results and that is the median. In this case it’s 2 ms. So this means that 50% of the requests had a latency of 2ms or better.
</div>
<div class="Indented">
The 90th percentile is the best latency seen by 90% of the requests. In this case that’s the 9th result (0.9 * 10 = 9) which is 30 ms.
</div>
<div class="Indented">
Typically the results will show a graph from 1 percentile through 90% in increments of 1% followed by 99.9% or higher if appropriate. In most cases the results have to be shown on a logarithmic scale to be easily viewable.
</div>

<div class="Indented">
Throughput is measured in a similar way. The big difference is that for throughput one is measuring the number of requests completed over some window of time. 1 second is a pretty typical window. I usually just measure how many requests completed during a particular window.
</div>]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/average_percentile_services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Distributed Storage Reading List</title>
		<link>http://www.goland.org/distributed_storage_reading/</link>
		<comments>http://www.goland.org/distributed_storage_reading/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 23:26:28 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Internet Protocols]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=910</guid>
		<description><![CDATA[My technical wanderings of late at Microsoft have taken me into the realm of massively distributed storage. Of course, I've been here before but this time I need to bring some other folks along. So I was asked to put together suggested readings to help people come up to speed. I thought the list might [...]]]></description>
			<content:encoded><![CDATA[<p>My technical wanderings of late at Microsoft have taken me into
the realm of massively distributed storage. Of course, I've been here
<a HREF="http://www.goland.org/whatiscosmos/">before</a> but this
time I need to bring some other folks along. So I was asked to put
together suggested readings to help people come up to speed. I
thought the list might be of general interest so I'm posting it here.
</p>
<p>What do you think? Is this a good list? A bad one? What would you
suggest?</p>
<span id="more-910"></span>
<h2 CLASS="western">Required Reading</h2>
<p>Google's Stack - <a HREF="http://labs.google.com/papers/gfs.html">GFS</a>,
<a HREF="http://labs.google.com/papers/bigtable.html">Big Table</a> &amp;
<a HREF="http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf">Megastore</a></p>
<p>An alternate way to building large scale stores without
consistency, worth having in mind - <a HREF="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf">Dynamo</a></p>
<p>Working in highly distributed storage systems almost certainly
means dealing with Paxos (or one of its variants) so - <a HREF="http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf">Paxos
made simple</a> (make sure to also check out my <a href="http://www.goland.org/state_diagrams_for_paxos_made_simple/" >companion
article</a> as well) &amp; an outstanding Google paper on Paxos called
<a HREF="http://labs.google.com/papers/paxos_made_live.html">Paxos
made live</a>.</p>
<h2 CLASS="western">Extra Credit</h2>
<p>I suspect that anyone who ends up working in this area for any
period of time will end up reading the following list of articles but
they aren't necessary from day one:</p>
<p>Wikipedia has a great <a HREF="http://en.wikipedia.org/wiki/Paxos_algorithm">summary</a>
of the various variants of Paxos that are running around.</p>
<p><a HREF="http://labs.google.com/papers/chubby.html">Chubby</a> is
Google's abstraction layer to make Paxos easier for developers to
handle.</p>
<p>A great summary of <a HREF="http://research.yahoo.com/node/3280">ZooKeeper</a>,
this is Hadoop's answer to Chubby.</p>
<p>A summary of <a HREF="http://research.yahoo.com/node/3274">ZAB</a>,
the 'Paxos Like' mechanism that is underneath ZooKeeper.</p>]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/distributed_storage_reading/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How do I securely wipe my hard drive?</title>
		<link>http://www.goland.org/wiping_data_from_drives/</link>
		<comments>http://www.goland.org/wiping_data_from_drives/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 22:32:54 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Home PC]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=907</guid>
		<description><![CDATA[Ever since Gutmann published his original paper [3] in 1996 there has been an assumption amongst security types that to ’securely’ delete a hard drive one had to overwrite it many times. While it’s not entirely clear that this claim was true when Gutmann made it, nevertheless, changes in magnetic hard drive technology appear to [...]]]></description>
			<content:encoded><![CDATA[<div class="Unindented">
Ever since Gutmann published his original paper <span class="bibcites">[<a class="bibliocite" name="cite-3" href="#biblio-3"><span class="bib-index">3</span></a>]</span> in 1996 there has been an assumption amongst security types that to ’securely’ delete a hard drive one had to overwrite it many times. While it’s not entirely clear that this claim was true when Gutmann made it, nevertheless, changes in magnetic hard drive technology appear to have made the need for multiple overwrites completely unnecessary. As explained in gory detail in <span class="bibcites">[<a class="bibliocite" name="cite-1" href="#biblio-1"><span class="bib-index">1</span></a>]</span> there is no economical way known to recover data that has been overwritten just once from a modern magnetic hard drive. So a single pass writing zeros should more than handle things.
</div>
<div class="Indented">
As explored in the Wikipedia article on <a class="URL" href="https://secure.wikimedia.org/wikipedia/en/wiki/Data_remanence">data remanence</a> it is possible for bad sectors to potentially be recovered because the zero pass wouldn’t touch them. But keep in mind that no matter how many times one wipes a drive those bad sectors won’t be written to. So if bad sectors are an issue then one will probably need to degauss, physically destroy the drive or use whole disk encryption.

</div>
<div class="Indented">
Where things get more fun is with solid state drives (SSDs). As explained in <span class="bibcites">[<a class="bibliocite" name="cite-2" href="#biblio-2"><span class="bib-index">2</span></a>]</span> there are real problems with securely deleting SSDs. Right now there is really no good way for a normal person (e.g. someone who isn’t a storage expert) to really know if they have successfully deleted everything off a SSD. Tricks like filling up the drive with data won’t work because the drives have more capacity than they advertise and since flash cells fade the drive may have taken cells (with data) offline. The drives do support entire disk delete commands but as <span class="bibcites">[<a class="bibliocite" name="cite-2" href="#biblio-2"><span class="bib-index">2</span></a>]</span> points out, those commands aren’t always appropriately implemented. Overwriting sometimes works but sometimes not and using a pattern of zeros is particularly problematic because some SSDs compress contents.
</div>
<div class="Indented">
So if one wants to securely dispose of a SSD I suspect the only reasonable approach is software based disk encryption. Yes, some SSDs do implement hardware level encryption but given the lack of easy validation of the logic and updates when there are issues I wouldn’t personally trust that approach. Of course this reduces the security of a ’wipe’ to someone not being able to crack the password on the key file stored in the SSD. Personally I’d still do two wipes, at least one using some kind of random data, before disposing of a SSD just to be especially paranoid. Perhaps the only reasonable alternative with SSDs is physical destruction.
</div>

<div class="Indented">
<h1 class="biblio">
References
</h1>
<p class="biblio">
<span class="entry">[<a class="biblioentry" name="biblio-1"><span class="bib-index">1</span></a>] </span> <span class="bib-authors">Dave Kleiman Craig Wright</span>. <span class="bib-title">Overwriting Hard Drive Data: The Great Wiping Controversy</span>. <span class="bib-year">2008</span>. URL <a href="http://www.vidarholen.net/~vidar/overwriting_hard_drive_data.pdf"><span class="bib-url">http://www.vidarholen.net/~vidar/overwriting_hard_drive_data.pdf</span></a>.

</p>
<p class="biblio">
<span class="entry">[<a class="biblioentry" name="biblio-2"><span class="bib-index">2</span></a>] </span> <span class="bib-authors">Laura M. Grupp Michael Wei</span>. <span class="bib-title">Reliably Erasing Data From Flash-Based Solid State Drives</span>. <span class="bib-year">2011</span>. URL <a href="http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf"><span class="bib-url">http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf</span></a>.
</p>
<p class="biblio">
<span class="entry">[<a class="biblioentry" name="biblio-3"><span class="bib-index">3</span></a>] </span> <span class="bib-authors">Peter Gutmann</span>. <span class="bib-title">Secure Deletion of Data from Magnetic and Solid-State Memory</span>. <span class="bib-year">1996</span>. URL <a href="http://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html"><span class="bib-url">http://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html</span></a>.

</p>

</div>]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/wiping_data_from_drives/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sharing sparse disk image bundles across OS X machines</title>
		<link>http://www.goland.org/shared_sparse_bundle/</link>
		<comments>http://www.goland.org/shared_sparse_bundle/#comments</comments>
		<pubDate>Mon, 10 Jan 2011 00:28:10 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Home PC]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=859</guid>
		<description><![CDATA[Normally using my Mac is a simple joy. But recently I created a sparse disk image bundle on my main OS X box and wanted to share it with other OS X boxes. This is quite possible but requires some very arcane commands to make work. I explore those commands below. My goal was to [...]]]></description>
			<content:encoded><![CDATA[<p>Normally using my Mac is a simple joy. But recently I created a
sparse disk image bundle on my main OS X box and wanted to share it
with other OS X boxes. This is quite possible but requires some very
arcane commands to make work. I explore those commands below.</p>
<span id="more-859"></span>
<p>My goal was to create a sparse disk image bundle file that I could
share out from my machine and access from other machines. But every
time I tried to load the bundle on a different machine I would get
read only access. When I checked the status of the images in the Disk
Utility their Disk Write Status would always show up as Read Only and
if I used Get Info from the explorer to check the files inside they
would show up as 'custom access'. This applied even if I had a brand
new bundle with &quot;Ignore ownership on this volume&quot; set. I
even set the permissions specifically to allow everybody to
read/write the files and it didn't matter.</p>
<p>I then found this <a HREF="http://discussions.apple.com/message.jspa?messageID=12296068">discussion</a>,
specifically the post from user KJK555 on 9/11/2010 at 11:55 PM. In
it he outlines a set of commands to give in order to fix the problem.
His commands are (reworked a little bit):</p>
<ol>
	<li><p>Unmount the sparse bundle (which we'll call
	bundle.sparsebundle in the directory /Path/To)</p>
	</li><li><p>sudo chown -R root:admin /Path/To/bundle.sparsebundle</p>
	</li><li><p>sudo chmod -R =rw,+X,g=u,o=u /Path/To/bundle.sparsebundle</p>
</li></ol>
<p>These commands are applied to the sparse bundle, not what's in the
sparse bundle. This is an important difference. The spare bundle
itself is actually a directory that contains a bunch of files called
bands. These bands are where the contents of the data inside the
sparse bundle are kept. If the mounting machine doesn't have the
right permissions for these files then presumably write access isn't
possible.</p>
<p>Command 2 recursively changes the ownership of the sparse bundle
and all of its contents to the user root and the group admin. Command
3 recursively changes the permissions on the spare bundle and all of
its contents. The command, I believe, says something like &quot;reset
the user's permission bits to read/write and then add in
execute/search rights, then set the group and others permissions to
be the same as the user's&quot;. The previous description presumes
the user is familiar with UNIX permissions and user/group/other.</p>
<p>The instructions then say:</p>
<ol>
	<li><p>Mount the sparse bundle (we'll assume it's mounted to
	/Volumes/Bundle)</p>
	</li><li><p>sudo chown root:admin /Volumes/Bundle</p>
	</li><li><p>sudo chmod 1777 /Volumes/Bundle</p>
</li></ol>
<p>These commands apply to what's inside of the bundle. Command 2
just gives ownership of the root of the contents of the bundle to the
user root in the group admin. It doesn't do so recursively however..
Command 3 also only applies to the root directory. The command 1777
is an octal encoding of UNIX permission bits. The '1' says that the
directory should be 'sticky' which essentially means it can only be
deleted either by the owner of the directory or by someone who has
write access to it. 777 means give read/write/execute permissions to
owner/group/other.</p>
<p>One can reasonably argue this is all fairly bad. The reason is
that essentially we making the spare bundle read/writeable to
everybody. In practice it isn't quite that simple. In my case, for
example, only machines that have been given explicit permission to
access my shared files directory can even get to the bundle. Second,
the bundle itself is encrypted and permission or no the only way to
access the files is with the password. In theory I should probably
follow KJK555's advice from 9/19/2010 at 2:34 PM where he shows how
to set the permissions to the sparse files to a named user instead of
to everyone. But simplicity is a virtue and the permissions above are
pretty robust and given the other protections in place I suspect I
can live with the previous steps.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/shared_sparse_bundle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>User IDs &#8211; managing the mark of Cain</title>
		<link>http://www.goland.org/user_id_in_uri/</link>
		<comments>http://www.goland.org/user_id_in_uri/#comments</comments>
		<pubDate>Thu, 21 Oct 2010 21:31:51 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[SOA/Web/Etc.]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=842</guid>
		<description><![CDATA[Facebook’s latest privacy debacle was driven by their failure to properly manage user IDs. This is not a new problem area and as the EFF points out, Facebook has done this before. So while I don’t know if Facebook will be interested in this post, those who care about protecting their user’s privacy in an [...]]]></description>
			<content:encoded><![CDATA[<p>Facebook’s latest <a class="URL" href="http://online.wsj.com/article/SB10001424052702304772804575558484075236968.html">privacy debacle</a> was driven by their failure to properly manage user IDs. This is not a new problem area and as the EFF <a class="URL" href="https://www.eff.org/deeplinks/2010/10/facebooks-broken-promises-facebook-apps-leaking">points out</a>, Facebook has done this <a class="URL" href="http://online.wsj.com/article/SB10001424052748704513104575256701215465596.html">before</a>. So while I don’t know if Facebook will be interested in this post, those who care about protecting their user’s privacy in an age of data sharing may want to have a look at the threats and defenses needed to share user IDs across sites. Securing user IDs isn’t easy. 
</p>
<p>[Update 10/22/2010: Changed the title and intro and added three new sections at the end.]</p>
<span id="more-842"></span>
<div class="fulltoc">
<div class="tocheader">
<h1>Table of Contents</h1>

<div class="tocindent">
<div class="toc">
<p>
<a class="Link" href="#toc-Section-1">Section 1: Where should user IDs be put? URIs?</a>
</p>
<p>
<div class="toc">
<a class="Link" href="#toc-Section-2">Section 2: Referer [sic] Header </a>
</div></p>
<p>
<div class="toc">
<a class="Link" href="#toc-Section-3">Section 3: Man-in-the-Middle (MITM)</a>
</div></p>
<div class="toc">
<p>
<a class="Link" href="#toc-Section-4">Section 4: Browser History </a>
</p>
<p>
<div class="toc">
<a class="Link" href="#toc-Section-5">Section 5: Browser Links</a>
</div></p>
<p>
<a class="Link" href="#toc-Section-6">Section 6: Tracking Users Across Sites</a>
</p>

<p><div class="toc">
<a class="Link" href="#toc-Section-7">Section 7: Triangulating Services</a>
</div></p>
<p>
<div class="toc">
<a class="Link" href="#toc-Section-8">Section 8: Blowing user anonymity</a>
</div></p>
<p>
<div class="toc">
<a class="Link" href="#toc-Section-9">Section 9: A side note on implementing secure IDs</a>
</div></p>



<h1 class="Section">
<a class="toc" name="toc-Section-1">1</a> Where should user IDs be put? URIs?
</h1>
 <p>
pasta.example.com lets people post their favorite Pasta types. Joe has an account on pasta.example.com. What should the URI to his favorite Pasta shape be? It could be http://pasta.example.com/favorites/shape. The website knows to pull up Joe’s data because Joe’s ID is in a cookie or maybe an authentication token.
</p>
<p>

In that case however if Joe wants to send his friend a link to his favorite pasta he can’t. Because when his friend tries to click on the URI pasta.example.com doesn’t know which user’s data to pull up. By putting Joe’s identity into the URI, e.g. http://pasta.example.com/users/joe/favorites/shape now Joe can share the URI around and his friends can see Joe’s favorite pasta shape.
</p>
<p>
The point is that by putting user IDs into URIs we make it possible for services to reason about multiple users and understand how to pull up their data. So, for example, if a recipe service wants to show Jane who Joe and Jake’s favorite pasta shapes are the recipe service can just record http://pasta.example.com/users/joe/favorites/shape and http://pasta.example.com/users/jake/favorites/shape. If the user IDs weren’t in the URI then the recipe service would need to understand what magic the pasta site was using to identify users (e.g. cookies, auth tokens, etc.) and learn that special magic.
</p>
<p>
By putting the User ID into the URI normal HTTP logic just works.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-2">2</a> Referer [sic] Header 
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.1">2.1</a> Attack
</h2>
 <p>

Let’s say that Joe is accessing his medical records at https://cancer.example.com/users/joe. On that page is a link to an outside service with some relevant information. Joe clicks on that link and is taken to that outside site. In theory nobody needs to know where Joe came from but in reality browsers send referer [sic] headers that list the site the user came from. This HTTP header will contain the URI https://cancer.example.com/users/joe. This exposes Joe’s identity and his association with cancer.example.com. If Joe’s identity wasn’t in the URI then that information wouldn’t be leaked.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-2.2">2.2</a> Defense
</h2>
 <p>
This attack is as old as the hills but as the recent Facebook referer leak shows, that doesn’t mean people don’t still screw it up. The ’solution’ is that external links must come from a page that contains no user identifying information in the URI. There are a couple of ways of doing this but given the Facebook leak my guess is that oceans of technical ink will now be spilled on various tricks to avoid this problem so I don’t see the need to add to it. I couldn’t actually find a really solid article on the various ways to defuse the referer [sic] leak so if anyone has a good URI please let me know so I can add it here.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-3">3</a> Man-in-the-Middle (MITM)
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-3.1">3.1</a> Attack
</h2>

 <p>
A user may log into a website security over HTTPS (thus hiding their login and their identity) but then be switched to HTTP for the rest of the interaction. This would let an observer see the URIs going back and forth and see that the person talking to cancer.example.com is Joe.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-3.2">3.2</a> Defense
</h2>
 <p>
SSL protects everything from the TCP payload on up. So if SSL is used for all communications, not just login, then all URIs will be protected against eavesdroppers. For any non-trivial data we should be using SSL to protect it.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-4">4</a> Browser History 
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-4.1">4.1</a> Attack

</h2>
 <p>
Browsers keep track of what URIs a user has visited, even if those URIs are over HTTPS. If the user’s identity is in the URIs then this opens up various attack vectors. One is simply accessing the machine after the user has left and viewing the history. Even if the cookies and caches have been cleared history tends to stick around and so the attacker can see what sites the user has to been to and what identities were used just by seeing the browser history.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-4.2">4.2</a> Defense
</h2>
 <p>
The core of this attack is that an attacker basically takes control of the browser and can view the contents of the browser history at will. This is a fairly scary situation in and of itself. In the case that users are only expected to use the service from their own devices then this attack basically means that an attacker owns the browser and so can do much worse things than just look at history.
</p>
<p>
But what about scenarios where a user may access the site from someone else’s device such as a kiosk? In this case we need to think carefully about the threat model. If we assume the machine is compromised before the user shows up then then user’s identity is already compromised no matter what we do with URIs. So the only scenario where this attack really matters is one where the user is expected to use machines they don’t control, the machine they use is expected to generally be clean and only later on is the machine attacked and the user’s data compromised.
</p>
<p>
If we think that scenario is realistic (I don’t btw, I suspect in most cases if a machine is going to be compromised it happened before the user got there) then we need to rethink the whole design of the website. We will certainly need to run everything over SSL, ideally use something like Silverlight with heavily dynamic content and keep user’s IDs out of URIs.
</p>

<h1 class="Section">
<a class="toc" name="toc-Section-5">5</a> Browser Links
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-5.1">5.1</a> Attack
</h2>
 <p>
From Javascript one can choose what color visited links versus non-visited links will have. The attack then is to create a hidden IFrame, set the color of visited versus non-visited links to known values and then start sending down lots of anchor tags in HTML in the IFrame and use Javascript to examine the page as it loads to see what color the browser assigned to the links. This provides an oracle that can tell where the user has been. So, for example, if an attacker suspects that the user is Joe and Joe has an account at cancer.example.com then the attack can (once they fool Joe into visiting their site) see if Joe has ever been to https://cancer.example.com/users/joe/mainpage.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-5.2">5.2</a> Defense
</h2>
 <p>
This one is just plain nasty. It’s a security hole plain and simple. But it’s there and we can’t ignore it. So we need to threat model it. The attack requires one to guess the exact URL, down to the last character, for it to work. In other words one can’t just say "Hey has the user been to any site that begins with https://cancer.example.com/?". Instead one has to have the exact URL with the user ID and all other path values. So for the attack to be launched the attacker has to know both which site they are trying to attack and which specific user they are looking for.

</p>
<p>
How to address this attack depends on one’s threat profile. In other words, a fairly small service dealing with fairly harmless information can probably ignore this attack. A major site dealing with secret data needs to be extremely concerned.
</p>
<p>
Unfortunately there really isn’t much one can do about this attack. For example, let’s say that a user goes to a bad guy’s site and is fooled into logging in. So the bad guy knows who the user is. Now the bad guy site sends down a bunch of test links and sees that https://cancer.example.com/mainpage was clicked on. Even though the user’s ID isn’t in the link, it doesn’t matter, the bad guy site knows the user has gone there.
</p>
<p>
The next step in the attack is for the bad guy to figure out what ID the user uses on cancer.example.com. This is where some defenses can come into play. One game one can play is to put cryptographically secure random numbers into all the URLs as query components. This will kill the link query attack dead since the attacker can’t guess those numbers and so can never provide a test URL that will match against a logged in URL. In other words if the user’s logged in URL is https://cancer.example.com/users/joe?id=[cryptographically secure random number] then the link guessing attack just fails.
</p>
<p>
Of course if Website A is going to legitimately redirect the user to cancer.example.com and the redirect URL doesn’t include some cryptographically secure random number then the protection fails. So where cross site redirects involving URLs pointing directly at the user’s ID are involved this is a fairly tenuous mechanism. Thankfully this is a reasonably rare scenario. Typically when sites legitimately share user specific data they do so on the back end.
</p>
<p>
Another technique when dealing with data that isn’t intended to be public is to use a cryptographically secure random number as the user’s identifier. This isn’t a foolproof technique since referer [sic] could still leak the identifier. But now the attack becomes substantially more complex since the attacker needs to be one of the referrers or have a relationship with one of the referrers. The attack is still possible, just more expense.
</p>
<p>

One can up the ante even more by taking the real user ID and encrypted it with a known key. This keeps the conceptual model server side simple (decrypt, use ID) and provides the ability to change the visible ID as often as one wants. This approach can even work with partners if a shared key is used. In other words website A can redirect the user to website B with the user ID included in the URL as an encrypted value using a key shared by the websites. 
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-6">6</a> Tracking Users Across Sites
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-6.1">6.1</a> Attack
</h2>
 <p>
There are plenty of scenarios where a user authorizes another site to learn something about them. So Joe, in our previous example, might want cancer.example.com to share data about him with white.cell.tracker.example.com and maybe discount.drug.example.com. In both cases cancer.example.com will have to provide links to the two other sites and those links will contain Joe’s identity. This will then let the two sites exchange notes and realize they are both dealing with Joe. This allows everything from unwanted marketing to more nefarious scenarios.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-6.2">6.2</a> Defense
</h2>

 <p>
If two websites want to conspire to track a user there is literally nothing the user can do to stop it. An excellent proof of this is <a class="URL" href="http://xauth.org">xauth.org</a>. This is a group of popular sites (including my current employer) who publish information about their users to a central site which can then track the user across those sites. Of course the sites promises to only share identity data where appropriate given the user’s approval. No, really, they promise. That creating xauth.org isn’t a criminal conspiracy helps to illuminate just how backwards our laws are when it comes to electronic privacy.
</p>
<p>
But in any case, xauth.org is perfect proof that if sites want to conspire to rob you of your privacy then there is absolutely nothing you can do about it. You will just have to blindly hope that the sites don’t decide to go rogue. Of course you have no say in the matter.
</p>
<p>
That having been said it’s one thing for a group like xauth.org to exist ahead of time. That really can’t be defended against if the attackers are sufficient determined. It’s another for a group of sites who weren’t previously cooperating to decide to cooperate in the future and to be able to share their old logs and compare users.
</p>
<p>
For example, site A shares information about Joe with site B and C. In both cases site A identified Joe by using the user ID "Joe". If, in the future, site B and C want to conspire to track users across them (this can be extremely profitable since it allows for targeted advertising) then they just look for common IDs from site A.
</p>
<p>
To prevent that specific kind of retroactive data sharing one can use what are called pairwise unique IDs. That is, when site A identifies Joe to site B they can use a different identifier then they use with site C. Pairwise Unique IDs can be generated either using encryption (e.g. take the user’s actual ID and the site’s ID and encrypt them together) or by look up tables. Strictly speaking look up tables are probably more secure since encryption keys can eventually be broken. But they are more expensive to implement. Check with your local crypto guru to make the proper trade off for your situation.
</p>

<h1 class="Section">
<a class="toc" name="toc-Section-7">7</a> Triangulating Services
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-7.1">7.1</a> Attack
</h2>
 <p>
Let’s tell what I’m sure is a completely hypothetical story (cough). Imagine, if you will, a service that lets people create their own websites. In the URL for each user’s website is a unique ID used to identify the owner of the website. The ID is just a jumble of numbers and there is no official mechanism to translate from the unique ID to something readily identifiable like an e-mail address or name.
</p>
<p>
Normally websites created with this service will display the person’s name, e-mail, etc. Thus, of course, providing a way to map from the supposedly anonymous unique ID to the person’s identity, but believe it or not, that isn’t the hole in this case. But the service decides to add the ability to create anonymous websites. What makes the websites anonymous is that they won’t automatically display the user’s name or e-mail. But their unique ID? It’s still in the path. But in theory this is ’o.k.’ because again, there is no ’official’ mechanism to go from the unique ID to more useful user identifiers.
</p>
<p>
However there was another, separate, service, that provided IM. This (hypothetical of course) service also used a unique ID for users. Now the IM service didn’t display the unique ID for a user but it was included in the (theoretically thoroughly hacked and well documented) protocol in messages send down to the IM client.
</p>

<p>
All would have been fine if it weren’t for one tiny little issue. Both the IM and website service used the same identity provider and so used the same unique ID for the same user. So if user A had an anonymous website and sent IM’s to user B then user B could pull out user A’s user ID from the IM client, do a quick Internet search, see if a website existed with the same user ID and if the website was anonymous now user A knew that user B was the owner of the website.
</p>
<p>
The issue here is ID triangulation. By exposing the same ID for the same user in two different contexts an attacker can triangulate and determine who the user is.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-7.2">7.2</a> Defense
</h2>
 <p>
We have already discussed encrypting IDs and using pairwise unique IDs. Now we need to throw in per service pairwise unique IDs. That way even if the same entity is accessing the same user’s data across two different services for the same user they will get two different IDs.
</p>
<p>
Still, one has to think carefully about going to this level of obfuscation. In many cases it’s a feature and not a bug that a service can track a user across multiple service front ends. So long as the service has been empowered to have this information (which may be by default in terms of public data) then there is no problem.
</p>
<p>

The problem comes when services are trying to hide this data. Note, btw, that at this point it doesn’t even matter if the ID is in the URL or not. We have taken the argument beyond that issue. If the ID is available via any mechanism to the caller (cookie, auth token, whatever) across two different services then this kind of triangulation is possible.
</p>
<p>
So the bottom line is - ID in the URL or not, if you are sharing an identity provider with anyone (even other services at your own company) then you have the triangulation problem and need to threat model it to decide if it requires remediation.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-8">8</a> Blowing user anonymity
</h1>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-8.1">8.1</a> Attack
</h2>
 <p>
Let’s say that there is a website called party.example.com. It tracks social and business events and can send out SMS messages notifying its users when an event interesting to them is going to happen. Sanford is something of a party animal but he doesn’t want his fellow CPAs to know that. To separate his two lives Sanford created two identifies, partyanimal@live.com and sanford@gmail.com.
</p>
<p>

Sanford logs into party.example.com using partyanimal@live.com and asks to receive SMS messages about the loudest most insane parties where he lives. As part of this process Sanford is redirected using OAuth to his carrier BigTelco where Sanford logs in with his account sanford@big.telco.com. All party.example.com knows is that Sanford uses BigTelco, it doesn’t know anything about his identity at BigTelco. After granting party.example.com permission to send him SMS messages, BigTelco sends Sanford back to party.example.com with a refresh token that include a BigTelco ID for Sanford.
</p>
<p>
Now Sanford again logs into party.example.com (using a different browser) but this time he logs in with the identity sanford@gmail.com. This time he subscribes to SMS notifications about business events in his area. Again Sanford is redirected to his carrier, BigTelco and again logs in as sanford@big.telco.com. At this point however the permission request is short circuited since party.example.com already has permission to send Sanford SMS messages. So BigTelco just sends Sanford automatically back to party.example.com with a refresh token containing BigTelco’s ID for Sanford.
</p>
<p>
The result is that party.example.com now has two different refresh tokens for two different accounts at party.example.com (partyanimal@live.com and sanford@gmail.com) which have the same BigTelco User ID. Bingo, party.example.com now knows that the same person owns both accounts, Sanford’s anonymity is lost.
</p>
<p>
This scenario involves OAuth but any situation where two websites are sharing data about a user will run into this issue. 
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-8.2">8.2</a> Defense
</h2>
 <p>
In the previous section we talked about having to make pairwise per service unique IDs for users in order to protect their privacy. In other words an ID that is unique per calling application, per service being called, per user. Now we add another dimension which requires unique IDs, per third party account. In other words in order to protect user’s privacy it’s necessary for third parties (like party.example.com) who are asking for a user ID to specify what user ID they know the user by. Then BigTelco needs to generate a new ID that was unique for the combination of [Big Telco User ID, Big Telco Service Type, calling partying ID, ID by which the calling party knows the Big Telco User]. Only by generating a unique user ID any time any of these four values change can Big Telco be sure it won’t blow their user’s attempts to protect their own privacy.

</p>
<p>
Note, btw, that party.example.com, when it shares the ID it knows Sanford by, should apply all the techniques mentioned in this paper as well. What’s good for the goose is good for the gander.
</p>
<p>
The core of protecting a user’s privacy in a scenario like this is a four column table: Internal User ID, Internal Service ID, Calling Party ID &amp; Calling Party’s ID for the User. Any time any of these values change in an interaction with an external party a new external user ID is needed.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-9">9</a> A side note on implementing secure IDs
</h1>
 <p>
Securing user IDs means constantly generating new IDs for users to be handed out to different parties in different circumstances. These IDs must later be traced back to an internal ID in a way that can’t be hacked by outsiders. There are two sort of obvious ways to do this.
</p>
<h2 class="Subsection">

<a class="toc" name="toc-Subsection-9.1">9.1</a> Encryption
</h2>
 <p>
The one that comes to mind first is encryption. Just take the four values discussed in the previous section, concatenate them, encrypt them and use that as the user ID. But there are several practical problems with using encryption this way. First, encryption keys need to be rolled over. So what happens to old IDs when a new key is in use? Typically it’s o.k. to use an old key for validation purposes for a while but eventually it needs to be retried. But then what? Will every user ID generated with the old key have to be rolled over? That isn’t going to be much fun. 
</p>
<p>
Also nothing ever gets forgotten on the Internet and some day our ’super duper strong cryptography’ will be ’weak cryptography’ that 12 year olds will amuse themselves by breaking on their neurally integrated processing implants. So by generating these IDs we create a situation where 10, 20, 30 years down the road the keys will be cracked and the secrets revealed. Having one’s past from decades gone by come back to haunt one isn’t a pleasant thought.
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-9.2">9.2</a> Tables v1
</h2>
 <p>
One could literally create a table with the four specified columns and a fifth column that contains a cryptographically secure random number to be used as the external user ID. When an ID is submitted it is looked up in the table to determine what user it maps to. By using a random number rather than encrypted content no secret information is allowed free onto the network. All the juicy stuff is strictly internal. This also means that user IDs once created for a particular quadruple doesn’t ever have to change.
</p>
<p>

Typically look ups on the table will go from user ID to the four columns which have to be matched to the current context to make sure the right ID is being used and to discover the desired internal user ID. 
</p>
<p>
But we will also need a reverse look up table if we are going to be nice and make sure that if someone asks us for a permission they already have then they will get the same ID they got last time. Building the reverse look up table isn’t much fun but it’s also par for the course for anyone who deals with denormalized databases. Alternative if one’s user base is small enough then a relationship database can be used which makes the reverse look up trivial. 
</p>
<h2 class="Subsection">
<a class="toc" name="toc-Subsection-9.3">9.3</a> Tables v2
</h2>
 <p>
If one is willing to be a little less friendly to forgetful developers then there is a way to simplify the table, get rid of the reverse look up and almost certainly increase over all security. The trick is that any time a partner asks for a permission about a user they will get a different user ID. This means that if a site forgets it has a permission and later remembers it will now have two different IDs. But this simplification means, amongst other things, that one doesn’t need an ID for the user from the partner. Instead any time a permission is asked for the user will again be prompted and a new ID will be generated.
</p>

</div></div></div></div></div>]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/user_id_in_uri/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>OAuth 2.0 Bearer tokens &#8211; unsafe at any speed?</title>
		<link>http://www.goland.org/are_bearer_tokens_unsafe_at_any_speed/</link>
		<comments>http://www.goland.org/are_bearer_tokens_unsafe_at_any_speed/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 01:01:42 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[Internet Protocols]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=805</guid>
		<description><![CDATA[Eran’s latest article raises a number of specific security threats by way of arguing that bearer tokens are irredeemably insecure. In this article I examine the attacks Eran calls out and demonstrate that they are already addressed by OAuth 2.0. Eran’s article does bring up the interesting question of - do we need defense in [...]]]></description>
			<content:encoded><![CDATA[<p>
Eran’s <a class="URL" href="http://hueniverse.com/2010/09/oauth-bearer-tokens-are-a-terrible-idea/?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Hueniverse+%28Hueniverse%29">latest article</a> raises a number of specific security threats by way of arguing that bearer tokens are irredeemably insecure. In this article I examine the attacks Eran calls out and demonstrate that they are already addressed by OAuth 2.0. Eran’s article does bring up the interesting question of - do we need defense in depth for the tamper resistance and confidentiality provided by SSL/TLS?
</p>
<span id="more-805"></span>
<h1 class="Section">
<a class="toc" name="toc-Section-1">1</a> The security threats Eran raises
</h1>
<p class="Unindented">
Below I list the specific security threats I found Eran raising in his article:
</p>
<p class="Description">
<span class="Description-entry">Developers turning off error checking in SSL</span> Some sites do not have up to date certificates and a few try to issue certificates that aren’t signed by a certifying authority trusted by the operating system. To deal with these errors client authors sometimes just turn off SSL cert error checking. Done in a blanket way this invalidates SSL security since any man in the middle can insert itself, send a bad cert and the client will accept it.
</p>
<p class="Description">
<span class="Description-entry">Bearer tokens can be replayed</span> Eran claims that if SSL is screwed up (such as above) then an attacker can take the bearer token and use it to make any request they want.
</p>
<p class="Description">
<span class="Description-entry">Typos</span> Eran claims that if the client developer puts in a typo (say http://foo.com instead of https://foo.com or https://boo.com instead of https://foo.com) then SSL’s security guarantees are compromised.
</p>
<p class="Description">
<span class="Description-entry">Client developers don’t do security</span> Eran believes that client developers can’t get security right and therefore must have libraries to protect them from themselves.
</p>
<p class="Unindented">
The first and last threats obviously contradict each other. Eran’s attack on SSL is about people using the libraries wrong and then he suggests that libraries can fix security? The reality is this - developers who do not pay attention to the security consideration sections and understand the security threats they are under will get it wrong. So either we educate developers or security will fail.
</p>
<p class="Indented">
That bearer tokens can be replayed is absolutely true but equally true is that properly designed bearer tokens significantly reduce the damage done. First, they are short lived. Second, they have an audience in them which in most interesting cases (as I discussed <a class="URL" href="http://www.goland.org/bearer-tokens-discovery-and-oauth-2-0/">previously</a>) kills replay attacks before they can even get started.
</p>
<p class="Indented">
The typo attack sounds the most scary. If a developer mistypes a single character the entire security of the system might be forfeit. But is this threat realistic? In OAuth 2.0 clients are required to go through a two step. In the first step they present their credentials to a token endpoint who issues an access token. In the second step they present the access token to an application endpoint to actually do something. 
</p>
<p class="Indented">
Let’s say the developer put in the wrong token endpoint but the right application endpoint. If so then the access token the wrong token endpoint produces won’t work on the right application endpoint and the client will fail.
</p>
<p class="Indented">
Let’s say the developer put in the right token endpoint but the wrong application endpoint. A properly designed OAuth token endpoint request includes the URL of the application endpoint the token is going to be used for. This allows the token endpoint to validate that this is a supported application endpoint. Typically the way systems I’m involved with handle this situation is by putting the base URL for requests into the scope field. But, so long as the token endpoint checks the application endpoint URL and sees that it isn’t a supported endpoint then no access token will be issued and no damage is done.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-2">2</a> Security is insurance, please don’t buy more than you need
</h1>
<p class="Unindented">
OAuth 2.0 depends on SSL/TLS to provide two key features - message tampering protection and confidentiality. If SSL/TLS is broken then one or both of those features will be lost. As explored in the appendix below mechanisms like OAuth 1.0’s signature protocol don’t really provide much defense in depth against SSL/TLS failures. So if we are going to get additional protection it’s really only going to be by essentially re-inventing SSL/TLS like capabilities somewhere else in the stack. This would most likely occur by introducing a generic mechanism to sign/encrypt HTTP messages.
</p>
<p class="Indented">
But inventing such a mechanism is a non-trivial endeavor. Just look at the complexity of SSL/TLS itself to get some idea of how hard getting a HTTP level message signing/encrypting mechanism right will be. So if we are to invent such a mechanism we need one heck of a good use case. I haven’t seen such a use case but if someone has one I’d love to see it because I have some ideas on how to implement HTTP message signing/encrypting.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-A">A</a> Appendix - A quick look at what OAuth 1.0 signatures buy in terms of security
</h1>
<p class="Unindented">
Let’s say that a developer has, as Eran realistically describes in his article, turned off SSL cert error checking. Let’s further assume that the developer is using OAuth 1.0 signatures. Finally let’s assume that a man-in-the-middle (MITM) attack is underway. As soon as the client tries to connect to the server the MITM will redirect the request to their own machine and present a bad cert which will be accepted because cert checking is off. Now let’s examine what the attacker can do even though OAuth 1.0 signatures are being used.
</p>
<p class="Indented">
OAuth 1.0’s security considerations section already points out two things the attacker can do. In section 4.2 it points out that the attacker can silently pass on requests and responses thus allowing them to eavesdrop. But even more fun is section 4.3 which points out that since responses aren’t signed the attacker can change the response to be anything they want opening up a Pandora’s box of security threats. Trying to check on the status of your web service via it’s OAuth protected management interface? The attacker can make it look like everything is fine with your service even as the attacker is taking it down. Looking for the storage location to upload your secret document? The attacker can re-write the response to your query for the directory location to point at a URL they control.
</p>
<p class="Indented">
Or check out the warning in section 3.4.1, the request body is only protected by the signature if it’s a HTML form. In other words if the request protocol is JSON, XML, etc. then the attacker can not only change the response, they can change the request too without any fear of detection.
</p>
<p class="Indented">
In addition, as a practical matter, an attacker can repeat the same request message multiple times if it wants to. OAuth 1.0 tries to prevent replays by using nonces. These are unique values generated by the client that the server is supposed to record for each and every request received from each and every client (at least until the time stamp in the message has passed). The idea is that before processing a request the server will check the nonce in the request and see if it has been seen before and if so will reject the request.
</p>
<p class="Indented">
In reality distributed systems will do no such thing. This is because keeping a database of nonces for every single request received from all clients is so expensive to implement and so hard to keep consistent (we run right into the CAP theorem) that in practice scalable systems just won’t do it. Instead what they will do is check the time stamp and that is it. So as a practical matter attackers can in fact replay requests.
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/are_bearer_tokens_unsafe_at_any_speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bearer Tokens, Discovery and OAuth 2.0</title>
		<link>http://www.goland.org/bearer-tokens-discovery-and-oauth-2-0/</link>
		<comments>http://www.goland.org/bearer-tokens-discovery-and-oauth-2-0/#comments</comments>
		<pubDate>Thu, 23 Sep 2010 20:46:55 +0000</pubDate>
		<dc:creator>Administrator</dc:creator>
				<category><![CDATA[SOA/Web/Etc.]]></category>

		<guid isPermaLink="false">http://www.goland.org/?p=794</guid>
		<description><![CDATA[Part of my day job is working on adding discovery to OAuth 2.0. This article provides a summary of some of that work. So I was more than a little concerned when I saw a blog article from Eran Hammer-Lahav, the editor of OAuth 2.0, asserting that OAuth 2.0 couldn’t support secure discovery. Very worried [...]]]></description>
			<content:encoded><![CDATA[<p>Part of my day job is working on adding discovery to OAuth 2.0. <a class="URL" href="http://www.goland.org/oauthgenericdelegation/">This article</a> provides a summary of some of that work. So I was more than a little concerned when I saw a <a class="URL" href="http://hueniverse.com/2010/09/oauth-2-0-without-signatures-is-bad-for-the-web/">blog article</a> from Eran Hammer-Lahav, the editor of OAuth 2.0, asserting that OAuth 2.0 couldn’t support secure discovery. Very worried that something was terribly wrong I carefully read Eran’s article. I summarize below what I believe his concerns are and explain how I believe those concerns would be addressed by extensions to OAuth 2.0 to support discovery. I also explain how Eran’s article helped me find a flaw in my own proposal and how I propose fixing that flaw.</p>
<span id="more-794"></span>
<h1 class="Section">
<a class="toc" name="toc-Section-1">1</a> The replay attack - advertising the wrong token endpoint but the right protected resource
</h1>
<p>
The first part of Eran’s article deals with a general critique of bearer tokens. The issues raised are all well known and equally well understood to not apply to the scenarios that core OAuth 2.0 addresses. And, in fact, half way through his article in the section ”Why None of this Matters Today” Eran agrees that his critiques don’t really apply to the core OAuth 2.0 use cases. However it is right after that section that Eran gets to what seems to be really bothering him - his contention that the use of bearer tokens with discovery will make clients susceptible to replay attacks.
</p>
<p>
He never gave a detailed example but I believe something like the following captures his concern. Let’s imagine a protected resource, we’ll call it https://evil.example.com, advertises that its token endpoint is Google. So the client should go to Google, get an access token and then give it to https://evil.example.com. Evil.example.com will then turn around and replay the token to other services thus successfully impersonating the client. What makes the attack possible is that in a discovery based scenario a client has to discover both the protected resource endpoint and the token endpoint. This provides an opportunity for a bad service to lie about its token endpoint. In this case the lie would give the attacker a Google access token.
</p>
<p>
In practice however I don’t believe this attack will work. The reason is that in discovery based systems (such as WS-Federation, SAML-P, etc.) one of the mandatory arguments is ’audience’. Audience defines the protected resource a requested token is going to be used with. Both the request for the access token and the access token itself will contain the audience value it is targeted at. The use of audience provides two levels of protection.
</p>
<p>
First, when the fooled client goes to Google’s token endpoint to ask for an access token it will have to specify that it intends to use the access token with the protected resource https://evil.example.com. The logic here is trivial, the client is to take the location it will send the access token to and slap that in its request to the token endpoint. Right away Google will see that https://evil.example.com is not one of its supported endpoints and so will reject the access token request.
</p>
<p>
Even if the attack somehow worked and Google issued the access token, the token would still include the audience it was intended for, https://evil.example.com. So if evil.example.com tries to replay the token somewhere else the replay will fail because the protected resource it gives the token to will see that the audience value isn’t addressed to it.
</p>
<p>
There is no magic here btw. In essence the inclusion of audience in a signed token plays the same role that signatures played in OAuth 1.0a. It associates the target of a request with the request itself and so prevents replay attacks.
</p>
<p>
Thus to make OAuth 2.0 useful for discovery based contexts we have to define how to submit the protected resource’s URI in an access token request as well as require that the produced access token include that URI as an audience claim. This is exactly the sort of proposals I and others are working on in order to enable OAuth 2.0 to support discovery.
</p>
<h1 class="Section">
<a class="toc" name="toc-Section-2">2</a> Reversing the attack - advertising the right token endpoint and the wrong protected resource
</h1>
<p>
While I was reviewing Eran’s proposed attack I wondered what would happen if we reversed the attack. In the reverse scenario an evil doer is targeting users of calendar.live.com and launches a phishing attack on them. Thanks to the phishing attack a user of calendar.live.com is fooled into asking their local calendar client to do discovery on https://calendar.evil.example.com. The returned discovery document says that its calendar endpoint (i.e. the protected resource) is https://calendar.live.com and its token endpoint is https://sts.evil.example.com. This is the reverse of the previous attack, in this case the token endpoint being advertised is correct (e.g. it’s the one owned by evil.example.com), it’s the protected resource location that is false.
</p>
<p>
The user uses their Google identity to log into calendar.live.com (a man can dream, can’t he?) and so the client will present the user’s Google credentials (or a token representing the same, it doesn’t matter) to Google’s token endpoint to get what’s called an ’on-behalf-of’ token targeted at https://calendar.live.com. 
</p>
<p>
This part of the attack will work just fine because in a discovery based world Google is used to issuing on-behalf-of tokens for audiences it knows nothing about. So Google will happily produce the on-behalf-of token with an audience of https://calendar.live.com which the client will then send to the advertised token endpoint, https://sts.evil.example.com. And now the trap is sprung, evil.example.com now has a 100% genuine Google signed and issued ’on-behalf-of’ token that it can replay to calendar.live.com’s real token endpoint and so get an access token for https://calendar.live.com and do whatever it wants to the user’s account.
</p>
<p>
The way to prevent the attack is to require that the request to Google and the issued on-behalf-of token contain both the protected resource address and the token endpoint address. Now if evil.example.com tries to replay the on-behalf-of token to Live’s token endpoint the token will be rejected because while its audience value is good (e.g. it’s for a protected resource the token endpoint belongs to) the token endpoint value won’t match and so the request will be rejected.
</p>
<p>
Much thanks to Eran for helping to unearth this attack.
</p>]]></content:encoded>
			<wfw:commentRss>http://www.goland.org/bearer-tokens-discovery-and-oauth-2-0/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

