Stuff Yaron Finds Interesting

Technology, Politics, Food, Finance, etc.

Portable Reputations – Reputations Want to Be Free!

Imagine if you could find a random store on the Internet and get reputation data about it regardless of where it had previously sold goods. Maybe it has a great reputation on Yahoo Shopping but a lousy reputation on EBay? Maybe it has an outstanding reputation on MySimon but hasn't really established itself yet on MSN Shopping where you happened to find it? Shouldn't we be able to find all the reputation data without having to be held hostage to some reputation intermediary? The same applies to reputations about people. Imagine if a website with a comment section could collect reputation data about a user from every site they have posted at and use that to figure out how much moderation, if any, they should apply to that person's posts? Why should someone have to start building their reputation from scratch on every website they go to? It's out reputation data, it's about us or its generated by us and it's bloody time we set it free!

What is reputation?

In the simplest sense reputation is an assertion by one party about some other party. X says Y is Z. A classic example would be Standards & Poors (S&P). When S&P gives a credit rating to some company they are saying "S&P says Company Y has credit rating Z". But reputation on the Internet tends to be a little fancier than this.

For example, let's pretend that we create a new reputation system where when someone wins an EBay auction and purchases an item the buyer will post a publicly available statement that they have digitally signed stating that they bought an item from the seller S and how they rated the seller. Of course the first thing black hats (read: bad people) will do is create a ton of these assertions. Since the average person evaluating seller S doesn't know who the buyers are anyway how would they know real from fake buyer feedback?

The short answer to this question is - this kind of fraud actually occurs on EBay. People set up fake accounts, generate a lot of purchase activity with them, use them to run up their reputation scores and then use their inflated reputations to steal money. But doing all of this takes quite a bit of effort so in reality EBay adds real value because in its reputation system there are actually two assertions being made, not one. The first assertion is, in effect, "Buyer B says that Seller S did a good Sale for transaction T". The second assertion is "EBay says that Buyer B and Seller S engaged in transaction T". Because the auction had to go through EBay, EBay can actually add value to the reputation system by validating that the discussed transaction did appear to occur. Yes, the transaction could still be fradulent, but by inserting itself into the transaction EBay makes fraud harder and thus increases the value of reputation assertions based on transactions it mediated.

How do we free reputations?

It's tempting to argue that we should come up with some system whereby a random person on the Internet can go to any random reputation intermediary and say "Tell me your reputation score in regards to quality X for entity Z" where"quality X" could be Karma, feedback score, positive dating reviews, whatever and where entity Z could be website, person, movie, etc. But I think this misses the point. First, it puts us at the mercy of the reputation intermediary. What happens if that intermediary decides to hold back information or only publish information for people whose accounts are paid up or if the reputation intermediary just decides to be 'slow' that day to hurt its members who are trying to use their reputations somewhere else? Second, it restricts our use of our own reputations to the things that the reputation intermediary decides to expose.

For example, let's pretend that Slashdot decided to expose an interface such as described above. Now a Slashdot user goes to an OS X fan website saying "Let me post, here's my Slashdot Karma score". But it turns out that the person earned their Karma by moderating and posting on Physics related issues. Is that really the best way to measure how good a poster they would be on OS X? Or imagine that a user is buying a book from seller S and goes to Yahoo and says "tell me about S". It turns out that S actually has a great score, except, oh, that's right, they got a bunch of negative reviews when they sold books, it was everything else that they did well on. With a single score from Yahoo it would be difficult for the buyer to figure this out.

What we really want is to decouple the reputation information from the reputation intermediary. We want each assertion to be available, in its raw form, directly to the person seeking reputation information. That way the person can mine the reputation data for exactly what they need to know, not what the reputation intermediary decides to give them.

This calls for a system where reputation assertions are made publicly available. They can be published all over the place. It doesn't matter where they are hosted or in how many different places they are hosted because they are self contained digitally signed values.

How do we identify who the players are in an assertion?

On EBay we know who the buyer and seller are via their EBay IDs. On Slashdot by their Slashdot IDs. But if we are going to have an open reputation system then we need an open ID. Or, I should say, OpenID. Using OpenID we can give buyers, sellers, people, etc. a globally unique ID to track them with so we aren't held hostage to any particular website's ID system. And we can identify companies the old fashioned way - DNS.

How do people find reputation data?

In theory every site or person that generates reputation assertions can just publish them anywhere they want on the web and other folks can find them using Internet search engines. But I have some trouble believing this will happen in practice. If nothing else Internet Search engines are more focused on what to ignore than what to retrieve so it's likely that a lot of reputation data would never be seen. My guess is that we will quickly find aggregators to whom reputation assertions can be posted and those aggregators will then be searchable.

For example, I suspect that eventually OpenID's proposed attribute exchange mechanism will turn into a generic data store of information about an OpenID user. So it wouldn't be a big deal to include an OpenID attribute whose value is a collection of all public trust assertions that particular user has ever made. The OpenID standard could then be extended to define how someone on the Internet can walk up to an OpenID provider and say "Give me all the trust assertions made by all of your users about entity X".

I'm also guessing that companies like iKarma, Opinity, RapLeaf, iOffer, etc. will try to figure out how to make it easy for people to publish reputation assertions to them. For example, people who are their own OpenID provider (E.g. they run the authentication software on their own machine) or who use more obscure providers wouldn't want reputation assertions they have made to be lost (especially the ones in which they say bad things about some company or person) so their OpenID provider would likely republish the reputation assertions to some 'bigger' reputation aggregator such as the previously named companies.

The point is that because the reputation assertions are open, transportable and free there is likely to be a reasonable number of reputation aggregators who will collect the information and try to add value on top of it. Then sites that want to gain access to or publish the data can do so by talking to the aggregators. We will need some fairly light standards for how to submit and retrieve reputation assertions but this is a constrained enough problem space that I'm guessing POST plus a little URL query magic will do the job nicely.

Who the heck is going to analyze all this data?

Let's imagine a black hat who wants to take down the whole reputation system. The first step would be to create a fake company F who supposedly is an auction site. Then generate a ton of fake OpenID accounts and start issuing bogus reputation assertions about real companies. Let's say the Black Hat particularly hates seller S. In that case the Black Hat will use the fake OpenID accounts to create a bunch of bad reputation assertions about transactions with Seller S and then create a bunch of fake assertions from company F validating that the transactions between the fake OpenID accounts and Seller S actually happened. Who would know? Who would realize that company F is completely bogus?

The point is, someone has to not just parse through all the trust assertions but they have to do so intelligently. That isn't easy. Which leads me to believe there is a bright market for reputation aggregators. For example, a price comparison service (ala BestWebBuys) is going to have to validate who it accepts reputation assertions from. Either it will have to visit company F's website to see if they are real or use algorithms to evaluate them (Gosh, every transaction from company F involves someone with the same OpenID provider, maybe they are bogus?). But in either case it's likely to be an aggregator who is actually processing this data. This will lead to its own competition as different aggregators evaluate the data in different ways and compete on how well they can come up with useful results.

Making sense in the tower of babel

So we can identify people using OpenID. We can publish reputation assertions via OpenID or via reputation aggregators. But how do we make the assertions themselves? Probably SAML or something similar will be used for the assertion syntax but what about the exact definition of what is being asserted? How many different kinds of information should be captured about a sales transaction? Over all happiness with the seller? Happiness with their customer service? Price? Shipping time? What scale should be used? 0-5? 1-10? How about a floating point number to three digits precision between 0 and 1? These are issues that people can make careers out of arguing in standards bodies.

So I suspect we'll skip it. Here's my suspicion of what's really going to happen. First, the whole world is not going to suddenly run out and get an OpenID. Instead some relatively major but not dominant market player, probably one with an ecosystem of sites it works with, will decide to have a go and what they will do is simply take whatever their existing identity system is and map it to OpenID. This will give all of their users an OpenID. But they won't necessarily become an OpenID provider in the traditional sense of validating logins. Instead they will use the OpenIDs so they can start to make assertions. And here is where it gets really fun. Rather than user A saying "I assert that I had a transaction with user B and the result was X" (which would require user A to not just have an OpenID but also a digital certificate). What will really happen is that user A will use the site's own rating system. Then the site will say "I, the website, hereby assert that user A had a transaction with user B and that user A said that the result was X."

Think about that for a second because it's important. What just happened is that the website brought its existing ID and reputation system into the open and its users had to do exactly, nothing. Of course this means that everyone else better really trust that site but trust in the site is critical anyway so I don't see that as a big deal. This will be the first crack. The ecosystem of affiliated sites will then start to share that reputation assertions and try to bring new affiliates in on that basis. Eventually as OpenID and the reputation system get more popular people will start to make their own direct reputation assertions instead of sites making assertions on their behalf, but one step at a time.

Of course there will be a bunch of these sites doing the same thing, all of them coming up with their own trust assertions and every one of them will be different. They will use different questions with different ratings and it will be a mess. And that's O.K. because the first thing that is going to happen is that one group of these sites is going to want to use reputation assertions from the other group so they will figure out how to map. Mapping one star system to another is no big deal, Rotten Tomatoes does it all the time. Mapping one set of questions to another is trickier but not insuperable. For example, one site may decide to take 5 questions from some other site's reputation assertions and calculate a single over all reputation score from them. There are many ways to skin that particular semantic cat.

Then, as new sites come on line, they will want to take reputation assertions from existing sites but to do that they either have to use the same assertions as the existing sites or they can invent their own system but then have to solve the mapping problem. Eventually, I suspect, the system will settle down into some fairly small set of assertion formats and reasonably consistent ways of mapping between them. No, it's not clean, but it should work just fine in practice.

Where to from here?

From a standards perspective the work to be done is quite modest. Mostly just integrating a bunch of existing things. The real work will be in adoption. My guess is that the initial target market are websites that accept OpenID and use some kind of moderated comment system. That market would gain more benefit from shared reputation assertions then it would ever lose from lock in. Blogs that also allows comments would be another natural target market. But I do have hope that websites that do price comparisons would also want to get on board. I use a bunch of these sites and none of them has managed to get a critical mass of ratings for the sites whose prices they provide. This makes sense since the rating sites aren't really in a position to gather information from the user, in most cases the price comparison site doesn't even know who the user is. So they too would probably get more benefit than loss from being able to pool reputation information (such as it is) between themselves. From there I would imagine that someone looking to break EBay's strangle hold on auctions would probably try to give it a go. All the smaller auction sites probably have more to gain from shared reputations then they stand to lose. Eventually one of the bigger but not biggest shopping sites, looking for any kind of advantage, will probably crack and go open. That's the thing about open systems. They don't have to take over overnight or all at once. They just need to keep going and growing and eventually the gravity of their user base becomes irresistible.

2 Responses to Portable Reputations – Reputations Want to Be Free!

  1. Your last paragraph, around adoption, is the most problematic. Many other issues can be figured out as we go (assertion format, aggregation, …), but if there is no adoption in the first place then we’re stuck.

    The problem with adoption as you describe it is: what incentive is there for any single website to start publishing this data?

    The data is definitely valuable to the group, as each benefits from the data published by others.
    It’s a prisoner’s dilemma.

    Tim O’Reilly argues that many web 2.0 businesses rely on “data inside”, using data as an advantage to lock the user in and attract more users. It seems counter-intuitive for such businesses (ebay, amazon, …) to release some of their control.

  2. Administrator says:

    I can imagine a number of ways adoption will happen. The first step is getting the standards in place and getting easy open source implementations that sites can adopt to make it easy to create and publish reputation assertions.

    The next step is to find a community with an incentive to publish assertions. I agree that the big players (e.g. Amazons, EBays, etc.) won’t use the system. But imagine you are one of the numerous smaller auction sites trying to solicit sellers. Being able to publish reputation data in a way that the seller can use to build a reputation on other auction sites would be a competitive advantage that would cost the small auction site nothing. And when you are competing against EBay you need every advantage you can possibly get.

    Similarly one can imagine price comparison websites also using reputation assertion publishing services since their goal is to give their users as much information as possible about the sellers and since each of them collects relatively little data banding together to share reputation data helps everyone. Again, this has no real cost to the comparison pricing websites and it benefits all of them since none of them yet has a very large reputation database.

    I can only see blogs adopting a reputation system. I can identify other blogs I trust and say “if someone got a post accepted by any of the following blogs then let that person post on my blog”. You can also easily see services like Akismet also using reputation data to mark bad comments.

    So I think there are a number of on-line communities that would be interested in adopting a portable reputation system but none of them is interested enough to do the work themselves. So the effort has to be kick started first.

Leave a Reply to Administrator Cancel reply

Your email address will not be published. Required fields are marked *