Personal Web Services – The Unnecessary Business

When I think of say e-mail or IM or web pages or personalized home pages or single sign on or address books, I realize that there is really no good excuse, certainly not in the long run, for these services to be delivered in a centralized fashion. Why can't I just have a little box off in a corner of my house hooked up to my Internet connection that provides these services for me? Why can't I have my own little e-mail, IM and web server that hosts all of my content? At the very least I would maintain my privacy, something I have no hope of maintaining when a 3rd party hosts my content for me (even if the hosting service doesn't peak, doesn't leak and isn't hacked, I can safely assume the government is looking). But in addition to privacy I would also get better functionality since my little box would 'over provisioned' compared to the processing power allocated to users in a typical centralized service.

It's kinda silly, really

From a technical perspective having massive centralized personal services, in the specific cases I mention above, really doesn't make much sense. It is substantially harder to write one scary mondo centralized hosting system that can handle all load and security issues then it is to design radically smaller systems meant to just handle a few users worth of load.

The smaller systems would also likely provide more features than the centralized systems. Most centralized systems depend on ad revenue to pay the bills so the focus is on getting ads in front of people's faces and keeping costs to the bare minimum. One of the biggest costs in running a centralized system is care and feeding for the hardware. So there is a strong tendency to reduce per user functionality to the minimum in order to reduce hosting costs. But if my servers were running on a local machine, thanks to the insane power of even the weakest home systems, the server would effectively be 'over provisioned' and so able to provide all sorts of functionality that just isn't economical in a centralized system.

But wait, I thought Software as a Service was supposed to win?

I often run into the idea that software as a service is supposed to be about running stuff on centralized servers. But this is to confuse the ends with the means. The ends of 'software as a service' is to make the experience of using a piece of software 'pleasant' by relieving the user of all the nastiness of keeping the software running. No more corrupt registries, no more conflicting DLLs, etc. Currently the easiest way to reach this end is via a centralized service but that is only because of some historical baggage having to do with most modern OS's not properly enforcing process separation.

There is no inherent reason why software can't be run as a service except the hardware the software is running on happens to be in someone's house rather than in an enormous data center. Think of an application like Firefox. Today it updates itself automatically. The only reason I even know it is updating itself is because it has to close and re-open after an update. But even that experience will probably get better in Firefox 2.0 when they build in the ability to close and re-open windows/tabs in the same state. My point is, to a large extent, my experience of Firefox (or Smultron or Cyberduck or many other apps on my Mac) is that the software is essentially a service. It just shows up and it just works. But everything is running on my machine.

Nice idea, but how do we get there from here?

If I had to put money on which technology was most likely going to enable the box in the corner to provide all these services I would bet on hypervisors like Xen. Hypervisors let one make a single machine (which could have many cpus or cores, that's actually a separate issue) look like multiple physically separate machines. Each application can get its own virtual machine and do whatever the heck it pleases, load any OS's, configure it anyway it wants, crash in the most horrendous ways and not bother any of the other virtual machines. The use of hypervisors radically simplifies software development because one has so much more control over the execution environment. This helps to address one of the biggest advantages of a centralized system – the reduced test matrix. In a centralized system you only have to test your software against the OS/installation configuration you want to use. With a hypervisor system you can largely replicate that same experience.

The other contender for the box (or boxes) in the corner are application specific boxes ala ipod or TiVo. The idea being that hardware is so nutso cheap that companies will build dedicated boxes for their server applications and instead of there being one box in the corner there will be a whole mess of them.

My guess is that the real answer will be 'all of the above'. Just as companies today produce dedicated nat/firewire boxes so some companies will produce dedicated all in one 'personal' servers and others will product cheap boxes running hypervisors that are just perfect for hosting personal 'server' functionality. Inevitably the boxes will get so cheap and small that you'll probably just carry them around with you in your pocket but that's another story all together.

But what if my Internet connection goes down or is slow and anyway how do I deal with DNS and such?

Most of the world is moving quite quickly to broadband (hence AOL's recent troubles) and from what I can tell most broadband companies do a decent enough job of keeping the line up and working. In fact, compared to the problems certain centralized services have in keeping themselves up (no names mentioned) my guess is that a typical broadband connection is effectively as reliable as most centralized systems from the perspective of an individual user.

The stickier problem is DNS handling. It's really nice to be able to go to a specific address from a web browser and get your e-mail. Most of us (myself included) are too cheap to pay for a permanent IP address and associate a DNS name with it. We usually get our addresses via DHCP. Thankfully the IETF solved the problem of how to associate a DNS name with a floating IP address nearly ten years ago – it's called DNS update.

Just as most folks have figured out how to get a telephone number (and many even figured out how to get both a landline and a cell phone not to mention a cable connection and an Internet connection) so they will figure out that they need their own DNS address and so the company they get to host the address will figure out how to use DNS update.

So does this mean we never need centralized systems?

At a minimum we will need backup services so if the house burns down all of our data doesn't go with it. But really there are lots of services that need centralization. Generally anything that needs ridiculous scale (e.g. needs to survive a Slashdotting) probably needs to be centralized. So currently this means that services ranging from search to map serving to shopping to collaborative editing (e.g. wikis) to media sites, etc. all need to be centralized. But many, many services, such as the ones I mentioned at the start of the article, don't gain much from centralization and we all lose a lot in terms of privacy and functionality when we unnecessarily centralize things.

And, just to head off an obvious conversation at the pass, none of this applies to enterprise service scenarios. There are lots of legal and technical reasons why a company providing e-mail or IM or web or other services to its employees will want to do so using centralized systems. This article is exclusively about consumers and their individual services. Enterprise is a different ball of wax.

But don't people want things for free?

It so happens that at the moment that most centralized services are ad based and so nominally free. Although I'd argue given the feature limitations and annoying ads that 'free' is a misnomer. But it doesn't matter, the monetization model and the location of the hardware are completely irrelevant. If you are getting your web based e-mail from a local box or across the Internet either system can serve you an ad. We shouldn't confuse how the services get paid for with where they are physically located.

Don't Panic

I'm sure to my colleagues in the services world all this talk of many of our services being unnecessary or even counter productive (at least in their current form) is more than a little worrisome. But I'm reminded of an experience I had when I was doing software standards work. The developers I was working with asked when the standards group would finish the standard so they knew when to ship their product. My answer was "There are only two meaningful answers to the question of when a standards body will finish – less than six months and I don't know." In a similar vein my suspicion is that when it comes to consumer behavior the only meaningful answer to "when will this happen?" is either "It already is happening" or "I don't know". Although I could wave my hands around some interesting things happening with cable boxes I'll instead say "I don't know". Which means it probably won't happen tomorrow or the day after. So we have time. But the point of this article isn't to council panic. Instead it is to point out what I believe is a desirable and likely future direction for many of the services that we bank on today. So the moral of the story is – be prepared.

Conclusion

Thanks to broadband Internet connections, cheap hardware and hypervisors there really isn't much advantage to individual users getting their e-mail or IM or web pages or portals or single sign on or address books from centralized systems. But getting these services from centralized systems does entail a real cost both in terms of lost privacy and in terms of available functionality. Therefore I believe that in the long run we will start to see these services migrate to the edge. This doesn't mean the software won't be delivered as a service, it just means the service will be running on user controlled hardware.

Appendix

Why is having the software on your box any better, in privacy terms, that having it on a central system?

Because you can watch the box. The easiest approach, from a privacy perspective, is if the software on the box is open source. In that case lots of folks can tear the code apart to find problems and if history is any guide, they will. But even closed source software isn't so bad since (at least if it's popular) lots of folks will tear it apart anyway. It is also possible to watch the box and what it does, if it makes network connections to strange places it's possible to know and find out why. With a central service you have zero control.

It is true, of course, that the 'service' aspect of the box complicates things since who knows what's coming down in those updates but you still have network monitoring and anyway the situation is certainly no worse from a privacy perspective than a central service and it offers the potential of being much better.

If hardware is so cheap why don't centralized services benefit from it as much as home users?

The reason is what's known in economics as "marginal cost". The marginal cost of adding a new box to a data center is higher than the marginal cost of adding a new box to your home. In the case of a data center rack space has a defined value that one has to pay. So does having someone take care of the box (even if it is just to occasionally throw it away), provide power, cooling, etc. So someone is getting a bill for each and every box and speaking from experience, that bill isn't small, even for huge data centers.

For someone in a home the marginal cost of powering, cooling, providing space and servicing (read: throwing out) a few bits of extra hardware is near zero because the person is already powering, cooling and providing space for humans who need a lot more of all three of those things then a small server box. Obviously the situation can get out of control if too many computers are brought in (e.g. the marginal cost changes as a function of the number of systems involved) but for most people that isn't an issue.

Why don't you use operating system X to run software Y to give you personal service Z?

There is no question that someone dedicated enough could set up their own server box and run their own e-mail server or IM server (although who would you talk to?) or certainly their own web server (although without all the cool spaces functionality that the hip cats like), etc. But today it's really painful. I'd love to run my own e-mail server but all the software I've run up against seems designed for large installations and requires all sorts of annoying configurations. Even blog software is a relatively high maintenance pain. I think the real reason for the complexity is that most people are running large installations and so the software trades off complexity for extra flexibility and power. A quite reasonable trade off. But if you are running a tiny system then all the things you need to learn quickly become oppressive. But, the good news is, once people design software explicitly for tiny installations there are numerous short cuts they can take that can radically simplify things.

2 thoughts on “Personal Web Services – The Unnecessary Business”

  1. I posted some thoughts along the same lines.
    I think that you missed some important trade-offs though. Personal services do get better privacy and more resources per user, but there have downsides: there is a maintenance cost (power, backuping,..) and a functionality cost (it’s difficult to aggregate the social data and learn from crowd).
    Also, you mention that centralized/hosted services can handle the slashdot effect better. I rather think that P2P or hybrid systems (minimal server involvement, large edge caching and computation) would be a better fit.

  2. Given that the power needed by your average headless server is in the range of a lightbulb I don’t see power as an issue. As for backing up, you need that in any case for your main machine so the extra load from the zero is effectively zero.

    As for functionality, I must admit that I’m having trouble coming up with any loss of “social” data by having the specific services I listed in the article served privately.

    Take an address book as an example. It knows who the owner’s friends are and the server itself can trivially aggregate data from those friends and use it for ‘friends of friends’ functionality. In fact, it can do queries across the friends of friends data that a centralized system could never afford to calculate.

    As for handling the slashdot effect, there’s nothing magical about P2P. If someone is worried about the slashdot effect then by definition they are not serving up private data and therefore the entire article no longer applies.

    I think the key point here is that I am only talking about services such as the specific list I gave at the start of the article that are inherently private (or very small scale). If data isn’t private (which beyond 1 or 2 hops in a friend of friends system, it’s not) or isn’t small scale then my article doesn’t apply. But it turns out, as I believe my list shows, that lots and lots and lots of very interesting and very private data does fit into the examples.

Leave a Reply to Julien Couvreur Cancel reply

Your email address will not be published. Required fields are marked *