Thursday, October 30, 2008

Looking Up Data in P2P Systems

This article begins with a discussion of centralized vs non centralized indexing structures in P2P systems. Entirely centralized systems, such as Napster, contain a single point of failure. This is the fundamental problem which subsequent systems aimed to solve. The article moves on to a discussion of some such symmetric, structured systems. They contrast Chord's circular skiplist routing scheme with tree-based routing schemes used by Pastry, Kademlia, and Tapestry with CAN's d-dimensional space based partitioning abstraction (which makes intuitive sense to those who are comfortable thinking in spatially).

They make a largely unfounded claim that P2P systems are useful for a lot more than illegal music sharing. However, for the most part, P2P systems haven't seen widespread adoption for problems other than illegal file sharing (you nerds in the back of the room are quick to point out that bit torrent is used to distribute various flavors of linux, but that was an already solved problem, sit back down). Why?

To rephrase, why do the LARGEST distributed systems in the world, those run by internet giants like Google and Yahoo, still run software stacks (such as MapReduce, Chubby, etc.) that are engineered to contain a small number of points of failure? I say it is because we have come to realize that the performance and complexity cost of making our systems entirely symmetrical is not justified. Instead we use replication and psuedo-distribution of mission critical centralized services to provide resiliency.

The only time we find value in a fully symmetric system is when we don't want to or can't have a centralized authority. However for any legitimate business use, we DO have a centralized authority, that being the business itself. This leaves a very narrow market for the justified use of these symmetrical systems, i.e. illegal or ad-hoc file sharing.

Before this, I hadn't read about any DHT's other than Chord, so I appreciated the abbreviated introduction to such a variety of other DHTs.

2 comments:

Ari Rabkin said...

That's a little harsh. If you need very high availability, and distrust configuration, symmetric ad-hoc systems make sense. That's why Dynamo does it.

Andy Konwinski said...

yes, I concede that dynamo is an example of a large computer engineering company really using a DHT.