Monday, October 5, 2009

Chord

As mentioned in the previous blog entry, searching for data items is a central issue in peer-to-peer systems. Several approaches have been suggested to deal with this problem. The paper “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”[1] focuses on one of them.

The article discussed a simple yet effective means for locating data items in a dynamic peer-to-peer system. Aside from its simplicity, the provability of its correctness and performance distinguish it from other lookup protocols.

Overview
The foundation of Chord’s simplicity lies in its consistent hash function that makes use of a base hash function such as SHA-1. Chord’s consistent hash function is both used in assigning to nodes and keys an m-bit identifier, where m is large enough to ensure that the event of having 2 nodes or keys sharing the same identifier is highly improbable (though the use of SHA-1 is already a measure for enforcing deterministic results, unless explicitly modified).

This approach can be viewed as a circle having 2^m identifiers, ranging from 0 to 2^m – 1, and arranged in ascending order. This circle is called the identifier circle. Nodes are plotted in the circle based on its IP address, while keys – which refer to their hash function values in this context – are assigned to nodes with the nearest identifier, starting from the value of the keys themselves (i.e. the identifiers having the same value as the keys). The node where a key is assigned is called the successor node of key k or simply successor(k). To maintain the consistency produced by the hashing function, certain keys are reassigned to a different node when nodes enter and leave the network. When a node enters a network, some of the keys belonging to its successor, specifically those whose values are less than or equal to the new node’s identifier, will be assigned to it. When a node leaves a network, all of its keys will be assigned to its successor.

Searching for a key, which is mapped to a data item or an address, is facilitated by maintaining routing information, stored in what is called as the finger table, in each node in the Chord circle. The finger table contains a list of nodes, their successors, and the range that they cover. Having a single entry in the finger table that denotes the successor of the current node would have already been sufficient for searching a key in the circle, but this may take up to n nodes to search for a key. To reduce the number of steps needed for searching, a node stores up to m entries in the finger table. This is the same as having logn entries in a node, thus routing information remains relatively small even for large networks. Having additional routing information provides the means to search for the nearest node if the current one doesn’t contain the key. This allows Chord to perform look up in just logn (application level) hops, which means its lookup also scales well for large networks.

Changes in the Network
It is understandable, however, that nodes may join and leave a network at any given time. To maintain its capacity to search for keys, Chord provides operations to ensure that each node’s successor is updated and successor(k) returns the correct value. Maintaining a correct finger table is also ideal, since this will enforce the same lookup speed of logn. These processes are simplified by adding a predecessor pointer to each node.

When a new node enters the network, its finger table and predecessor node are initialized. All other nodes update their finger tables and predecessors to reflect the existence of the new node. Key associations are also updated. Frequent corrections of finger tables, though, are costly, so an alternative is to use a stabilization procedure that runs periodically to verify each node’s successor and adjust its finger table.

When a node n fails, each node that contains n in its finger table must replace n with its successor. This is done by searching for the first active entry in each node’s successor list, which contain a node’s r nearest successors. Periodic stabilization will correct each node’s finger table.

Related Works
There are other similar approaches that use Distributed Hash Table (DHT) for searching just like Chord. I’ll cite two of them: Content Addressable Network (CAN)[2] and Tapestry[3].

CAN makes use of a d-dimensional coordinate system to plot network nodes and keys. Just like Chord, it makes use of a consistent hash function such as SHA-1 to determine the coordinate space of nodes and keys. Though, unlike Chord, the size of each node’s routing table is based on d, and it remains fixed even if the size of the network increases. For each node, it only has to maintain information about its immediate neighbors. The number of steps needed for searching scales relatively well also, as this is asymptotic to O(n^(1/d)). Note, however, that if d is 2 (resulting to a 2-dimensional coordinate space), the number of hops needed for searching is O(n^(1/2)) or O(√n), which grows faster than O(logn). In order for CAN to match the speed of Chord’s searching routine, it has to increase the number of dimensions of its coordinate space or the number of coordinate spaces (each one called a “reality”), resulting to a complex system.

In Tapestry, each node in the network is assigned a nodeID from a large namespace by also using a consistent hash function (e.g. SHA-1). The same mechanism is also used for assigning application-specific endpoints GUIDs. Each identifier G is assigned to a live node that has an ID equal to G. This node becomes the root of G or GR. To allow the routing of messages, each node maintains a routing table consisting of nodeIDs and the IP addresses of their neighbors. This routing table is divided into multiple levels, with each level matching a prefix up to a certain digit position. Messages are passed by selecting nodes whose ID’s have prefixes that are closer to G. With each hop, the number of digits that are similar to G are increasing until it reaches its destination. This method ensures that any existing nodes in the system can be reached in at most logBN hops (application level), where N is the size of the namespace and B is the base of the IDs. The structure of Tapestry can be viewed as a graph of connected nodes, with each node that is a root of an identifier having a spanning tree that allows messages from the leaf nodes to be routed towards the root. While Tapestry can perform lookups faster than Chord, it also has a more complex system than Chord.

Analysis
One property that stands out for Chord is its simplicity despite its facility for fast lookups and its relatively small routing tables. This makes Chord ideal for large networks not only because of its scalability, but because of its simple mechanisms that are suitable even for limited resources.

Though the size of its routing table increases with the network size, a size of logn is still small, even if the number of nodes in the network is almost equal to the current human population (31 entries for 4 billion nodes). Its load balancing also ensures that nodes will generally carry the same burden, reducing the risk of nodes having responsibilities that are more than their capacity.

Its lookup speed of logn steps is also impressive. Other P2P overlays can match or outperform Chord’s lookup speed, but at the expense of implementing a more complex system (e.g. CAN and Tapestry). From a reader’s perspective, Chord is a practical solution for the real world problems faced by P2P systems.

References:
[1] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. ACM SIGCOMM ‘01. August 2001.
[2] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A Scalable Content Addressable Network. ACM SIGCOMM ‘01. August 2001.
[3] Ben Zhao, Ling Huang, Jeremy Stribling, Sean Rhea, Anthony Joseph, and John Kubiatowicz. Tapestry: A Resilient Global-Scale Overlay for Service Deployment. IEEE Journals on Selected Areas in Communications, Vol. 22, No. 1. January 2004.

Looking Up Data in P2P Systems

These days, peer-to-peer systems are a staple for people who are constantly scouring the Internet for files that serve their interests (songs, movies, TV shows, EBooks, etc). They have become a universal tool for sharing and grabbing files, legal or otherwise. Their popularity is driven not only by their convenience, but also of their notorious capability to source some of the hardest to find data, like MP3s that one may not be able download even through the Web itself.

There are more to P2P systems, though, than what they are famous (or infamous) for. Taken for granted is the fact that they are also a product of rigorous research and development. The short article titled “Looking Up Data in P2P systems”[1] presents this other side of P2P systems, which is more academic in nature. Its focus is the problem that is at the heart of P2P systems: the lookup problem – how to search for data across distributed and decentralized systems in a scalable manner.

Previous Approaches to the Lookup Problem
The article presented some previous approaches to the lookup problem.

First off is the approach used by the once popular Napster. It made use of a central database for mapping file names to the location of the servers that store them. The problem here is that, if the central database fails, the entire system will also fail.

One approach that uses hierarchy is the name lookups for the Internet’s Domain Name System (DNS). Searching is done by the traversing the path from the root up to the node that contains the target data. This approach, though, also suffers from the similar problem faced by Napster. If the root or any of the nodes positioned at the higher levels of the hierarchy fails, then a large part of the system will also probably fail. These higher level nodes also carry more load than the leaf nodes.

Because of the problems associated with hierarchal structures, the article noted that some P2P systems went on to use symmetric lookup algorithms. In this kind of structure, each node carries the same amount of burden, which mitigates the effect of losing some nodes due to failure.

An example of this scheme is the forwarding of broadcast messages until a request reaches its destination. This approach, though, is costly and doesn’t scale well. Gnutella uses an approach similar to this, but with provisions to avoid request loops.

FastTrack’s P2P platform included “superpeers” in a hierarchal structure to handle the scaling problem. But then again, adaptability to failures is compromised just like in the case of DNS, aside from the fact that successful retrieval is not guaranteed. KaZaA is one of the applications that popularized this platform.

Freenet also forwards request node per node until the target object is reached. Its searching is based on unstructured routing tables that are created through caching. It avoids mapping of documents to a predictable server to provide anonymity. This approach, however, may cause unpopular documents to be left unassigned to any server. Searching may be costly also, and successful results are not guaranteed.

Distributed Hash Table (DHT)
As an alternative to structured and symmetric schemes, some of the recent P2P algorithms (e.g. CAN, Chord, Kademlia, Pastry, Tapestry, and Viceroy) use a combination of both these schemes, resulting to a system that offers guarantees, while not being vulnerable to some node failures.

These new P2P systems offer a simple approach for the lookup problem through the use of Distributed Hash Tables (DHTs).

DHTs use only a single function: lookup(key). This returns the network address that is mapped to the given key. Mapping is made possible through the use of a base hash function such as SHA-1. Since the result of using a hash function is unique for a given input, then mapping to network locations are consistent.

Here are some issues that have to be addressed when implementing DHTs to lookup algorithms:

Mapping keys to nodes in a load-balanced way. Keys are stored to one or two nodes whose IDs are “close” to the key, based from the given ID space.

Forwarding a lookup for a key to an appropriate node. Nodes must be able to forward queries to nodes that are “closer” to the key identifier being searched.

Distance function. Schemes that make use of DHTs must have a definition for “closeness” so requests can be forwarded until they reach their destination.

Building routing tables adaptively. Nodes must be able to update their routing tables during nodes joins and failures to maintain correct forwarding of messages.

Routing in One Dimension
One of the things that make the various algorithms that use DHTs different from one another is the structure that they implement to allow O(logN) lookups.

For one dimensional structures, examples are Chord, Kademlia, Pastry, Tapestry, and Viceroy. Chord uses a skiplist (or ring) like data structure. Kademlia, Pastry, and Tapestry have nodes that maintain a tree-like data structure. Viceroy uses a butterfly data structure.

Chord: Skiplist-like routing.
(Please refer to the succeeding blog for a more detailed discussion about Chord.)

Tree-like Routing
Nodes in algorithms that use tree-like structures maintain information about nodes under specific prefixes and their location. Pastry, Tapestry (please refer to the succeeding blog for a brief discussion about Tapestry), and Kademlia uses this kind of structure.

Pastry, for instance, provides its nodes with a randomly chosen ID that determines their position on an identifier circle. Each node maintains a set of nodes that are closest to it (called a leaf set). To search for a key, nodes are checked if the requested keys are under their leaf set. If not, messages are routed to the node with the ID that is numerically closest to the key being requested.

Routing in Multiple Dimensions
An example of this approach is CAN (please refer to the succeeding blog for a brief discussion about CAN).

Analysis and Feedback
The article that was discussed here provided a clear overview of the lookup problem in P2P systems. By citing previous approaches to this problem, including their shortcomings, it emphasized the relevance of addressing this issue. Indeed, with P2P systems’ growing user base, there’s no arguing that the feature that makes these systems appealing to mass consumers should be given its due attention.

An indication that the lookup problem is now being given a closer look is the recent wave of P2P lookup algorithms that were presented by the article. An interesting feature of these new approaches is their possession of both structure and symmetry. Even more interesting is the fact that at the core of their implementation is a simple concept called the DHT.

For me, I find DHTs an appealing tool to the said problem because it seems like the ideal abstraction for situations that concern the mapping of unique identities, such as in the case of assigning a data item to a specific network resource. This makes searching determinate: given a query, either you know exactly where to find it or you know that the query will fail.

Of course, the use of DHTs is definitely not the only way to solve the lookup problem. There might be more complex structures out there that can achieve the same thing. But with its simplicity, it seems like a perfect match for a problem that, conceptually speaking, is simple too.

References:
[1] Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Looking Up Data in P2P Systems. Communications of the ACM, Vol. 46, No. 2. February 2003.