The Ping Panther

Monday, October 5, 2009

Chord

As mentioned in the previous blog entry, searching for data items is a central issue in peer-to-peer systems. Several approaches have been suggested to deal with this problem. The paper “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”[1] focuses on one of them.

The article discussed a simple yet effective means for locating data items in a dynamic peer-to-peer system. Aside from its simplicity, the provability of its correctness and performance distinguish it from other lookup protocols.

Overview
The foundation of Chord’s simplicity lies in its consistent hash function that makes use of a base hash function such as SHA-1. Chord’s consistent hash function is both used in assigning to nodes and keys an m-bit identifier, where m is large enough to ensure that the event of having 2 nodes or keys sharing the same identifier is highly improbable (though the use of SHA-1 is already a measure for enforcing deterministic results, unless explicitly modified).

This approach can be viewed as a circle having 2^m identifiers, ranging from 0 to 2^m – 1, and arranged in ascending order. This circle is called the identifier circle. Nodes are plotted in the circle based on its IP address, while keys – which refer to their hash function values in this context – are assigned to nodes with the nearest identifier, starting from the value of the keys themselves (i.e. the identifiers having the same value as the keys). The node where a key is assigned is called the successor node of key k or simply successor(k). To maintain the consistency produced by the hashing function, certain keys are reassigned to a different node when nodes enter and leave the network. When a node enters a network, some of the keys belonging to its successor, specifically those whose values are less than or equal to the new node’s identifier, will be assigned to it. When a node leaves a network, all of its keys will be assigned to its successor.

Searching for a key, which is mapped to a data item or an address, is facilitated by maintaining routing information, stored in what is called as the finger table, in each node in the Chord circle. The finger table contains a list of nodes, their successors, and the range that they cover. Having a single entry in the finger table that denotes the successor of the current node would have already been sufficient for searching a key in the circle, but this may take up to n nodes to search for a key. To reduce the number of steps needed for searching, a node stores up to m entries in the finger table. This is the same as having logn entries in a node, thus routing information remains relatively small even for large networks. Having additional routing information provides the means to search for the nearest node if the current one doesn’t contain the key. This allows Chord to perform look up in just logn (application level) hops, which means its lookup also scales well for large networks.

Changes in the Network
It is understandable, however, that nodes may join and leave a network at any given time. To maintain its capacity to search for keys, Chord provides operations to ensure that each node’s successor is updated and successor(k) returns the correct value. Maintaining a correct finger table is also ideal, since this will enforce the same lookup speed of logn. These processes are simplified by adding a predecessor pointer to each node.

When a new node enters the network, its finger table and predecessor node are initialized. All other nodes update their finger tables and predecessors to reflect the existence of the new node. Key associations are also updated. Frequent corrections of finger tables, though, are costly, so an alternative is to use a stabilization procedure that runs periodically to verify each node’s successor and adjust its finger table.

When a node n fails, each node that contains n in its finger table must replace n with its successor. This is done by searching for the first active entry in each node’s successor list, which contain a node’s r nearest successors. Periodic stabilization will correct each node’s finger table.

Related Works
There are other similar approaches that use Distributed Hash Table (DHT) for searching just like Chord. I’ll cite two of them: Content Addressable Network (CAN)[2] and Tapestry[3].

CAN makes use of a d-dimensional coordinate system to plot network nodes and keys. Just like Chord, it makes use of a consistent hash function such as SHA-1 to determine the coordinate space of nodes and keys. Though, unlike Chord, the size of each node’s routing table is based on d, and it remains fixed even if the size of the network increases. For each node, it only has to maintain information about its immediate neighbors. The number of steps needed for searching scales relatively well also, as this is asymptotic to O(n^(1/d)). Note, however, that if d is 2 (resulting to a 2-dimensional coordinate space), the number of hops needed for searching is O(n^(1/2)) or O(√n), which grows faster than O(logn). In order for CAN to match the speed of Chord’s searching routine, it has to increase the number of dimensions of its coordinate space or the number of coordinate spaces (each one called a “reality”), resulting to a complex system.

In Tapestry, each node in the network is assigned a nodeID from a large namespace by also using a consistent hash function (e.g. SHA-1). The same mechanism is also used for assigning application-specific endpoints GUIDs. Each identifier G is assigned to a live node that has an ID equal to G. This node becomes the root of G or GR. To allow the routing of messages, each node maintains a routing table consisting of nodeIDs and the IP addresses of their neighbors. This routing table is divided into multiple levels, with each level matching a prefix up to a certain digit position. Messages are passed by selecting nodes whose ID’s have prefixes that are closer to G. With each hop, the number of digits that are similar to G are increasing until it reaches its destination. This method ensures that any existing nodes in the system can be reached in at most logBN hops (application level), where N is the size of the namespace and B is the base of the IDs. The structure of Tapestry can be viewed as a graph of connected nodes, with each node that is a root of an identifier having a spanning tree that allows messages from the leaf nodes to be routed towards the root. While Tapestry can perform lookups faster than Chord, it also has a more complex system than Chord.

Analysis
One property that stands out for Chord is its simplicity despite its facility for fast lookups and its relatively small routing tables. This makes Chord ideal for large networks not only because of its scalability, but because of its simple mechanisms that are suitable even for limited resources.

Though the size of its routing table increases with the network size, a size of logn is still small, even if the number of nodes in the network is almost equal to the current human population (31 entries for 4 billion nodes). Its load balancing also ensures that nodes will generally carry the same burden, reducing the risk of nodes having responsibilities that are more than their capacity.

Its lookup speed of logn steps is also impressive. Other P2P overlays can match or outperform Chord’s lookup speed, but at the expense of implementing a more complex system (e.g. CAN and Tapestry). From a reader’s perspective, Chord is a practical solution for the real world problems faced by P2P systems.

References:
[1] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. ACM SIGCOMM ‘01. August 2001.
[2] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A Scalable Content Addressable Network. ACM SIGCOMM ‘01. August 2001.
[3] Ben Zhao, Ling Huang, Jeremy Stribling, Sean Rhea, Anthony Joseph, and John Kubiatowicz. Tapestry: A Resilient Global-Scale Overlay for Service Deployment. IEEE Journals on Selected Areas in Communications, Vol. 22, No. 1. January 2004.

Looking Up Data in P2P Systems

These days, peer-to-peer systems are a staple for people who are constantly scouring the Internet for files that serve their interests (songs, movies, TV shows, EBooks, etc). They have become a universal tool for sharing and grabbing files, legal or otherwise. Their popularity is driven not only by their convenience, but also of their notorious capability to source some of the hardest to find data, like MP3s that one may not be able download even through the Web itself.

There are more to P2P systems, though, than what they are famous (or infamous) for. Taken for granted is the fact that they are also a product of rigorous research and development. The short article titled “Looking Up Data in P2P systems”[1] presents this other side of P2P systems, which is more academic in nature. Its focus is the problem that is at the heart of P2P systems: the lookup problem – how to search for data across distributed and decentralized systems in a scalable manner.

Previous Approaches to the Lookup Problem
The article presented some previous approaches to the lookup problem.

First off is the approach used by the once popular Napster. It made use of a central database for mapping file names to the location of the servers that store them. The problem here is that, if the central database fails, the entire system will also fail.

One approach that uses hierarchy is the name lookups for the Internet’s Domain Name System (DNS). Searching is done by the traversing the path from the root up to the node that contains the target data. This approach, though, also suffers from the similar problem faced by Napster. If the root or any of the nodes positioned at the higher levels of the hierarchy fails, then a large part of the system will also probably fail. These higher level nodes also carry more load than the leaf nodes.

Because of the problems associated with hierarchal structures, the article noted that some P2P systems went on to use symmetric lookup algorithms. In this kind of structure, each node carries the same amount of burden, which mitigates the effect of losing some nodes due to failure.

An example of this scheme is the forwarding of broadcast messages until a request reaches its destination. This approach, though, is costly and doesn’t scale well. Gnutella uses an approach similar to this, but with provisions to avoid request loops.

FastTrack’s P2P platform included “superpeers” in a hierarchal structure to handle the scaling problem. But then again, adaptability to failures is compromised just like in the case of DNS, aside from the fact that successful retrieval is not guaranteed. KaZaA is one of the applications that popularized this platform.

Freenet also forwards request node per node until the target object is reached. Its searching is based on unstructured routing tables that are created through caching. It avoids mapping of documents to a predictable server to provide anonymity. This approach, however, may cause unpopular documents to be left unassigned to any server. Searching may be costly also, and successful results are not guaranteed.

Distributed Hash Table (DHT)
As an alternative to structured and symmetric schemes, some of the recent P2P algorithms (e.g. CAN, Chord, Kademlia, Pastry, Tapestry, and Viceroy) use a combination of both these schemes, resulting to a system that offers guarantees, while not being vulnerable to some node failures.

These new P2P systems offer a simple approach for the lookup problem through the use of Distributed Hash Tables (DHTs).

DHTs use only a single function: lookup(key). This returns the network address that is mapped to the given key. Mapping is made possible through the use of a base hash function such as SHA-1. Since the result of using a hash function is unique for a given input, then mapping to network locations are consistent.

Here are some issues that have to be addressed when implementing DHTs to lookup algorithms:

Mapping keys to nodes in a load-balanced way. Keys are stored to one or two nodes whose IDs are “close” to the key, based from the given ID space.

Forwarding a lookup for a key to an appropriate node. Nodes must be able to forward queries to nodes that are “closer” to the key identifier being searched.

Distance function. Schemes that make use of DHTs must have a definition for “closeness” so requests can be forwarded until they reach their destination.

Building routing tables adaptively. Nodes must be able to update their routing tables during nodes joins and failures to maintain correct forwarding of messages.

Routing in One Dimension
One of the things that make the various algorithms that use DHTs different from one another is the structure that they implement to allow O(logN) lookups.

For one dimensional structures, examples are Chord, Kademlia, Pastry, Tapestry, and Viceroy. Chord uses a skiplist (or ring) like data structure. Kademlia, Pastry, and Tapestry have nodes that maintain a tree-like data structure. Viceroy uses a butterfly data structure.

Chord: Skiplist-like routing.
(Please refer to the succeeding blog for a more detailed discussion about Chord.)

Tree-like Routing
Nodes in algorithms that use tree-like structures maintain information about nodes under specific prefixes and their location. Pastry, Tapestry (please refer to the succeeding blog for a brief discussion about Tapestry), and Kademlia uses this kind of structure.

Pastry, for instance, provides its nodes with a randomly chosen ID that determines their position on an identifier circle. Each node maintains a set of nodes that are closest to it (called a leaf set). To search for a key, nodes are checked if the requested keys are under their leaf set. If not, messages are routed to the node with the ID that is numerically closest to the key being requested.

Routing in Multiple Dimensions
An example of this approach is CAN (please refer to the succeeding blog for a brief discussion about CAN).

Analysis and Feedback
The article that was discussed here provided a clear overview of the lookup problem in P2P systems. By citing previous approaches to this problem, including their shortcomings, it emphasized the relevance of addressing this issue. Indeed, with P2P systems’ growing user base, there’s no arguing that the feature that makes these systems appealing to mass consumers should be given its due attention.

An indication that the lookup problem is now being given a closer look is the recent wave of P2P lookup algorithms that were presented by the article. An interesting feature of these new approaches is their possession of both structure and symmetry. Even more interesting is the fact that at the core of their implementation is a simple concept called the DHT.

For me, I find DHTs an appealing tool to the said problem because it seems like the ideal abstraction for situations that concern the mapping of unique identities, such as in the case of assigning a data item to a specific network resource. This makes searching determinate: given a query, either you know exactly where to find it or you know that the query will fail.

Of course, the use of DHTs is definitely not the only way to solve the lookup problem. There might be more complex structures out there that can achieve the same thing. But with its simplicity, it seems like a perfect match for a problem that, conceptually speaking, is simple too.

References:
[1] Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Looking Up Data in P2P Systems. Communications of the ACM, Vol. 46, No. 2. February 2003.

Friday, July 10, 2009

Unlocking the Internet’s Secrets

The paper by Lixin Gao, titled “On Inferring Autonomous System Relationships in the Internet,”[1] discusses an algorithm that her group has developed for determining the relationships of various ASs. Based on her account, there are no available public records specifying inter-AS relationships. What exist, though, are publicly available BGP routing tables and these served as their source of data for inferring relationships.

Their approach uses an annotated graph to represent ASs and their relationships. Nodes represent ASs and edges represent relationships, which they classified as provider-to-customer, customer-to-provider, peer-to-peer, and sibling-to-sibling.

Export Policies
Vital to the design of their algorithm are their derived BGP export policies [1]:

“Exporting to a provider: In exchanging routing information with a provider, an AS can export its routes and its customer routes, but usually does not export its provider or peer routes.

Exporting to a customer: In exchanging routing information with a customer, an AS can export its routes and its customer routes, and as well as its provider or peer routes.

Exporting to a peer: In exchanging routing information with a peer, an AS can export its routes and its customer routes, but usually does not export its provider or peer routes.

Exporting to a sibling: In exchanging routing information with a sibling, an AS can export its routes and routes of its customers, and as well as its provider or peer routes.”

From these export policies, they formulated what they consider as the selective export rule, which summarizes the above policies. I’d like to note that, as this paper and the previously discussed lecture on AS have also emphasized, policies regarding exporting and importing of routes are only privy to ASs involved in a deal. So what they have here are mere descriptions of what’s likely to be the case, as suggested by the phrase “but usually does not export,” and these hardly qualify as scientific. From my perspective, I find it hard to image that such assumptions are capable of producing accurate results. But as their experimental results suggest, my perception is the one that is inaccurate.

From here, I’ll go straight (I’ll skip their proving parts) to their algorithm.

Algorithm
Based from BGP routing tables, an augmented graph can be derived using the policies described by the paper. This graph will serve as the input for their algorithm. The first part of the algorithm simply determines the degree of each AS in a graph. A higher degree suggests that a node is likely a provider, since customers are connected to them.

The next phase of the algorithm is meant for reducing the inconsistencies that are possibly caused by misconfigured routers. In some cases, two AS pairs appear to have both provider-to-customer and customer-to-provider relationships in various routing table entries. This could lead to mislabeling in some AS paths. To correct this, they used a variable L to serve as a threshold for determining the dominant relationship. If a relationship of two pairs appear in more than L entries, then that is used as the correct relationship. But if both the provider-to-customer and customer-to-provider relationships both appear in less than L entries or more than L entries, then they are considered to provide transit to one another.

The third part of the algorithm is the labeling of edges. For each AS path, the AS with the highest degree is considered as the “top provider.” Once a top provider is determined, ASs with customer-to-provider and provider-to-customer relationships can be easily identified. All transits before the top provider are customer-to-provider edges, since providers are generally known to have higher degrees, while transits appearing after are provider-to-customer edges. If two pairs both provide transit to each other (i.e. they both appear before the top provider and also after it in a reversed order), then this pair is classified as a sibling-to-sibling edge. This part of the algorithm simply traverses all consecutive pairs in a path, which means it has an efficient linear run time (O(n)).

The last part of the algorithm is for determining peering relationships. In each AS path, the top provider can have a peering relationship with at most one of its neighbors. The heuristic considered for determining a top provider’s peer is the tendency of a top provider to peer with a neighbor with a higher degree. However, this could erroneously produce many peering relationships. To handle this, a variable R is introduced as measure for the similarity of two peers in terms of their degree. The author said, though, that this value still needs to be fine-tuned, though its effects are less influential if there are more routing tables used as references.

Experimental Results
Their experimental results, to say the least, are impressive. Here are some of their results that were confirmed by AT&T internal information: 100% of their inferred customers, 100% of their inferred peers, 20% of their inferred siblings. 98.9% of all their inference results were confirmed by AT&T internal information. For an algorithm that is based on simple heuristics, including some variables that may need further adjustment, it is really surprising to see that they were able to come up with accurate results concerning data that are meant to be confidential. Even better, they were able to accomplish this through a series of sequential data processing, resulting to quite an efficient algorithm.

Aside from confirming the correctness of their algorithms, their results also prove the validity of certain assumptions made on routing policies. Although these assumptions are based on what makes sense, there are still no hard evidences to prove these. But now that a well known ISP has confirmed their results based on these notions, then maybe it’s time for ISPs to make their routing policies and relationships open for research purposes, since they’re slowly being exposed anyway.

References:
[1] L. Gao. On Inferring Autonomous System Relationships in the Internet. IEEE/ACM Transactions on Networking, vol.9, no.6, p.733-745. December 2001

The Economics of the Internet

Karl Marx postulated that society, for the most part, is economically motivated. Apparently, the Internet (or what drives it) is not exempted from this.

The lecture “Interdomain Internet Routing”[1] presents the different entities that collectively form the Internet and their varying relationships with one another. These entities are known as “Autonomous Systems” (AS), and these include Internet Service Providers (ISPs) and their customers. One may prefer that these ASs cooperate in a symbiotic fashion for optimum global connectivity, but this is generally not the case. The authors noted that, since most of them belong to commercial enterprises, relationships are dictated by business decisions, resulting to selective routing. I think this is understandable, though, as providing Internet traffic is not cheap, so you want to make sure that your transmission facilities are being utilized towards your own advantage.

ASs use the Border Gateway Protocol (BGP) to exchange reachability information among one another, and this in turn serve as their guide for determining whether they would share their routes with another AS or not. The paper explained that the two most common forms of inter-AS relationships are transit and peering.

Inter-AS Relationships
Transit relationships are the ones shared between providers and customers. Largely a result of business transactions, this gives customers the benefit of being able to access most, if not all, of the destinations available from a provider. Peering, on the other hand, gives two ASs access to only a number of entries in each other’s routing tables. After all, this type of relationship is usually not settled financially (as the saying goes, we always get our money’s worth), and it is more of an agreement between two parties. The connectivity that peering provides may seem limited, but its real value lies in the semi-mutual benefit that involved ASs may reap. An AS on one side of a peering relationship save on transmission cost by being able to send their packets using their peer’s routes that are connected to their customers, thus preventing the need to transit with a higher-tiered ISP. This, of course, in exchange for doing the same thing for their peer. I described this as semi-mutual because, as the paper indicated, the common traffic ratio between peering ASs is 4:1, and that is already considered as not highly asymmetrical. Maybe this is because it’s hard to find two AS pairs that share almost similar amount of customers belonging on the other end. Still, one is getting 75% more than the other! Well, if it works for them, then it works for them (but one is luckier).

Exporting and Importing Routes
The paper also discussed the exporting and importing rules applied by an ISP to its neighbors. In either case, the customer is always the king, since customer traffics provide ISPs revenues. Other rules for exporting and importing are simply a matter of making sure that they provide routes for traffic that would benefit them and that they deny any route that would only make them carriers of packets that solely serve the interests of their neighboring ISPs.

BGP
The succeeding sections of the paper discussed the BGP in a fairly detailed manner. I won’t reiterate everything here since things are already explained (and it’s a long one), so I’ll just mention some parts where I can place my insights.

One interesting section there is the communication of BGP speaking routers within an AS. In this setting, iBGP is used by the communicating routers (as opposed to eBGP, which is used for speaking with routers belonging to other ASs) in order to disseminate among themselves routes learned from other ASs. The problem lies, though, in the seemingly trivial task of spreading information. The first solution they used is the linking of every eBGP router with all other routers in an AS through iBGP sessions, resulting to a full mesh topography. In graph theory lingo, you can think of this as something similar to a clique graph, which contains |v|(|v| - 1)/2 edges, where |v| is the number of vertices in the graph. In this representation alone, n routers would mean Θ(n2) iBGP sessions maintained within an AS. Worse, clique graph is not a very accurate representation because there are internal nodes inside the graph, resulting to a larger value of e(e - 1)/2 + ei iBGP sessions, where e is the number of eBGP routers and i is the number of iBGP routers. If you’re concerned with scaling, this is not a good sign.

Other possible solutions involve the use of confederations and route reflectors. Confederation is a form of clustering of routers in such a way that a cluster is considered as a single entity[2]. Actually, based on rfc3065, these are mainly applied on ASs, but this model fits well within an AS too, plus there wouldn’t be any problems with regards to the AS_PATH value. Route reflectors, on the other hand, are routers that are responsible for passing routing information for their whole domain (an AS or confederation). Combining these two concepts, what happens is that information is passed among confederations instead of individual routers, then each confederation may have its own reflector that would spread information within its domain. This represents a cascading effect, somewhat similar to a tree structure. Sounds like an interesting idea. But come to think of it, isn’t this also the same way the Internet is structured, just on a larger scale?

Another interesting issue about BGP is the MED attribute. It can be used either as a “magnet” or a “turn-off” (sorry, I couldn’t think of a better term) for traffic. What if two neighboring ASs both wanted or disliked to carry certain packets, and they kept on adjusting their MED values until they reached to a point where they have similar values, would this result to a stalemate? Of course, I’m sure there’s a provision for this, and this attribute is not even always used. But imagine the hypothetical scenario of two Tier-1 ISPs not being able to come up with an “agreement” on a certain data transmission because of their MED values (or even other attributes) and this caused the Internet to temporarily hang. That would be, um, strange.

References:
[1] H. Balakrishnan and N. Feamster. Lecture 4: Interdomain Internet Routing.
[2] http://tools.ietf.org/html/rfc3065

Friday, June 26, 2009

The Internet Architecture’s Design Philosophy

The Web. E-mail. File sharing. Voice over Internet Protocol (VoIP). Stalker-friendly tools. These are just some of the services available to us through the Internet. What makes these services particularly useful is the nature of the Internet itself: interconnected networks. Through this setup, we are able to video conference with relatives on the other side of the globe, download a movie from a mirror in Germany, and follow Megan Fox on Twitter. All these, of course, we owe to the architecture that made the very concept of connecting different types of networks possible.

This architecture is the topic of David Clark’s paper, titled “The Design Philosophy of the DARPA Internet Protocols”[1]. It provides an insight on the Internet’s primitive architecture, as well as the rationalization behind it.

Before I proceed with details of the paper, I have to admit first that I got confused with the article’s constant shift in discussing between the Internet protocol suite and the Internet’s architecture. To me, it seemed that the author was using the concepts “Internet protocol suite” and “Internet’s architecture” interchangeably, that in some context, I may have misinterpreted them to be the same banana. But, of course, that is not the author’s fault (this is an MIT paper, by the way), and I attribute it to my shallow knowledge in networking. So if ever I get to mix-up or bungle some concepts, pardon me.

The Protocols that Paved the Way for LOLS (and Other Web Jargons)
Now, here comes the serious part. The paper started with a brief introduction on the Transmission Control Protocol (TCP) and the Internet Protocol (IP), collectively known as TCP/IP or the Internet protocol suite. It stated there that the TCP/IP was developed by the Defense Advanced Research Projects Agency (DARPA) some decades back. If I’m not mistaken, though, Vinton Cerf have been doing work on this project even before he became part of DARPA, that is why he is widely regarded as the “Father of the Internet” (not to be confused with Sir Tim Berners Lee, who is the “Father of the World Wide Web”).

As the author said, information about of these protocols is reasonably available, but the motivation and rationalization behind their design is not clear. To give a perspective on how these protocols came to be, the original objectives of the Internet architecture were discussed, starting from the most important goals to the least important ones.

Fundamental Goal
According to the author:

“The top level goal for the DARPA Internet Architecture was to develop an effective technique for multiplexed utilization of existing interconnected networks.”

This goal could very well be the Internet’s defining principle, as well as its claim to fame. To achieve this, the architects were given several options, including the use of circuit switching. But the networks that were to be connected and the applications that they support were based on packet switching, so the method chosen for interconnecting the networks is also packet switching. This became the foundation of today’s Internet structure. No further reason is given as to why this approach was used, so it seems they simply made the logical (and obvious) choice.

However, they also had an alternative to interconnecting existing networks, and that is the creation of a new system that would incorporate different transmission media, resulting to a multi-media network. The author described this as possibly having “a higher degree of integration” resulting to “better performance”, but the architects preferred the use of the existing networks, believing this was necessary if the Internet was going to be “useful in the practical sense.” If the worldwide success attained by the Internet today were any indication, the architects have indeed made the right decision. But the prospect of using a unified network system is enticing. I think this may lead to the standardization of the different transmission media catering to various types of networks.

Second Level Goals
A more detailed list of goals was also presented, this time focusing on how the Internet should behave:

1. Internet communication must continue despite loss of networks or gateways.
2. The Internet must support multiple types of communications service.
3. The Internet architecture must accommodate a variety of networks.
4. The Internet architecture must permit distributed management of its resources.
5. The Internet architecture must be cost effective.
6. The Internet architecture must permit host attachment with a low level of effort.
7. The resources used in the internet architecture must be accountable.

These goals are arranged in order of priority, and must be viewed relative to the context of military use (after all, this is a project of the US Department of Defense).

The top priority in this list is survivability. At first glance, the first goal may sound like continuous uninterrupted connection. Actually, it simply meant that there should be a provision that would allow communications to resume, without having to reestablish conversations, after services are temporarily interrupted due to some network failure. To accomplish this, the architects decided to preserve communication states by storing them in the entities using the network services. This approach is termed by the author as “fate-sharing” (a precursor to session data, perhaps?). This removed the burden of maintaining states from the packet switches, giving the Internet a “datagram” network design. I have to emphasize here that the author once again pointed out a possible advantage of using a multi-media network, saying that a more survivable technology might have been derived from this type of network design.

The support for multiple types of services is the second goal of the Internet architecture. Initially, it was thought that the TCP was general enough support any type of service. But as the nature of certain services became more defined, it became apparent that the reliable service provided by the TCP inappropriate for some them. The reason: some services prioritize other network characteristics, such as speed and latency, over reliability. This led to the separation of IP from TCP into a separate layer. IP provides the basic building block, in the form of a datagram, needed to allow to a variety of services to build upon it. Combined, the TCP/IP provides the flexibility needed to handle several types of services.

The third goal is the support for different types of networks. This was achieved through making a minimum set of assumptions about network behaviors and capabilities, and configuring the Internet architecture to ensure that it would cater to these assumptions. I view these assumptions as the interface that other networks must adhere to in order to be supported by the Internet.

The other goals in the list are considered to be of lower importance, so some of them may have only been partially satisfied, if not completely unsupported as of the paper’s writing. But these days, one may observe that the lesser prioritized goals are already satisfied to a certain extent, giving us a clue that changes might have already taken place in the architecture.

Like in the case of distributed management of resources, I’d say the modern server-client setup, wherein servers are allowed to control the resources given to clients, addresses this goal. Personally, I’d also say that the aim of allowing host attachment with low level effort have also been achieved, as we all could easily connect to the Internet through the simple configuration of our machine’s network properties (given that Internet access is available). Accountability, it seems, has also been addressed. I have encountered several websites that became inaccessible because they have reached the bandwidth limit provided by their hosts. The issue of cost effectiveness, though, is something that I cannot comment about. I’m not sure what qualifies as cost effective in networking terms, nor am I aware of what’s exactly going on beneath the surface.

One specific goal that I was expecting to be addressed, but surprisingly isn’t in the list is security. Well, I’m guessing that it’s either that the burden of security is placed on the individual networks connected to the Internet, or it doesn’t make sense to support it at the architecture. But if these weren’t the case, then no wonder the Internet is continuously being bothered by security issues. =)

Conclusion
Clark’s article provides a fairly detailed insight on the history of the Internet’s architecture. Interestingly, the design decisions made in order to come up with what was the initial Internet architecture were fairly straightforward (well, that is how it seems in paper, at least). In software engineering, there are no hard and fast rules for making software designs, just heuristics. This might have been the case for some of the design philosophy behind the Internet’s architecture. But nobody can argue with success.

References:
[1] D.D. Clark. The Design Philosophy of the DARPA Internet Protocols. ACM SIGCOMM Computer Communication Review, Vol. 18, Issue 4. August 1988.

End-to-End Argument

Sometimes, simple arguments can lead to profound effects.

Take the case of the paper titled “End-to-End Arguments in System Design”[1], authored by J.H. Saltzer, D.P. Reed and D.D. Clark. The paper offers a guide for the placement of functions in communication systems in the form of a principle called the “end-to-end argument”.

The argument goes as follows:
“The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible. (Sometimes an incomplete version of the function provided by the communication system may be useful as a performance enhancement.)”

It is as an argument proposing the implementation of functions on the application level as much as it is an argument for discouraging (or at least regulating) the placement of functions in the lower-levels of a system. The reason is quite blunt: place functions on where they will matter, not on where they will just waste resources.

To emphasize the concept being suggested by the argument and to demonstrate the wide range of situations to which this argument applies, they’ve provided several scenarios. I’ll only discuss one of them here, since all of them more or less provide the same idea.

File Transfer Scenario
File transfer is a fairly basic service provided by networks, but this is not necessarily protected from errors. Here are some possible threats enumerated by the authors (based on the setup of two host exchanging files through a network):
1. The file, though originally written correctly onto the disk at host A, if read now may contain incorrect data, perhaps because of hardware faults in the disk storage system.
2. The software of the file system, the file transfer program, or the data communication system might make a mistake in buffering and copying the data of the file, either at host A or host B.
3. The hardware processor or its local memory might have a transient error while doing the buffering and copying, either at host A or host B.
4. The communication system might drop or change the bits in a packet, or lose a packet or deliver a packet more than once.
Given these threats, provisions for ensuring reliable file transfers must be put in place. Following the end-to-end argument, error checking should be implemented in the file transfer application on both hosts, since it is only at that level can the success of the operation can be truly verified.
The communication system may also enforce reliable data transfer using a variety of error checking functions, but this will not guarantee that the file that will arrive at the receiving end is correct. Other forms of errors may occur outside of the communication system before it arrives at its destination disk storage system. As a result, the file transfer application must still perform error checking whether the data coming from the communication system is reliable or not. Not only is the use of functions in the lower levels of the system in this case futile, they are also an instrument for wasteful consumption of resources.

Performance
The authors, however, didn’t completely rule out the implementation of functions on the lower-levels of a communication system. In some cases, particularly in performance issues, this may have beneficial effects too. For example, using functions for ensuring reliability on the lower-levels of a network may increase its performance if the frequency of retrying file transfers outweighs the delay caused by the implementation of these functions. But the authors suggest that such low-level implementations be used only for performance enhancements.

Remark
I agree on the principle being raised by the end-to-end argument. It’s a matter of avoiding the redundant use of resources, and putting functions where they only truly count. Their examples drive home the point of the argument.
I have a minor comment on their argument, though. If they can assume that faults may occur on areas that are expected to be reliable (such as disk storage systems), wouldn’t it be fair to say that the same thing could apply to the applications on the end points of a system, such as the file transfer program? In fact, the second threat in the file transfer scenario highlights this matter. Now, if a file transfer application can commit mistakes in the simple process of buffering and copying data, then much more in the act of error checking (such as the verification of checksums). In short, their end-to-end argument wouldn’t hold if the integrity of end point applications is also compromised.

Conclusion
The authors have presented an argument, which they claim to be similar to “Occam’s razor” (though their scenarios seem to be inspired by Murphy’s Law), for making decisions on the placement of functions in a system. They themselves have admitted that this is not a strict rule to be followed, but a just mere guideline. If applied to a proper context, this simple guide can provide significant benefits for a network.

References
[1] J. H. Saltzer, D.P. Reed and D. D. Clark. End-to-End Arguments in System Design. ACM Transactions in Computer Systems, Vol. 2, No. 4, pp. 277-288. November 1984.