PageRank Algorithm

  • PDF / 3,267,165 Bytes
  • 65 Pages / 547.087 x 737.008 pts Page_size
  • 31 Downloads / 237 Views

DOWNLOAD

REPORT


P

P

P2P 2001; Stoica, Morris, Karger, Kaashoek, Balakrishnan DAHLIA MALKHI Microsoft, Silicon Valley Campus, Mountain View, CA, USA Keywords and Synonyms Peer to peer; Overlay; Overlay network; DHT; Distributed hash table; CDN; Content delivery network; File sharing; Resource sharing Problem Definition This problem is concerned with efficiently designing a serverless infrastructure for a federation of hosts to store, index and locate information, and for efficient data dissemination among the hosts. The key services of peer-topeer (P2P) overlay networks are: 1. A keyed lookup protocol locates information at the server(s) that hold it. 2. Data store, update and retrieve operations maintain a distributed persistent data repository. 3. Broadcast and multicast support information dissemination to multiple recipients. Because of their symmetric, serverless nature, these networks are termed P2P networks. Below, we often refer to hosts participating in the network as peers. The most influential mechanism in this area is consistent hashing, pioneered in a paper by Karger et al. [21]. The idea is roughly the following. Frequently, a good way of arranging a lookup directory is a hash table, giving a fast O(1)-complexity data access. In order to scale and provide highly available lookup services, we partition the hash table and assign different chunks to different servers. So, for example, if the hash table has entries 1 through n, and there are k participating servers, we can have each server select a virtual identifier from 1 to n at random. Server i

will then be responsible for key values that are closer to i than to any other server identifier. With a good randomization of the hash keys, we can have a more or less balanced distribution of information between our k servers. In expectation, each server will be responsible for (n/k) keys. Furthermore, the departure/arrival of a server perturbs only one or two other servers with adjacent virtual identifiers. A network of servers that implement consistent hashing is called a distributed hash table (DHT). Many current-generation resource sharing networks, and virtually all academic research projects in the area, are built around a DHT idea. The challenge in maintaining DHTs is two-fold: Overlay routing Given a hash key i, and starting from any node r in the network, the problem is to find the server s whose key range contains i. The key name i bears no relation to any real network address, such as the IP address of a node, and therefore we cannot use the underlying IP infrastructure to locate s. An overlay routing network links the nodes, and provides them with a routing protocol, such that r can route toward s using the routing target i. Dynamic maintenance DHTs must work in a highly dynamic environment in which the size of the network is not known a priori, and where there are no permanent servers for maintaining either the hash function or the overlay network (all servers are assumed to be ephemeral). This is especially acute in P2P settings, where the servers are transient users wh