Graph sampling

  • PDF / 605,582 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 13 Downloads / 219 Views

DOWNLOAD

REPORT


Graph sampling L.-C. Zhang1,2 · M. Patone1

Received: 30 June 2017 / Accepted: 21 September 2017 / Published online: 3 October 2017 © The Author(s) 2017. This article is an open access publication

Abstract We synthesise the existing theory of graph sampling. We propose a formal definition of sampling in finite graphs, and provide a classification of potential graph parameters. We develop a general approach of Horvitz–Thompson estimation to T -stage snowball sampling, and present various reformulations of some common network sampling methods in the literature in terms of the outlined graph sampling theory. Keywords Network · Finite-graph sampling · Multiplicity sampling · Indirect sampling · Adaptive cluster sampling

1 Introduction Many technological, social and biological phenomena exhibit a network structure that may be the interest of study; see e.g. Newman [20]. As an example of technological networks, consider the Internet as consisting of routers that are connected to each other via cables. There are two types of objects, namely routers and cables. A router must be connected to a cable to be included in the Internet, and a cable must have two routers at both ends. As another example, consider the social network of kinships. Again, there are two types of objects, namely persons and kinships. Each person must have two or more kinships, and each kinship must represent a connection between two persons. However, while it is obvious that any two routers must be connected by cables to each other either directly or via other routers in the Internet, it is not sure that any two persons can be connected to each other in the network of

B

L.-C. Zhang [email protected] M. Patone [email protected]

1

Department of Social Statistics and Demography, University of Southampton, Southampton, UK

2

Statistisk sentralbyrå, Oslo, Norway

123

278

L.-C. Zhang, M. Patone

kinships. The difference can be articulated in terms of some appropriate characterisation of their respective network structures. Following Frank [11,12,14], we refer to network as a valued graph, and graph as the formal structure of a network. The structure of a network, i.e. a graph, is defined as a collection of nodes and edges (between the nodes); measures may be attached to the nodes or the edges or both to form a valued graph, i.e. a network. For a statistical approach to networks one may choose to model the entire population network as a random realisation [16], or to exploit the variation over possible sample networks taken from a given fixed population network. Graph sampling theory deals with the structure of a network under the latter perspective. In comparison, finite-population sampling [3,21] can mostly be envisaged as sampling in a ‘graph’ with no edges at all. We shall refer to such a setting as list sampling. Ove Frank has undoubtedly made the most contributions to the existing graph sampling theory. See e.g. Frank [8,10,12–14] for his own summary. However, the numerous works of Frank scatter over several decades, and are not easily appre