Sorting and Selection
Telephone directories are sorted alphabetically by last name. Why? Because a sorted index can be searched quickly. Even in the telephone directory of a huge city, one can usually find a name in a few seconds. In an unsorted index, nobody would even try to
- PDF / 405,849 Bytes
- 27 Pages / 439.37 x 666.142 pts Page_size
- 11 Downloads / 180 Views
Telephone directories are sorted alphabetically by last name. Why? Because a sorted index can be searched quickly. Even in the telephone directory of a huge city, one can usually find a name in a few seconds. In an unsorted index, nobody would even try to find a name. To a first approximation, this chapter teaches you how to turn an unordered collection of elements into an ordered collection, i.e., how to sort the collection. However, sorting has many other uses as well. An early example of a massive data-processing task was the statistical evaluation of census data; 1500 people needed seven years to manually process data from the US census in 1880. The engineer Herman Hollerith,1 who participated in this evaluation as a statistician, spent much of the ten years to the next census developing counting and sorting machines for mechanizing this gigantic endeavor. Although the 1890 census had to evaluate more people and more questions, the basic evaluation was finished in 1891. Hollerith’s company continued to play an important role in the development of the information-processing industry; since 1924, it has been known as International Business Machines (IBM). Sorting is important for census statistics because one often wants to form subcollections, for example, all persons between age 20 and 30 and living on a farm. Two applications of sorting solve the problem. First, we sort all persons by age and form the subcollection of persons between 20 and 30 years of age. Then we sort the subcollection by home and extract the subcollection of persons living on a farm. Although we probably all have an intuitive concept of what sorting is about, let us give a formal definition. The input is a sequence s = e1 , . . . , en of n elements. Each element ei has an associated key ki = key(ei ). The keys come from an ordered universe, i.e., there is a linear order ≤ defined on the keys.2 For ease of notation, we extend the comparison relation to elements so that e ≤ e if and only 1 2
The photograph was taken by C. M. Bell (see US Library of Congress’s Prints and Photographs Division, ID cph.3c15982). A linear order is a reflexive, transitive, and weakly antisymmetric relation. In contrast to a total order, it allows equivalent elements (see Appendix A for details).
100
5 Sorting and Selection
if key(e) ≤ key(e ). The task is to produce a sequence s = e1 , . . . , en such that s is a permutation of s and such that e1 ≤ e2 ≤ · · · ≤ en . Observe that the ordering of equivalent elements is arbitrary. Although different comparison relations for the same data type may make sense, the most frequent relations are the obvious order for numbers and the lexicographic order (see Appendix A) for tuples, strings, and sequences. The lexicographic order for strings comes in different flavors. We may treat corresponding lower-case and upper-case characters as being equivalent, and different rules for treating accented characters are used in different contexts. Exercise 5.1. Given linear orders ≤A for A and ≤B for B, define a linea
Data Loading...