Deterministic Searching on the Line

  • PDF / 3,056,511 Bytes
  • 48 Pages / 547.087 x 737.008 pts Page_size
  • 75 Downloads / 203 Views

DOWNLOAD

REPORT


D

D

Data Migration 2004; Khuller, Kim, Wan YOO-AH KIM Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA Keywords and Synonyms File transfers; Data movements Problem Definition The problem is motivated by the need to manage data on a set of storage devices to handle dynamically changing demand. To maximize utilization, the data layout (i. e., a mapping that specifies the subset of data items stored on each disk) needs to be computed based on disk capacities as well as the demand for data. Over time as the demand for data changes, the system needs to create new data layout. The data migration problem is to compute an efficient schedule for the set of disks to convert an initial layout to a target layout. The problem is defined as follows. Suppose that there are N disks and  data items, and an initial layout and a target layout are given (see Fig. 1a for an example). For each item i, source disks Si is defined to be a subset of disks which have item i in the initial layout. Destination disks Di is a subset of disks that want to receive item i. In other words, disks in Di have to store item i in the target layout but do not have to store it in the initial layout. Figure 1b shows the corresponding Si and Di . It is assumed that S i ¤ ; and D i ¤ ; for each item i. Data migration is the transfer of data to have all Di receive data item i residing in Si initially, and the goal is to minimize the total amount of time required for the transfers. Assume that the underlying network is fully connected and the data items are all the same size. In other words, it takes the same amount of time to migrate an item from one disk to another. Therefore, migrations are performed

Data Migration, Figure 1 Left An example of initial and target layout and right their corresponding Si ’s and Di ’s

in rounds. Consider the half-duplex model, where each disk can participate in the transfer of only one item – either as a sender or as a receiver. The objective is to find a migration schedule using the minimum number of rounds. No bypass nodes1 can be used and therefore all data items are sent only to disks that desire them. Key Results Khuller et al. [11] developed a 9.5-approximation for the data migration problem, which was later improved to 6:5 + o(1). In the next subsection, the lower bounds of the problem are first examined. Notations and Lower Bounds 1. Maximum in-degree (ˇ): Let ˇ j be the number of data items that a disk j has to receive. In other words, ˇ j = jfij j 2 D i gj. Then ˇ = max j ˇ j is a lower bound on the optimal as a disk can receive only one data item in one round. 2. Maximum number of items that a disk may be a source or destination for (˛): For each item i, at least one disk in Si should be used as a source for the item, and this disk is called a primary source. A unique primary source s i 2 S i for each item i that minimizes 1 A bypass node is a node that is not the target of a move operation, but is used as an intermediate holding point for a data item.

217

218

D

Data Migration