read old data and parity write new data and parity - PDF

Please download to get full document.

View again

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Law

Published:

Views: 6 | Pages: 11

Extension: PDF | Download: 0

Share
Related documents
Description
From IEEE International Computer Performance and Dependability Symposium (IPDS '95), Erlangen, Germany, April 24-26, 1995, pp PERFORMANCE ANALYSIS OF THE RAID 5 DISK ARRAY Anand Kuratti and William
Transcript
From IEEE International Computer Performance and Dependability Symposium (IPDS '95), Erlangen, Germany, April 24-26, 1995, pp PERFORMANCE ANALYSIS OF THE RAID 5 DISK ARRAY Anand Kuratti and William H. Sanders Center for Reliable and High-Performance Computing Coordinated Science Laboratory University of Illinois at Urbana-Champaign 1308 W. Main St., Urbana, IL Abstract While the processor and memory performance of computers continues to improve, I/O performance has remained relatively constant. Strategies to achieve better I/O performance than current disk systems have been investigated to address the growing I/O bottleneck. One eort is the RAID (Redundant Arrays of Inexpensive Disks) Level 5 disk array. RAID 5 offers increased parallelism of I/O requests through the disk array architecture and fault tolerance through rotated parity. Although RAID 5 oers the promise of improved I/O performance, such gains must be quantied. Accurate analytical modeling can be used to quantify and predict the I/O request response time for dierent workloads. While previous analytical models to calculate the I/O request response time have been constructed, they often rely upon simplifying assumptions or computational bounds, which are accurate for certain values of the possible workload parameters. This paper presents an analytical performance model to compute the mean steady state response time of a RAID 5 I/O request under a transaction-processing workload. By carefully considering individual disk accesses, the arrival process of disk requests, and correlation between disk requests, an accurate model for a wide range of the workload parameters is developed. 1 Introduction Over the past decade, it has been recognized that as CPU and memory performance of computers continue to dramatically increase, I/O performance has improved at a much slower pace. To address the growing gap between processor/memory and I/O performance, new strategies for mass data storage have begun to be developed. One way to increase I/O performance is to use an array of disks. By interleaving data across many disks, both throughput (megabytes per second) and the I/O rate (requests per second) are improved. However with more disks, the reliability and availability of disk arrays is lower than for single disks. Because disk arrays hold the promise of improved performance, dierent ways to design and organize disk array architectures have been investigated. One eort is Redundant Arrays of Inexpensive Disks (RAID). Introduced in [1, 2], Patterson, Gibson, and Katz present ve ways to introduce redundancy into an array of disks: RAID Level 1 to RAID Level 5. For each level, data is interleaved across multiple disks, but the incorporated redundancy ranges from traditional mirroring to rotated parity. Using a simple model of maximum throughput, the authors suggest that RAID 5 with rotated parity oers the best performance potential of the organizations considered. Although it appears that RAID 5 can achieve lower I/O request response times than single disks, tools for modeling and analyzing response time under dierent operating conditions are important to be able to compare RAID 5 and current disk systems. In particular, analytical models combined with a realistic assessment of workload would allow a system architect to accurately design and predict the performance of a RAID 5 disk array. But like many parallel systems, disk arrays are dicult to model because of the eect of queuing and fork-join synchronization. Since data is placed on multiple disks, each I/O request requires several disk requests. Each disk request waits to access a disk then waits for the other disk requests to complete before the I/O request can complete. Under general conditions, exact analysis of the eect of these non-independent delays on the I/O request response time is not possible. However, approximate analysis such as computation of upper and lower bounds of the mean I/O request response time is possible by careful consideration of the characteristics of the system. This paper presents an analytical model to calculate the mean steady state response time for a RAID 5 I/O request under a transaction-processing workload. By systematically deriving the distribution of time for a disk to service a request, the arrival process of requests to individual disks in the array, and the time for all disk requests in an I/O request to complete, a model which considers both queuing and fork-join synchronization is developed. Based on this model, analytical values for the mean overall, mean read, and mean write response times for I/O requests are computed over a wide range of the workload parameters and compared to results obtained from a detailed simulation model. Previous analytical models of I/O request response time include [3, 4, 5, 6]. In particular, Lee and Katz [3] constructed a closed queuing model and compared several disk utilization approximations to results from a detailed simulation model. Although a formula for response time based on disk utilization is derived, it is based on the mean service time for a disk request which is not determined. Chen and Towsley [4, 5] modeled RAID 5 performance using queuing analysis and determined the mean overall, mean read and mean write I/O request response times. They considered write synchronization, the eect of dierent request size distributions, and disk access skewing. Thomasian and Menon [6] developed a model for I/O request response time under normal, degraded, and rebuild modes based on an M=G=1 queue with server vacations. In [4, 5, 6], the authors assume that arrivals to each disk in the array can be approximated as independent Poisson processes with a uniform rate. Although both report accurate results for the mean I/O request response time based on the workloads considered, we show how such assumptions can lead to inaccurate results when a larger range of the workload is considered. In our model, we illustrate how the workload inuences the arrival and service of disk requests. As a result, we can analytically determine the regions of the workload where such assumptions are appropriate and verify these conclusions by detailed simulation. The organization of this paper is as follows: section 2 briey describes the RAID 5 architecture, including data and parity assignment and I/O methods. Using this description, the workload and assumptions used in developing the analytical models are presented in sections 3 and 4. Section 5 develops a model for individual disk accesses, including derivation and approximation of the time to service a disk request. In section 6, we show how disk requests are distributed to individual disks given Poisson arrivals of I/O requests and a transaction processing workload. Based on the arrival process and disk service time for disk requests, section 7 demonstrates that response time for an I/O request is the maximum of the completion times of the resulting disk requests. Using the completion time for an I/O request, formulas for the mean overall, mean read, and mean write response times are derived. Section 8 presents graphs of the analytical and simulated mean I/O request response times. Finally, section 9 gives conclusions and directions for future research. 2 RAID 5 Architecture 2.1 Data and Parity Placement A RAID 5 disk array consists of N identical disks on which data is interleaved. The unit of data interleaving, or amount of data that is placed on one disk before data is placed on the next disk, is a stripe unit. Since disks are organized into rows and columns, the set of stripe units with the same physical location on each disk in a row is a stripe. Each stripe contains data stripe units and a parity stripe unit. A parity stripe unit is the exclusive-or (XOR) of all the data stripe units in the stripe. The presence of a parity stripe unit in each stripe allows a RAID 5 disk array to tolerate single disk failures in each row of disks. When a single disk in a row fails, a data stripe unit can be reconstructed by reading the corresponding data and parity stripe units from the other disks in the stripe. The number of stripe units in a stripe is dened as the stripe width (W s ), where the number of data stripe units in each stripe is W s, 1. Parity stripe units in a RAID 5 disk array are rotated, i.e. distributed uniformly across all disks in a RAID 5 disk array. This helps to balance the increased load at each disk caused by requests that update parity. Another advantage of rotated parity is that data is also distributed more evenly. A more uniform distribution of data increases the probability that a disk will participate in an I/O operation, which increases the throughput and I/O rate of the array. 2.2 I/O Methods Given a description of how data and parity are placed on a RAID 5 disk array, operations to read data from and write to the array can be dened. A read request for data results in read requests for data stripe units at individual disks. When all stripe units have been accessed, the I/O request is complete. For a write request, data stripe units must not only be written, but the corresponding parity stripe units must be updated. Depending on how much of a stripe is written and how many stripes are accessed, three dierent cases can occur. 1. If the request starts at the rst data disk in a stripe and the request size is W s, 1 data stripe units, all data stripe units in the stripe are written. This is called a full stripe write. Since the parity stripe unit is the exclusive-or of all data stripe units in the stripe and all data stripe units are written, the new parity stripe unit is generated entirely from new data. The request completes when all data and parity stripe units are written. 2. If the request consists of less than W s, 1 data stripe units and all stripe units belong to the same stripe, the stripe is partially modied. Since the new parity stripe unit is a combination of the new read old data and parity write new data and parity Figure 1: Write Synchronization and old data stripe units, the new parity is computed by reading the old data and parity stripe units, then XORing the old and new data. The request completes after the new data and parity stripe units have been written. 3. If the request accesses two or more stripes, i.e. data stripe units requested are allocated across stripe boundaries, two or more partial and/or full stripes must be updated. Because an operation on one stripe does not eect the data and parity stripe units in another stripe, a write I/O request that accesses multiple stripes is divided into several partial and/or full stripe writes. The I/O request is complete when all operations nish. A potential problem that arises during a partial stripe write is how the parity stripe unit is updated. If the old data stripe units are overwritten by new data before the new parity stripe unit is calculated, the parity stripe unit will be incorrect. To ensure that old data and parity stripe units are read before new data is written, the write request must be synchronized between read and write phases. Several methods have been proposed to ensure synchronization [7]. For example, Chen and Towsley [4] model the disk-rst with priority scheme, in which parity requests are issued when the data accesses reach the head of the disk queues, but the parity access is given higher priority than non-parity accesses at the same disk. In our model shown in Figure 1, synchronization is ensured by waiting for all reads for old data and parity, then issuing the write requests for the new data and parity. As opposed to [4], this method incurs higher response time for the parity update requests by not using priority, but reduces the response times for other requests at the same disk. 3 System Workload The workload we consider is determined by the size and arrival pattern of I/O requests. Since the arrivals of I/O requests to the disk array depend on the characteristics of the application(s) which read from and write data to the array, it is impossible to give a general model for the arrival of I/O requests. However, for many applications, the stream of I/O requests can be approximated by a Poisson process. In [4, 5, 6], they also assume that the arrival of I/O requests is Poisson but further assume that arrivals to each disk are also Poisson and independent with a uniform rate. In this paper, we assume that the arrival of I/O requests is Poisson but compute the arrival process of requests to individual disks in section 6. The second component of the system workload is the size of an I/O request. For many applications, request sizes can be classied as either supercomputerbased, where requests are large and infrequent, or transaction-based, where small amounts of data are frequently accessed. For this work, it is assumed that requests are transaction-based, where the number of data stripe units requested is less than or equal to the number of data stripe units in a stripe, W s, 1. A distribution which reects this type of workload is a quasi-geometric distribution [4, 5]: n =1; P fn = ng = (1,) (1, ) n,1 (1,),(1,) Ws,1 n =2;:::;Ws, 1 ; where N is a random variable representing the request size (1 n W s, 1), is the probability that one data stripe unit is requested (0 1), and is the parameter of the geometric distribution (0 1). Since we assume that the maximum number of data stripe units in an I/O request is W s, 1 and a request for data can overlap stripe boundaries, at most two stripes can be accessed during any I/O request. 4 Model Overview Using a description of a RAID 5 disk array, including data and parity mapping, I/O methods and system workload, we can develop a model to compute the mean response time of a RAID 5 I/O request. In doing so, the following assumptions are made: 1. Although our approach works with any RAID 5 conguration, we assume that the array contains 20 disks with a stripe width of 5 disks and a stripe unit size of 4 KB to obtain numerical results. 2. For many transaction-processing workloads, such as scientic databases, a majority of requests are queries to read data. The ratio of reads to writes for such systems is typically 2 or 3 to 1. In this paper, the probabilities for read and write requests are assumed to be 0.7 and Each disk can service only one request at a time. Other requests wait and are serviced in rst come- rst served (FCFS) order. 4. The arrival of I/O requests to the system is Poisson with rate. 5. I/O requests access data throughout the disk array in a uniform pattern. Since an I/O request requires multiple stripe units, this means that the starting stripe unit is assumed to be random and that each disk in the array is equally likely to contain the starting stripe unit in a request. 6. Since the focus of this paper is the performance of the array,we assume that the response time for I/O requests is only aected by the disk subsystem, i.e. memory and data paths are fast enough to have little impact on the response time of a request. Based on a description of the data and parity placement, I/O methods, workload, and model assumptions, we can construct a model to calculate the response time for a RAID 5 I/O request. Recall that an I/O request generates several disk requests. Each request may wait for several disk requests to complete before service for that request begins. Once the data for a request has been located and transfered to memory, itmust wait for the remaining disk requests in the I/O request to complete before the I/O request can complete. In the following sections, we will analyze each of the non-independent delays that contribute to the response time of an I/O request. To do this, we rst develop a model of the time for a disk to service a disk request. Second, based on the Poisson arrival of I/O requests, we derive the arrival process of requests to individual disks and the probability that a disk is accessed by an I/O request, subject to the request size distribution. Third, using the arrival process and service time for a disk, the completion time for a disk request is viewed as the virtual waiting time of a queue with the same arrival and service time distribution. Finally, we compute the response time for an I/O request as the maximum of the dependent completion times of the accessed disks. 5 Disk Model Since an I/O request is composed of multiple disk requests, a model of the I/O request response time must consider the time for individual disk accesses. Although a disk access involves many complex electrical and mechanical interactions, the seek time, rotational latency, and transfer time dominate the time to service a disk request. At a high level, the time for a disk to service a request can be viewed as the time to locate the correct track and sector where the data is located and the time to transfer the data to memory. To precisely dene the time needed for disk service, let S, R, T, and Y be a set of random variables representing seek time, rotational latency, transfer time and disk service time. As stated above, a disk request involves locating the correct track and sector, then transferring the data to memory, ory = S + R + T. Because the track location is independent of sector placement, density of Y, f Y (y), can be written as a maximum disk rotation time (R max ) 16.7 ms number of disk cylinders (C) 1200 total disk storage 500 MB single cylinder seek time (a, S min ) 3ms average seek time ( S) 11 ms maximum stroke time (S max ) 18 ms sustained transfer rate (T r ) 3 MB/s Table 1: Assumed Disk Parameters convolution f S (s) f R (r) f T (t). Using previous results for the probability densities (pdf) for seek time, rotational latency, and transfer time, we can evaluate the convolution to determine the pdf of the disk service time. 5.1 Seek Time In considering a model for seek time, there exists a possibility that the disk arm does not move during a disk access. This is called the sequential access probability p s [4, 5]. This probability changes according to the data access pattern for disk requests and the disk scheduling policy. When requests access data in uniformly and the disk scheduling is rst-come, rstserved, the disk arm tends to move toany other cylinder with equal probability during a request. Given these assumptions, the pdf of seek distance can be expressed as [4, 5] ( ps i =0 P fd = ig = (1, p s ) n 2(C,i) C(C,1) o i =1; 2;:::;C, 1 ; where C is the total number of disk cylinders. To determine a relationship of seek time to seek distance, Chen and Katz [8] empirically derive a formula for the disk prole, s a + b p i, i 0, where s is the seek time, i is the seek distance in cylinders, a is the arm acceleration time, and b is the seek factor of the disk. In terms of the parameters listed in table 1, a is equivalent to the single cylinder seek time. The seek factor b can be expressed as (,10a+15S,5S max )=3 p C where a is the single cylinder seek time, S is the average seek time, and Smax is the maximum seek time [3]. Because seek time is a function of seek distance, the seek time pdf can be written as a transformation of the seek distance pdf ( 0 s =0 f S (s) = (1,p s)2[c,((s,a)=b) 2 ] C(C,1) 0 s S max : 5.2 Rotational Latency Under a variety of workloads and disk scheduling policies, rotational latency is commonly assumed to be uniformly distributed in [0;R max ], where R max is the time for a full disk rotation. However, in [9], the authors note that the uniform distribution is an accurate approximation for rotational latency when requests at a disk are independent. Since we assume that the starting data stripe unit of each I/O request is random, the locations of data for successive requests at a disk will tend be scattered across the disk. Thus, we use the uniform distribution to model the rotational latency of a disk. 5.3 Transfer Time Once the location for the data has been determined, the data is transferred to memory. Because each d
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks