Data replication in data intensive scientific applications with performance guarantee

Loading...
Thumbnail Image
Authors
Nukarapu, Dharma Teja
Advisors
Tang, Bin
Issue Date
2009-12
Type
Thesis
Keywords
Research Projects
Organizational Units
Journal Issue
Citation
Abstract

Data replication is well adopted in data intensive scientific applications to reduce the data file transfer time and the bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, is proved to be NP-hard and even non-approximable. Previous research in this field are either theoretical investigations without practical consideration, or heuristics-based with little or no theoretical background. In this paper, we propose a data replication algorithm which not only has provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a replication technique which reduces the total job execution time at least half of that obtained from the optimal solution. Our centralized replication algorithm is amenable to distributed implementation, which can be easily adopted in a distributed environment such as the Data Grid. We have done extensive simulations to validate the proposed replication algorithms. Using our own simulator, we show that the centralized greedy replication algorithm performs comparably to the optimal algorithm under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed replication technique significantly outperforms an existing replication technique; moreover, it is more adaptive to the dynamic change of file access pattern in Data Grids.

Table of Contents
Description
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science
Publisher
Wichita State University
Journal
Book Title
Series
PubMed ID
DOI
ISSN
EISSN