Integrated replication and scheduling in Data Grids with performance guarantee

Loading...
Thumbnail Image
Authors
Anikode, Lakshmi Ravi
Advisors
Tang, Bin
Issue Date
2011-05
Type
Thesis
Keywords
Research Projects
Organizational Units
Journal Issue
Citation
Abstract

Data Grid consists of geographically distributed computing and storage resources that are used in large scale scientific applications such as high energy physics, bioinformatics, climate modeling. Scheduling and Replication are two well-known techniques to boost the performance of Data Grid. There has been research on integrating both the techniques in Data Grids to improve performance. However, most of the work is heuristic based. In their work, data replication is used to minimize the file transfer time thus total job execution time of all the sites, while scheduling is used to minimize the maximum job execution time (so called makespan) among all the sites. We propose to utilize both data replication and job scheduling to minimize the total job execution time in Data Grid, and formulate our Data Replication and Job Scheduling Problem. Unlike previous work, our problem seamlessly integrates both techniques into one framework. This problem is NP-hard. We first propose a Job Scheduling and Data Replication algorithm whose performance is provable theoretically, and which also dramatically reduces time complexity compared to that of the optimal algorithm. We then design a series of heuristic algorithms to further reduce the time complexity of our Job Scheduling and Data Replication algorithm. Using simulations, we demonstrate that the heuristic algorithms perform comparably to the Job Scheduling and Data Replication algorithm.

Table of Contents
Description
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science.
Publisher
Wichita State University
Journal
Book Title
Series
PubMed ID
DOI
ISSN
EISSN