Credit scheduling and prefetching in hypervisors using Hidden Markov Models
The advances in storage technologies like storage area networking, virtualization of servers and storage have revolutionized the storage of the explosive data of modern times. With such technologies, resource consolidation has become an increasingly easy task to accomplish which has in turn simplified the access of remote data. Recent researches in hardware has boosted the capacity of drives and the hard disks have become very inexpensive than before. However, with such an increase in the storage technologies, there come some bottlenecks in terms of performance and interoperability. When it comes to virtualization, especially server virtualization, there will be a lot of guest operating systems running on the same hardware. Hence, it is very important to ensure each guest is scheduled at the right time and decrease the latency of data access. There are various hardware advances that have made prefetching of data into the cache easy and efficient. But, however, interoperability between vendors must be assured and more efficient algorithms need to be developed for these purposes. In virtualized environments where there can be hundreds of virtual machines running, very good scheduling algorithms need to be developed in order to reduce the latency and the wait time of the virtual machines in run queue. The current algorithms are more oriented in providing fair access to the virtual machines and are not very concerned about reducing the latency. This can be a major bottleneck in time critical applications like scientific applications that have now started deploying SAN technologies to store the explosive data. Also, when data needs to be extracted from these storage arrays to vii analyze and process them, the latency of a read operation has to be reduced in order to improve the performance. The research done in this thesis aims to reduce the scheduling delay in a XEN hypervisor and also to reduce the latency of reading data from the disk using Hidden Markov Models (HMM). The scheduling and prefetching scenarios are modeled using a Gaussian and a Discrete HMM and the latency involved is evaluated. The HMM is a statistical analysis technique used to classify and predict data that has a repetitive pattern over time. The results show that using a HMM decreases the scheduling and access latencies involved. The proposed technique is mainly intended for virtualization scenarios involving hypervisors and storage arrays. Various patterns of data access involving different ratios of reads and writes are considered and a discrete HMM (DHMM) is used to prefetch the next most probable block of data that might be read by a guest. Also, a Gaussian HMM is used to classify the arrival time of the requests in a XEN hypervisor and the GHMM is incorporated with the credit scheduler used in order to reduce the scheduling latency. The results are numerically evaluated and found that scheduling the virtual machines (domains) at the correct time indeed decreases the waiting times of the domains in the run queue.