Date of Award


Document Type

Union College Only


Computer Science

First Advisor

Nicholas Webb




machine learning, linear regression, cloud computing, google cluster trace, data analysis


Cloud computing provides various types of computing utilities where clients pay for services depending on their requirements and consumption. The resources used by machines in a cloud datacenter oscillate over time, which can affect application performance. As the number of users for cloud computing has increased significantly, load balancing has become a crucial factor cloud service providers need to consider. Load balancing is a technique for dynamically shifting tasks between machines in order to reduce the workload of machines whose capacities are reached and increase the workload of machines that are idle. Load balancing can be improved when used with accurate predictions of future resource usage. Such predictions help identify potentially overloaded machines before it occurs. I present a Machine Learning-based approach to evaluate how accurately we can predict the future workload of these machines, using Google Cluster Trace 2011. The results show that a simple Machine Learning model like Linear Regression can improve the accuracy of workload prediction for the next 5 minutes by 7.5%, the next 10 minutes by 6%, and the next 15 minutes by 3.5% over that of a baseline method.



Rights Statement

In Copyright - Educational Use Permitted.