Investigating the impact of algorithms and hardware on machine learning models in HPC systems

No Thumbnail Available
Authors
Thompson, Christian C.
Asaduzzaman, Abu
Uddin, Md Raihan
Nawar, Fairuz
Advisors
Issue Date
2025-10-16
Type
Conference Paper
Keywords
Training , Support vector machines , Accuracy , Machine learning algorithms , Graphics processing units , Random access memory , Hardware , Central Processing Unit , Resource management , Convolutional neural networks
Research Projects
Organizational Units
Journal Issue
Citation
Thompson, C. (2025, October 16). Investigating the impact of algorithms and hardware on machine learning models in HPC systems. 2025 IEEE High Performance Extreme Computing Conference (HPEC), Wakefield, MA, USA.
Abstract

The development and effectiveness of machine learning (ML) applications rely on the support from underlying computing systems. This project investigates the impact of algorithmic techniques and high-performance computing (HPC) system components on ML performance. Following standardized image data preprocessing, the Synthetic Minority Over-sampling Technique (SMOTE) is employed for class balancing, and the Recursive Feature Elimination with Cross-Validation (RFECV) technique is employed for optimal feature selection. An HPC cluster featuring hundreds of central processing unit (CPU) cores, multiple graphics processing unit (GPU) accelerators, several terabytes of random-access memory (RAM), and running the CentOS Linux distribution is used to investigate the training time and prediction accuracy of various ML models, including Support Vector Machine (SVM), Convolutional Neural Network (CNN), Random Forests (RF), and Extreme Gradient Boosting (XGBoost). Per fair-share scheduling policy, this study uses up to four CPU cores, two GPU accelerators, and 150 gigabytes of RAM. Simulation results show that the CNN model outperforms the other models. Using the top 50% of balanced features reduces the CNN model's training time significantly, up to 90.39%, with a slight increase in accuracy. Allocating four CPU cores and two GPU accelerators, rather than relying on a single CPU core without GPU support, cut the training time up to 56.18%, while maintaining comparable accuracy. The impact of hardware support on ML models can be extended to investigate how resource allocation affects ML inference time.

Table of Contents
Description
Click on the DOI link to access this conference paper at the publishers website (may not be free).
Publisher
Institute of Electrical and Electronics Engineers Inc.
Journal
Book Title
Series
2025 IEEE High Performance Extreme Computing Conference (HPEC)
PubMed ID
ISSN
2643-1971
2377-6943
EISSN