Reduction of Input Features from Machine Learning Datasets for Water Quality Analysis

No Thumbnail Available
Authors
Asaduzzaman, Abu
Uddin, Md Raihan
Nawal, Nowshin
Ang, Marcus
Advisors
Issue Date
2024
Type
Conference paper
Keywords
Data points , Dataset , Dimensionality reduction , Machine learning , Water quality prediction
Research Projects
Organizational Units
Journal Issue
Citation
Asaduzzaman, A., Uddin, M.R., Nawal, N., Ang, M. Reduction of Input Features from Machine Learning Datasets for Water Quality Analysis. (2024). International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2024. DOI: 10.1109/ACDSA59508.2024.10467928
Abstract

Typical water quality testing methods used in water treatment organizations are very complex, time consuming, and expensive. Because these methods require enormous amounts of input features in the datasets. Studies show that machine learning has potential to help analyze water quality. This study employs a method to reduce the number of input features applying machine learning techniques, allowing frequent water tests at a lower cost. First, recursive feature elimination with cross-validation (RFECV), permutation importance (PI), and random forest (RF) techniques are used to identify the most prominent features. Second, artificial neural network (ANN) and support vector machine (SVM) are used to evaluate that the accuracy due to the reduced features is acceptable. A dataset from Kaggle with nine features and 2011 data points is used in this study. Experimental results show that the dataset with five features produces <3% higher accuracy when compared with those using the dataset with all features. It is observed that the reduction in the input features helps decrease the cost about 65%. © 2024 IEEE.

Table of Contents
Description
Publisher
Institute of Electrical and Electronics Engineers Inc.
Journal
International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2024
Book Title
Series
2024 International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2024
1 February 2024 through 2 February 2024
PubMed ID
ISSN
EISSN