Reduction of Input Features from Machine Learning Datasets for Water Quality Analysis
Authors
Advisors
Issue Date
Type
Keywords
Citation
Abstract
Typical water quality testing methods used in water treatment organizations are very complex, time consuming, and expensive. Because these methods require enormous amounts of input features in the datasets. Studies show that machine learning has potential to help analyze water quality. This study employs a method to reduce the number of input features applying machine learning techniques, allowing frequent water tests at a lower cost. First, recursive feature elimination with cross-validation (RFECV), permutation importance (PI), and random forest (RF) techniques are used to identify the most prominent features. Second, artificial neural network (ANN) and support vector machine (SVM) are used to evaluate that the accuracy due to the reduced features is acceptable. A dataset from Kaggle with nine features and 2011 data points is used in this study. Experimental results show that the dataset with five features produces <3% higher accuracy when compared with those using the dataset with all features. It is observed that the reduction in the input features helps decrease the cost about 65%. © 2024 IEEE.
Table of Contents
Description
Publisher
Journal
Book Title
Series
1 February 2024 through 2 February 2024