Application of machine learning models and feature engineering to predict genomic phenomena
Abstract
This dissertation investigates two topics towards designing tiny machine learningbased
intelligent devices for biomedical applications. In the first section, feature engineering
and machine learning models are developed to predict two genomic phenomena: gene mutation
and differential gene expression. A hypothesis that features encoding interactions
between genes will improve gene mutation prediction performance is proposed. To test this,
additional training features are engineered from protein-coding gene cofunctional networks
and combined with a mutation dataset of E. coli exposed to different conditions. Also,
a feature-selection algorithm based on gene cofunctional networks is presented. Then, a
support vector classifier, an artificial neural network, and an ensemble of both models are
trained to predict gene mutation using the extended dataset. A sequential mutation modeling
approach to predicting gene mutation is also presented. In addition, the prediction
of differentially expressed genes (DEGs) when exposed to conditions in space from a set
of diverse engineered features is investigated. DEGs and non-differentially expressed genes
(NDEGs) of house mouse (Mus musculus)-based experiments are collected and a unique feature
engineering procedure is proposed to generate key training features for machine learning
models. The results show that the proposed feature engineering procedure generates features
that boost the gene mutation prediction performance by a maximum of 8.74% in the receiver
operating characteristics curve (AUC). Additionally, the generated features in the prediction
of DEGs achieve a maximum and minimum AUC of 0.97 and 0.74, respectively.
In the second work, magnetic induction-based communication and powering are demonstrated
via simulation for a microscale mote. Then, low-power modulation, error-correction
coding, and suitable low-power media access control (MAC) schemes with evidence of feasible
implementation in microscale are explored. Results of the performance analysis indicate
that the proposed design achieves communication at a range of at least a few centimeters
(5 - 6 cm) with an acceptable bit error rate (BER). Finally, MAC layer analysis reveals the
optimum number of motes to be deployed for various read delays and transmission rates.
Description
Thesis (Ph.D.)-- Wichita State University, College of Engineering, Dept. of Electrical and Computer Engineering