Application of machine learning models and feature engineering to predict genomic phenomena
This dissertation investigates two topics towards designing tiny machine learningbased intelligent devices for biomedical applications. In the first section, feature engineering and machine learning models are developed to predict two genomic phenomena: gene mutation and differential gene expression. A hypothesis that features encoding interactions between genes will improve gene mutation prediction performance is proposed. To test this, additional training features are engineered from protein-coding gene cofunctional networks and combined with a mutation dataset of E. coli exposed to different conditions. Also, a feature-selection algorithm based on gene cofunctional networks is presented. Then, a support vector classifier, an artificial neural network, and an ensemble of both models are trained to predict gene mutation using the extended dataset. A sequential mutation modeling approach to predicting gene mutation is also presented. In addition, the prediction of differentially expressed genes (DEGs) when exposed to conditions in space from a set of diverse engineered features is investigated. DEGs and non-differentially expressed genes (NDEGs) of house mouse (Mus musculus)-based experiments are collected and a unique feature engineering procedure is proposed to generate key training features for machine learning models. The results show that the proposed feature engineering procedure generates features that boost the gene mutation prediction performance by a maximum of 8.74% in the receiver operating characteristics curve (AUC). Additionally, the generated features in the prediction of DEGs achieve a maximum and minimum AUC of 0.97 and 0.74, respectively. In the second work, magnetic induction-based communication and powering are demonstrated via simulation for a microscale mote. Then, low-power modulation, error-correction coding, and suitable low-power media access control (MAC) schemes with evidence of feasible implementation in microscale are explored. Results of the performance analysis indicate that the proposed design achieves communication at a range of at least a few centimeters (5 - 6 cm) with an acceptable bit error rate (BER). Finally, MAC layer analysis reveals the optimum number of motes to be deployed for various read delays and transmission rates.