Deep learning-based approaches for prediction of post-translational modification sites in proteins
Authors
Advisors
Issue Date
Type
Keywords
Citation
Abstract
Protein post-translational modification plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of post-translational modification sites in proteins. Computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. This dissertation reports on DeepNGlyPred, a deep neural network-based approach for N-linked glycosylation sites PTM prediction and it encodes the positive and negative sequences in the human proteome dataset using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. Similarly, this dissertation presents LMNglyPred, a deep learning-based approach to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained protein language model. To efficiently explore more undiscovered ubiquitylation sites, a novel multimodal deep learning architecture tool that identifies ubiquitination sites in proteins is studied. This study proposes a novel integrated deep learning-based approach named UbiIDN, for general ubiquitination site prediction, extracts and combines sequence and physicochemical properties information. Moreover, a novel integrated deep learning-based approach named LMPhosSite, for general phosphorylation site prediction is developed. LMPhosSite extracts and combines sequence and protein language model information. Using an independent test set of experimentally identified N-linked glycosylation, ubiquitination, and phosphorylation sites the respectively developed predictors were able to outperform state-of-the-art predictors. These results demonstrate that developed predictors are a robust computational technique to predict PTM sites in proteins.