Publication

LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model

Pakhrin, Subash C.
Pokharel, Suresh
Aoki-Kinoshita, Kiyoko F.
Beck, Moriah R.
Dam, Tarun K.
Caragea, Doina
KC, Dukka B.
Citations
Altmetric:
Other Names
Location
Time Period
Advisors
Original Date
Digitization Date
Issue Date
2023-05-01
Type
Article
Genre
Keywords
Deep learning,N-linked glycosylation,Post-translation modification,Prediction,Protein language model
Subjects (LCSH)
Research Projects
Organizational Units
Journal Issue
Citation
Subash C Pakhrin, PhD and others, LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model, Glycobiology, Volume 33, Issue 5, May 2023, Pages 411-422, https://doi.org/10.1093/glycob/cwad033
Abstract
Protein N-linked glycosylation is an important post-translational mechanism in Homo sapiens, playing essential roles in many vital biological processes. It occurs at the N-X-[S/T] sequon in amino acid sequences, where X can be any amino acid except proline. However, not all N-X-[S/T] sequons are glycosylated; thus, the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In this regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem that has not been extensively addressed by the existing methods, especially in regard to the creation of negative sets and leveraging the distilled information from protein language models (pLMs). Here, we developed LMNglyPred, a deep learning-based approach, to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained pLM. LMNglyPred produces sensitivity, specificity, Matthews Correlation Coefficient, precision, and accuracy of 76.50, 75.36, 0.49, 60.99, and 75.74 percent, respectively, on a benchmark-independent test set. These results demonstrate that LMNglyPred is a robust computational tool to predict N-linked glycosylation sites confined to the N-X-[S/T] sequon.
Table of Contents
Description
Click on the DOI to access this article (may not be free).
Publisher
Oxford University Press
Journal
Book Title
Series
Glycobiology
Volume 33, No. 5
Digital Collection
Finding Aid URL
Use and Reproduction
Archival Collection
PubMed ID
DOI
ISSN
1460-2423
EISSN
Embedded videos