Loading...
Thumbnail Image
Publication

SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model

Palacios, Andrew Vargas
Acharya, Pujan
Peidl, Anthony Stephen
Beck, Moriah R.
Blanco, Eduardo
Mishra, Avdesh
Bawa-Khalfe, Tasneem
Pakhrin, Subash C.
Research Projects
Organizational Units
Journal Issue
Citation
Palacios, A.V., Acharya, P., Peidl, A.S., Beck, M.R., Blanco, E., Mishra, A., Bawa-Khalfe, T., & Pakhrin, S.C. (2024). "SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model." NAR Genomics and Bioinformatics, 6(1). https://doi.org/10.1093/nargab/lqae011
Abstract
SUMOylation is an essential post-translational modification system with the ability to regulate nearly all aspects of cellular physiology. Three major paralogues SUMO1, SUMO2 and SUMO3 form a covalent bond between the small ubiquitin-like modifier with lysine residues at consensus sites in protein substrates. Biochemical studies continue to identify unique biological functions for protein targets conjugated to SUMO1 versus the highly homologous SUMO2 and SUMO3 paralogues. Yet, the field has failed to harness contemporary AI approaches including pre-trained protein language models to fully expand and/or recognize the SUMOylated proteome. Herein, we present a novel, deep learning-based approach called SumoPred-PLM for human SUMOylation prediction with sensitivity, specificity, Matthew's correlation coefficient, and accuracy of 74.64%, 73.36%, 0.48% and 74.00%, respectively, on the CPLM 4.0 independent test dataset. In addition, this novel platform uses contextualized embeddings obtained from a pre-trained protein language model, ProtT5-XL-UniRef50 to identify SUMO2/3-specific conjugation sites. The results demonstrate that SumoPred-PLM is a powerful and unique computational tool to predict SUMOylation sites in proteins and accelerate discovery.
Table of Contents
Description
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Publisher
Oxford University Press
Journal
Book Title
Series
NAR Genomics and Bioinformatics
vol.6, no. 1
Digital Collection
Finding Aid URL
Use and Reproduction
Archival Collection
PubMed ID
DOI
ISSN
2631-9268
EISSN
Embedded videos