Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance


Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays a significant role in the study of cell biology. A newly invented reversible type of PTM, lysine phosphoglycerylation which affects glycolytic enzyme activities and is responsible for a wide variety of diseases, such as heart failure, arthritis, degeneration of nervous system, etc. Our motive is to characterize the annotated phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites. This system has effectively utilized the sequence-coupling information among the nearby amino acid residues of a protein sequence along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics.

Benchmark Dataset

In this study, verified annotations of phosphoglycerylation sites were obtained from the Compendium of Protein Lysine Modifications (CPLM) version 2.0, one of the reliable repositories of post-translational modification in lysine residue, and corresponding protein sequences were retrieved from UniProt knowledge-base for developing the prediction model. Subsequently, redundant sequences were discarded with 40% similarity cutoff using CD-HIT for avoiding bias in performance evaluation as this level of redundancy removal is widely accepted. As a result, a total of 91 non-redundant proteins were held out for constructing a benchmark dataset. There were 111 experimentally annotated phosphoglycerylated sites and 3249 non-phosphoglycerylated sites, which is identical to the most recent predictor, Bigram-PGK's dataset.

If you use any data obtained from this server, please cite the following paper:

  • Ahmed S, Rahman A, Hasan MAM, Islam MKB, Rahman J, Ahmad S (2021) predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance. PLoS ONE 16(4): e0249396.


