predPhogly-Site

Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance



About

Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays a significant role in the study of cell biology. A newly invented reversible type of PTM, lysine phosphoglycerylation which affects glycolytic enzyme activities and is responsible for a wide variety of diseases, such as heart failure, arthritis, degeneration of nervous system, etc. Our motive is to characterize the annotated phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites. This system has effectively utilized the sequence-coupling information among the nearby amino acid residues of a protein sequence along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics.


Submit Sequences

Please input query protein sequences in FASTA format (help). Maximum 10 sequences can be submitted each time (example).


 


Enter your e-mail address and upload the batch input file (example). The predicted result will be sent to you by e-mail once completed.




Submission Guide

You can submit your protein sequences in FASTA format. We currently accept maximum 10 sequences for browser. If you have more than 10 sequences, you can also submit a batch file. The file must contain FASTA sequences.

NOTE: The FASTA format for the current predictor can be described as follows:

  • Each query protein must begin with a greater-than (">") symbol.
  • The identifier and description of the sequence might be placed after the ">" symbol (Optional).
  • The sequence begins in a different line and ends when a ">" appears, which indicates the start of another query protein.

Example

>Example_1(P17036)  ✅
METQADLVSQEPQALLDSALPSKVPAFSDKDSLGDEMLAAALLKAKSQELVTFEDVAVYFIRKEWKRLEP
AQRDLYRDVMLENYGNVFSLDRETRTENDQEISEDTRSHGVLLGRFQKDISQGLKFKEAYEREVSLKRPL
GNSPGERLNRKMPDFGQVTVEEKLTPRGERSEKYNDFGNSFTVNSNLISHQRLPVGDRPHKCDECSKSFN
RTSDLIQHQRIHTGEKPYECNECGKAFSQSSHLIQHQRIHTGEKPYECSDCGKTFSCSSALILHRRIHTG
EKPYECNECGKTFSWSSTLTHHQRIHTGEKPYACNECGKAFSRSSTLIHHQRIHTGEKPYECNECGKAFS
QSSHLYQHQRIHTGEKPYECMECGGKFTYSSGLIQHQRIHTGENPYECSECGKAFRYSSALVRHQRIHTG
EKPLNGIGMSKSSLRVTTELNIREST

>Example_2(P47955)  ✅
MASVSELACIYSALILHDDEVTVTEDKINALIKAAGVSVEPFWPGLFAKALANVNIGSLICNVGAGGPAP
AAGAAPAGGAAPSTAAAPAEEKKVEAKKEESEESEDDMGFGLFD

>Example_3(D3YUN8)  ✅
MAECCVPVCPRPMCIPPPYADLGKAARDIFNKGFGFGLVKLDVKTKSCSGVEFSTSGSSNTDTGKVSGTL
ETKYK

>Example_4(Wrong input)  ❌
MSILRIHAREIFDSRGNPTVEVDLYTAKGLFRAKKKKKKSTGIYEALELRDNDKTRFMGKGVSRPVKYVN
EFLAPALCTQKVNVVEEEEEEKLMIEMDGTENKSKFGAGGGGGVSLAVCKAGAVEXXXXXYRHIADLAGN
PEVILPVPAFNVINGGSHAGNKLAMQEFMILP*****FREAMRIGAEVYHNLKNVIKEKYGKDATNVGDE
GGFAPNILENK


Benchmark Dataset

In this study, verified annotations of phosphoglycerylation sites were obtained from the Compendium of Protein Lysine Modifications (CPLM) version 2.0, one of the reliable repositories of post-translational modification in lysine residue, and corresponding protein sequences were retrieved from UniProt knowledge-base for developing the prediction model. Subsequently, redundant sequences were discarded with 40% similarity cutoff using CD-HIT for avoiding bias in performance evaluation as this level of redundancy removal is widely accepted. As a result, a total of 91 non-redundant proteins were held out for constructing a benchmark dataset. There were 111 experimentally annotated phosphoglycerylated sites and 3249 non-phosphoglycerylated sites, which is identical to the most recent predictor, Bigram-PGK's dataset.

Supporting Information

The supporting information S1 and S2 will be provided upon request (contact).


Citation

If you use any data obtained from this server, please cite the following paper:

  • Ahmed S, Rahman A, Hasan MAM, Islam MKB, Rahman J, Ahmad S (2021) predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance. PLoS ONE 16(4): e0249396. https://doi.org/10.1371/journal.pone.0249396

Contact

For any query, please contact:

  • sabit.a.sirat@gmail.com
  • afrida.r.samma@gmail.com