Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance
Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays a significant role in the study of cell biology. A newly invented reversible type of PTM, lysine phosphoglycerylation which affects glycolytic enzyme activities and is responsible for a wide variety of diseases, such as heart failure, arthritis, degeneration of nervous system, etc. Our motive is to characterize the annotated phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites. This system has effectively utilized the sequence-coupling information among the nearby amino acid residues of a protein sequence along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics.
Enter your e-mail address and upload the batch input file (example). The predicted result will be sent to you by e-mail once completed.
You can submit your protein sequences in FASTA format. We currently accept maximum 10 sequences for browser. If you have more than 10 sequences, you can also submit a batch file. The file must contain FASTA sequences.
NOTE: The FASTA format for the current predictor can be described as follows:
>Example_1(P17036) ✅ METQADLVSQEPQALLDSALPSKVPAFSDKDSLGDEMLAAALLKAKSQELVTFEDVAVYFIRKEWKRLEP AQRDLYRDVMLENYGNVFSLDRETRTENDQEISEDTRSHGVLLGRFQKDISQGLKFKEAYEREVSLKRPL GNSPGERLNRKMPDFGQVTVEEKLTPRGERSEKYNDFGNSFTVNSNLISHQRLPVGDRPHKCDECSKSFN RTSDLIQHQRIHTGEKPYECNECGKAFSQSSHLIQHQRIHTGEKPYECSDCGKTFSCSSALILHRRIHTG EKPYECNECGKTFSWSSTLTHHQRIHTGEKPYACNECGKAFSRSSTLIHHQRIHTGEKPYECNECGKAFS QSSHLYQHQRIHTGEKPYECMECGGKFTYSSGLIQHQRIHTGENPYECSECGKAFRYSSALVRHQRIHTG EKPLNGIGMSKSSLRVTTELNIREST
>Example_2(P47955) ✅ MASVSELACIYSALILHDDEVTVTEDKINALIKAAGVSVEPFWPGLFAKALANVNIGSLICNVGAGGPAP AAGAAPAGGAAPSTAAAPAEEKKVEAKKEESEESEDDMGFGLFD
>Example_3(D3YUN8) ✅ MAECCVPVCPRPMCIPPPYADLGKAARDIFNKGFGFGLVKLDVKTKSCSGVEFSTSGSSNTDTGKVSGTL ETKYK
>Example_4(Wrong input) ❌ MSILRIHAREIFDSRGNPTVEVDLYTAKGLFRAKKKKKKSTGIYEALELRDNDKTRFMGKGVSRPVKYVN EFLAPALCTQKVNVVEEEEEEKLMIEMDGTENKSKFGAGGGGGVSLAVCKAGAVEXXXXXYRHIADLAGN PEVILPVPAFNVINGGSHAGNKLAMQEFMILP*****FREAMRIGAEVYHNLKNVIKEKYGKDATNVGDE GGFAPNILENK
In this study, verified annotations of phosphoglycerylation sites were obtained from the Compendium of Protein Lysine Modifications (CPLM) version 2.0, one of the reliable repositories of post-translational modification in lysine residue, and corresponding protein sequences were retrieved from UniProt knowledge-base for developing the prediction model. Subsequently, redundant sequences were discarded with 40% similarity cutoff using CD-HIT for avoiding bias in performance evaluation as this level of redundancy removal is widely accepted. As a result, a total of 91 non-redundant proteins were held out for constructing a benchmark dataset. There were 111 experimentally annotated phosphoglycerylated sites and 3249 non-phosphoglycerylated sites, which is identical to the most recent predictor, Bigram-PGK's dataset.
The supporting information S1 and S2 will be provided upon request (contact).
If you use any data obtained from this server, please cite the following paper:
For any query, please contact: