Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
DextMP: deep dive into text for predicting moonlighting proteins.
Khan IK
,
Bhuiyan M
,
Kihara D
.
???displayArticle.abstract???
Motivation: Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database.
Results: DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis , and found that about 2.5-35% of the proteomes are potential MPs.
Availability and Implementation: Code available at http://kiharalab.org/DextMP .
Contact: dkihara@purdue.edu.
Fig. 1. Distribution of the number of abstracts per protein. Black, MP; gray, non-MP in the control dataset. The first bar is for 1 and 2 abstracts, next bar is for 3 and 4 and so on
Fig. 2. Schematic diagram of DextMP. The upper panel shows the text prediction process while the bottom panel is for the prediction model that uses predicted text labels to make the final MP/non-MP classification. P1, Protein 1, CL: Class Label
Fig. 3. Word clouds of text information of moonlighting protein dataset. The size of a word in the visualization is proportional to the number of times the word appears in the input text. (A–C): titles, function descriptions and abstracts, respectively. The images were generated at http://www.wordle.net/
Fig. 4. Protein-level cross-validation F-scores for weighted and non-weighted majority votes. Results for 21 (text type)-(language model)-(classifier) combinations are compared
Campbell,
Endocrine peptides 'moonlighting' as immune modulators: roles for somatostatin and GH-releasing factor.
1995, Pubmed
Campbell,
Endocrine peptides 'moonlighting' as immune modulators: roles for somatostatin and GH-releasing factor.
1995,
Pubmed
Chapple,
Extreme multifunctional proteins identified from a human protein interaction network.
2015,
Pubmed
Dotan-Cohen,
Biological process linkage networks.
2009,
Pubmed
Gómez,
Do protein-protein interaction databases identify moonlighting proteins?
2011,
Pubmed
Gómez,
Do current sequence analysis algorithms disclose multifunctional (moonlighting) proteins?
2003,
Pubmed
Han,
Structural separation of different extracellular activities in aminoacyl-tRNA synthetase-interacting multi-functional protein, p43/AIMP1.
2006,
Pubmed
Hawkins,
Function prediction of uncharacterized proteins.
2007,
Pubmed
Hawkins,
Enhanced automated function prediction using distantly related sequences and contextual association by PFP.
2006,
Pubmed
Hawkins,
Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP.
2010,
Pubmed
Hawkins,
PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.
2009,
Pubmed
Hernández,
MultitaskProtDB: a database of multitasking proteins.
2014,
Pubmed
Huberts,
Moonlighting proteins: an intriguing mode of multitasking.
2010,
Pubmed
Jeffery,
Moonlighting proteins: old proteins learning new tricks.
2003,
Pubmed
Jeffery,
Moonlighting proteins.
1999,
Pubmed
Johnson,
Strand exchange protein 1 from Saccharomyces cerevisiae. A novel multifunctional protein that contains DNA strand exchange and exonuclease activities.
1991,
Pubmed
Kanehisa,
KEGG: kyoto encyclopedia of genes and genomes.
2000,
Pubmed
Khan,
Computational characterization of moonlighting proteins.
2014,
Pubmed
Khan,
Genome-scale prediction of moonlighting proteins using diverse protein association information.
2016,
Pubmed
Khan,
Evaluation of function predictions by PFP, ESG,and PSI-BLAST for moonlighting proteins.
2012,
Pubmed
Khan,
Genome-scale identification and characterization of moonlighting proteins.
2014,
Pubmed
Käslin,
A multifunctional exonuclease from vegetative Schizosaccharomyces pombe cells exhibiting in vitro strand exchange activity.
1994,
Pubmed
Low,
Regulation of glycolysis via reversible enzyme binding to the membrane protein, band 3.
1993,
Pubmed
Mani,
MoonProt: a database for proteins that are known to moonlight.
2015,
Pubmed
Marzban,
Earth before life.
2014,
Pubmed
Nègre,
SPODOBASE: an EST database for the lepidopteran crop pest Spodoptera.
2006,
Pubmed
Piatigorsky,
Enzyme/crystallins: gene sharing as an evolutionary strategy.
1989,
Pubmed
Pritykin,
Genome-Wide Detection and Analysis of Multifunctional Genes.
2015,
Pubmed
Rachlin,
Biological context networks: a mosaic view of the interactome.
2006,
Pubmed
Scheerer,
Structural basis for catalytic activity and enzyme polymerization of phospholipid hydroperoxide glutathione peroxidase-4 (GPx4).
2007,
Pubmed
Schlicker,
A new measure for functional similarity of gene products based on Gene Ontology.
2006,
Pubmed
Stallmeyer,
The neurotransmitter receptor-anchoring protein gephyrin reconstitutes molybdenum cofactor biosynthesis in bacteria, plants, and mammalian cells.
1999,
Pubmed
UniProt Consortium,
Activities at the Universal Protein Resource (UniProt).
2014,
Pubmed
Weaver,
Telomeres: moonlighting by DNA repair proteins.
1998,
Pubmed
Wei,
NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.
2017,
Pubmed
Wistow,
Lens protein expression in mammals: taxon-specificity and the recruitment of crystallins.
1991,
Pubmed
Wool,
Extraribosomal functions of ribosomal proteins.
1996,
Pubmed