Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
FixPred: a resource for correction of erroneous protein sequences.
Nagy A
,
Patthy L
.
???displayArticle.abstract???
Protein databases are heavily contaminated with erroneous (mispredicted, abnormal and incomplete) sequences and these erroneous data significantly distort the conclusions drawn from genome-scale protein sequence analyses. In our earlier work we described the MisPred resource that serves to identify erroneous sequences; here we present the FixPred computational pipeline that automatically corrects sequences identified by MisPred as erroneous. The current version of the associated FixPred database contains corrected UniProtKB/Swiss-Prot and NCBI/RefSeq sequences from Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Danio rerio, Fugu rubripes, Ciona intestinalis, Branchostoma floridae, Drosophila melanogaster and Caenorhabditis elegans; future releases of the FixPred database will include corrected sequences of additional Metazoan species. The FixPred computational pipeline and database (http://www.fixpred.com) are easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. Database URL: http://www.fixpred.com.
Figure 2. Screen shot of an entry of the FixPred database. The figure shows the corrected version (upper part) of an erroneous protein sequence of G. gallus, deposited in the UniProtKB/SwissProt database with the protein ID: FZD3_CHICK (lower part). The FZD3_CHICK protein was identified as erroneous by MisPred tool 4 (domain size deviation) because it contains only a fragment of the Frizzled (PF01534) domain. The erroneous protein was corrected by the FixPred pipeline in Step 2 by identifying a full-length version of the frizzled-3 precursor (NP_001258869.1).
Figure 3. Correction of an erroneous protein sequence by the FixPred pipeline. (A) The upper part of the screen shot shows a H. sapiens protein sequence (NP_001184026.2, trypsin-3 isoform 3 preproprotein) that was identified as erroneous by MisPred tool 1 because it has an extracellular domain but lacks secretory signal peptide. (B) The erroneous protein was corrected by the FixPred pipeline in Step 2 by identifying a version (NP_002762.2, trypsin-3 isoform 2 preproprotein) that does not suffer from this type of error (see lower part of the screen shot).
Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997, Pubmed
Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997,
Pubmed
Benson,
GenBank.
2013,
Pubmed
Birney,
GeneWise and Genomewise.
2004,
Pubmed
Burge,
Prediction of complete gene structures in human genomic DNA.
1997,
Pubmed
Flicek,
Ensembl 2013.
2013,
Pubmed
Guigó,
EGASP: the human ENCODE Genome Annotation Assessment Project.
2006,
Pubmed
Guo,
Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication.
2012,
Pubmed
Harrow,
Identifying protein-coding genes in genomic sequences.
2009,
Pubmed
Nagy,
Reassessing domain architecture evolution of metazoan proteins: the contribution of different evolutionary mechanisms.
2011,
Pubmed
Nagy,
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.
2008,
Pubmed
,
Xenbase
Nagy,
MisPred: a resource for identification of erroneous protein sequences in public databases.
2013,
Pubmed
,
Xenbase
Nagy,
Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors.
2011,
Pubmed
Norgren,
Improving genome assemblies and annotations for nonhuman primates.
2013,
Pubmed
Prosdocimi,
Controversies in modern evolutionary biology: the imperative for error detection and quality control.
2012,
Pubmed
Pruitt,
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.
2012,
Pubmed
Rice,
EMBOSS: the European Molecular Biology Open Software Suite.
2000,
Pubmed
Stanke,
AUGUSTUS: a web server for gene finding in eukaryotes.
2004,
Pubmed
Tress,
The implications of alternative splicing in the ENCODE protein complement.
2007,
Pubmed
UniProt Consortium,
Update on activities at the Universal Protein Resource (UniProt) in 2013.
2013,
Pubmed
Zhang,
Limitations of the rhesus macaque draft genome assembly and annotation.
2012,
Pubmed