- Research article
- Open Access
Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome
© Buza et al; licensee BioMed Central Ltd. 2007
- Received: 07 August 2007
- Accepted: 19 November 2007
- Published: 19 November 2007
The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned.
We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology), we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO) functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines.
We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and inform gene prediction algorithms.
- Gene Ontology
- Functional Annotation
- Standardize Nomenclature
- Structural Annotation
- Mouse Orthologs
After genome sequencing, genome annotation is critical to denote and demarcate the functional elements in the genome (structural annotation) and to link these genomic elements to biological function (functional annotation). Structural annotation of newly sequenced genomes begins during the final stages of genome assembly with electronic prediction of open reading frames (ORFs) [1–3]. Sequencing consortiums typically release these predicted genes and their translated products into public databases, where they account for the majority of data for the newly sequenced species [4, 5] and are critical for high-throughput wet lab functional genomics (microarray and proteomics) experiments [4, 6]. The NCBI Non-Redundant Protein Database (NRPD) and the UniProt Archive (UniParc) do not directly provide functional annotation for these predicted ORFs. The highly curated UniProt Knowledgebase (UniProtKB) database  displays functional annotation from the European Bioinformatics Institute Gene Ontology Annotation (EBI-GOA) Project , but does not include predicted gene products until there is experimental evidence for their in vivo expression. Thus, despite being critical for functional genomics experiments, most data from a newly sequenced genome does not have even preliminary functional annotation. This problem is exacerbated as other public resources such as Ensembl . Entrez Gene  and Affymetrix Netaffx  use data from UniProtKB or the EBI-GOA Project as their functional annotation source.
Gene Ontology evidence codes
Direct experimental evidence codes
Inferred from Direct Assay
in vitro reconstitution
physical interaction/binding assay
Inferred from Genetic Interaction
"traditional" genetic interactions such as suppressors, synthetic lethals, etc.
inference about one gene drawn from the phenotype of a mutation in a different gene
Inferred from Mutant Phenotype
any gene mutation/knockout
overexpression/ectopic expression of wild-type or mutant genes
specific protein inhibitors
polymorphism or allelic variation
Inferred from Physical Interaction
ion/protein binding experiments
Inferred from Expression Pattern
transcript levels (e.g. Northerns, microarray data)
protein levels (e.g. Western blots)
Indirect evidence codes
Non-traceable Author Statement
Database entries that don't cite a paper
Traceable Author Statement
original experiments are traceable through that article
Inferred by Curator
inferred by a curator from other GO annotations
Inferred from Genomic Context
genome-scale analysis of processes
used for annotations done before curators began tracking evidence types, not used for new annotations
No biological Data available
"unknown" molecular function, biological process, cellular component
Inferred from Electronic Annotation
"hits" in sequence similarity searches, if they have not been reviewed by curators; transferred from database records, if not reviewed by curators
Inferred from Sequence or Structural Similarity
sequence similarity (homologue of/most closely related to)
protein features, predicted or observed (e.g. hydrophobicity, sequence composition)
Inferred from Reviewed Computational Analysis
predictions based on large-scale experiments (e.g. genome-wide two-hybrid)
predictions based on integration of large-scale datasets of several types
text-based computation (e.g. text mining)
Although most GO annotations for newly sequenced species are the IEA-based annotations provided by the EBI-GOA Project , these IEA annotations do not initially include the gene products predicted during sequence assembly. Moreover, while IEA annotations are based on functional motifs and sequences, the most rigorous way of assigning function when there is no direct experimental evidence available, is based on strict orthology. Orthology is one of the central concepts of comparative genome analysis. By definition orthologs are genes or proteins in two or more species that share significant similarity, and are thought to have diverged from a common ancestral gene that existed in their last common ancestor [13–17]. Since orthologous pairs have minimum level of evolutionary separation between them, they are more likely to retain a common function. Determination of orthology relations assists knowledge transfer between species and can be used to improve both structural and functional annotation in organisms that have less annotation.
A number of ortholog prediction methods and search tools are available [9, 18–20]. However, the number of proteins from one species that is considered to be part of the same orthologous group varies from one method to another due to different algorithms employed and species included in the methods . For example, Homologene  does orthology analyses by comparing protein sequences using the BLASTP tool and then matching the sequences using phylogenetic trees built from sequence similarity and synteny, where possible. Ensembl  first uses BLASTP and the Smith-Waterman algorithm to identify putative orthologs by reciprocal BLAST analysis and synteny evidence. Inparanoid  is based on pairwise similarity scores and it detects best-best hits between sequences from two different species to form the main orthologous group to which other sequences (in-paralogs) are added only if they are closely related. Treefam (Tree families)  uses phylogeny based on Ensembl datasets and clusters genes (and corresponding gene products) from multiple organisms into groups that are all descended from a single ancestor gene. In order to obtain good coverage and reliable predicted orthologs, various methods should be integrated .
Comparative genome analysis also requires standardized nomenclature. By identifying orthologs of experimentally supported proteins, standardized nomenclature can be added. Committees for standardized nomenclature exist for human and mouse gene and gene products  and chicken researchers have followed suit  and will use human nomenclature for orthologous chicken genes.
In this work we analysed nine chicken tissues using a three-stage combined high throughput proteomics and computational biology approach to derive "expressed protein sequence tags" (ePSTs) to improve structural annotation by experimentally supporting the in vivo expression of computationally predicted chicken proteins . We then used orthology to add standardized gene nomenclature and GO annotations (by transferring functional annotations based on direct experimental evidence for corresponding human and mouse orthologs).
Identification of predicted proteins
Not surprisingly, more predicted proteins were identified by mass spectrometry when Differential Detergent Fractionation (DDF) was used as the method for protein isolation, as previously reported . This means that muscle and brain tissues, two tissues which would normally be expected to have the highest number of identified proteins, had the fewest predicted proteins (61 and 36, respectively). We found that 52% of the identified proteins were expressed in more than one tissue (Figure 1B), and their independent identification in multiple tissues lends validity to their in vivo expression in chicken. The protein identification and mass spectrometry data has been submitted to the PRoteomic IDEntifications database (PRIDE; ), accession numbers 1621–1626, 1654 & 1655.
One of the most time consuming tasks in high-throughput experiments is navigating among different database identifiers. To assist researchers with their data analysis and facilitate data sharing we mapped all identified proteins to UniParc, IPI (International Protein Index), Entrez Gene and Ensembl identifiers (see additional file 2). Only 80% of the identified proteins were mapped to Ensembl IDs. This may be because Ensembl has a different gene prediction method  to that of NCBI and not all of the NCBI predicted proteins are represented in Ensembl.
The use of standardized nomenclature facilitates comparative biology and aids modelling of functional genomics data. We assigned 5,064 (65%) chicken predicted proteins with HGNC (Human Genome Organization (HUGO) Gene Nomenclature Committee) approved gene symbols and names based on their human or mouse orthologs (see additional file 3). Although it has been agreed to base chicken gene nomenclature on human nomenclature guidelines  it is only relatively recently that there has been a concerted effort to provide standardized nomenclature for chicken genes, and the majority of chicken gene products are not named according to standardized nomenclature guidelines. We have assigned standardized nomenclature to chicken genes on a large scale as part of a high-throughput experimental annotation effort.
Here we demonstrate a combined approach to provide experimental-based structural annotations and functional annotations based on orthology. The workflow we have developed relies on using proteomics to survey a range of tissues from the species of interest. Newer structural annotation pipelines include the use of ESTs and mRNA in their computational models. We are proposing an analogous method that would include experimental support at the protein level while providing information that can be used to improve structural annotation in the species being studied, provide information to improve annotation in other species and be used to improve open reading frame prediction algorithms. In addition, providing information about tissue specificity and preliminary functional information based on sequence analysis will facilitate analysis of future functional genomics studies.
The chicken genome was sequenced because of its importance as a non-mammalian vertebrate model, its use as a biomedical model to study embryology and [29, 30] development and its agricultural importance. A major step that follows after genome sequencing is structural and functional annotation (denoting and demarcating the functional elements in the genome and link these genomic elements to biological function, respectively). When we began the work described in this manuscript only 53% of chicken proteins were known to be expressed in vivo, with the remainder being electronically predicted using in silico methods. Moreover, only 52% of chicken gene products had any GO annotations and, although genes predicted during genome assembly may be the bulk of the data for a newly sequenced species, these predicted gene products are not automatically assigned any GO annotation.
The parameters we have used in this study provide strong support for protein expression in vivo. In particular, the parameter DeltaCn is a measure of specificity of the match within the database used and a DeltaCn value 0.1 ensures that a peptide is distinctly different from other peptides within the same database. However, a single peptide match to a predicted protein does not necessarily provide evidence that the annotation for the entire open reading frame is accurate; this can only be confirmed by accumulating more mass spectra data and accounting for the detectable peptides within the genome . While some of the predicted proteins we identified were identified on the basis of a single peptide, 44% of these proteins were expressed in more than one tissue, providing additional evidence for their in vivo expression. In a typical proteomics experiment 20–67%-of proteins are identified by a single peptide match [26, 32, 33]. Calculation of false discovery rate has been used to validate peptide or proteins identifications [32, 34–37], including proteins identified by a single peptide match. In one study, 90% of the proteins identified by a single peptide were validated by immunoassay detection .
By analysis of multiple tissues we maximize the number of predicted proteins identified and provide tissue expression data for these identified proteins. Also, identifying predicted proteins in more than one experiment (52% of the chicken proteins identified were detected in more than one tissue) provides additional confidence that the predicted protein is expressed in vivo. In addition, 30 proteins were only electronically predicted or hypothetical translations in human. Identifying these proteins in chicken is additional information to support, not only the expression of these proteins in chicken but also in human based on orthology.
The least number of proteins were identified from the muscle and brain tissues. However, this does not necessarily reflect the biological complexity of these tissues but is more likely a reflection of the different protein extraction method used for these two tissues and amount of sample analyzed.
In addition to providing experimental support for the in vivo expression of chicken predicted proteins, we used strict 1:1 orthology with human and mouse genes to provide the identified proteins with standardized gene nomenclature based on established nomenclature guidelines and functional annotations based on the best available data. Since by definition predicted proteins have no direct experimental evidence, assignation of GO annotation for these proteins can be done using either IEA or ISS. While IEA is provided for a large range of organisms by the EBI-GOA Project, this annotation effort does not include predicted proteins and IEA annotations tend to be broad descriptions of function (e.g. "protein binding"). The most rigorous way to assign function in the absence of direct experimental evidence is by strict orthology.
Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Orthologs are, by definition, more likely to share functional similarity  and orthology can be used to reliably infer function to their co-orthologs. We determined chicken orthologous genes that pair with human and mouse genes. Since there is no a 'gold standard' method for orthologs identification , we integrated different published orthology identification methods that could possibly increase the breadth of orthologs identified. We were able to identify human or mouse orthologs for 77% of the identified chicken proteins. This figure, however, is better than the number that could have been obtained when using only one method (see additional file 3). For example from the total number of identified chicken predicted proteins (7,809), only 71%, 57%, 57% and 23% could have been identified by Homologene, Inparanoid, Ensembl and Treefam, respectively. Each of these methods use different procedures and orthologs identified by more than one method have been reported to be more consistent and reliable .
In addition to the experimentally supported predicted proteins that have human or mouse orthologs, there are a further 1,780 predicted proteins that we identified in this study. We are in the process of providing GO functional annotation for these proteins based on sequence similarity to other GO annotated gene products and functional motifs and domains and this information will be also be made publicly available.
Standardized nomenclature is becoming increasingly important with the large amounts of data released by sequencing projects, gene expression microarrays and proteomics. This information will facilitate comparative and functional genomics studies in both avians and mammals. Moreover, assigning functional annotation based on orthology is more robust than using sequence similarity alone . This is because the higher level of functional conservation between orthologous proteins makes orthology highly relevant for protein function prediction. Thus our 8% increase in chicken GO annotated proteins is a significant improvement.
We demonstrate the value of proteomics to experimentally support the in-vivo expression of electronically predicted proteins of a newly sequenced genome. We assigned standardized nomenclature and GO functional annotations for these newly confirmed proteins. The approach we have developed facilitates comparative and functional genomics studies and may be applied to improve the annotations of a diverse range of newly sequenced genomes.
Tissues and protein extraction
Proteins were isolated from several different tissues in a series of experiments. Bursal B cells and stromal cells were isolated from bursas collected from five 21-day-old Ross 508 mixed sex chickens, muscle from the Pectoralis Major muscle of six 42 day old female chickens, brain from six 42 day old female chickens, spleen from eighteen 7- and 8-day-old advanced intercross Fayoumi and Leghorn mixed sex chickens, T cells from peripheral blood mononuclear cells (PBMC) obtained from adult Ross 508 mixed sex chickens, serum from 20-day-old Ross 508 male chickens. The disease virus-transformed cell line, MDCC-UA01 (obtained from Dr M. Parcells, University of Delaware) was grown as described . Proteins were isolated using Differential Detergent Fractionation (DDF)  for each of the tissues except muscle and brain. For the muscle and brain samples, the samples were immediately frozen at -80°C. The samples were then allowed to warm to -21°C and solubilized in lysis buffer (7 M urea, 2 M thiourea, 4% CHAPSO, 8 mM PMSF) with repetitive pulsed sonication on ice. Note that the DDF method has been shown to yield more proteins than a single step lysis of tissues (as used for muscle and brain) .
All solubilized proteins were identified by 2-dimensional liquid chromatography tandem mass spectrometry (2-DLCMS/MS) exactly as previously described [24, 27]. Briefly, protein mixtures are trypsin digested and the peptides desalted prior to strong cation exchange followed by reverse phase liquid chromatography coupled directly in line with ESI ion trap MS. A flow rate of 3 μL/min was used for both SCX and RP columns. A salt gradient was applied in steps of 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 57, 64, 71, 79, 90, 110, 300, and 700 mM ammonium acetate in 5% ACN, 0.1% formic acid and the resultant peptides loaded directly into the sample loop of a 0.18 × 100 mm BioBasic C18 reverse phase liquid chromatography column of a Proteome X workstation (ThermoElectron). The reverse phase gradient used 0.1% formic acid in ACN and increased the ACN concentration in a linear gradient from 5% to 30% in 30 min and then 30% to 65% in 9 min followed by 95% for 5 min and 5% for 15 min.
A database containing only chicken proteins that have been electronically predicted was prepared by parsing the chicken RefSeq entries (chicken gene build 2.1, 01/08/2007) for records with an XP prefix (14,676 proteins). The XP prefix is used to indicate proteins that have been predicted using the GNOMON pipeline. Redundancies were minimized by using the RefSeq dataset rather than the dataset from the Non-redundant Protein Database. The RefSeq database contained 19,500 chicken proteins but only the 14,676 GNOMON predicted proteins were used in this study. Trypsin digestion was applied in silico to the predicted protein database including mass changes due to cysteine-carboxyamidomethylation and methionine oxidation.
The MS2 spectra were then used to search the non-redundant predicted protein database using Cluster 3.2 (Bioworks Browser 3.2, Thermo Electron, San Jose, CA). The peptide (MS precursor ion) mass tolerance was set to 1.4 and the groups scan to 1.0. Peptide molecular range was set to 600–3500. Only peptides ≥ 6 amino acids in length that had cross correlation (Xcorr) scores of 1.5, 2.0 and 2.5 (for +1, +2, and +3 charge state, respectively) and DeltaCn of > 0.1 [25, 40, 41] were considered matches. To quantify the peptide false discovery rate (FDR), we used the reverse database function in Bioworks 3.2 to search all MS2 spectra against a reversed version of our predicted proteins database using the same search criteria described above. Prior to calculating the FDR, we calculated the probability of each peptide match from both real and reversed database based on the product of XCorr and DeltaCn and set a cut-off of P ≤ 0.05 for individual peptide identifications. With this probability as the cut-off, we calculated the FDR using the expected proportion E(V) of incorrect identifications from correct identifications (R) : FDR = E(V)/R. Proteins were identified based on the peptides that pass the above criteria.
Proteins identified by SEQUEST search algorithm have a Genbank identifier (gi) and RefSeq identifiers. In order to facilitate data sharing with public databases and ortholog determination we mapped the identified proteins to corresponding identifiers from UniProt Archive (UniParc), the International Protein Index (IPI), Entrez Gene and Ensembl protein identifiers using either different online tools for ID mapping [42–45] or an in-house Perl script (MapProtID.pl) to match different ID datasets. In cases where the program could not find an identifier, we used gi or RefSeq numbers to manually search co-identifiers in the UniParc , IPI , Entrez  or Ensembl  databases.
Chicken-human orthologs were downloaded from the HGNC (Human Genome Organization (HUGO) Gene Nomenclature Committee) Comparison of Orthology Predictions (HCOP) site  using the HCOP search tool [20, 51]. HCOP integrates and displays the orthology assertions made by different ortholog prediction methods such as Ensembl , Homologene [21, 52], Inparanoid , MGI (Mouse Genome Informatics)  and Treefam . In cases where we could not identify chicken-human orthologs we manually checked Homologene , Inparanoid  or Ensembl [49, 55] in order to obtain the most recent data. Chicken-mouse orthologs were downloaded only from Homologene, Inparanoid and Ensembl because HCOP does not predict chicken-mouse orthologs
Standardized gene nomenclature is vital for effective scientific communication  and chicken researchers have agreed to use human nomenclature for orthologous chicken genes . In this study we assigned chicken standardized nomenclature based on HGNC approved gene symbols and names that were associated with the human or mouse orthologs. We manually check the existence of each symbol and name in the HGNC nomenclature database before transferring it to chicken. In cases where the human or mouse gene symbol or name was not found or withdrawn from HGNC, no symbol or name was assigned to the chicken co-ortholog. To distinguish chicken from human genes the symbol assigned to chicken gene products are all in lowercases except for the first letter, as is the convention for mouse.
Since orthologs are presumed to have the same function, useful functional information can be extracted from other species when annotating orthologous gene products with unknown functions. To provide GO annotation for the identified chicken predicted proteins, we downloaded the human and mouse GO annotations from either the European Bioinformatics Institute GO annotation project (EBI-GOA: 03/12/2007) or searched Ensembl  using Biomart [43, 55]. We assigned the chicken predicted proteins the GO annotations of human and mouse orthologs that are only based on direct experimental evidence codes (Table 1) and each chicken GO annotation was assigned an ISS GO evidence code, as per usual GO annotation procedure.
Public Availability of Data
Experimentally supported predicted proteins will be shared with the NCBI database, standardized nomenclature made available to both the NCBI and UniProt databases and GO annotations made available publicly via AgBase, the EBI-GOA Project and the GO Consortium. Assigned GO annotations are publicly available via the AgBase database  and will be submitted to the EBI-GOA Project. A summary of these GO annotations was obtained by mapping the associated GO terms to the Generic GOSlim Sets  using GOSlimViewer [4, 5].
We wish to acknowledge Alan Shack, Joram Buza, Amanda Cooksey and Bart van den Berg assistance with collecting the tissue samples and protein extraction & Tibor Pechan for MS/MS analysis. The authors acknowledge financial support from Mississippi Agricultural and Forestry Experiment Station, Mississippi State University.
- Alexandersson M, Cawley S, Pachter L: SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 2003, 13 (3): 496-502. 10.1101/gr.424203.PubMed CentralPubMedView ArticleGoogle Scholar
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M: The Ensembl automatic gene annotation system. Genome Res. 2004, 14 (5): 942-950. 10.1101/gr.1858004.PubMed CentralPubMedView ArticleGoogle Scholar
- Wu JQ, Shteynberg D, Arumugam M, Gibbs RA, Brent MR: Identification of rat genes by TWINSCAN gene prediction, RT-PCR, and direct sequencing. Genome Res. 2004, 14 (4): 665-671. 10.1101/gr.1959604.PubMed CentralPubMedView ArticleGoogle Scholar
- McCarthy FM, Bridges SM, Wang N, Magee GB, Williams WP, Luthe DS, Burgess SC: AgBase: a unified resource for functional analysis in agriculture. Nucleic acids research. 2007, 35 (Database issue): D599-603. 10.1093/nar/gkl936.PubMed CentralPubMedView ArticleGoogle Scholar
- McCarthy FM, Wang N, Magee GB, Nanduri B, Lawrence ML, Camon EB, Barrell DG, Hill DP, Dolan ME, Williams WP, Luthe DS, Bridges SM, Burgess SC: AgBase: a functional genomics resource for agriculture. BMC genomics. 2006, 7: 229-10.1186/1471-2164-7-229.PubMed CentralPubMedView ArticleGoogle Scholar
- Azuaje F, Al-Shahrour F, Dopazo J: Ontology-driven approaches to analyzing data in functional genomics. Methods Mol Biol. 2006, 316: 67-86.PubMedGoogle Scholar
- The Universal Protein Resource (UniProt). Nucleic Acids Res. 2007, 35 (Database issue): D193-7.Google Scholar
- Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004, 32 (Database issue): D262-6. 10.1093/nar/gkh021.PubMed CentralPubMedView ArticleGoogle Scholar
- Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, et : Ensembl 2007. Nucleic acids research. 2007, 35 (Database issue): D610-7. 10.1093/nar/gkl996.PubMed CentralPubMedView ArticleGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2007, 35 (Database issue): D26-31. 10.1093/nar/gkl993.PubMed CentralPubMedView ArticleGoogle Scholar
- Cheng J, Sun S, Tracy A, Hubbell E, Morris J, Valmeekam V, Kimbrough A, Cline MS, Liu G, Shigeta R, Kulp D, Siani-Rose MA: NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics. 2004, 20 (9): 1462-1463. 10.1093/bioinformatics/bth087.PubMedView ArticleGoogle Scholar
- Lewis S, Ashburner M, Reese MG: Annotating eukaryote genomes. Curr Opin Struct Biol. 2000, 10 (3): 349-354. 10.1016/S0959-440X(00)00095-6.PubMedView ArticleGoogle Scholar
- Chen F, Mackey AJ, Stoeckert CJ, Roos DS: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic acids research. 2006, 34 (Database issue): D363-8. 10.1093/nar/gkj123.PubMed CentralPubMedView ArticleGoogle Scholar
- Hulsen T, Huynen MA, de Vlieg J, Groenen PM: Benchmarking ortholog identification methods using functional genomics data. Genome biology. 2006, 7 (4): R31-10.1186/gb-2006-7-4-r31.PubMed CentralPubMedView ArticleGoogle Scholar
- Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13 (9): 2178-2189. 10.1101/gr.1224503.PubMed CentralPubMedView ArticleGoogle Scholar
- O'Brien KP, Westerlund I, Sonnhammer EL: OrthoDisease: a database of human disease orthologs. Human mutation. 2004, 24 (2): 112-119. 10.1002/humu.20068.PubMedView ArticleGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of molecular biology. 2001, 314 (5): 1041-1052. 10.1006/jmbi.2000.5197.PubMedView ArticleGoogle Scholar
- Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J, Durbin R: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic acids research. 2006, 34 (Database issue): D572-80. 10.1093/nar/gkj118.PubMed CentralPubMedView ArticleGoogle Scholar
- O'Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic acids research. 2005, 33 (Database issue): D476-80. 10.1093/nar/gki107.PubMed CentralPubMedView ArticleGoogle Scholar
- Wright MW, Eyre TA, Lush MJ, Povey S, Bruford EA: HCOP: the HGNC comparison of orthology predictions search tool. Mamm Genome. 2005, 16 (11): 827-828. 10.1007/s00335-005-0103-2.PubMedView ArticleGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007, 35 (Database issue): D5-12. 10.1093/nar/gkl1031.PubMed CentralPubMedView ArticleGoogle Scholar
- Wright MW, Bruford EA: Human and orthologous gene nomenclature. Gene. 2006, 369: 1-6. 10.1016/j.gene.2005.10.029.PubMedView ArticleGoogle Scholar
- Crittenden LB, Bitgood JJ, Burt DW, Ponce de Leon FA, Tixier-Boichard M: Nomenclature for naming loci, alleles, linkage groups, and chromosomes to be used in poultry genome publications and databases. The Second International Workshop on Poultry Genome Mapping in Prague. 1994Google Scholar
- McCarthy FM, Cooksey AM, Wang N, Bridges SM, Pharr GT, Burgess SC: Modeling a whole organ using proteomics: the avian bursa of Fabricius. Proteomics. 2006, 6 (9): 2759-2771. 10.1002/pmic.200500648.PubMedView ArticleGoogle Scholar
- Balgley BM, Laudeman T, Yang L, Song T, Lee CS: Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy. Mol Cell Proteomics. 2007, 6 (9): 1599-1608. 10.1074/mcp.M600469-MCP200.PubMedView ArticleGoogle Scholar
- Higdon R, Kolker E: A predictive model for identifying proteins by a single peptide match. Bioinformatics. 2007, 23 (3): 277-280. 10.1093/bioinformatics/btl595.PubMedView ArticleGoogle Scholar
- McCarthy FM, Burgess SC, van den Berg BH, Koter MD, Pharr GT: Differential detergent fractionation for non-electrophoretic eukaryote cell proteomics. J Proteome Res. 2005, 4 (2): 316-324. 10.1021/pr049842d.PubMedView ArticleGoogle Scholar
- Martens L, Hermjakob H, Jones P, Adamski M, Taylor C, States D, Gevaert K, Vandekerckhove J, Apweiler R: PRIDE: the proteomics identifications database. Proteomics. 2005, 5 (13): 3537-3545. 10.1002/pmic.200401303.PubMedView ArticleGoogle Scholar
- Burt DW: Chicken genome: Current status and future opportunities . Genomes. Edited by: Sussman HE, Smit MA. 2006, Cold Harbor Laboratory Press , 221-236.Google Scholar
- McPherson JD, Dodgson J, R. K, Pourquié O: Proposal to sequence the genome of chicken. World Wide Web (http://www.nih.gov/science/models/gallus/ChickenGenomeWhitePaper.pdf). 2003Google Scholar
- Sanders WS, Bridges SM, McCarthy FM, Nanduri B, Burgess SC: Prediction of peptides observable by mass spectrometry applied at the experimental set level,. BMC Bioinformatics,. 2007, 8(Suppl 7) (S23):Google Scholar
- Gupta N, Tanner S, Jaitly N, Adkins JN, Lipton M, Edwards R, Romine M, Osterman A, Bafna V, Smith RD, Pevzner PA: Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation. Genome Res. 2007, 17 (9): 1362-1377. 10.1101/gr.6427907.PubMed CentralPubMedView ArticleGoogle Scholar
- Lowenthal MS, Mehta AI, Frogale K, Bandle RW, Araujo RP, Hood BL, Veenstra TD, Conrads TP, Goldsmith P, Fishman D, Petricoin EF, Liotta LA: Analysis of albumin-associated peptides and proteins from ovarian cancer patients. Clinical chemistry. 2005, 51 (10): 1933-1945. 10.1373/clinchem.2005.052944.PubMedView ArticleGoogle Scholar
- Elias JE, Gygi SP: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature methods. 2007, 4 (3): 207-214. 10.1038/nmeth1019.PubMedView ArticleGoogle Scholar
- Nesvizhskii AI, Aebersold R: Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics. 2005, 4 (10): 1419-1440. 10.1074/mcp.R500012-MCP200.PubMedView ArticleGoogle Scholar
- Nesvizhskii AI, Vitek O, Aebersold R: Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature methods. 2007, 4 (10): 787-797. 10.1038/nmeth1088.PubMedView ArticleGoogle Scholar
- States DJ, Omenn GS, Blackwell TW, Fermin D, Eng J, Speicher DW, Hanash SM: Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nature biotechnology. 2006, 24 (3): 333-338. 10.1038/nbt1183.PubMedView ArticleGoogle Scholar
- Fitch WM: Distinguishing homologous from analogous proteins. Syst Zool. 1970, 19 (2): 99-113. 10.2307/2412448.PubMedView ArticleGoogle Scholar
- Dienglewicz RL, Parcells MS: Establishment of a lymphoblastoid cell line using a mutant MDV containing a green fluorescent protein expression cassette. Acta Virol. 1999, 43 (2-3): 106-112.PubMedGoogle Scholar
- Eng JK, McCormack AL, Yates JR, III: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994, 5: 976-989. 10.1016/1044-0305(94)80016-2.PubMedView ArticleGoogle Scholar
- Liu T, Qian WJ, Gritsenko MA, Xiao W, Moldawer LL, Kaushal A, Monroe ME, Varnum SM, Moore RJ, Purvine SO, Maier RV, Davis RW, Tompkins RG, Camp DG, Smith RD: High dynamic range characterization of the trauma patient plasma proteome. Mol Cell Proteomics. 2006, 5 (10): 1899-1913. 10.1074/mcp.M600068-MCP200.PubMed CentralPubMedView ArticleGoogle Scholar
- Alibes A, Yankilevich P, Canada A, Diaz-Uriarte R: IDconverter and IDClight: conversion and annotation of gene and protein IDs. BMC bioinformatics. 2007, 8: 9-10.1186/1471-2105-8-9.PubMed CentralPubMedView ArticleGoogle Scholar
- Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics (Oxford, England). 2005, 21 (16): 3439-3440. 10.1093/bioinformatics/bti525.View ArticleGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic acids research. 2006, 34 (Database issue): D187-91. 10.1093/nar/gkj161.PubMed CentralPubMedView ArticleGoogle Scholar
- Batch Retrieval:PIR - Protein Information Resource. [http://pir.georgetown.edu/pirwww/search/idmapping.shtml]
- UniProt Archive Database . [http://www.pir.uniprot.org/database/archive.shtml]
- International Protein Index database . [http://www.ebi.ac.uk/IPI/IPIhelp.html]
- Entrez cross-database search . [http://www.ncbi.nlm.nih.gov/sites/entrez]
- Ensembl Genome Browser . [http://www.ensembl.org/Gallus_gallus/index.html]
- HGNC Comparison of Orthology Predictions search tool . [http://www.genenames.org/cgi-bin/hcop.pl]
- Eyre TA, Wright MW, Lush MJ, Bruford EA: HCOP: a searchable database of human orthology predictions. Briefings in bioinformatics. 2007, 8 (1): 2-5. 10.1093/bib/bbl030.PubMedView ArticleGoogle Scholar
- Homologene: A homology resource. [http://www.ncbi.nlm.nih.gov/HomoloGene/]
- Eppig JT, Bult CJ, Kadin JA, Richardson JE, Blake JA, Anagnostopoulos A, Baldarelli RM, Baya M, Beal JS, et : The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology. Nucleic acids research. 2005, 33 (Database issue): D471-5. 10.1093/nar/gki113.PubMed CentralPubMedView ArticleGoogle Scholar
- Inparanoid: Eukaryotic Ortholog Groups . [http://inparanoid.sbc.su.se]
- BioMart: Data mining tool. [http://www.ensembl.org/biomart/martview]
- Generic GOSlim set . [http://www.geneontology.org/GO_slims/goslim_generic.obo]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.