NovelFam3000 – Uncharacterized human protein domains conserved across model organisms
© Kemmer et al; licensee BioMed Central Ltd. 2006
Received: 18 November 2005
Accepted: 13 March 2006
Published: 13 March 2006
Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins.
From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system.
Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families.
The number of protein-encoding human genes identified has reached a plateau , leaving researchers with the challenging task of ascribing biochemical function(s) for each protein . Broad genome sequencing and functional genomics studies, partially motivated by the goal to discover the functions of uncharacterized proteins, have provided a distributed set of data collections suitable to catalyze the inference of the functions of proteins. While gene predictions and high-throughput genomics data can be of variable quality, studies have demonstrated that consistent results for interactions between homologous genes in multiple organisms, so called Interolog Analysis, can be more reliable [3–5]. Therefore, human protein characterization efforts that focus on similar proteins across multiple organisms are expected to more effectively capitalize on the available genomics data.
The genome sequence annotation and functional genomics data of Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens (hereafter referred to as worm, fly, and human) provide the basis for the study of proteins conserved across metazoan species. In pursuing comparative genomics approaches for functional inference of protein function, the initial selection of related proteins separated by great evolutionary distances can be a challenge. A decision must often be drawn between the study of homologous and orthologous proteins. In addition to technical difficulties and controversies that can arise in ortholog identification, a conservative focus on the study of orthologs greatly limits the number of proteins available to study. For homolog studies, grouping full-length protein sequences by similarity is not always feasible. The modular evolution of proteins presents a systematic complication – unrelated pairs of proteins can be linked through additional proteins sharing a domain with each pair (e.g. a protein with domains A and B may be linked to a protein with domains C and D via an intermediary protein with domains B and C). This problem is ameliorated by placing the focus on modular protein domain families, in which proteins are linked by the presence of a common domain . Resources are well established which describe protein domain families, including such examples as Pfam, InterPro, and Panther [7–9]. Those domains observed in proteins from multiple species are likely to be most reliable .
Characterization of protein function remains a fundamental challenge in functional genomics research. We have created the NovelFam3000 data centre to accelerate the study of uncharacterized domains conserved across worm, fly, and human. Building on domains identified in Pfam , we systematically link domain-containing proteins to functional genomics data in online databases. The NovelFam3000 system allows users to post both comments and experimental data. For a selected subset of the uncharacterized domain-containing families, we generate and post expression profiles and proteomic sub-cellular localization images. Specific examples are presented showing how a combination of experimental approaches and bioinformatics resources may elucidate functional characteristics of uncharacterized domains.
Construction and content
Selection of uncharacterized domain families
The characterization state of each protein domain is dynamic, dependent both on the available experimental literature and the perspective of the observing scientist. Using the Pfam database , we extracted approximately 3000 protein domain families for which we judged minimal biochemical annotation to be available (hence the name NovelFam3000). We limited our search to protein families present in genes from three metazoan genomes (worm, fly, and human), for which there were multiple human protein members. Applying these criteria, we extracted 2785 Pfam-B domain families and 127 families of Domains of Unknown Function (DUFs). The Pfam-B and DUF classes are distinguished by the level of human curation, as Pfam-B domains represent purely computational analysis and DUFs have been subjected to curator review. Of these domains, 892 (32%) of selected Pfam-B domains and 59 (46%) selected DUFs included at least one yeast protein member.
NovelFam3000 system overview
The NovelFam3000 annotation system
Users may view and post detailed information about each gene using four categories: i) resource links, linking to major bioinformatics resources, ii) news, highlighting the latest annotations submitted to the system, iii) comments, giving users the opportunity to view and post general comments regarding the domain-containing protein of interest, and iv) experimental evidence, displaying results submitted by individual researchers. At the bottom of each page displaying gene-specific information for one of the four categories, the user is prompted to submit new information. Submitted data are rendered accessible through the system within 24 hours, after brief editorial review to confirm relevance (i.e. to prevent posting of unrelated material).
For each gene, links to both diverse external resources and user-submitted experimental results and comments are provided via a "gene page". Organism-centric resource links for each gene include WormBase , Flybase , and SGD . For human proteins, links are provided to genome browsers [19, 20] and the meta-database GeneLynx . For each protein, we provide links to the Biomolecular Interaction Network Database (BIND) , as well as to the Interolog Analysis system Ulysses [5, 23] that displays protein-protein interactions observed for homologous proteins across fly, worm, human, and yeast.
Within the NovelFam3000 system, we report the Gene Characterization Index (GCI) for each human gene, providing users with a measure of the current knowledge of the gene's function. GCI scores assign a continuous score in the range of one (uncharacterized) to ten (fully characterized). The GCI system (Podowski et al., in preparation) is based on the results of a global survey of research biologists. Each participating scientist was given a sample of ten genes and returned their opinion as to the characterization status. The survey covered a total set of 100 genes with at least three fold redundancy. A machine learning procedure was used to create a scoring function to automatically predict the GCI score for all genes in the human genome. In this step, a Support Vector Machine was trained based on the survey results as training data, and the number of links to common databases (e.g. links to abstracts in PubMed or domains in Pfam).
Both the gene-specific news and user comment features allow researchers to highlight recent publications and observations. The experimental evidence pages enable the user to view and submit experimental results for individual proteins. The option to post and view comments related to protein-specific information forms a basis for a general discussion forum and motivates scientific exchange and discussion between researchers.
Posting of laboratory results to the NovelFam3000 system
Selection of sample set of genes
List of selected domain family members for experimental validation
Pfam domain family
NP_055268 (CHMP2A, BC-2)
NP_057385 (Protein × 0004)
RABGAP1L (HHL protein, EVI-5 homolog)
The function of proteins is, in part, defined by the cellular compartment within which they reside. Sub-cellular localization can be determined by visualization of recombinant proteins in amenable cell lines [24, 25]. We initiated sub-cellular localization by verifying that a set of predicted human genes were endogenously expressed in human cells. For this purpose, we screened the expression of the 25 selected genes in three human cell lines by reverse transcription polymerase chain reaction (RT-PCR) analysis [see Additional file 3]. The human cell lines, chosen for their suitability for microscopy studies, included the hepatocarcinoma cell line PLC/PRF/5, the glia cell line U333CG/343 MG, and the fibroblast line HF-SV80. Of the 25 candidate genes, 20 were expressed in all three cell lines, three were found to be expressed in two of the three cell lines, and transcripts for two genes were only detected in a single cell line. These observations confirmed the physiological expression of predicted human genes. For sub-cellular screening, full-length human cDNAs were amplified from mRNA and cloned in-frame with an N-terminal FLAG tag. The 25 cloned, FLAG epitope-tagged recombinant proteins were analyzed by immunofluorescence microscopy. Individual transfection of each construct into mammalian cells followed by expression and immunolocalization with monoclonal FLAG-specific antibodies revealed sub-cellular localization of the fusion proteins.
We performed an initial screen to distinguish between cytoplasmic and nuclear localization. This initial classification was followed by counterstaining experiments with multiple sub-cellular markers. Each marker was specific to a sub-cellular compartment, thus facilitating the refined interpretation of previously determined coarse staining patterns. During the primary analysis, we observed six fusion proteins localized to the nucleus, nine proteins in the cytoplasm and six proteins appeared diffusely distributed over the entire cell. Four of the recombinant proteins did not give rise to any detectable staining pattern. All constructs were expressed in the three cell lines to confirm that the observed localization pattern was identical between transfections with the same construct irrespective of the cell type. In the second round of screening, this time limited to PLC/PRF/5 cells, we re-transfected those constructs that had previously given rise to distinct cellular localization patterns, and stained using either antibody markers or specific dyes for cellular structures to confirm co-localization.
All of the expression data and microscopy images from the sub-cellular localization profiling were posted through the laboratory results service of the NovelFam3000 system.
Inference of potential domain properties
Within the targeted domain families, we sought to identify intra-family consistencies.
Combining results from multiple sources via NovelFam3000
In addition to the analysis of paralogous human genes (derived by duplication), similarities between family members can be considered across species (orthologs analysis). For those selected proteins present in yeast, we extracted and reviewed sub-cellular localization and interacting protein partners. We show in two examples how the integration of functional data from studies of homologous yeast proteins reveals the broad conservation of function.
Yeast proteins containing the brix domain (PF04427) and their interacting partners have been localized to the nucleolus . Imp4p is a specific component of the U3 snoRNP and is required for pre-18S rRNA processing. Brx1p is implicated in the biogenesis of the 60S ribosomal subunit. The functional differences of human homologs, BRIX_HUMAN and IMP4, are reflected in their observed nucleolar, yet distinct localization patterns (Figure 2).
Protein localization and interaction data from yeast studies complement the observed localization of human NP_057480 (HSPC129) and DULLARD, both from protein family PF03031 (Figure 4). A yeast homolog containing the NIF domain, nem1, is described as a trans-membrane protein localizing to the membranes of the ER and the nucleus . Nem1's specific molecular function is unknown. Protein interaction studies with nem1 have identified three interacting partners (nup84, nup85, nup120), all components of the yeast nuclear pore complex (NPC) . Despite the strong links to the NPC and the localization to the nuclear membrane, we are not convinced that NP_057480 (HSPC129) is a direct component of the vertebrate NPC, since its nuclear rim staining does not show a punctuate pattern – a general feature of NPC elements . Based on the consistency among yeast network members, we identified the human orthologs for the interacting partners. Human NUP107 (related to yeast nup84) supports the NPC link, as this protein is required for the assembly of a subset of "Nup" proteins into the NPC . From the analysis of NP_057480 (HSPC129), its homologs and interacting partners, we hypothesize that this protein is an uncharacterized NPC-associated protein.
Discussion and conclusion
Based on comparative genome analysis across multiple organisms, protein families have been identified containing domains for which minimal functional annotation is available. From the Pfam database  we extracted uncharacterized domain families conserved across vast evolutionary distance, suggesting a well-defined and important cellular role. To elucidate the cellular function of individual proteins, we created the NovelFam3000 system to integrate links to diverse resources, provide an interface for scientific discourse and comments, and house relevant experimental data. As a demonstration, we explored the properties of several domain families and used the NovelFam3000 system to develop data-based inferences.
Existing data mining tools [36–38] collect information and provide ample annotation for predicted genes and gene products from scattered resources. These tools are generally species-specific or concentrate on specific gene properties such as gene expression  or gene associations . The NovelFam3000 system is a powerful tool for internet-based information exchange and is unique in its focus on active community participation.
There are aspects of the NovelFam3000 system which are reminiscent of the popular WIKI group communication systems . In comparison, the BioWiki project  promises to provide a system for shared content editing, which may be well suited for ontology development projects. While WIKI systems are predicated on user editing of posted information, NovelFam3000 was implemented without the community editing functions, as laboratory data should only be subject to corrections from the source investigator. However, NovelFam3000 does allow for critiques to be posted related to experimental results (subject to editorial review to insure the relevance of postings). In combining a WIKI-like interface with a broad collection of hyperlinks to gene-centric and experimental databases, NovelFam3000 is a unique tool to facilitate inference of protein domain functions.
As the structure of the NovelFam3000 data centre is suitable for any number of projects predicated on the collaborative analysis of sets of genes, the underlying software has been made available on the website – provided as an open-source program with no restrictions on the use or redistribution of the code. Already the software has been revised for use in a large genomics project (the Pleiades Project ), with only modest software revision required. Thus, the NovelFam3000 software stands as an important product of this research effort.
We populated the NovelFam3000 experimental data service with an initial panel of results for 25 genes from 39 domain families. The transcripts were detected by RT-PCR and cloned, confirming active transcription. We assigned the proteins to distinct sub-cellular compartments by epitope tagging followed by immunolocalization of the fusion proteins. Consistent localization across members of a protein domain family suggests that the function of the domain is directly linked to location. In some cases, the experimental localization data was complemented by the properties of interacting partners of model organism family members.
The race to functional annotation runs at full speed and the level of cellular characterization of genes is constantly changing. The Gene Characterization Index score displayed in NovelFam3000 provides a dynamic indicator of the status of annotation for each gene. As upward shifts in GCI scores are indicative of advances in the elucidation of the functions of genes in NovelFam3000, dramatic changes will be highlighted on the homepage of the system.
The NovelFam3000 system facilitates community-based curation of gene information.
NovelFam3000 is publicly available and can be accessed at http://www.cisreg.ca/novelfam3000/. The NovelFam3000 software is available for download without restrictions on the website.
Thanks to Mark Gurney who helped shape our approach to the study of novelty in the human genome. We are grateful for support and suggestions from Claes Wahlestedt, Luis Parodi, Ismail Kola, Michael Hsing, Qiaolin Deng, and Lars Arvestad. This project was funded with financial support from the Pharmacia Corp. to the Center for Genomics andBioinformatics, and the software development was partially supported by funds from Merck-Frost to the Centre for Molecular Medicine and Therapeutics. W.W.W. acknowledges the support of the Canadian Institutes of Health Research and the Michael Smith Foundation for Health Research.
- Southan C: Has the yo-yo stopped? An assessment of human protein-coding gene number. Proteomics. 2004, 4: 1712-1726. 10.1002/pmic.200300700.PubMedView ArticleGoogle Scholar
- Orchard S, Hermjakob H, Apweiler R: Annotating the human proteome. Mol Cell Proteomics. 2005Google Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A Gene Coexpression Network for Global Discovery of Conserved Genetic Modules. Science. 2003, 21: 21-Google Scholar
- Wiehe T, Gebauer-Jung S, Mitchell-Olds T, Guigo R: SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Res. 2001, 11: 1574-1583. 10.1101/gr.177401.PubMedPubMed CentralView ArticleGoogle Scholar
- Kemmer D, Huang Y, Shah SP, Lim J, Brumm J, Yuen MM, Ling J, Xu T, Wasserman WW, Ouellette BF: Ulysses - an application for the projection of molecular interactions across species. Genome Biol. 2005, 6: R106-10.1186/gb-2005-6-12-r106.PubMedPubMed CentralView ArticleGoogle Scholar
- Copley RR, Doerks T, Letunic I, Bork P: Protein domain analysis in the era of complete genomes. FEBS Lett. 2002, 513: 129-134. 10.1016/S0014-5793(01)03289-6.PubMedView ArticleGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-41. 10.1093/nar/gkh121.PubMedPubMed CentralView ArticleGoogle Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 2003, 31: 315-318. 10.1093/nar/gkg046.PubMedPubMed CentralView ArticleGoogle Scholar
- Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, Kitano H, Thomas PD: The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 2005, 33: D284-8. 10.1093/nar/gki078.PubMedPubMed CentralView ArticleGoogle Scholar
- Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z, Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK, Mungall C, O'Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J, Zhao Q, Zheng XH, Lewis S: Comparative genomics of the eukaryotes. Science. 2000, 287: 2204-2215. 10.1126/science.287.5461.2204.PubMedPubMed CentralView ArticleGoogle Scholar
- The NovelFam3000 Data Center. [http://www.cisreg.ca/novelfam3000/]
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.PubMedView ArticleGoogle Scholar
- Servant F, Bru C, Carrere S, Courcelle E, Gouzy J, Peyruc D, Kahn D: ProDom: automated clustering of homologous domains. Brief Bioinform. 2002, 3: 246-251. 10.1093/bib/3.3.246.PubMedView ArticleGoogle Scholar
- ProDom. [http://protein.toulouse.inra.fr/prodom.html]
- Chen N, Harris TW, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, Bradnam K, Canaran P, Chan J, Chen CK, Chen WJ, Cunningham F, Davis P, Kenny E, Kishore R, Lawson D, Lee R, Muller HM, Nakamura C, Pai S, Ozersky P, Petcherski A, Rogers A, Sabo A, Schwarz EM, Van Auken K, Wang Q, Durbin R, Spieth J, Sternberg PW, Stein LD: WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. Nucleic Acids Res. 2005, 33 Database Issue: D383-9.Google Scholar
- Drysdale RA, Crosby MA, Gelbart W, Campbell K, Emmert D, Matthews B, Russo S, Schroeder A, Smutniak F, Zhang P, Zhou P, Zytkovicz M, Ashburner M, de Grey A, Foulger R, Millburn G, Sutherland D, Yamada C, Kaufman T, Matthews K, DeAngelo A, Cook RK, Gilbert D, Goodman J, Grumbling G, Sheth H, Strelets V, Rubin G, Gibson M, Harris N, Lewis S, Misra S, Shu SQ: FlyBase: genes and gene models. Nucleic Acids Res. 2005, 33 Database Issue: D390-5.Google Scholar
- Christie KR, Weng S, Balakrishnan R, Costanzo MC, Dolinski K, Dwight SS, Engel SR, Feierbach B, Fisk DG, Hirschman JE, Hong EL, Issel-Tarver L, Nash R, Sethuraman A, Starr B, Theesfeld CL, Andrada R, Binkley G, Dong Q, Lane C, Schroeder M, Botstein D, Cherry JM: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 2004, 32 Database issue: D311-4. 10.1093/nar/gkh033.View ArticleGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006. 10.1101/gr.229102. Article published online before print in May 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Curwen V, Eyras E, Andrews TD, Clarke L, Mongin E, Searle SM, Clamp M: The Ensembl automatic gene annotation system. Genome Res. 2004, 14: 942-950. 10.1101/gr.1858004.PubMedPubMed CentralView ArticleGoogle Scholar
- Lenhard B, Hayes WS, Wasserman WW: GeneLynx: a gene-centric portal to the human genome. Genome Res. 2001, 11: 2151-2157. 10.1101/gr.199801.PubMedPubMed CentralView ArticleGoogle Scholar
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003, 31: 248-250. 10.1093/nar/gkg056.PubMedPubMed CentralView ArticleGoogle Scholar
- Ulysses. [http://www.cisreg.ca/ulysses]
- Simpson JC, Wellenreuther R, Poustka A, Pepperkok R, Wiemann S: Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Reports. 2000, 1: 287-292. 10.1093/embo-reports/kvd058.PubMedPubMed CentralView ArticleGoogle Scholar
- Hoja MR, Wahlestedt C, Hoog C: A visual intracellular classification strategy for uncharacterized human proteins. Exp Cell Res. 2000, 259: 239-246. 10.1006/excr.2000.4948.PubMedView ArticleGoogle Scholar
- Eisenhaber F, Wechselberger C, Kreil G: The Brix domain protein family -- a key to the ribosomal biogenesis pathway?. Trends Biochem Sci. 2001, 26: 345-347. 10.1016/S0968-0004(01)01851-5.PubMedView ArticleGoogle Scholar
- Fujita H, Umezuki Y, Imamura K, Ishikawa D, Uchimura S, Nara A, Yoshimori T, Hayashizaki Y, Kawai J, Ishidoh K, Tanaka Y, Himeno M: Mammalian class E Vps proteins, SBP1 and mVps2/CHMP2A, interact with and regulate the function of an AAA-ATPase SKD1/Vps4B. J Cell Sci. 2004, 117: 2997-3009. 10.1242/jcs.01170.PubMedView ArticleGoogle Scholar
- Hodges E, Redelius JS, Wu W, Hoog C: Accelerated discovery of novel protein function in cultured human cells. Mol Cell Proteomics. 2005, 4: 1319-1327. 10.1074/mcp.M500117-MCP200.PubMedView ArticleGoogle Scholar
- Howard TL, Stauffer DR, Degnin CR, Hollenberg SM: CHMP1 functions as a member of a newly defined family of vesicle trafficking proteins. J Cell Sci. 2001, 114: 2395-2404.PubMedGoogle Scholar
- Stauffer DR, Howard TL, Nyun T, Hollenberg SM: CHMP1 is a novel nuclear matrix protein affecting chromatin structure and cell-cycle progression. J Cell Sci. 2001, 114: 2383-2393.PubMedGoogle Scholar
- Bogengruber E, Briza P, Doppler E, Wimmer H, Koller L, Fasiolo F, Senger B, Hegemann JH, Breitenbach M: Functional analysis in yeast of the Brix protein superfamily involved in the biogenesis of ribosomes. FEMS Yeast Res. 2003, 3: 35-43.PubMedView ArticleGoogle Scholar
- Siniossoglou S, Santos-Rosa H, Rappsilber J, Mann M, Hurt E: A novel complex of membrane proteins required for formation of a spherical nucleus. Embo J. 1998, 17: 6449-6464. 10.1093/emboj/17.22.6449.PubMedPubMed CentralView ArticleGoogle Scholar
- Siniossoglou S, Lutzmann M, Santos-Rosa H, Leonard K, Mueller S, Aebi U, Hurt E: Structure and assembly of the Nup84p complex. J Cell Biol. 2000, 149: 41-54. 10.1083/jcb.149.1.41.PubMedPubMed CentralView ArticleGoogle Scholar
- Cronshaw JM, Krutchinsky AN, Zhang W, Chait BT, Matunis MJ: Proteomic analysis of the mammalian nuclear pore complex. J Cell Biol. 2002, 158: 915-927. 10.1083/jcb.200206106.PubMedPubMed CentralView ArticleGoogle Scholar
- Walther TC, Alves A, Pickersgill H, Loiodice I, Hetzer M, Galy V, Hulsmann BB, Kocher T, Wilm M, Allen T, Mattaj IW, Doye V: The conserved Nup107-160 complex is critical for nuclear pore complex assembly. Cell. 2003, 113: 195-206. 10.1016/S0092-8674(03)00235-6.PubMedView ArticleGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res. 2005, 33 Database Issue: D154-9.Google Scholar
- Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D: GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics. 1998, 14: 656-664. 10.1093/bioinformatics/14.8.656.PubMedView ArticleGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2005, 33: D54-8. 10.1093/nar/gki031.PubMedPubMed CentralView ArticleGoogle Scholar
- Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A. 2002, 99: 4465-4470. 10.1073/pnas.012025199.PubMedPubMed CentralView ArticleGoogle Scholar
- Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW: PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics. 2003, 4: 11-10.1186/1471-2105-4-11.PubMedPubMed CentralView ArticleGoogle Scholar
- Sauer IM, Bialek D, Efimova E, Schwartlander R, Pless G, Neuhaus P: "Blogs" and "wikis" are valuable software tools for communication within research groups. Artif Organs. 2005, 29: 82-83. 10.1111/j.1525-1594.2004.29005.x.PubMedView ArticleGoogle Scholar
- BioWiki. [http://www.biowiki.org]
- The Pleiades Project. [http://www.cisreg.ca/pleiades/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.