A customized Web portal for the genome of the ctenophore Mnemiopsis leidyi
© Moreland et al.; licensee BioMed Central Ltd. 2014
Received: 12 November 2013
Accepted: 31 March 2014
Published: 28 April 2014
Mnemiopsis leidyi is a ctenophore native to the coastal waters of the western Atlantic Ocean. A number of studies on Mnemiopsis have led to a better understanding of many key biological processes, and these studies have contributed to the emergence of Mnemiopsis as an important model for evolutionary and developmental studies. Recently, we sequenced, assembled, annotated, and performed a preliminary analysis on the 150-megabase genome of the ctenophore, Mnemiopsis. This sequencing effort has produced the first set of whole-genome sequencing data on any ctenophore species and is amongst the first wave of projects to sequence an animal genome de novo solely using next-generation sequencing technologies.
The Mnemiopsis Genome Project Portal (http://research.nhgri.nih.gov/mnemiopsis/) is intended both as a resource for obtaining genomic information on Mnemiopsis through an intuitive and easy-to-use interface and as a model for developing customized Web portals that enable access to genomic data. The scope of data available through this Portal goes well beyond the sequence data available through GenBank, providing key biological information not available elsewhere, such as pathway and protein domain analyses; it also features a customized genome browser for data visualization.
We expect that the availability of these data will allow investigators to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development. The overall approach taken in the development of this Web site can serve as a viable model for disseminating data from whole-genome sequencing projects, framed in a way that best-serves the specific needs of the scientific community.
Ctenophores are an important group of early-branching metazoans that are essential for understanding the evolution of multicellular animals, the relationship between genomic complexity and morphological complexity, and the molecular basis for the evolution of novel cell types such as epithelia, neurons, muscle, and stem cells. One ctenophore species that has received particular attention is Mnemiopsis leidyi, which is native to the coastal waters of the Atlantic Ocean. Studies in Mnemiopsis have advanced our understanding of a number of important biological processes such as regeneration, axial patterning, and bioluminescence [1–3]. As such, Mnemiopsis has emerged as an important model organism for understanding the immense diversity and complexity seen in the early evolution of animals.
Despite the importance of Mnemiopsis as an emerging model organism, there were no high-quality genome-scale sequence data available for any ctenophore species until recently. To address this dearth of genome-scale sequence data, we recently completed the sequencing, assembly, annotation, and preliminary analysis of the 150-megabase genome of Mnemiopsis leidyi; these data will serve as an invaluable resource for the growing community of developmental, evolutionary, and marine biologists studying important questions regarding early branching metazoan biology. Initial studies utilizing these sequence data have contributed to our understanding of the evolution of gene families [5–7], signaling pathways [8, 9], protein domains , miRNAs , and genes involved in the production and detection of light . The availability of these data also has provided a solid foundation for studies aimed at resolving the question of the phylogenetic position of the ctenophores .
In recent years, databases have been created to house whole-genome sequencing data from several emerging model organisms. Genomic data and annotation are typically made accessible via public genome portals at sequencing centers such as the US Department of Energy’s Joint Genome Institute (JGI)  and the Broad Institute , while other groups have developed Web-based genomic database resources that provide additional analysis tools  and browsing options  to increase the utility of the data. Still others have implemented genomic resources that offer the scientific community access to genomic annotation and actively seek user contributions . Ideally, for each organism with a sequenced genome, there would be a single centralized resource where most (if not all) data retrieval and analysis could take place; this kind of resource would include, at a minimum, the ability to search, browse, and download sequence and annotation data, visualize genomic data via a Web-based browser tool, and encourage the active engagement of the scientific community in maintaining a wiki-style resource for capturing supplementary annotations of gene models and predicted proteins. Moreover, this kind of resource would (and should) be developed and maintained by researchers in the model organism community who intimately understand the needs of their scientific colleagues, presenting the data in an intuitive, user-friendly and concise manner despite its sheer volume and complexity.
In our own experience advising groups who have undertaken whole-genome sequencing projects, we have found that many of these groups do not have ready access to the kind of programming resources needed to implement some of the more “advanced” database solutions currently available. With that in mind, and to facilitate the creation of the kind of centralized genomic data resource described above, we set out to develop a generalized framework that strikes a reasonable balance between ease of implementation and documented structure, without the additional constraints posed by some publicly available database schemas.
Here, we describe the development and features of a comprehensive Web-based data portal for navigating the recently completed genome sequence of Mnemiopsis leidyi (http://research.nhgri.nih.gov/mnemiopsis/). The Mnemiopsis Genome Project Portal (or “MGP Portal”) is a biologist-centric resource designed with a particular emphasis on usability, intuitive navigation, and clarity. Some key features of the MGP Portal include the ability to retrieve selected nucleotide and protein sequences, the availability of whole-genome datasets for download, an integrated BLAST utility for sequence comparisons, a genome browser tool, a gene-centric wiki, and “phylogenetically informed” gene ortholog clusters mapped to human KEGG pathways. Furthermore, the scope of data accessible through this Web site goes well beyond the sequence data available at GenBank, providing other key biological information such as Gene Ontology term assignments and data from pathway and protein domain analyses. In addition, we offer a set of Perl modules that can be utilized by other scientists as a generalized framework for implementing a gene page in MediaWiki, as well as a customizable genome browser for visualizing large-scale genomic data using JBrowse.
Construction and content
Displaying customized genome annotation
To build feature tracks, JBrowse requires properly formatted input files. While newer versions of JBrowse released subsequent to the creation of the Mnemiopsis Genome Browser are able to accept a slightly expanded number of inputs (including specific relational database dumps), the generic feature format version 3 (GFF3) flat file was the input type we adopted, a format that continues to be supported by JBrowse at the time of this writing (version 1.11.1). Adoption of the GFF3 format greatly facilitated the processing of multiple output file types produced by a number of data analysis programs. To begin, the scaffoldToGFF3.pl script (Additional file 1) can be used to reformat scaffold sequences from FASTA to GFF3. Two parameters (–i and –o) are required, specifying the input scaffold file and desired output directory, respectively. Two optional parameters (–l and –n) are also available. The first should be called if the gap-indication letter in the input is different from ‘N’ , accepting a single character (e.g., X) as a substitute, while the second parameter can be called in order to specify which feature-generating program was used. The script creates a GFF3 file for each sequence in the input scaffold FASTA file and is able to handle both gapped [e.g., the scaffold (SCF) track] and repeat-masked [e.g., the repeat-masked (MASK) track] regions in a scaffold. Next, evmToGFF3.pl (Additional file 2) parses a GFF3-formatted output file created by EvidenceModeler (EVM)  by collecting data about the start and end positions of predicted genes, using this information to create well-formed GFF3 files. The script accepts several additional parameters; the input EVM file (–i) and desired output directory (–o) must be set, while the third and optional parameter (–n) is, again, the name of the feature-generating program (e.g., EVM). The third module is called cufflinksToGFF3.pl (Additional file 3) and is used to parse predicted transcript assemblies from the GTF-formatted file created by Cufflinks . The cufflinksToGFF3.pl script has the same behavior for transcript location as evmToGFF3.pl has for predicted gene location, and accepts the same three parameters (–i, –o, and –n).
To import GFF3 data into JBrowse for display as custom tracks in the main genome window, a series of three JBrowse-supplied Perl scripts (prepare-refseqs.pl, biodb-to-json.pl, and generate-names.pl) need to be executed using appropriate parameters and a system-specific configuration file, the details of which can be found in the tutorials on the JBrowse Web site (http://www.jbrowse.org). A number of custom CGI scripts were written to create the hyperlinks connecting features in JBrowse to the various sources of gene data.
A Perl script named create_wiki_page.pl has been used to create MediaWiki pages for displaying genomic data (Additional file 4). Here, we provide a sample wiki page that takes FASTA-formatted nucleotide and protein sequences, GFF3 files containing exonic coordinates, and an hmmscan output file containing information on Pfam-A domains as input. The create_wiki_page.pl script requires five parameters. The first three parameters specify a set of input files, including a nucleotide FASTA file (-n), a protein FASTA file (-a), and a Pfam-A file (-p). The remaining parameters specify a directory containing the input GFF3 files containing the exonic coordinates (-d) and an output directory (-o). In addition, we show a PHP command line that imports a wiki page into MediaWiki:php/$WIKIHOME/maintenance/importTextFile.php - -user=USER wikipage.out The wikipage.out file is created by the create_wiki_page.pl.
Researchers interested in creating a customized genome browser or gene wiki for visualizing genome-related data are encouraged to utilize the aforementioned scripts (Additional files 1, 2, 3 and 4), as their implementation satisfies the fundamental requirements of both JBrowse and MediaWiki. Furthermore, these modules may serve as a useful framework for both the development of gene wikis and more advanced genome browser tracks as new JBrowse applications are created to visualize genomic data.
User interface and genome browser implementation
Utility and discussion
Browsing the Mnemiopsisgenome
The JBrowse display is organized by genomic scaffold. Scaffolds are named using a six-character convention (e.g., MLnnnn); the ML designates the species (Mnemiopsis leidyi), and the individual scaffolds are numbered from 0001 to 5100 (e.g., ML0001). Gene identifiers (e.g., ML000129a) start with a prefix indicating the scaffold on which the gene is located (in this example, ‘ML0001’), followed by a non-padded integer that is unique in combination with the scaffold identifier (in this case, ‘29’) and ending with a lower-case letter that specifies the gene isoform (in this case, ‘a’). Data displayed in the browser can be searched using a variety of Mnemiopsis identifiers. A scaffold-based query takes the user directly to that scaffold, while a gene-based search goes to that gene’s location on the appropriate scaffold. A user may also search the genome browser using a Mnemiopsis GenBank mRNA identifier, an EST identifier, or a Pfam-A domain name (e.g., AF293700.1, FC475136, or “Glyco hydro 20”, respectively). The PFAM2.2 track was created by running hmmscan against the Protein Models 2.2, the Unfiltered Protein Models, and the six-frame translations of the Mnemiopsis genome. Scaffold coordinates are displayed across the top of the browser window. Navigation options can be found beneath the coordinate bar, including the zoom tool and left-right arrows. Users can also refine the displayed region by entering the scaffold coordinates into the search box to the right of the navigation options.
Genome browser tracks are described in the Track Descriptions link above the left sidebar. The consensus Mnemiopsis gene prediction models (track 2.2) are presented by default. Additional track can be added by clicking on the appropriate track box on the left sidebar of the main view window. All track options are displayed in a given scaffold window even when there are no annotated features in that particular region. Features are represented as black arrows, with the direction of the arrow indicating the orientation. Exons are presented as light-colored solid bars (e.g., green for 2.2 and pink for 2.2UF), while untranslated regions (both 5’ and 3’) are rendered using darker-shaded colors. Assembled genomic scaffolds (SCF) are depicted as solid black tracks with intermittent bright pink gaps. The MASK track appears as a solid black bar, with blue highlighting the regions that were repeat-masked using VMatch (Figure 3). The Reference sequence track depicts the scaffold sequence, but only when the display is fully zoomed in.
Mnemiopsisgenes in KEGG pathways
For each KEGG pathway, each row in the ortholog cluster table represents a cluster of orthologous genes from our clustering analysis . For a cluster to be included in the table, at least one human gene from that cluster must be present in the KEGG pathway represented. In most cases, a row consists of one or more human genes that belong to the selected KEGG pathway, along with the computationally predicted orthologs from Mnemiopsis and 21 other model organisms. The ‘Cluster’ column indicates the most inclusive phylogenetic clade that encompasses all of the genes in the particular cluster (e.g., ‘Metazoa’ could indicate that the cluster contains genes from both bilaterians and cnidarians and thus cannot be characterized by a less inclusive clade, such as ‘Bilateria’). Each human gene is hyperlinked to its Entrez Gene entry. The ‘Ratio’ column represents the number of human genes in the particular cluster that are in a given pathway (numerator) over the number of total human genes in that cluster (denominator). The higher the ratio, the more likely the non-human orthologs in the cluster are involved in the pathway. The numbers in the columns under each species abbreviation indicate the number of genes from that species that are in that cluster. Each number in the ‘Ml’ (Mnemiopsis leidyi) column is linked to the appropriate Mnemiopsis gene ID(s) and corresponding Gene Wiki pages.
Pfam domains in Mnemiopsisproteins
Another way to characterize the Mnemiopsis genes is to determine the protein domains that they encode. We used hmmscan from the HMMER suite (HMMER 3.0; March 2010) to search the Protein Models 2.2 and Unfiltered Protein Models for domains from the Pfam-A database (version 25). The gathering threshold (cut_ga) option was used to ensure conservative domain prediction. The Pfam Domains link on the home page of the MGP Portal takes the user to a query page, where researchers can specify a given Pfam-A domain by name or accession number, then search for Mnemiopsis genes that contain that domain (Additional file 7: Figure S3). The results are displayed as a list of protein models, listed by gene identifier, and the number of query domains found in those protein models. Additionally, a user may download FASTA-formatted Pfam-A domain sequences from the resulting list by clicking on the check boxes next to the sequence(s) of interest, selecting either Pfam-A domain only or the full-length domain-containing protein from the pull-down menu, and clicking ‘Get’.
Pre-compiled BLAST hits are enumerated in tabular form. Each Mnemiopsis protein was compared to the UniProt and NCBI non-redundant protein databases (nr) using BLASTP. The results display the hit number, the accession numbers, E-values, and brief descriptions of the top four hits (lowest E-values). Accession numbers are linked to relevant corresponding entries at UniProt and GenBank. The E-values are hyperlinked to the pairwise BLAST alignments.
Each Mnemiopsis protein was also compared to sequence data from developmentally relevant organisms, including Homo sapiens, Drosophila melanogaster, Capitella teleta, Amphimedon queenslandica, Nematostella vectensis, Hydra magnipapillata, Trichoplax adhaerens, Monosiga brevicollis, Salpingoeca rosetta, Capsaspora owczarzaki, fungi, plants, and non-eukaryotes. The top hits for BLASTP and TBLASTN results, falling below an established E-value threshold (E-value ≤ 1 × 10-6), are displayed along with their gene or protein identifiers, E-values, and description of the best hits. For the Mnemiopsis organismal database search, the gene identifier of the top non-self hit is displayed (and linked to its corresponding Gene Wiki page) along with the E-value for that alignment.
The Gene Wiki also contains a section displaying the Pfam-A domains that are encoded by the protein. The Pfam identifier, domain architectures, sequence start and end coordinates, HMM start and end coordinates, E-values, and domain sequence are displayed for each Pfam-A domain. All Pfam identifiers are hyperlinked to their corresponding entries on the Pfam Web site . To further assist in classification, GO terms are presented for each gene, with GO terms assigned using the Argot2  method. Functional annotations derived using Blast2GO  are also presented for each gene.
Retrieving single scaffold sequences
Downloading full or partial datasets
Mnemiopsis leidyi complete sequence datasets available for download from the Mnemiopsis Genome Project Portal
Number of sequences
Genome assembly (scaffolds)
Unfiltered protein models
Demonstrating the MGP Portal’s utility: a worked example
The MGP Portal was developed to facilitate research that would benefit from the availability of genomic information from this emerging model organism and, to this end, it includes a number of intuitive data analysis tools. To illustrate this point, consider the case of a developmental biologist studying the human TALE class homeobox gene family (e.g., PBX3; [GenBank:NP_001128250.1]) who may be interested in comparing these sequences against (or predicting novel) Mnemiopsis homeodomain orthologs. A straightforward approach to addressing this question would be to run a BLASTP search of the PBX3 protein sequence against the Mnemiopsis Protein Models (2.2) database. The Mnemiopsis BLAST results display a number of high-scoring candidate proteins that can be further evaluated for properties characteristic of TALE class homeodomains (e.g., a TALE-type homeobox; Figure 1).
Similarly, a researcher may also want to use their own custom scripts or external computational tools to further-explore the available Mnemiopsis data sets. In such a case, the Download Sequence links in the MGP Portal can be used to download both the Protein Models and the Unfiltered Protein Models for analysis with tools from the HMMER suite  (e.g., hmmsearch Homeobox.hmm ML2.2aa > ML_novel_HDs). Predicted domains with E-values below an inclusion threshold (e.g., E-value < 0.05) could then be considered candidate homeodomains for further evaluation.
The Mnemiopsis Genome Project Portal is intended as a resource for investigators from the scientific community to obtain genomic information on Mnemiopsis through an intuitive and easy-to-use interface; it also serves as a model for researchers undertaking the development of such a customized genome portal themselves. There are a number of comprehensive data portals available for well-established model organisms (e.g., FlyBase). However, as we searched for model Web sites from which to draw inspiration for the MGP Portal, we found that many repositories for next-generation sequencing data are simply Web sites with lists of links to raw sequence data accompanied by minimal annotation, or were non-intuitive and difficult to navigate. Based on this experience, we felt that the selection and utilization of essential resources to systematically manage and disseminate the considerable amounts of data generated by these sequencing projects was imperative. Thus, the presentation and conveyance of such a genome Web portal should be intuitive, user-friendly, and concise. It is within this framework that we present the MGP Portal as such a resource for the recently completed genome sequence of Mnemiopsis leidyi, and we are hopeful that this resource will inspire other groups as they create Web portals of their own.
It was our intent during the development of the MGP Portal to develop a resource to maximize usability while presenting a comprehensive series of datasets not available elsewhere. Recognizing the difficulties and lessons learned from the development of such a resource, and in our continued effort to further communicate our shared experiences to the scientific community at large, we encourage other investigators to consider the proposed genome portal model and, as such, have included a series of scripts (Additional files 1, 2, 3 and 4) to facilitate the conversion of output files produced by various programs. Specifically, this series of scripts can be used to format annotation data for visualization within a customized genome browser and a wiki.
As described above, the MGP Portal contains sequence-based information and several customized utilities not available elsewhere, increasing the utility of the data generated by our group in the course of our Mnemiopsis whole-genome sequencing project . The genome browser tool provides an intuitive interface for users to visualize the various types of data available, including data resulting from our comprehensive annotation of the Mnemiopsis genome. Most importantly, many features of this site make it easy for users who do not have a background in bioinformatics to straightforwardly access information presented from a comparative genomics point-of-view, without having to perform many of the analyses themselves. For instance, our phylogenetically relevant gene clusters are mapped to human KEGG pathways, providing a clear phylogenetic perspective for any particular Mnemiopsis gene (or pathway of genes) of interest. In addition, users may contribute to our gene annotation efforts by adding isoforms, in situ images, or other notes to any Gene Wiki page using a secure login. We trust that the availability of these data will allow investigators from numerous fields (such as developmental, evolutionary, and marine biology) to advance their own research projects aimed at understanding phylogenetic diversity and the evolution of proteins that play a fundamental role in metazoan development.
Availability and requirements
The Mnemiopsis Genome Project Portal is freely available at http://research.nhgri.nih.gov/mnemiopsis, with no barriers to access. Registration is only required if users wish to contribute data to the Isoforms, In situ Images, References, or Notes sections of any of the Gene Wiki pages.
This research was supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. We would like to thank Steven Bond, Mark Fredriksen, Derek Gildea, and Evan Maxwell for their thoughtful, constructive comments during the development of the Portal. We also thank Steven Bond, Derek Gildea, and Evan Maxwell for their critical reading of the manuscript.
- Henry JQ, Martindale MQ: Regulation and regeneration in the ctenophore Mnemiopsis leidyi. Dev Biol. 2000, 227 (2): 720-733. 10.1006/dbio.2000.9903.PubMedView ArticleGoogle Scholar
- Martindale MQ, Finnerty JR, Henry JQ: The Radiata and the evolutionary origins of the bilaterian body plan. Mol Phylogenet Evol. 2002, 24 (3): 358-365. 10.1016/S1055-7903(02)00208-7.PubMedView ArticleGoogle Scholar
- Freeman G, Reynolds GT: The development of bioluminescence in the ctenophore Mnemiopsis leidyi. Dev Biol. 1973, 31 (1): 61-100. 10.1016/0012-1606(73)90321-7.PubMedView ArticleGoogle Scholar
- Ryan JF, Pang K, Schnitzler CE, Nguyen AD, Moreland RT, Simmons DK, Koch BJ, Francis WR, Havlak P, Smith SA, Putnam NH, Haddock SHD, Dunn CW, Wolfsberg TG, Mullikin JC, Martindale MQ, Baxevanis AD, NISC Comparative Sequencing Program: The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science. 2013, 342 (6164): 1242592-10.1126/science.1242592.PubMed CentralPubMedView ArticleGoogle Scholar
- Ryan JF, Pang K, Mullikin JC, Martindale MQ, Baxevanis AD: The homeodomain complement of the ctenophore Mnemiopsis leidyi suggests that ctenophora and porifera diverged prior to the ParaHoxozoa. Evodevo. 2010, 1 (1): 9-10.1186/2041-9139-1-9.PubMed CentralPubMedView ArticleGoogle Scholar
- Reitzel AM, Pang K, Ryan JF, Mullikin JC, Martindale MQ, Baxevanis AD, Tarrant AM: Nuclear receptors from the ctenophore Mnemiopsis leidyi lack a zinc-finger DNA-binding domain: lineage-specific loss or ancestral condition in the emergence of the nuclear receptor superfamily?. Evodevo. 2011, 2 (1): 3-10.1186/2041-9139-2-3.PubMed CentralPubMedView ArticleGoogle Scholar
- Liebeskind BJ: Evolution of sodium channels and the new view of early nervous system evolution. Commun Integr Biol. 2011, 4 (6): 679-683.PubMed CentralPubMedView ArticleGoogle Scholar
- Pang K, Ryan JF, Mullikin JC, Baxevanis AD, Martindale MQ: Genomic insights into Wnt signaling in an early diverging metazoan, the ctenophore Mnemiopsis leidyi. Evodevo. 2010, 1 (1): 10-10.1186/2041-9139-1-10.PubMed CentralPubMedView ArticleGoogle Scholar
- Pang K, Ryan JF, Baxevanis AD, Martindale MQ: Evolution of the TGF-beta signaling pathway and its potential role in the ctenophore, Mnemiopsis leidyi. PLoS One. 2011, 6 (9): e24152-10.1371/journal.pone.0024152.PubMed CentralPubMedView ArticleGoogle Scholar
- Koch BJ, Ryan JF, Baxevanis AD: The diversification of the LIM superclass at the base of the metazoa increased subcellular complexity and promoted multicellular specialization. PLoS One. 2012, 7 (3): e33261-10.1371/journal.pone.0033261.PubMed CentralPubMedView ArticleGoogle Scholar
- Maxwell EK, Ryan JF, Schnitzler CE, Browne WE, Baxevanis AD: MicroRNAs and essential components of the microRNA processing machinery are not encoded in the genome of the ctenophore Mnemiopsis leidyi. BMC Genomics. 2012, 13 (1): 714-10.1186/1471-2164-13-714.PubMed CentralPubMedView ArticleGoogle Scholar
- Schnitzler CE, Pang K, Powers ML, Reitzel AM, Ryan JF, Simmons D, Tada T, Park M, Gupta J, Brooks SY, Blakesley RW, Yokoyama S, Haddock SH, Martindale MQ, Baxevanis AD: Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes. BMC Biol. 2012, 10 (1): 107-10.1186/1741-7007-10-107.PubMed CentralPubMedView ArticleGoogle Scholar
- Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, Kuo A, Mitros T, Salamov A, Carpenter ML, Signorovitch AY, Moreno MA, Kamm K, Grimwwod J, Schmutz J, Shapiro H, Grigoriev IV, Buss LW, Schierwater B, Dellaporta SL, Rokhsar DS: The Trichoplax genome and the nature of placozoans. Nature. 2008, 454 (7207): 955-960. 10.1038/nature07191.PubMedView ArticleGoogle Scholar
- Genome Index - Origins of Multicellularity; Broad Institute. http://www.broadinstitute.org/annotation/genome/multicellularity_project/GenomesIndex.html,
- Sullivan JC, Ryan JF, Watson JA, Webb J, Mullikin JC, Rokhsar D, Finnerty JR: StellaBase: the nematostella vectensis genomics database. Nucleic Acids Res. 2006, 34 (Database issue): D495-D499.PubMed CentralPubMedView ArticleGoogle Scholar
- Kreppel L, Fey P, Gaudet P, Just E, Kibbe WA, Chisholm RL, Kimmel AR: dictyBase: a new Dictyostelium discoideum genome database. Nucleic Acids Res. 2004, 32 (Database issue): D332-D333.PubMed CentralPubMedView ArticleGoogle Scholar
- Aiptasia Wiki. http://aiptasia.cs.vassar.edu/AiptasiaWiki/,
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH: JBrowse: a next-generation genome browser. Genome Res. 2009, 19 (9): 1630-1638. 10.1101/gr.094607.109.PubMed CentralPubMedView ArticleGoogle Scholar
- Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR: Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008, 9 (1): R7-10.1186/gb-2008-9-1-r7.PubMed CentralPubMedView ArticleGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Buren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28 (5): 511-515. 10.1038/nbt.1621.PubMed CentralPubMedView ArticleGoogle Scholar
- Deng W, Nickle DC, Learn GH, Maust B, Mullins JI: ViroBLAST: a stand-alone BLAST web server for flexible queries of multiple databases and user's datasets. Bioinformatics. 2007, 23 (17): 2334-2336. 10.1093/bioinformatics/btm331.PubMedView ArticleGoogle Scholar
- PHP. http://www.php.net,
- The Perl Programming Language. http://www.perl.org,
- NCBI BLAST FTP site. ftp://ftp.ncbi.nlm.nih.gov/blast,
- Python Programming Language. http://www.python.org,
- Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011, 39 (Web Server issue): W29-W37.PubMed CentralPubMedView ArticleGoogle Scholar
- Pett W, Ryan JF, Pang K, Martindale MQ, Baxevanis AD, Lavrov DV, NISC Comparative Sequencing Program: Extreme mitochondrial evolution in the ctenophore Mnemiopsis leidyi. Mitochondrial DNA. 2011, 22 (4): 130-142. 10.3109/19401736.2011.624611.PubMed CentralPubMedView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-652. 10.1038/nbt.1883.PubMed CentralPubMedView ArticleGoogle Scholar
- Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001, 29 (22): 4633-4642. 10.1093/nar/29.22.4633.PubMed CentralPubMedView ArticleGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. 2005, 33 (Database issue): D54-D58.PubMed CentralPubMedView ArticleGoogle Scholar
- Hosack DA, Dennis G, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE. Genome Biol. 2003, 4 (10): R70-10.1186/gb-2003-4-10-r70.PubMed CentralPubMedView ArticleGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2010, 38 (Database issue): D211-D222.PubMed CentralPubMedView ArticleGoogle Scholar
- Falda M, Toppo S, Pescarolo A, Lavezzo E, Di Camillo B, Facchinetti A, Cilia E, Velasco R, Fontana P: Argot2: a large scale function prediction tool relying on semantic similarity of weighted gene ontology terms. BMC Bioinforma. 2012, 13 (Suppl 4): S14-10.1186/1471-2105-13-S4-S14.View ArticleGoogle Scholar
- Conesca A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.