- Research article
- Open Access
Transcriptomics of diapause in an isogenic self-fertilizing vertebrate
BMC Genomicsvolume 16, Article number: 989 (2015)
Many vertebrate species have the ability to undergo weeks or even months of diapause (a temporary arrest of development during early ontogeny). Identification of diapause genes has been challenging due in part to the genetic heterogeneity of most vertebrate animals.
Here we take the advantage of the mangrove rivulus fish (Kryptolebias marmoratus or Kmar)—the only vertebrate that is extremely inbred due to consistent self-fertilization—to generate isogenic lineages for transcriptomic dissection. Because the Kmar genome is not publicly available, we built de novo genomic (642 Mb) and transcriptomic assemblies to serve as references for global genetic profiling of diapause in Kmar, via RNA-Seq. Transcripts unique to diapause in Kmar proved to constitute only a miniscule fraction (0.1 %) of the total pool of transcribed products. Most genes displayed lower expression in diapause than in post-diapause. However, some genes (notably dusp27, klhl38 and sqstm1) were significantly up-regulated during diapause, whereas others (col9a1, dspp and fmnl1) were substantially down-regulated, compared to both pre-diapause and post-diapause.
Kmar offers a strong model for understanding patterns of gene expression during diapause. Our study highlights the importance of using a combination of genome and transcriptome assemblies as references for NGS-based RNA-Seq analyses. As for all identified diapause genes, in future studies it will be critical to link various levels of RNA expression with the functional roles of the coded products.
Many organisms have the ability to enter a temporary state of suspended animation before resuming the normal progression of life. Various terms describe the phenomenon in specific situations in nature: (i) embryonic diapause, in which an embryo’s development is temporarily arrested during early ontogeny ; (ii) torpor, when body temperature and metabolic rate in endothermic animals are significantly reduced during certain times of the day ; (iii) hibernation or multiday torpor, as utilized by various animals to escape harsh winter conditions [3, 4]; (iv) aestivation, when multiday torpor occurs in hot or dry seasons in warm climates ; and (v) cryptobiosis, which entails a temporary absence of measurable metabolic activity [6, 7]. In this study, we will use the term “diapause” when it occurs during embryonic development. Other terms sometimes used to describe this phenomenon in Kmar are “delayed hatching” or “embryonic quiescence”.
Diapause was first described in roe deer in 1854, where blastocyst development and subsequent implantation in the uterus were delayed for 4–5 months . Similar phenomena have since been discovered in more than 130 species of mammals [1, 9, 10]. Remarkably, mammalian embryonic diapause can last up to 90 % of the total gestational period . Another form of diapause is displayed by some egg-laying fishes in which embryonic hatching can be delayed for a year or more. Such fish may display any of three different types of diapause: Diapause I, occurring during the dispersed-cell phase of early ontogeny; Diapause II, occurring mid-somite embryogenesis; and Diapause III, occurring just prior to hatching [12, 13].
Whereas the role of environmental cues in triggering diapause in killifish or more generally in vertebrates is well established [1, 13], biological mechanisms underlying the phenomenon have yet to be elucidated. For example, there is no consensus on how patterns of gene expression change during diapause. This problem promised to resolve following two major breakthroughs in the measurement of gene expression: (i) cDNA microarray technology in the mid-1990s and (ii) next generation sequencing (NGS) based RNA-Sequencing (RNA-Seq) technology in the mid-2000s [14, 15]. Indeed, about 40 studies used cDNA microarrays or RNA-Seq to assay biological samples undergoing some form of suspended animation, and many more papers have addressed the diapause syndrome more generally. RNA-Seq is superior to cDNA microarray in that it allows: (i) the capture of whole-genome expression (transcriptomics); (ii) execution without prior knowledge of annotated genes; and (iii) the detection and quantification of low-abundance genes or those with higher-fold changes of expression . The use of RNA-Seq to determine genes with distinct levels of expression during diapause previously has been confined to invertebrates (mainly insects) [17–19].
To address this issue, we take advantage of the unique mating/reproductive system of the mangrove rivulus fish (Kryptolebias marmoratus or Kmar) - the world’s only vertebrate species that is highly inbred due to consistent self-fertilization (selfing). We generate isogenic lineages for transcriptomic dissection of one form of diapause. In this hermaphroditic species, selfing repeated across successive generations leads to a rapid decay of heterozygosity and to the rise of isogenic lines that are effectively “clonal” [20, 21]. In captivity, Kmar survives well in 25 ppt saltwater, where >90 % of Kmar embryos enter diapause at stage 32 (the last stage prior to hatching when nutritional oil droplets are depleted) (Fig. 1a). In Kmar, the average duration of embryonic stages 1–32 is 24 days. However, in the laboratory we regularly observe Kmar embryos remaining in stage 32 (diapause) for up to 2.5 months. We suggest that Kmar offers an excellent model to understand the biology of diapause because of its genetic homogeneity, simple rearing environment, translucent externally developing embryos, and relatively straightforward form of diapause. Here, we report the use of next generation sequencing and RNA-Seq to distinguish the Kmar diapause transcriptome from those of pre-diapause and post-diapause (hatched) embryos (Fig. 1a, b). Since a Kmar genome sequence was not publicly available, we sequenced the whole genome of this species. We then built de novo genome assemblies to serve as a genomic reference for the RNA-Seq data. We also built de novo transcriptome assemblies to further capture and validate the RNA-Seq data. Differentially expressed genes unique to diapause in Kmar are herein identified and discussed.
Mapping of Kmar RNA-Seq reads to known reference genomes
By using TopHat software , we found that less than 10 % of Kmar RNA-Seq reads could be mapped against the annotated genomes of humans, mice, and several fishes (Amazon molly, platyfish, medaka, zebrafish, and fugu) (Table 1). To increase the number of mapped reads, we sequenced the whole Kmar genome and built a de novo reference genome using ABySS software . The genome size was determined to be 642 Mb (Fig. 2). Two additional references were built from RNA-Seq reads using Trans-ABySS and Trinity software [24, 25]. More than 89 % of the RNA-Seq reads were mapped against these three Kmar references (Table 1). The references were used as templates to identify transcripts. A computational program called “tuxedo suite” was utilized to identify, quantitate, and calculate differential transcript abundance between diapause and pre- or post-diapause RNA samples (Additional file 1: Table S1) .
Mitochondrial housekeeping genes had high and stable levels of gene expression across developmental stages
We found that the level of expression of mitochondrial housekeeping genes was high and consistent across the three developmental stages surveyed in Kmar (Fig. 3, average Log10(RPKM) = ~4.5, RPKM: reads per kilobase per million mapped reads). These mitochondrial genes include cytochrome oxidase, ribosomal RNAs, ATPase, and NADH dehydrogenase. The fact that the expression of such housekeeping genes was not affected by developmental stage makes comparison of the expression of other genes straightforward.
Genes with distinct levels of expression in diapause constitute a minority of surveyed genes
We observed that >97 % of the surveyed gene transcripts were detected in all Kmar samples (Table 2), including pre-diapause, three diapause replicates, and post-diapause. Of these detected transcripts, only 0.1 % were up- and down-regulated in diapause compared to both pre- and post-diapause. Most of these latter transcripts had lengths between 200 and 4000 nt and with expression abundances in the range 0.5 < Log10(RPKM) < 2.5 (Fig. 4). When diapause transcripts were compared to pre-diapause transcripts, the numbers of up- and down-regulated genes were approximately equal. However, for 67 % of surveyed genes, the level of gene expression in diapause was down-regulated compared to post-diapause (Table 3).
Most functional clusters of genes were down-regulated in diapause compared to post-diapause
Transcripts that showed distinct abundances in Kmar diapause compared to other developmental stages were annotated and their putative biological functions were identified from a gene ontology database . We identified 16 operational clusters of genes related to the following functions: extracellular matrix (ECM) (15.9 %), respiration (8.5 %), protein modification (7.7 %), development (7.7 %), nucleic acid modification (6.3 %), other cellular or enzymatic processes (4.3 %), carbohydrate modification (3.6 %), environmental responses (3.4 %), hormones (2.4 %), toxin-like proteins (2.2 %), cytoskeleton (2.0 %), cell signaling (2.0 %), ion binding (2.0 %), immune system (1.8 %), transporter functions (1.6 %), and cardiac and skeletal muscles (1.4 %). The remaining 25.7 % of transcripts with differential expression could not be assigned to any known gene. Most of these functional clusters were down-regulated in diapause compared to post-diapause, with the exception of four genes related to cardiac or skeletal muscle (myo18b, mylk2, myh2, and bag3) (Table 4, Fig. 5, Additional file 1: Table S2 ). Interestingly, an interactome analysis showed that those four loci are within the same interacting network of genes (Fig. 5) .
Few genes were up- or down-regulated in diapause
Our transcriptome profiling identified three genes (dusp27, klhl38, and sqstm1) that had higher expression in diapause than in both pre- and post-diapause in all three references (Table 5). Conversely, three other genes (col9a1, dspp, and fmnl1) were found to have lower expression in diapause than in both pre- and post-diapause (Table 5, Additional file 1: Table S2).
Environmental cues are known to contribute to the induction and termination of diapause in fishes, but any gene expression changes associated with the phenomenon are less well understood. Here we used NGS technologies to characterize patterns of gene expression during diapause in the world’s only vertebrate species that normally reproduces as a self-fertilizing hermaphrodite. Isogenic sets of Kmar embryos were screened for gene-expression profiles during diapause and compared against pre-diapause embryos and post-diapause larvae. To our knowledge, this is the first such study of naturally occurring diapause in any vertebrate species. Thus, massively parallel genetic expression analyses of vertebrate diapause via NGS-based RNA-Seq are unavailable for comparison with the current study.
Nevertheless, several diapause studies using an earlier technology (cDNA microarrays) can provide some useful perspectives. By using microarrays, 15,686 expression sequence tags (ESTs) were obtained from a time-course analysis of the Mummichog fish (Fundulus heteroclitus), including embryos with delayed hatching due to desiccation . The individual F. heteroclitus genes identified were different than those reported in the current study. However, the ontologies of various genes involved in energy metabolism, cellular stress response, cytoskeleton, ion regulation, extracellular matrices, transcriptional control, and growth may share some degree of similarity with those reported here . For invertebrates, several studies using RNA-Seq were published recently [17–19]. Again, particular genes found to be differentially regulated during diapause are not in obvious agreement with those identified in the current study. It may seem counterintuitive to observe that diapause occurs broadly across such animal taxa, yet few if any of the genes associated with the phenomenon were reportedly identical.
However, when we compare functional clusters of diapause genes (based on gene ontologies), we find some degree of similarity as exemplified by the following two summaries: (i) up-regulated transcripts during diapause in a leafcutting bee (Megachile rotundata) were involved in oxidative stress, neural activity, development, immune response, and ion homeostasis ; and (ii) differentially expressed diapause genes in a mosquito (Aedes albopictus) were involved in the cell cycle, oxidative phosphorylation, and DNA replication . Similar statements apply to our current study of Kmar. Overall, we have identified 16 functional clusters of genes associated with diapause in Kmar. Within a particular cluster (e.g. genes related to extracellular matrices), we found that multiple genes were differentially regulated during diapause. Another example involves cardiac muscle related genes that on average were up-regulated in diapause over both pre- and post-diapause (and which therefore might highlight the importance of maintaining a functional heart during diapause). Similarly, genes related to the immune system were up-regulated in diapause over pre-diapause (perhaps suggesting the importance of maintaining embryo resistance against pathogens). Conversely, numerous gene clusters were down-regulated in diapause versus post-diapause (as might generally be expected). Examples include clusters involved in extracellular matrices, respiration or energy metabolism, cellular or enzymatic processes, hormones, cytoskeleton, toxin-like, ion binding, and development-related functions. Our results clearly provide evidence that after diapause is terminated, such gene clusters are activated beyond levels that were expressed during diapause.
Kryptolebias marmoratus offers a strong model for understanding patterns of gene expression during diapause. Our study highlights the importance of using a combination of genome and transcriptome assemblies as references for NGS-based RNA-Seq analyses. For example, one gene cluster that proved to be up-regulated in Kmar diapause over both pre-diapause and post-diapause was related to cardiac function. Several other gene clusters important for various functions were on average similar in diapause and pre-diapause. However, numerous genes and gene clusters were down-regulated in diapause compared to post-diapause, suggesting that lower gene expression during delayed development is emblematic of the diapause phenomenon. Three individual diapause genes (dusp27, klhl38, and sqstm1) were highly up-regulated in diapause versus both pre-diapause and post-diapause. As for all identified diapause genes, in the future studies it will be critical to link various levels of RNA expression with the functional roles of the coded products.
Live Kmar were provided by Dr. Ryan Earley, University of Alabama, Tuscaloosa. Progenitors of Kmar lineages used in this study were: Rad2 and Rad7 (collected at N27°20'48.4", W080°14'16.9"), OSR2.5 and OSR7.8 (N24°36'49.4", W081°33'06.7"), Nuke 13 (N27°21'00.2", W080°14'22.5"), PLT03 (N24°59'24.8", W080°33'04.7"), SOB8 (N24°36'05.2", W081°34'34.1") and FDS08 (N27°37'42.1", W082°42'13.6"). The fish were kept in 25 ppt saltwater (Instant Ocean® Sea Salt, Cat. No. SS15-10, United Pet Group Inc., Cincinnati, OH) on a 12 h light and 12 h dark photoperiod at 26 °C. Embryonic stages were determined under inverted light microscopy following the descriptions by . Maintenance of Kmar (aquatic species) follows General Guidance for the Care and Maintenance of Ectothermic Species, Institutional Animal Care and Use Committee (IACUC) Protocol #2015-3161-0, University of California, Irvine.
Total RNA was extracted from pooled embryos at appropriate stages of development (Additional file 1: Table S3) following the protocol by Peterson and Freeman  using TRIzol® RNA isolation reagent (Thermo Fisher Scientific, http://www.thermofisher.com/us/en/home.html) and DNAseI treatment. Combining purified RNA minimizes the variance in gene expression across different lineages. We had three replicates of RNA-Seq samples representing diapause embryos (the prolonged stage 32), and one sample each for pre-diapause embryos (stage 31) and post-diapause (up to 48 h of hatched larvae) (Fig. 1). Quality control for the RNA was performed using NanoDrop 8000 (www.nanodrop.com) and Agilent 2100 Bioanalyzer (www.genomics.agilent.com) (Additional file 2: Figure S1). mRNA was enriched from a 500 ng subset of purified total RNA by using oligo(dT) magnetic beads and was sheared into ~200 nt fragments. The first strand of cDNA was produced by random hexamer priming followed by mRNA template removal using RNase H and the antisense strand was synthesized by DNA polymerase I. Sequencing primers and barcodes were ligated to the cDNA fragments, purified using agarose gel electrophoresis, and amplified by PCR. These libraries were then subjected to next generation sequencing (NGS) using Illumina HiSeqTM2000 at single-end, producing 50 bp-long reads. The library preparation and NGS were done at BGI America (bgiamericas.com). Image data generated by Illumina HiSeqTM2000 were transferred into sequence data in the form of fastq (.fq) files encoded by Illumina GA Pipeline v1.5. The relationship between sequencing error probabilities (P) and the quality value or Phred quality score (Q) is as follows: Q = -10Log10 P . Original reads were filtered to remove the following: adaptors; N (unresolved nucleotide positions); and sequences with low quality according to BGI’s in-house software (Additional file 1: Table S4). Filtered reads were then subjected to a further quality control check using a FastQC program (www.bioinformatics.babraham.ac.uk/projects/fastqc/) (Additional file 2: Figure S2–S3). Reads with Phred quality scores higher than 30 (>99.9 % base call accuracy) were used to create de novo transcriptome assemblies. Two RNA-Seq assemblers were utilized: Trinity (trinityrnaseq.github.io) and Trans-ABySS (www.bcgsc.ca/platform/bioinfo/software/trans-abyss) [24, 25]. K-mer size for Trans-ABySS assembly was k = 32 (such k-mer size offers a good tradeoff for assembling both rare and common transcripts) . Three different modules were run by Trinity: with k-mer sizes up to 32 for “Inchworm”; and fixed k-mer sizes of 25 for “Chrysalis” and “Butterfly”; these parameters were reported to perform well for both highly and lowly expressed transcripts .
Purified genomic DNA of Kmar FDS08 from our previous study was randomly selected for whole genome sequencing (WGS) (Additional file 1: Table S5)  and sheared using Adaptive Focused Acoustics™ S2 (Covaris Inc., Woburn, MA). Libraries were constructed using NEXTflex™ DNA Sequencing Kit (Bioo Scientific Corp., Austin, TX). Library preparation and WGS were performed at UCI Genomics High-Throughput Facility (ghtf.biochem.uci.edu) on an Illumina HiSeq2500 sequencing system using PE100 cycles. Reads with Q < 30 were trimmed by using Trimmomatic software , whereas reads with Q >30 were used to create de novo genome assembly via the ABySS assembler . N50 values as visualized by ABySS-Explorer [34, 35] serve as a quality control for the assemblies that were built successfully using various k-mers: k = 32, 40, 48, 56, 58, 60, and 64 .
Once a reference genome or transcriptome was built, RNA-Seq reads were processed into mRNA transcripts by using the approaches described in the following six steps: (i) index the reference genome or transcriptome for alignment (we used bowtie2, a Burrows-Wheeler indexer that keeps memory footprint small) ; (ii) download annotated genomes for the alignment from http://www.ensembl.org/info/data/ftp/index.html or http://hgdownload.soe.ucsc.edu/downloads.html (known reference genomes used in this study were from Poecilia formosa (amazon molly), Xiphophorus maculatus (platyfish), Oryzias latipes (medaka), Danio rerio (zebrafish), Takifugu rubripes (fugu), Mus musculus (mouse), and Homo sapiens (human)); (iii) align all RNA-Seq reads against an indexed reference genome, indexed reference transcriptome, or annotated genome by using the TopHat program (Table 1) . (iv) index and sort the alignments by using the SamTools suite of programs (the sorted and indexed TopHat mappings can be visualized by an integrative genomics viewer (IGV) program ; (v) assemble the transcripts, quantitate gene expression, and compare fold changes of gene expression between biological samples from the TopHat mappings file using the software Cufflinks ; and (vi) visualize the results using the cummeRbund program run in an R environment using Bioconductor (Additional file 2: Figure S4–S5) . These six steps were done for each reference genome or transcriptome.
The level of gene expression was measured with RPKM (reads per kilobase of transcript per million mapped reads) . A fold change in gene expression between samples was measured by Log2 of the RPKM ratio. Each measurement of differential gene expression between samples had to satisfy the following three requirements performed by the cuffdiff program : (i) enough alignments to perform a statistical test (‘OK’ status); (ii) the uncorrected p-value must be ≤0.0002 (in the test statistic used to compute significance of the observed change in RPKM), and (iii) the q value (the false discovery rate or FDR) after Benjamini Hochberg correction for multiple testing) must be ≤0.05 (Additional file 1: Table S1) . Genes with differential expression were selected when an average of the three diapause replicates differed from pre-diapause or post-diapause samples by at least 1.5-fold (Tables 1 and 4, Additional file 1: Table S2). Sequences from these genes were aligned against nucleotide databases using BLAST (www.ncbi.nlm.nih.gov/blast/) . Gene ontology was established by using AmiGO (amigo.geneontology.org) . Molecular interaction analysis and visualization of human orthologs of Kmar genes were done by using networks available from The Biological General Repository for Interaction Datasets (BioGRID) (thebiogrid.org) or the Unified Human Interactome (UniHI) (www.unihi.org) [27, 39].
The sequences reported in this paper have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database (www.ncbi.nlm.nih.gov) (accession nos. PRJNA282391, PRJNA288308, SRR1999414, SRR2001218, SRR2001221, SRR2001227, SRR2001231, SRR2079677, and SRR2080613).
Computational analyses were done using the University of California, Irvine (UCI) High Performance Computing (HPC) (hpc.oit.uci.edu).
Renfree MB, Shaw G. Diapause. Annu Rev Physiol. 2000;62:353–75.
Geiser F. Metabolic rate and body temperature reduction during hibernation and daily torpor. Annu Rev Physiol. 2004;66:239–74.
Abbott CC. Hibernation of the Lower Vertebrates. Science. 1884;4(75):36–9.
Geiser F. Hibernation. Curr Biol. 2013;23(5):R188–93.
Abbott CC. Aestivation of Mammals, What Is It? Science. 1885;6(144):402–4.
Keilin D. The problem of anabiosis or latent life: history and current concept. Proc R Soc Lond B Biol Sci. 1959;150(939):149–91.
Clegg JS. Cryptobiosis—a peculiar state of biological organization. Comp Biochem Physiol B Biochem Mol Biol. 2001;128(4):613–24.
Bischoff TLW. Entwickelungsgeschichte des Rehes. Giessen: J. Ricker’sche Buchlandlung; 1854.
Ptak GE, Tacconi E, Czernik M, Toschi P, Modlinski JA, Loi P. Embryonic diapause is conserved across mammals. PLoS ONE. 2012;7(3), e33027.
Fenelon JC, Banerjee A, Murphy BD. Embryonic diapause: development on hold. Int J Dev Biol. 2014;58(2–4):163–74.
Ptak GE, Modlinski JA, Loi P. Embryonic diapause in humans: time to consider? Reprod Biol Endocrinol. 2013;11:92.
Wourms JP. The developmental biology of annual fishes. 3. Pre-embryonic and embryonic diapause of variable duration in the eggs of annual fishes. J Exp Zool. 1972;182(3):389–414.
Furness AI. The evolution of an annual life cycle in killifish: adaptation to ephemeral aquatic environments through embryonic diapause. Biol Rev Camb Philos Soc. 2015. [Epub ahead of print]
Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SP. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci U S A. 1994;91(11):5022–6.
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320(5881):1344–9.
Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS ONE. 2014;9(1), e78644.
Huang X, Poelchau MF, Armbruster PA. Global Transcriptional Dynamics of Diapause Induction in Non-Blood-Fed and Blood-Fed Aedes albopictus. PLoS Negl Trop Dis. 2015;9(4), e0003724.
Torson AS, Yocum GD, Rinehart JP, Kemp WP, Bowsher JH. Transcriptional responses to fluctuating thermal regimes underpinning differences in survival in the solitary bee Megachile rotundata. J Exp Biol. 2015;218(Pt 7):1060–8.
Dong Y, Desneux N, Lei C, Niu C. Transcriptome characterization analysis of Bactrocera minax and new insights into its pupal diapause development with gene expression analysis. Int J Biol Sci. 2014;10(9):1051–63.
Tatarenkov A, Earley RL, Taylor DS, Avise JC. Microevolutionary distribution of isogenicity in a self-fertilizing fish (Kryptolebias marmoratus) in the Florida Keys. Integr Comp Biol. 2012;52(6):743–52.
Harrington Jr RW, Crossman Jr RA. Temperature-induced meristic variation among three homozygous genotypes (clones) of the self-fertilizing fish Rivulus marmoratus. Can J Zool. 1976;54(7):1143–55.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78.
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–23.
Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, et al. De novo assembly and analysis of RNA-seq data. Nat Methods. 2010;7(11):909–12.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.
Kalathur RK, Pinto JP, Hernandez-Prieto MA, Machado RS, Almeida D, Chaurasia G, et al. UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks. Nucleic Acids Res. 2014;42(Database issue):D408–14.
Tingaud-Sequeira A, Lozano JJ, Zapater C, Otero D, Kube M, Reinhardt R, et al. A rapid transcriptome response is associated with desiccation resistance in aerially-exposed killifish embryos. PLoS ONE. 2013;8(5), e64410.
Mourabit S, Edenbrow M, Croft DP, Kudoh T. Embryonic development of the self-fertilizing mangrove killifish Kryptolebias marmoratus. Dev Dyn. 2011;240(7):1694–704.
Peterson SM, Freeman JL. RNA isolation from embryonic zebrafish and cDNA synthesis for gene expression analysis. J Vis Exp. 2009;30:1470.
Ewing B, Green P. Base-calling of automated sequencer traces using phred. II Error probabilities. Genome Res. 1998;8(3):186–94.
Mesak F, Tatarenkov A, Earley RL, Avise JC. Hundreds of SNPs vs. dozens of SSRs: which dataset better characterizes natural clonal lineages in a self-fertilizing fish? Front Ecol Evol. 2014;2:74.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Yandell M, Ence D. A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet. 2012;13(5):329–42.
Nielsen CB, Jackman SD, Birol I, Jones SJ. ABySS-Explorer: visualizing genome sequence assemblies. IEEE Trans Vis Comput Graph. 2009;15(6):881–8.
Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Chatr-Aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res. 2015;43(Database issue):D470–8.
Schartl M, Walter RB, Shen Y, Garcia T, Catchen J, Amores A, et al. The genome of the platyfish, Xiphophorus maculatus, provides insights into evolutionary adaptation and several complex traits. Nat Genet. 2013;45(5):567–72.
Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447(7145):714–9.
Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496(7446):498–503.
Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297(5585):1301–10.
Mouse Genome Sequencing C, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420(6915):520–62.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921.
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291(5507):1304–51.
This study was supported by funds from the University of California at Irvine. We thank Dr. Ryan Earley of the University of Alabama, Tuscaloosa, for providing live Kmar fishes and Dr. Andrew Furness of the University of California, Irvine, for critical reading of the manuscript. We acknowledge the assistance of Katherine Tatarenkov and Jonathan Mesak in Kmar husbandry in the spring and summer of 2014.
The authors declare that they have no competing interests.
FM, AT and JCA conceived the study, interpreted data, and contributed to drafting the manuscript. FM performed experiments and data analysis. All authors read and approved the final manuscript.
Table S1. Abundances of transcripts identified from all three references. Listed transcripts are only those with p_values <0.0002 and FDR <0.05 (see text). Table S2. Differentially expressed genes in diapause versus pre- or post-diapause. Table S3. Total number of Kmar embryos used for RNA extraction. Table S4. Numbers of filtered RNA-Seq reads for downstream analyses. Table S5. NGS data used in this study and deposited at the NCBI SRA database. (PDF 915 kb)
Figure S1. Quantity and quality of Kmar pooled RNA samples. A high quality of the samples was obtained prior to RNA-Seq. (A) table showing RNA quantity and quality as measured by NanoDrop and Agilent bioanalyzer. [Note: RIN is an RNA integrity number from Agilent2100’s measurement.] (B) Chromatogram for each pooled RNA sample that shows high integrity for both 28S and 18S rRNA. A 500 ng subset of pooled RNA was used for RNA-Seq processing. Figure S2. Quality of RNA-Seq data. Graphs indicate the high quality of RNA-Seq data from each sample. Figure S3. Quality of Kmar pair-ended (read1 and read2) WGS data. (A) Graphs indicate quality of WGS reads before and after the trimming. (B) Both raw and trimmed WGS reads were assembled to build a de novo genome at k = 32 and k = 64. The graph shows a relatively similar N50 value achieved by either raw or trimmed WGS reads. Figure S4. Pairwise comparison between samples for all genome/transcriptome references. The scatter diagram shows a negative binomial data distribution for transcript abundance in all samples. The number of transcripts for ABySS is 67,374; Trans-ABySS is 206,747; and Trinity is 97,979. These scatter diagram indicate that most of the transcripts were present in all developmental stages of Kmar. Figure S5. Phylogenetics of the RNA-Seq data. The tree shows Jensen-Shannon distance analyses for the five RNA-Seq samples. Transcriptomes from all three diapause replicates clustered together and displayed a slightly closer distance to pre- than post-diapause. (PDF 729 kb)