Genome-wide identification, characterization, and expression analysis of lineage-specific genes within zebrafish
© Yang et al.; licensee BioMed Central Ltd. 2013
Received: 22 September 2012
Accepted: 29 January 2013
Published: 31 January 2013
The genomic basis of teleost phenotypic complexity remains obscure, despite increasing availability of genome and transcriptome sequence data. Fish-specific genome duplication cannot provide sufficient explanation for the morphological complexity of teleosts, considering the relatively large number of extinct basal ray-finned fishes.
In this study, we performed comparative genomic analysis to discover the Conserved Teleost-Specific Genes (CTSGs) and orphan genes within zebrafish and found that these two sets of lineage-specific genes may have played important roles during zebrafish embryogenesis. Lineage-specific genes within zebrafish share many of the characteristics of their counterparts in other species: shorter length, fewer exon numbers, higher GC content, and fewer of them have transcript support. Chromosomal location analysis indicated that neither the CTSGs nor the orphan genes were distributed evenly in the chromosomes of zebrafish. The significant enrichment of immunity proteins in CTSGs annotated by gene ontology (GO) or predicted ab initio may imply that defense against pathogens may be an important reason for the diversification of teleosts. The evolutionary origin of the lineage-specific genes was determined and a very high percentage of lineage-specific genes were generated via gene duplications. The temporal and spatial expression profile of lineage-specific genes obtained by expressed sequence tags (EST) and RNA-seq data revealed two novel properties: in addition to being highly tissue-preferred expression, lineage-specific genes are also highly temporally restricted, namely they are expressed in narrower time windows than evolutionarily conserved genes and are specifically enriched in later-stage embryos and early larval stages.
Our study provides the first systematic identification of two different sets of lineage-specific genes within zebrafish and provides valuable information leading towards a better understanding of the molecular mechanisms of the genomic basis of teleost phenotypic complexity for future studies.
Teleosts, which roughly constitute 96% of all living fishes and half of the extant vertebrate, are the most phenotypically diversified and species-rich group of all the vertebrate species . The vast morphological and species diversity of teleosts have received intense attention worldwide because of their importance in both scientific research and aquaculture. However, the genomic basis of the complex phenotype of teleosts during evolution remains obscure, despite the increasing amount of genome and transcriptome sequence data available. One important mechanism for the phenotypic diversity of species is the duplication of genes and entire genomes . Evidence has recently been accumulated to allow a consensus to be reached that all teleosts experienced an additional whole genome duplication (fish-specific genome duplication, FSGD or 3R), which occurred after the basal ray-finned fishes separate from the actinopterygian stem lineage but before the teleosts began radiation [2–19]. Combining the absolute dates and phylogenetic timing of the 3R duplication, some groups have thought that the FSGD might be causally related to an increase in the number of species as well as their biological diversity [2, 4, 7, 9, 11–13, 15, 19–23]. However, if the FSGD (3R) was responsible for the evolutionary success and astounding biological diversification of teleosts, it must have occurred prior to the radiation of teleosts. With the fossil record, paleontological evidence have suggested that the first appearance of most of the extant teleosts was only about 235 million years ago [24, 25], which is shorter than the FSGD that occurred at least 320 million years ago. Thus, it would be inappropriate to think that the FSGD was a major driving force behind the rapid radiation of teleosts [3, 4, 11, 12, 19, 20]. Furthermore, considering the large amount of fossil data for basal ray-finned fishes, a consequential FSGD would not provide sufficient explanation for the morphological complexity of teleosts .
Besides the fish-specific genome duplication, alternative explanations for the increasing morphological complexity of teleosts include their experience with a higher rate of chromosomal rearrangements [27, 28] and a faster evolution of protein sequences  and conserved noncoding elements (CNEs)  compared to cartilaginous fishes and mammals. Their implications for the evolution and diversity of teleosts have been intensively discussed . However, conserved teleost lineage-specific genes have been poorly characterized.
Lineage-specific genes, also referred to as taxonomically restricted genes (TRGs)  are defined as genes found in one particular taxonomic group but share no sequence similarity with genes from other lineages [31–37]. With the advent of large-scale genome sequencing projects for a wide range of species, lineage-specific genes have been extensively studied in mammals [34, 38, 39], insects [33, 40–42], plants [36, 43–45], and microbial species [46–50]. Lineage-specific genes are a significantly abundant component of all genomes sequenced to-date , which defies an early hypothesis that an increasing database size would eventually reduce the number of lineage-specific genes . Orphan genes were first discussed when analyzing the yeast genome; approximately one-third of the identified genes fell into this category [51, 52]. Likewise in Drosophila melanogaster, the most accurate and complete genome analyzed, lineage-specific genes were found to make up nearly 18.6% of the total genes . Apart from being abundant, lineage-specific genes have also been thought to be important for lineage specific traits and adaptations . In Hydra, for example, interspecific differences in tentacle formation are closely related to the changes in the expression of taxonomically restricted genes . And in Drosophila, the flightin gene is specifically important for increasing the frequency of the flight muscle to deliver the maximum power to the wing, which is a rather specific adaptation for the Dipterans . Although abundant in quantity and important in functionality, the evolutionary origin of lineage-specific genes is still enigmatic. Several hypotheses about the origin of lineage-specific genes, including gene duplication followed by rapid sequence divergence, lateral gene transfer, accelerated evolutionary rate, artifacts from genome annotation, as well as de novo evolution from noncoding sequences have been proposed . Despite the fact that the origin and evolution of lineage-specific genes is still poorly understood, the identification, characterization, function, and expression analysis of lineage-specific genes may provide a better understand for lineage-specific adaptation, such as the successful diversification of teleosts.
In this study, we identified Conserved Teleost-Specific Genes (CTSGs) and orphan genes in zebrafish using comparative genomics. We then characterized each set of these genes by diverse features, including gene size, protein size, exon number, GC content, transcript support, and chromosomal locations. As a large portion of the CTSGs and orphan genes have no known function, ab initio predictions using ProtFun were performed to infer possible biological functions. We then explored the evolutionary origin of lineage-specific genes and performed a comprehensive analysis of their tissues and developmental stages specific expression patterns using the wealth of available expression data, including EST and RNA-seq data, which in turn provided important complementary datasets that may be used to uncover their functions in the future. Collectively, identification of lineage-specific genes as well as orphan genes and future studies of their function by means of target gene knockdown  or knockout [57, 58] will no doubt help increase our understanding of the molecular basis of the successful diversification of teleosts.
Identification of CTSGs and orphan genes
Characterization of CTSGs and orphan genes
Gene characteristics of lineage-specific genes
Gene Size (nt)
Protein Size (aa)
mean ± SE
mean ± SE
mean ± SE
mean ± SE
1055.41 ± 117.51
131.27 ± 15.81
2.29 ± 0.13
40.63 ± 1.15
13786.82 ± 11739.11
167.62 ± 7.44
3.65 ± 0.13
36.87 ± 0.40
27763.23 ± 289.38
497.86 ± 2.64
8.93 ± 0.04
37.67 ± 0.03
Our results showed a trend similar to previous studies that found that younger genes exhibited lower expression on average . To determine whether a gene had evidence of expression from EST or full-length cDNA (FL-cDNA), we first downloaded the unigene data using BIOMART. If a gene model of zebrafish was annotated with an EST or FL-cDNA in ENSEMBL, then we considered that gene to have transcript support. Combining these two sets of results, the percentage of genes with transcript support was determined (Table 1). The transcript support for orphan genes (36.4%) and for the CTSGs (64.4%) was significantly lower than that for EC genes (81.1%), with the orphan genes having the lowest transcript support (one-way ANOVA, p < 0.01).
Functional inference using ProtFun
It has been shown that few of the highly taxonomically restricted genes (lineage-specific genes) have been the focus of experimental work or could be characterized by GO categories [40, 49, 61]. In order to determine whether this is the case for the two sets of lineage-specific genes within zebrafish, the function annotations of CTSGs and orphan genes available at Ensembl  were explored using Biomart . As expected, a significantly large percentage (34.8%) of orphan genes were annotated as uncharacterized proteins (31.8%) or had no description information (3%) and there was only one gene annotated with GO term accession. Likewise, about 34.1% of CTSGs were annotated as uncharacterized proteins and hypothetical proteins or even have no description information and approximately 68.9% of CTSGs were without GO term annotations. These observations suggest that the functions of most of the two sets of lineage-specific genes are unknown and some of the annotated genes may be the result of incorrect annotations, considering that there were seven orphan genes encoding proteins less than ten amino acids. With respect to the remaining orphan genes with functional annotations in Ensembl, there was no bias seen in specific functions; however, there were about eight genes of CTSGs involved in immunity, indicating that immune response may be very important in the radiation of teleosts.
High percentage of lineage-specific genes generated via gene duplications
Gene duplication followed by rapid sequence divergence by a paralog, which is beyond the threshold of similarity searches, has long been thought to be the major mechanism that provided raw materials for the emergence of new genes since the publication of the famous monograph authored by Susumu Ohno , although there are several other hypotheses regarding the origin of the lineage-specific genes, such as horizontal gene transfer [71, 72], an accelerated evolutionary rate , de novo emergence from non-genic sequences  as well as artifacts from genome annotation . In order to determine the proportion of contribution by gene duplication to the lineage-specific genes within zebrafish, we sought to identify such lineage-specific genes generated from the duplication-divergence mechanism using a simple method; that is, we determined whether any paralogs of the lineage-specific genes were widely evolutionarily conserved . To ascertain a minimum percentage of genes that may be generated via gene doubling, we first downloaded the information regarding the paralogs of CTSGs and orphan genes annotated in Ensembl version 64 using Biomart. Secondly, paralogs that were also found to be evolutionarily conserved were considered to be associated with lineage-specific genes created via gene duplication followed by rapid sequence divergence.
Orphan genes are preferentially expressed in the reproductive system
The EST database is a collection of millions of ESTs gathered from thousands of RNA libraries covering dozens of zebrafish organs or tissues at different developmental stages . In order to elucidate the expression patterns of the lineage-specific genes in zebrafish, we first analyzed this comprehensive dataset to detect whether the lineage-specific genes were preferentially expressed in certain tissues or organs. One key step in achieving this goal was the reliable mapping of EST and full-length cDNA to genomic sequences, so we followed a relatively stringent pipeline  to retain high quality mappings (see Methods).We counted a gene as expressed in a tissue or organ as long as it was supported by one EST.
Tissue distribution of expressed lineage-specific genes
Type of genes
Temporal and spatial expression profiles of lineage-specific genes
Knowing the expression patterns of lineage-specific genes at different developmental stages and in different tissues or organs is essential to illuminate whether the lineage-specific genes have corresponding biological function. Therefore, we exploited the RNA-seq data to assess a far more precise measurement of the temporal and spatial expression profiles of these lineage-specific genes. RNA-seq is a recently developed method to transcriptome profiling that uses high throughput technologies and is a powerful method to quantify the expression of genes . A time-series of RNA-seq data from 15 time-points during early zebrafish organogenesis that mark important developmental stages was obtained from previous studies [81–83]. RNA-seq data from 5 different zebrafish tissues were downloaded from NCBI . The RPKM, defined here as the number of unique mapped reads to the coding regions divided by one thousandth of the total length of all the exons of the gene, subsequently normalized by dividing by one millionth of the total number of valid reads, was calculated (see Methods).
Evidence for the expression of the lineage-specific genes in zebrafish, here defined as the mapped reads to their coding regions, was found in the transcriptome data for 54 of the 66 orphan genes (81.8%) and 129 of the 135 CTSGs (95.6%). These percentages were significantly higher than those observed using EST/FL-cDNA data. For the remaining 12 orphan genes not represented by RNA-seq data, one had evidence of expression in EST data and one had evidence of expression in protein data. Thus, the failure to find evidence for expression of these two genes with RNA-seq data may suggest that the two genes were expressed in other developmental stages and tissues or at a very low level in the analyzed developmental stages and tissues, or the expression evidence based on the EST and protein data may be incorrect (i.e. as a result of contamination by other samples). As for the other 10 orphan genes not represented by RNA-seq data, their gene models have probably been incorrectly annotated, considering that each of them had a transcript less than 60 bp. On the other hand, expression evidence for the 6 CTSGs not represented by RNA-seq data had all been found from EST data. Therefore, the failure to find evidence for expression using RNA-seq data may suggest that these genes were expressed in other developmental stages or tissues. For example, gene ENSDARG00000094271 was expressed in olfactory tissue at about 3–4 months old and gene ENSDARG00000089157 was expressed in kidney tissue, which was not included in our RNA-seq data.
Several individual lineage-specific genes were also found to have intriguing expression patterns (Additional file 6: Figure S2). For example, orphan gene ENSDARG00000095794 exhibited significantly high expression in the ovary; thus, we speculated that that gene may play a role in reproduction. Orphan gene ENSDARG00000090169 was highly, and nearly specifically, expressed in the post-MBT stages, suggesting that this gene may contribute to the development of later-stage embryos and early larval stages. One of CTSGs, ENSDARG00000076244, was highly expressed in both female and male brains, which indicates that this gene may have a role in zebrafish brain development. Another CTSG, ENSDARG00000017163, exhibited significantly high expression in the early-stage embryos, elucidating its important contribution to the development of early embryos development. A list of the number of unique mapped reads and the RPKM of each lineage-specific gene is provided (Additional file 7: Table S6_1 and Table S6_2).
In order to validate the expression pattern of these lineage-specific genes, reverse transcription polymerase chain reaction (RT-PCR) assay was used. Primers for 16 lineage-specific genes (5 orphan genes and 11 CTSGs) were designed and all of these genes were amplified. The information of the primers and the results of RT-PCR were provided (Additional file 8: Table S7 and Additional file 9: Figure S3).
Enormous lineage-specific genes identified in other taxa with potentially important functions [36–41, 55, 60, 85, 86] motivated our genomewide search for lineage-specific genes within zebrafish. Here, we adopted BLAST , the preferred method for detecting homologs, and phylostratigraphy  to identify two sets of lineage-specific genes within zebrafish. Then we characterized these genes, predicted their functions ab initio, inferred their evolutionary origin, and analyzed their expression patterns, making this the most comprehensive study of lineage-specific genes within teleosts and zebrafish to date. The 135 CTSGs and 66 orphan genes obtained in this study are attractive targets for future experimental discovery, owing to their lineage specificity and to the fact that the majority encode proteins whose functions are yet to be determined (while only one orphan gene and 42 CTSGs have GO term accession). Compared with the lineage-specific genes identified in plants [36, 43, 44], the number of lineage-specific genes within zebrafish is significantly lower, which may reflect the basic difference between animals and plants, considering the likely small number of lineage-specific genes identified in primate  and insects [40, 41]. Although Yang et al. identified a relatively small number of lineage-specific genes in Arabidopsis, Oryza, and Populus, whose number is close to that in animals, their criteria used to define sequence conservation was too relaxed, making the validity of their results questionionable. For example, they restricted their analysis to only the genes with expression evidence support and employed a very relaxed criterion to define sequence conservation (e-value cutoff of 0.1) that has not been used in other studies. Taken together, the dramatic difference in number of lineage-specific genes observed between the genomes of animals and plants should not be the result of the method we used and may suggest that there is a remarkable genetic difference in terms of lineage-specific genes between the genomes of animals and plants. In addition, this difference may suggest that genome doubling followed by sequence divergence occurred in plants at a higher frequency , which may explain to some extent why there are many more lineage-specific genes in plant genomes.
Both the CTSGs and orphan genes had shorter gene length compared with the EC genes, probably owing to fewer numbers of exons per gene and higher percentage of intronless lineage-specific genes. For example, nearly 28% of orphan genes contained only one exon, while the percentage of single exon EC genes was only 6%. One reason for such a difference may be that intronless genes can arise via retroposition, which has been confirmed to create a large amount of new genes in the zebrafish genome . Alternatively, this difference may be a result of the “introns late” hypothesis, which assumes intron accretion into the protein-coding genes is continuous throughout the evolutionary time of eukaryotes . Thus, the younger the genes are, the fewer exons they have. Additionally, since orphan genes are species specific, these genes may have arisen in relatively recent years. Collectively, these reasons may partly explain why young orphan genes contain a single exon and why lineage-specific genes are shorter than older evolutionarily conserved genes.
Generally speaking, lineage-specific genes are thought to play significant roles in the evolution of lineage specific phenotypes and adaptive innovation . Although there are a large number of lineage-specific genes whose functions have not been characterized and only one orphan gene and 31% CTSGs have GO term accession, we were still able to find five orphan genes and eight CTSGs whose functions are closely related to immunity. The significant enrichment of immunity proteins in the lineage-specific genes within zebrafish indicates that defense against pathogens may be an important goal in terms of the successful diversification of fishes. Fishes are an extremely diverse group of aquatic vertebrate animals that also exhibit enormous diversity in the habitats they occupy. Fishes live in almost every conceivable type of aquatic habitat, from an elevation up to 5,200 meters in Tibet to 7,000 meters below the surface of the ocean and some species even make short excursions onto land. Some fishes can also live in almost pure freshwater, while others reside in very salty lakes. They can tolerate temperatures ranging from as high as 42.5°C to −2°C under the Antarctic ice sheet . Thus, fishes should be confronted with much more diverse pathogen invasion. Therefore, lineage-specific genes involved with immunity should help fishes better adapt to various pathogens and successfully survive within their diverse habitats. In addition, the prediction of gene function is based on homology to proteins with known function in other species. Some lineage-specific genes lack homologs in other lineages, so we predicted their function ab initio. Interestingly, the proteins of CTSGs involved in immune response were the most represented, with a percentage of 30.65% of CTSGs, probably implying a significantly larger expansion of these genes in teleosts. Therefore, function assignment both based on homology and prediction ab initio showed a significant enrichment in proteins related to immune response, suggesting that the successful adaptation of teleosts may be explained by their conserved lineage-specific genes.
Variation of gene number within different organisms suggests a general process of new gene origination . One basic question in biology is the molecular mechanisms involved in the creation of new genes. There have already been several hypotheses regarding the origin of lineage-specific genes. However, determining the exact mechanisms regarding the origin of lineage-specific genes depends on the comparative genome analysis of taxonomically closely related species. It is extremely difficult to achieve the aforementioned goal for research on fish so far. Gene duplication followed by rapid sequence divergence in one of the paralogs is a well explored source of lineage-specific genes [75, 91]. A simple method for determining such genes is to determine whether any of the paralogs of lineage-specific genes are widely evolutionarily conserved. Through this analysis, we found that there were a significantly large number of lineage-specific genes generated by gene duplication followed by rapid sequence divergence of one of the paralogs. It was also confirmed by observing that the similarity between the genes and their evolutionarily conserved paralogs was lower than the similarity between the genes and their paralogs not evolutionarily conserved. As for other mechanisms forming lineage-specific genes, we will explore these questions in the future when the genome sequence of the silver carp (Hypophthalmichthys molitrix), a relatively close species to zebrafish, is released.
Previous studies have shown that young new genes generated by various mechanisms seem to have been preferentially endowed with testis-specific or testis-biased expression patterns . In accordance with this observation, there are a significantly large number of new genes within zebrafish expressed in the reproductive system reflecting the expectation that emergence of new, lineage-specific genes may accompany speciation or reproduction. This suggests that this expression pattern is a general phenomenon not only in mammals and Drosophila, but also in teleosts. There are several hypotheses which can explain this propensity. First, sex- and reproduction-related genes are generally recognized as a class of rapidly evolving genes and undergo adaptive evolution after speciation events involved in male reproduction . Furthermore, the testis is the most rapidly evolving organ owing to the strong selective pressures to which it is subjected because of its important roles in sperm competition, sexual conflict, reproductive isolation, germline pathogens, and mutations causing segregation distortion in the male germline . Second, the “hypertranscription” state  caused by chromatin remodelling and RNA polymerase II complexes in the meiotic and postmeiotic spermatogenic cells would favor the initial, unprovoked transcription of newly arisen genes . As for the CTSGs, however, no significant reproductive expression was enriched, which further confirmed that only the young new genes were specifically expressed in the testis, since the CTSGs were relatively older than the orphan genes.
Expression analyses of lineage-specific genes using EST or microarrays have elucidated the fact that lineage-specific novel genes are preferentially expressed in specific tissues or organs, such as the testis or brain [39, 90]. Although EST data covers a large number of samples, which could be used to compare the expression between different samples, the coverage of individual genes is too low to quantify the expression level of genes. Microarrays also have some limitations, such as cross-hybridization and saturation of signals . Therefore, we used the RNA-seq data from various developmental stages and tissues to quantify the lineage-specific genes and highlight two novel properties of these genes. First, in addition to being highly tissue-specific, lineage-specific gene expression were highly temporally restricted. Second, lineage-specific genes were preferentially expressed in later-stage embryos and early larval stages compared with early-embryos. The higher expression level of lineage-specific genes after the MBT suggests that lineage-specific genes are important components for the zygotic transcription. Maternally deposited mRNAs direct early development before the initiation of zygotic transcription during mid-blastula transition . However, zygotic transcription plays a more important role in the regulation of development after MBT, since a high percentage of maternally stored mRNA has been degraded during the post-MBT stages. In addition, it has been shown that all vertebrate embryos must converge towards a narrow point, called phylotypic stage at which all vertebrate show high morphogenetic resemblance, to acquire the basic scheme on which subsequent differences will emerge . The phenomenon that more lineage-specific genes are expressed after the phylotypic stages may probably be linked to the acquisition of species-specific morphological traits. All vertebrate resemble each other at the phylotypic stage, so the crucial steps to form the morphological differences between species resulting from the expression product after the phylotypic stages. Therefore, lineage-specific genes within zebrafish should be crucial for the significantly morphological diversity of teleosts. On the other hand, Lineage-specific genes showed relatively higher expression levels during early larval stages, making them candidates for functions in specific tissues and organs during organogenesis. Expression analysis using RNA-seq from different tissues and organs supported the observations from the EST data and further showed that orphan genes are preferentially expressed in reproductive tissues, which also confirmed the potential roles of lineage-specific genes during organogenesis.
In the study, we have identified two sets of lineage-specific genes, CTSGs and orphan genes, which are specific to teleosts and zebrafish, respectively. The Conserved Teleost-Specific Genes were found to be especially enriched in proteins with immunity functions, implying that defense against invasion by diverse pathogens was critical to the successful diversification of teleosts. We also revealed that, in addition to being highly tissue-preferred expression, lineage-specific genes are also highly temporally restricted and are preferentially expressed in later-stage embryos and early larval stages compared with early-embryos. This study provides valuable information for further analysis of the functions of these genes during zebrafish embryogenesis and will be helpful in improving the understanding of the successful diversification of teleosts.
Sequence data sets
Both the detection method and the reference set of genomes to be blasted are important for identifying lineage-specific genes, so we used a method called ‘phylostratigraphy’ to obtain the lineage-specific genes within zebrafish [32, 61]. To identify CTSGs and zebrafish-specific genes (orphan genes), a total of 61 genomes and 59 proteomes were used in this study (Additional file 10: Table S1). Most of the proteomes and genomes data sets were downloaded from Ensembl version 64 , while the genome of Salmo salar was downloaded from NCBI . The genome and protein sequences of Branchiostoma floridae were obtained from the website of the Joint Genome Institute (http://genome.jgi-psf.org/Brafl1/Brafl1.info.html). The genome of Callorhinchus milii was obtained from http://esharkgenome.imcb.a-star.edu.sg/. The protein data from UniProtKB was downloaded from UniProt ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/. In all cases, the genomes and protein sets used were the latest versions.
The two sets of lineage-specific genes within zebrafish were identified in a pipeline (Figure 1) based on a homolog search using BLASTp and tBLASTn, as well as BLASTx  with an e-value cutoff of 10-5[36, 41, 44]. We classified the zebrafish genes into three categories: Evolutionarily Conserved genes (ECs), CTSGs, and orphan genes. Here, orphan genes refer to genes for which we could not find homologs in any other species. CTSGs include genes for which we could find at least one homolog in teleosts, but no homologs anywhere else. ECs were genes with at least one homolog outside the group of teleosts.
Gene characteristics and chromosomal localization
The genic information for the orphan genes, CTSGs, and ECs were downloaded from Ensembl version 64 using BIOMART (http://www.ensembl.org/). We then used Perl scripts to calculate gene length, protein length, number of exons, and GC content of the genes. We used one-way ANOVA to determine significant differences between the different sets of lineage-specific genes and the ECs. The chromosomal localization of each lineage-specific gene was also downloaded using BIOMART. In order to determine whether a gene had a transcript support, we used the results from the section “Expression analysis using EST and full-length cDNA.”
Protein function assignment and category
Since there were few homologs between the lineage-specific genes and the genes in the public database, ProtFun 2.2 server [97, 98] was employed to predict the cellular role and the gene ontology (GO) category of the entire two sets of lineage-specific genes. The prediction of cellular function and the GO category by ProtFun relies on a large number of other sequence derived protein features, including predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties, rather than relying on sequence similarity protein [97, 98]. Therefore, ProtFun allows for the prediction of the function for even orphan proteins where no homolog can be found. Here, we used the ProtFun 2.2 server (http://www.cbs.dtu.dk/services/ProtFun/) to determine the functional categories of these two sets of lineage-specific genes and then clustered these sequences according to their cellular roles and GO categories.
Gene duplication analysis
Gene duplication has long been thought as a major mechanism providing raw materials for the origin of new genes and innovations for genome evolution. Thus, we sought out to determine which lineage-specific genes had paralogs in the zebrafish that were more evolutionarily conserved than the lineage-specific genes. Such genes may indicate that the corresponding lineage-specific gene was generated via gene duplication followed by rapid sequence divergence. To accomplish this, we first downloaded the paralogs of orphan genes and CTSGs annotated in Ensembl using Biomart. Then, we further analyzed the lineage-specific genes that have paralogs to determine if any were more evolutionarily conserved; that is, at least one of the paralogs have homologs outside the teleosts or zebrafish.
Expression analysis using EST and full-length cDNA
The expression data for EST and full-length cDNA (FL-cDNA) of zebrafish were downloaded from the UCSC (http://hgdownload.cse.ucsc.edu/downloads.html) . EST and FL-cDNA data processing, such as genomic mapping, quality control for alignment, and EST or FL-cDNA to zebrafish gene mapping followed , which imposed a stringent quality control to retain high-quality mappings. First, we mapped the 1,488,275 EST sequences and 29,480 FL-cDNA to the zebrafish genome using BLAT  with the default parameters, which could eliminate sequences shorter than 100 bp. Then, we imposed the following criteria to discard low-quality mappings: mapping length ≥ 100 bp, identity ≥ 96%, coverage within mapping ≥ 97%, and coverage within whole transcript ≥ 75%. If a transcript was mapped to multiple genomic loci, then only the best mapping was retained; if more than one nearly identical best mapping existed (difference in BLAT scores < 5%), then the transcript was discarded to avoid ambiguity. Finally, only when a transcript overlapped with a gene longer than 100 bp and their directions were the same was that the transcript considered transcribed from the gene. These relatively stringent quality controls ensured the correct expression analysis. We counted a gene as expressed in a tissue as long as it was supported by only one EST. Then, we downloaded and extracted the tissue information for the expressed lineage-specific genes from NCBI using Batch Entrez (http://www.ncbi.nlm.nih.gov/sites/batchentrez/).
Characterization of expression patterns for CTSGs and orphan genes by RNA–Seq
RNA-Seq, a recently developed method to transcriptome profiling that uses high-throughput sequencing technologies, has been shown to be extremely accurate for quantifying expression levels of genes and should have to revolutionize the manner in which eukaryotic transcriptomes are studied . RNA-seq data from 4 zebrafish developmental stages: 1-cell [0.75 hour post fertilization (hpf)], 16-cell (1.5 hpf), 512-cell (2.75 hpf), and 50% epiboly (5.25 hpf) stages were obtained  and downloaded from NCBI with accession code ERP000635; from 6 zebrafish developmental stages: unfertilized eggs, 1-cell (~0.7 hpf), 16-cell stage (~1.5 hpf), 128-cell stage (~2.5 hpf), mid-blastula transition (MBT; ~3.5 hpf), and post-MBT (~5.3 hpf) from  with NCBI accession code GSE22830; from 8 zebrafish developmental stages: two to four cell, 1000 cell (3 hpf), dome (4.5 hpf), shield (6 hpf), bud (10 hpf), 28 hpf, 48 hpf, and 120 hpf were obtained and downloaded from NCBI  with accession code GSE32898. RNA-Seq data from 5 zebrafish tissues: adult zebrafish ovary, male adult zebrafish head, female adult zebrafish head, whole male adult zebrafish without head or testis, and whole female adult zebrafish without head or ovary were downloaded from NCBI with accession code ERP000016 . We then calculated gene-level measurements, specifically reads per kilobase of exon model per million mapped reads (RPKM) following . The RNA-seq data from the same developmental stages were put together. Briefly, we mapped all reads per time-point independently back to the zebrafish genome [Zv9] with TopHat (version 1.4.1)  and reads count per gene were calculated using htseq-count. Only reads that mapped to a unique location in the zebrafish genome were considered in the subsequent analyses. The expression level of a gene in a developmental stage or in a tissue was defined by the number of uniquely mapped reads in the gene divided by one thousandth of the whole exon length of the gene, then was normalized by dividing by one millionth of the total number of valid reads in the respect samples.
In order to evaluate the temporal specificity between the lineage-specific genes and the evolutionarily conserved genes, the temporal specificity score, here defined as 1- H(g)/log2(N), of the different sets of genes were determined, where H(g) is the Shannon entropy that could be a good measure of uncertainty.
In this study, we used one-way ANOVA followed by a Duncan’s post hoc test (for equal variance) or Dunnett’s T3 test (for unequal variance) to test whether there were significant differences between the characteristics of lineage-specific genes and that of ECs, as well as among the temporal specificity scores of the three categories of genes analyzed. The Spearman’s correlation test was used to determine whether the number of lineage-specific genes on chromosomes correlated with the length of the chromosomes. The t-test was used to determine the similarity between lineage-specific genes and their paralogs. The Fisher’s exact test was used to test whether there was expression enriched in specific tissues. The Chi-square test was used to detect any significant difference between lineage-specific genes enriched in later-stage embryos and early larval stages.
We are thankful to Professor Yong E. Zhang in the Institute of Zoology of Chinese Academy of Sciences for his critical reading of the manuscript. We are very grateful to Lihong Guan to perform the experiments. We are also grateful to Chengchi Fang and Zaixuan Zhong for their help with our experiments. This work was supported by the grants from National Natural Science Foundation of China (31090254 and U1036603) and Chinese Academy of Sciences (KSCX2-YW-Z-0807).
- Nelson JS: Fishes of the world. 2006, New York: John Wiley and Sons, FourthGoogle Scholar
- Meyer A, Van de Peer Y: From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). BioEssays: news and reviews in molecular, cellular and developmental biology. 2005, 27 (9): 937-945. 10.1002/bies.20293.Google Scholar
- Meyer A, Schartl M: Gene and genome duplications in vertebrates: the one-to-four (−to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol. 1999, 11 (6): 699-704. 10.1016/S0955-0674(99)00039-3.PubMedGoogle Scholar
- Vandepoele K, De Vos W, Taylor JS, Meyer A, Van de Peer Y: Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci USA. 2004, 101 (6): 1638-1643. 10.1073/pnas.0307968100.PubMed CentralPubMedGoogle Scholar
- Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431 (7011): 946-957. 10.1038/nature03025.PubMedGoogle Scholar
- Van de Peer Y, Taylor JS, Meyer A: Are all fishes ancient polyploids?. J Struct Funct Genomics. 2003, 3 (1–4): 65-73.PubMedGoogle Scholar
- Taylor JS, Van de Peer Y, Braasch I, Meyer A: Comparative genomics provides evidence for an ancient genome duplication event in fish. Philos Trans R Soc Lond B Biol Sci. 2001, 356 (1414): 1661-1679. 10.1098/rstb.2001.0975.PubMed CentralPubMedGoogle Scholar
- Taylor JS, Braasch I, Frickey T, Meyer A, Van de Peer Y: Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 2003, 13 (3): 382-390. 10.1101/gr.640303.PubMed CentralPubMedGoogle Scholar
- Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S, Venkatesh B: Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 2004, 21 (6): 1146-1151. 10.1093/molbev/msh114.PubMedGoogle Scholar
- Prohaska SJ, Stadler PF: The duplication of the Hox gene clusters in teleost fishes. Theory in biosciences = Theorie in den Biowissenschaften. 2004, 123 (1): 89-110. 10.1016/j.thbio.2004.03.004.PubMedGoogle Scholar
- Van de Peer Y: Tetraodon genome confirms Takifugu findings: most fish are ancient polyploids. Genome Biol. 2004, 5 (12): 250-10.1186/gb-2004-5-12-250.PubMed CentralPubMedGoogle Scholar
- Volff JN: Genome evolution and biodiversity in teleost fish. Heredity. 2005, 94 (3): 280-294. 10.1038/sj.hdy.6800635.PubMedGoogle Scholar
- Stellwag EJ: Are genome evolution, organism complexity and species diversity linked?. Integrative and comparative biology. 2004, 44 (5): 358-365. 10.1093/icb/44.5.358.PubMedGoogle Scholar
- Hoegg S, Meyer A: Hox clusters as models for vertebrate genome evolution. Trends Genet. 2005, 21 (8): 421-424. 10.1016/j.tig.2005.06.004.PubMedGoogle Scholar
- Crow KD, Stadler PF, Lynch VJ, Amemiya C, Wagner GP: The “fish-specific” Hox cluster duplication is coincident with the origin of teleosts. Mol Biol Evol. 2006, 23 (1): 121-136.PubMedGoogle Scholar
- Hoegg S, Brinkmann H, Taylor JS, Meyer A: Phylogenetic timing of the fish-specific genome duplication correlates with the diversification of teleost fish. J Mol Evol. 2004, 59 (2): 190-203. 10.1007/s00239-004-2613-z.PubMedGoogle Scholar
- Semon M, Wolfe KH: Reciprocal gene loss between Tetraodon and zebrafish after whole genome duplication in their ancestor. Trends Genet. 2007, 23 (3): 108-112. 10.1016/j.tig.2007.01.003.PubMedGoogle Scholar
- Taylor JS, Van de Peer Y, Meyer A: Genome duplication, divergent resolution and speciation. Trends Genet. 2001, 17 (6): 299-301. 10.1016/S0168-9525(01)02318-6.PubMedGoogle Scholar
- Wittbrodt J, Meyer A, Schartl M: More genes in fish?. BioEssays: news and reviews in molecular, cellular and developmental biology. 1998, 20 (6): 511-515. 10.1002/(SICI)1521-1878(199806)20:6<511::AID-BIES10>3.0.CO;2-3.Google Scholar
- Postlethwait J, Amores A, Cresko W, Singer A, Yan YL: Subfunction partitioning, the teleost radiation and the annotation of the human genome. Trends Genet. 2004, 20 (10): 481-490. 10.1016/j.tig.2004.08.001.PubMedGoogle Scholar
- Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002, 297 (5585): 1301-1310. 10.1126/science.1072104.PubMedGoogle Scholar
- Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL: Zebrafish hox clusters and vertebrate genome evolution. Science. 1998, 282 (5394): 1711-1714.PubMedGoogle Scholar
- Hoegg S, Boore JL, Kuehl JV, Meyer A: Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni. BMC Genomics. 2007, 8: 317-10.1186/1471-2164-8-317.PubMed CentralPubMedGoogle Scholar
- Patterson C: An overview of the early fossil record of acanthomorphs. Bulletin of Marine Science. 1993, 52: 29-59.Google Scholar
- Benton MJ: Vertebrate paleontology. 2005, Oxford, UK: Blackwell Science Ltd, ThirdGoogle Scholar
- Donoghue PC, Purnell MA: Genome duplication, extinction and vertebrate evolution. Trends Ecol Evol. 2005, 20 (6): 312-319. 10.1016/j.tree.2005.04.008.PubMedGoogle Scholar
- Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y: The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007, 447 (7145): 714-719. 10.1038/nature05846.PubMedGoogle Scholar
- Ravi V, Venkatesh B: Rapidly evolving fish genomes and teleost diversity. Curr Opin Genet Dev. 2008, 18 (6): 544-550. 10.1016/j.gde.2008.11.001.PubMedGoogle Scholar
- Robinson-Rechavi M, Laudet V: Evolutionary rates of duplicate genes in fish and mammals. Mol Biol Evol. 2001, 18 (4): 681-683. 10.1093/oxfordjournals.molbev.a003849.PubMedGoogle Scholar
- Venkatesh B, Kirkness EF, Loh YH, Halpern AL, Lee AP, Johnson J, Dandona N, Viswanathan LD, Tay A, Venter JC: Ancient noncoding elements conserved in the human genome. Science. 2006, 314 (5807): 1892-10.1126/science.1130708.PubMedGoogle Scholar
- Khalturin K, Hemmrich G, Fraune S, Augustin R, Bosch TC: More than just orphans: are taxonomically-restricted genes important in evolution?. Trends Genet. 2009, 25 (9): 404-413. 10.1016/j.tig.2009.07.006.PubMedGoogle Scholar
- Tautz D, Domazet-Loso T: The evolutionary origin of orphan genes. Nat Rev Genet. 2011, 12 (10): 692-702. 10.1038/nrg3053.PubMedGoogle Scholar
- Domazet-Loso T, Tautz D: An evolutionary analysis of orphan genes in Drosophila. Genome Res. 2003, 13 (10): 2213-2219. 10.1101/gr.1311003.PubMed CentralPubMedGoogle Scholar
- Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, Alba MM: Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol. 2009, 26 (3): 603-612.PubMedGoogle Scholar
- Mazza R, Strozzi F, Caprera A, Ajmone-Marsan P, Williams JL: The other side of comparative genomics: genes with no orthologs between the cow and other mammalian species. BMC Genomics. 2009, 10: 604-10.1186/1471-2164-10-604.PubMed CentralPubMedGoogle Scholar
- Lin H, Moghe G, Ouyang S, Iezzoni A, Shiu SH, Gu X, Buell CR: Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana. BMC Evol Biol. 2010, 10: 41-10.1186/1471-2148-10-41.PubMed CentralPubMedGoogle Scholar
- Fischer D, Eisenberg D: Finding families for genomic ORFans. Bioinformatics. 1999, 15 (9): 759-762. 10.1093/bioinformatics/15.9.759.PubMedGoogle Scholar
- Toll-Riera M, Castelo R, Bellora N, Alba MM: Evolution of primate orphan proteins. Biochem Soc Trans. 2009, 37 (Pt 4): 778-782.PubMedGoogle Scholar
- Tay SK, Blythe J, Lipovich L: Global discovery of primate-specific genes in the human genome. Proc Natl Acad Sci USA. 2009, 106 (29): 12019-12024. 10.1073/pnas.0904569106.PubMed CentralPubMedGoogle Scholar
- Johnson BR, Tsutsui ND: Taxonomically restricted genes are associated with the evolution of sociality in the honey bee. BMC Genomics. 2011, 12: 164-10.1186/1471-2164-12-164.PubMed CentralPubMedGoogle Scholar
- Zhang G, Wang H, Shi J, Wang X, Zheng H, Wong GK, Clark T, Wang W, Wang J, Kang L: Identification and characterization of insect-specific proteins by genome data analysis. BMC Genomics. 2007, 8: 93-10.1186/1471-2164-8-93.PubMed CentralPubMedGoogle Scholar
- Schmid KJ, Aquadro CF: The evolutionary analysis of “orphans” from the drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics. 2001, 159 (2): 589-598.PubMed CentralPubMedGoogle Scholar
- Donoghue MT, Keshavaiah C, Swamidatta SH, Spillane C: Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol Biol. 2011, 11: 47-10.1186/1471-2148-11-47.PubMed CentralPubMedGoogle Scholar
- Campbell MA, Zhu W, Jiang N, Lin H, Ouyang S, Childs KL, Haas BJ, Hamilton JP, Buell CR: Identification and characterization of lineage-specific genes within the Poaceae. Plant Physiol. 2007, 145 (4): 1311-1322. 10.1104/pp.107.104513.PubMed CentralPubMedGoogle Scholar
- Guo WJ, Li P, Ling J, Ye SP: Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome. Comparative and functional genomics. 2007, 2007: 21676-PubMed CentralGoogle Scholar
- Amiri H, Davids W, Andersson SG: Birth and death of orphan genes in Rickettsia. Mol Biol Evol. 2003, 20 (10): 1575-1587. 10.1093/molbev/msg175.PubMedGoogle Scholar
- Ogata H, Audic S, Renesto-Audiffren P, Fournier PE, Barbe V, Samson D, Roux V, Cossart P, Weissenbach J, Claverie JM: Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science. 2001, 293 (5537): 2093-2098. 10.1126/science.1061471.PubMedGoogle Scholar
- Siew N, Fischer D: Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins. 2003, 53 (2): 241-251. 10.1002/prot.10423.PubMedGoogle Scholar
- Daubin V, Ochman H: Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. Genome Res. 2004, 14 (6): 1036-1042. 10.1101/gr.2231904.PubMed CentralPubMedGoogle Scholar
- Cai JJ, Woo PC, Lau SK, Smith DK, Yuen KY: Accelerated evolutionary rate may be responsible for the emergence of lineage-specific genes in ascomycota. J Mol Evol. 2006, 63 (1): 1-11. 10.1007/s00239-004-0372-5.PubMedGoogle Scholar
- Casari G, De Daruvar A, Sander C, Schneider R: Bioinformatics and the discovery of gene function. Trends Genet. 1996, 12 (7): 244-245. 10.1016/0168-9525(96)30057-7.PubMedGoogle Scholar
- Dujon B: The yeast genome project: what did we learn?. Trends Genet. 1996, 12 (7): 263-270. 10.1016/0168-9525(96)10027-5.PubMedGoogle Scholar
- Zdobnov EM, von Mering C, Letunic I, Torrents D, Suyama M, Copley RR, Christophides GK, Thomasova D, Holt RA, Subramanian GM: Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science. 2002, 298 (5591): 149-159. 10.1126/science.1077061.PubMedGoogle Scholar
- Long M, Betran E, Thornton K, Wang W: The origin of new genes: glimpses from the young and old. Nat Rev Genet. 2003, 4 (11): 865-875.PubMedGoogle Scholar
- Khalturin K, Anton-Erxleben F, Sassmann S, Wittlieb J, Hemmrich G, Bosch TC: A novel gene family controls species-specific morphological traits in Hydra. PLoS Biol. 2008, 6 (11): e278-10.1371/journal.pbio.0060278.PubMed CentralPubMedGoogle Scholar
- Nasevicius A, Ekker SC: Effective targeted gene ’knockdown’ in zebrafish. Nat Genet. 2000, 26 (2): 216-220. 10.1038/79951.PubMedGoogle Scholar
- Wienholds E, van Eeden F, Kosters M, Mudde J, Plasterk RH, Cuppen E: Efficient target-selected mutagenesis in zebrafish. Genome Res. 2003, 13 (12): 2700-2707. 10.1101/gr.1725103.PubMed CentralPubMedGoogle Scholar
- Wienholds E, Schulte-Merker S, Walderich B, Plasterk RH: Target-selected inactivation of the zebrafish rag1 gene. Science. 2002, 297 (5578): 99-102. 10.1126/science.1071762.PubMedGoogle Scholar
- Irie N, Sehara-Fujisawa A: The vertebrate phylotypic stage and an early bilaterian-related stage in mouse embryogenesis defined by genomic information. BMC Biol. 2007, 5: 1-10.1186/1741-7007-5-1.PubMed CentralPubMedGoogle Scholar
- Yang X, Jawdy S, Tschaplinski TJ, Tuskan GA: Genome-wide identification of lineage-specific genes in Arabidopsis, Oryza and Populus. Genomics. 2009, 93 (5): 473-480. 10.1016/j.ygeno.2009.01.002.PubMedGoogle Scholar
- Domazet-Loso T, Brajkovic J, Tautz D: A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 2007, 23 (11): 533-539. 10.1016/j.tig.2007.08.014.PubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedGoogle Scholar
- Cai JJ, Petrov DA: Relaxed purifying selection and possibly high rate of adaptation in primate lineage-specific genes. Genome Biol Evol. 2010, 2: 393-409. 10.1093/gbe/evq019.PubMed CentralPubMedGoogle Scholar
- Lipman DJ, Souvorov A, Koonin EV, Panchenko AR, Tatusova TA: The relationship of protein conservation and sequence length. BMC Evol Biol. 2002, 2: 20-10.1186/1471-2148-2-20.PubMed CentralPubMedGoogle Scholar
- Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci USA. 2009, 106 (18): 7273-7280. 10.1073/pnas.0901808106.PubMed CentralPubMedGoogle Scholar
- Vishnoi A, Kryazhimskiy S, Bazykin GA, Hannenhalli S, Plotkin JB: Young proteins experience more variable selection pressures than old proteins. Genome Res. 2010, 20 (11): 1574-1581. 10.1101/gr.109595.110.PubMed CentralPubMedGoogle Scholar
- Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S: Ensembl 2012. Nucleic Acids Res. 2012, 40 (Database issue): D84-D90.PubMed CentralPubMedGoogle Scholar
- Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W: BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005, 21 (16): 3439-3440. 10.1093/bioinformatics/bti525.PubMedGoogle Scholar
- Sakharkar KR, Sakharkar MK, Culiat CT, Chow VT, Pervaiz S: Functional and evolutionary analyses on expressed intronless genes in the mouse genome. FEBS Lett. 2006, 580 (5): 1472-1478. 10.1016/j.febslet.2006.01.070.PubMedGoogle Scholar
- Ohno S: Evolution by gene duplication. 1970, New York: SpringerGoogle Scholar
- Daubin V, Lerat E, Perriere G: The source of laterally transferred genes in bacterial genomes. Genome Biol. 2003, 4 (9): R57-10.1186/gb-2003-4-9-r57.PubMed CentralPubMedGoogle Scholar
- Striepen B, Pruijssers AJ, Huang J, Li C, Gubbels MJ, Umejiego NN, Hedstrom L, Kissinger JC: Gene transfer in the evolution of parasite nucleotide biosynthesis. Proc Natl Acad Sci USA. 2004, 101 (9): 3154-3159. 10.1073/pnas.0304686101.PubMed CentralPubMedGoogle Scholar
- Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ: Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA. 2006, 103 (26): 9935-9939. 10.1073/pnas.0509809103.PubMed CentralPubMedGoogle Scholar
- Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, Zhan Z, Li X, Ding Y, Yang S, Wang W: On the origin of new genes in Drosophila. Genome Res. 2008, 18 (9): 1446-1455. 10.1101/gr.076588.108.PubMed CentralPubMedGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.PubMedGoogle Scholar
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S: Database resources of the national center for biotechnology information. Nucleic Acids Res. 2008, 36 (Database issue): D13-D21.PubMed CentralPubMedGoogle Scholar
- Zhang Y, Li J, Kong L, Gao G, Liu QR, Wei L: NATsDB: natural antisense transcripts DataBase. Nucleic Acids Res. 2007, 35 (Database issue): D156-D161.PubMed CentralPubMedGoogle Scholar
- Pannetier M, Renault L, Jolivet G, Cotinot C, Pailhoux E: Ovarian-specific expression of a new gene regulated by the goat PIS region and transcribed by a FOXL2 bidirectional promoter. Genomics. 2005, 85 (6): 715-726. 10.1016/j.ygeno.2005.02.011.PubMedGoogle Scholar
- Dai H, Chen Y, Chen S, Mao Q, Kennedy D, Landback P, Eyre-Walker A, Du W, Long M: The evolution of courtship behaviors through the origination of a new gene in Drosophila. Proc Natl Acad Sci USA. 2008, 105 (21): 7478-7483. 10.1073/pnas.0800693105.PubMed CentralPubMedGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.PubMed CentralPubMedGoogle Scholar
- Vesterlund L, Jiao H, Unneberg P, Hovatta O, Kere J: The zebrafish transcriptome during early development. BMC Dev Biol. 2011, 11 (1): 30-10.1186/1471-213X-11-30.PubMed CentralPubMedGoogle Scholar
- Aanes H, Winata CL, Lin CH, Chen JP, Srinivasan KG, Lee SG, Lim AY, Hajan HS, Collas P, Bourque G: Zebrafish mRNA sequencing deciphers novelties in transcriptome dynamics during maternal to zygotic transition. Genome Res. 2011, 21 (8): 1328-1338. 10.1101/gr.116012.110.PubMed CentralPubMedGoogle Scholar
- Pauli A, Valen E, Lin MF, Garber M, Vastenhouw NL, Levin JZ, Fan L, Sandelin A, Rinn JL, Regev A: Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res. 2012, 22 (3): 577-591. 10.1101/gr.133009.111.PubMed CentralPubMedGoogle Scholar
- Collins JE, White S, Searle SM, Stemple DL: Incorporating RNA-seq data into the zebrafish Ensembl genebuild. Genome Res. 2012, 22 (10): 2067-2078. 10.1101/gr.137901.112.PubMed CentralPubMedGoogle Scholar
- Milde S, Hemmrich G, Anton-Erxleben F, Khalturin K, Wittlieb J, Bosch TC: Characterization of taxonomically restricted genes in a phylum-restricted cell type. Genome Biol. 2009, 10 (1): R8-10.1186/gb-2009-10-1-r8.PubMed CentralPubMedGoogle Scholar
- Wilson GA, Bertrand N, Patel Y, Hughes JB, Feil EJ, Field D: Orphans as taxonomically restricted and ecologically important genes. Microbiology. 2005, 151 (Pt 8): 2499-2501.PubMedGoogle Scholar
- Udall JA, Wendel JF: Polyploidy and crop improvement. Crop Sci. 2006, 46: S3-S14.Google Scholar
- Fu B, Chen M, Zou M, Long M, He S: The rapid generation of chimerical genes expanding protein diversity in zebrafish. BMC Genomics. 2010, 11: 657-10.1186/1471-2164-11-657.PubMed CentralPubMedGoogle Scholar
- Koonin EV: The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate?. Biol Direct. 2006, 1: 22-10.1186/1745-6150-1-22.PubMed CentralPubMedGoogle Scholar
- Kaessmann H: Origins, evolution, and phenotypic impact of new genes. Genome Res. 2010, 20 (10): 1313-1326. 10.1101/gr.101386.109.PubMed CentralPubMedGoogle Scholar
- Conant GC, Wolfe KH: Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008, 9 (12): 938-950. 10.1038/nrg2482.PubMedGoogle Scholar
- Swanson WJ, Vacquier VD: The rapid evolution of reproductive proteins. Nat Rev Genet. 2002, 3 (2): 137-144.PubMedGoogle Scholar
- Schmidt EE: Transcriptional promiscuity in testes. Current biology: CB. 1996, 6 (7): 768-769. 10.1016/S0960-9822(02)00589-4.PubMedGoogle Scholar
- She X, Horvath JE, Jiang Z, Liu G, Furey TS, Christ L, Clark R, Graves T, Gulden CL, Alkan C: The structure and evolution of centromeric transition regions within the human genome. Nature. 2004, 430 (7002): 857-864. 10.1038/nature02806.PubMedGoogle Scholar
- Korzh V: Before maternal-zygotic transition … There was morphogenetic function of nuclei. Zebrafish. 2009, 6 (3): 295-302. 10.1089/zeb.2008.0573.PubMedGoogle Scholar
- Duboule D: Temporal colinearity and the phylotypic progression - a basis for the stability of a vertebrate bauplan and the evolution of morphologies through heterochrony. Development. 1994, 1994: 135-142.Google Scholar
- Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C: Prediction of human protein function from post-translational modifications and localization features. J Mol Biol. 2002, 319 (5): 1257-1265. 10.1016/S0022-2836(02)00379-0.PubMedGoogle Scholar
- Jensen LJ, Gupta R, Staerfeldt HH, Brunak S: Prediction of human protein function according to gene ontology categories. Bioinformatics. 2003, 19 (5): 635-642. 10.1093/bioinformatics/btg036.PubMedGoogle Scholar
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A: The UCSC genome browser database: update 2011. Nucleic Acids Res. 2011, 39 (Database issue): D876-D882.PubMed CentralPubMedGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664.PubMed CentralPubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.PubMedGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.