De novo transcriptome assembly and hepatic gene expression analyses in overfed and ad libitum fed Muscovy and Pekin ducks and their reciprocal inter-specific Mule and Hinny hybrids


 Background Common Pekin and Muscovy ducks and their intergeneric hinny and mule hybrids have different abilities for fatty liver production. RNA-Seq analyses from the liver of these different genetic types fed ad libitum or overfed would help to identify genes with different response to overfeeding between them. However RNA-seq analyses from different species and comparison is challenging. The goal of this study was develop a relevant strategy for transcriptome analysis and comparison between different species. Results Transcriptomes were first assembled with a reference-based approach. Important mapping biases were observed when heterologous mapping were conducted on common duck reference genome, suggesting that this reference-based strategy was not suited to compare the four different genetic types. De novo transcriptome assemblies were then performed using Trinity and Oases. Assemblies of transcriptomes were not relevant when more than a single genetic type was considered. Finally, single genetic type transcriptomes were assembled with DRAP in a mega-transcriptome. No bias was observed when reads from the different genetic types were mapped on this mega-transcriptome and differences in gene expression between the four genetic types could be identified. Conclusions Analyses using both reference-based and de novo transcriptome assemblies point out a good performance of the de novo approach for the analysis of gene expression in different species. It also allowed the identification of differences in responses to overfeeding between Pekin and Muscovy ducks and hinny and mule hybrids.

involves Mule ducks, i.e. intergeneric hybrids between a female common Pekin duck (Anas plathyrynchos) and a male Muscovy duck (Cairina moschata), and in a far lesser extent Muscovy ducks or geese from the Landes grey breed (Anser anser). Conversely, Pekin ducks are not involved due to their lower ability to produce fatty liver. Some studies have been conducted to better characterize hepatic steatosis development in waterfowls and analyze differences between genotypes. However, they mainly focused on biochemical levels [1][2][3][4][5]. Some studies also focused on gene expression levels but were conducted on few candidate genes only [6][7][8][9]. This was mainly due to the lack of duck specific microarrays for transcriptome analyses at the genome level. Thus, genomewide analyses of gene expression are still needed to better characterize hepatic steatosis in ducks and their hybrids.
The recent development of high throughput RNA sequencing now allows characterization of expressed transcripts and quantification in samples of any species without the need of preexisting tools such as microarrays [10,11]. Different strategies were applied for transcriptome assembly and analyses of RNA sequences referred as reference-based or de novo assemblies [12][13][14][15]. Reference-based strategy is generally conducted when a reference genome for the target transcriptome is available. In such a situation, RNA sequences are aligned on the reference genome using mapping tools like Burrows-Wheeler transform (BWA) [16] or TopHat-Cufflinks [17]. This approach has also been applied in heterologous situations where reference genome and RNA sequences were from different species, for examples mapping horse, donkey and their hybrids transcriptomes on horse reference genome [18], sparrows transcriptome on zebra finch reference genome [19] or red deer transcriptome on cow reference genome [20]. However, in such heterologous situations, i.e. in the absence of reference genome from the same species or when different species and hybrids are compared, de novo approaches are more generally conducted using de novo transcriptome assembling tools such as Trinity [21] or Oases [22]. Some examples of transcriptome analyses in interspecific hybrids are the studies conducted in cyprinidae [23] or in brassica [24]. As different tools are available for reconstructing transcripts de novo, the choice of an optimal method is challenging. Surprisingly, different de novo assembly tools and methods were compared using common duck RNA sequences 4 [25] while common duck reference genome is available [26]. Although different strategies can be applied to analyze transcriptomes from different species, no one appears as the best one.
Interestingly, a de novo RNA-Seq Assembly Pipeline (DRAP) has been developed improving de novo transcriptome assemblies performed by Trinity and Oases in terms of compaction (number of contigs needed to represent the transcriptome) and quality (chimera and nucleotide error rates) [27].
In the present study, reference-based approach using Anas platyrhynchos genome as reference and de novo assembly approach were used and compared to analyze hepatic gene expression of overfed and ad libitum fed Muscovy and Pekin ducks and their reciprocal inter-specific Mule and Hinny hybrids and to identify differentially expressed genes between feeding and duck genetic types.

Animals and experimental design
Animals, experimental design, RNA preparation and sequencing have been described in previous publications [5,6,8,28]. Briefly, male ducks corresponding to common Pekin (Anas platyrhynchos) and Muscovy (Cairina Moschata) duck species and to their two reciprocal interspecific Mule and Hinny hybrids were fed ad libitum or overfed 14 days with corn and corn meal. RNA were extracted from liver samples, cloned in libraries and paired-ends sequenced. Transcriptome assembly using reference genome After adapter trimming, mapping of RNA sequences and transcripts reconstruction on the common duck (Anas platyrhynchos) reference genome (BGI_duck_1.0, INSDC Assembly GCA_000355885.1) [26] were performed using TopHat2, Cufflinks and Cuffmerge tools of the Tuxedo suite [17].
De novo transcriptome assemblies RNA-seq data were alternatively assembled using de novo approaches. The raw datasets for each of the 4 species were assembled independently using Trinity v2.3.2 [21,29], including a reads cleaning process with Trimmomatic (using defaults parameters) [30] and an in silico reads normalization step (provided within the Trinity suite). Parental transcriptome (mixing Pekin + Muscovy normalized reads) and hybrid transcriptome (mixing Hinny and Mule duck normalized reads) were assembled with Trinity and Oases (-m 25 -M 65 -s 10) [22]. Composite meta-transcriptome assembly of single genetic type transcriptomes was also conducted using DRAP v1.91 [27] for compacting and correcting Trinity assemblies and constructing a reference transcriptome. By default, DRAP produced 4 assemblies which were filtered for low coverage with Fragments Per Kilobase per Million (FPKM) thresholds set at 1, 3, 5 and 10 respectively.

Results
Transcriptome assembly using reference genome normalization. They were assembled de novo using Trinity in four independent "single genetic type" transcriptomes, in a "mixed parental" transcriptome (with Pekin and Muscovy reads) and in a "mixed hybrids" transcriptome (with Hinny and Mule reads). A mixed hybrids transcriptome was also assembled using Oases for comparison. As shown in Table 1, single parental species Pekin or Muscovy transcriptome assemblies with Trinity were of higher quality when compared to single hybrid Hinny or Mule transcriptome assemblies with greater average lengths, higher N50 values and lower numbers of transcripts. The quality of the mixed parental (Pekin + Muscovy) transcriptome assembly was also lower, similar to those of single hybrids transcriptome assemblies. The quality of the mixed hybrids (Hinny + Mule) transcriptome assembly was even lower. When mixed hybrids transcriptome assembly was conducted with Oases, very long transcripts with a great N50 value were produced suspected to be chimeras. Pseudo-alignments of sample reads on these de novo transcriptome assemblies and quantification were performed using Kallisto (Fig. 1). Annotation completeness in terms of gene content of transcriptome assemblies using mapping approach with Cufflinks and de novo approach with DRAP were assessed with BUSCO for the presence/absence of the conserved eukaryotic single copy orthologous genes. As shown in Table 2, DRAP de novo approach produced more orthologues (97.1% completeness), provided a more complete (0% missing) and less fragmented (only 3%) catalog of orthologues when compared to the mapping approach using Cufflinks (81.5%, 7.3% and 11.2%, respectively). Interestingly, when same BUSCO analyses were conducted on common duck reference genome, it appears that this genome contains more complete and single copy orthologous genes but less complete orthologues in total (82.9%) with more genes missing (7.2%) suggesting that our RNA sequencing and assembly allowed identifying new genes that are missing in the reference genome assembly.

Gene expression analyses
Gene expressions after DRAP de novo approach were analyzed with edgeR. Significant (p-value < 0.01) differentially expressed genes (fold change ≥ 2) were determined. In total, 13898 different genes were found up-or down-regulated by overfeeding in the four genetic types (   (Fig. 3B), two clusters were first defined: one including all Pekin samples, whatever the feeding was, and the other including all other samples. ducks indicating more differences in feeding effect between the two duck "pure" species.

Discussion
The aim of this study was to analyze and compare gene expressions in four different duck genetic types, common Pekin and Muscovy duck species and their reciprocal mule and hinny hybrids using a relevant approach. As mentioned by Moreton et al. [14], choice for using reference-based or de novo transcriptome assembly approach for gene expression analyses is generally based on the question whether or not a reference genome is available. Reference-based strategy involves mapping of RNA sequences on a reference genome to assemble transcripts and count reads. This strategy is generally conducted when reference genome in the species of interest or in a closely related species is available. For examples, this strategy was conducted in sparrows using zebra finch reference genome [19] or in red deer using cow reference genome [20]. Otherwise, when a reference genome from the species or a closely related species is not available, de novo assembly approach is generally recommended and many examples can be found in the literature. However, when gene expression from different species are to be analyzed and compared, the question of strategy is more complex and cannot be taken up with the question of availability of a reference-genome for one of these species. Few relevant results for such a situation have been published. Reference-based strategy was applied for transcriptome analysis of horse, donkey and their hybrids using the horse reference genome [18]. Conversely, de novo assembly approach was conducted for transcriptome analyses in interspecific cyprinidae [23] and brassica [24] hybrids. In support of these studies, neither strategy seemed a priori better than the other. Therefore, we decided to test both approach in our study.
Reference-based approach was first conducted using the common duck (Anas platyrhynchos) genome as reference for the four genetic types. As indicated, mapping rates were very different when homologous and heterologous mapping were considered (71% with Pekin duck reads and 41% with Muscovy duck reads). In the study of Wang et al. [18], homologous mapping rate of horse reads on horse reference genome (61.70%) was very similar to heterologous mapping rate of donkey reads on horse reference genome (62.07%). The reason of this difference of heterologous mapping rate between equids and ducks was not directly investigated but we can speculate that it is partly due to the difference of evolution distance between equids and duck species (estimated time according to http://www.timetree.org/, [43], 7.7 and 22.8 MYA, respectively), but also to different mapping parameters in the two studies. In our study, these parameters were set by defaults and were not optimized for heterologous mapping. As described in a previous study [28], this bias of mapping was not a problem to analyze differences in gene response to feeding in ducks from the same genetic type. It was however too important for gene expression comparisons between the four genetic types and to analyze genetic effect in gene expression.
Therefore, de novo transcriptome assembly approaches were then conducted. Many tools are available for de novo transcriptome assembly, with different efficiencies according to species assembled [44]. We chose to use and compare two of them, Trinity [21,29] and Oases [22]. As shown in Table 1 and Fig. 1A, Trinity "single" transcriptome assemblies from "pure" Pekin or Muscovy duck sequences were very similar and better in terms of length and alignments of reads than assemblies from hinny and mule hybrid sequences. This result is probably due to the fact that in hybrids transcripts are expressed from two different genomes making assembly less efficient according to the presence of some polymorphisms in transcripts from the same orthologs between the two species.
This was also observed when assemblies were performed with sequences from two different species (Fig. 1B). We have also shown that more reads were aligned on these de novo assembled transcriptomes than on common duck reference genome. Furthermore, homologous mapping of Pekin reads on de novo Pekin transcriptome assembly (around 85%) was more efficient than that on reference genome (71%). We can speculate that some reads correspond to genes that are missing in the reference genome, corresponding to new or misassembled genes, or to genes that are in unknown regions, i.e. unassembled and unassigned regions. Thus, these reads are not mapped on reference genome but are assembled together in transcripts by de novo approach and map on theses transcripts.
As previously said, mapping of reads on transcriptomes assembled from hybrids or two genetic types were less efficient, whatever the assembler Trinity or Oases used, indicating that these tools are not well suited for "complex" transcriptome assembly. Interestingly, Cabau et al. have described a de novo RNA-Seq Assembly Pipeline (DRAP) expected to improve Trinity and Oases RNA-Seq assemblies [27]. This prompted us to test DRAP, not only to improve "single" genetic type assemblies but also to perform a meta-transcriptome assembly of transcriptomes from the four duck genetic types. As shown in Fig. 2, this meta-transcriptome assembly with DRAP was very efficient, with a mapping rate of sample reads similar to the best rates previously observed, i.e. homologous mapping rate of Pekin or Muscovy duck reads on de novo assembled Pekin or Muscovy transcriptomes. Further analysis with BUSCO [33] confirmed the quality of this meta-transcriptome assembled with DRAP ( Table 2). As stated on the dedicated website, BUSCO provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs. In our study, more than 97% of single-copy orthologs were assembled by DRAP as complete (i.e. full expected transcript length), 3% were fragmented (assembled with a shorter length than expected) and finally no (0%) were missing. With reference-based approach using Cufflinks, only 81.5% single-copy orthologs were assembled to full length, 11.2% were fragmented and 7.3% were missing. This clearly indicates that de novo assembly of meta-transcriptome using DRAP is of higher quality when compared to assembly using reference-based approach. Again, this is probably due to the missing of genes in the reference genome. Moreover, when BUSCO was applied to reference genome, proportions of complete and missing orthologs were similar (82.9% and 7.2%, respectively) to those found after reference-based approach for transcriptome assembly using Cufflinks. It is interesting to note that lower proportions of complete and single copy orthologs were found in DRAP and Cufflinks assembled transcriptomes (59.1% and 42.9%) than in reference genome (81.2%). The fact that most of orthologs were complete and as expected in a single copy in the genome and not in transcriptomes whatever the approach used is probably the result of the presence of only one isoform for one orthologs in the genome (by definition because BUSCO applies only on single-copy orthologs in related species) but of multiple alternative transcripts from one single-copy orthologs making assembly in one transcript more tricky.
This clearly indicates that development of methods to improve transcriptome assembly and reduce duplication and more generally redundancy remains a concern today. Nevertheless, our results show that the de novo approach with DRAP is more relevant than the reference-based approach to assemble a meta-transcriptome from reads of the four duck genetic types.
Gene expression analyses and identification of differentially expressed genes was then conducted using this meta-transcriptome as reference, assuming that mapping bias were substantially reduced when compared to our previous study using a reference based approach [28] and allowed us to analyze differentially expressed genes within and between genetic types. Many genes were found upand down-regulated by overfeeding and some of them (903) were found in the four genetic types.
These results indicate some similarities but also some genetic type-specific responses to overfeeding.
This was further confirmed by principal component analyses and hierarchical clustering. Different conclusions can be drawn from these results. First, responses to overfeeding in hinny and mule hybrids were very similar and indicate that expectation that hinnies are not used for fatty liver production due to a different and lower response to overfeeding is wrong. In fact, the reason essentially lies in the difficulty to produce viable hinnies due to a lower efficiency of artificial insemination of Muscovy females and their lower fertility when compared to Pekin females. Second important conclusion is that many genes are differentially expressed in the four genetic types. This suggests that response to overfeeding and consequently development of hepatic steatosis is a complex trait involving many genes and functions and is not well described by the expression of few candidate genes [6][7][8][9], even if expressions of these genes are well correlated to fatty liver weight [6].
Third, although some differentially expressed genes were found in the four genetic types, some other were not found in Pekin ducks, making this species less similar and more distant to the three other genetic types. Furthermore, individual variability of responses in Pekin ducks was very important making some overfed Pekin samples near to Pekin ducks fed ad libitum samples. This is directly linked to the greater variability and lower ability of Pekin ducks to produce fatty liver in response to overfeeding when compared to hinny and mule hybrids and Muscovy ducks. To conclude on this point, response to overfeeding in ducks is very complex, different in some extent from a genetic type to the other and involves many genes suggesting that genetic improvement of duck selection for fatty liver production should be undertaken at the whole genome level.

Conclusion
We have shown that differential gene expression analyses in different duck genetic types and comparisons are more accurate when a meta-transcriptome is assembled with DRAP from single transcriptomes of the four different genetic types assembled by Trinity. Using this strategy, we have shown that differences in responses to overfeeding can be identified between the four genetic types.
This work provides new information to describe hepatic steatosis in different duck genetic types in a much more holistic way. This information could also be exploited in the context of genomic selection programs that will certainly be developed in ducks in the coming years.  Pseudo-alignment rates of sample reads on DRAP meta-assembly. Mapping rates of sample reads on DRAP meta-transcriptome assembly. Pekin assembly using Trinity is again indicated to ease the comparison.