RNA-Seq of three free-living flatworm species suggests rapid evolution of reproduction-related genes

Brand, Jeremias N.; Wiberg, R. Axel W.; Pjeta, Robert; Bertemes, Philip; Beisel, Christian; Ladurner, Peter; Schärer, Lukas

doi:10.1186/s12864-020-06862-x

Research article
Open access
Published: 06 July 2020

RNA-Seq of three free-living flatworm species suggests rapid evolution of reproduction-related genes

Jeremias N. Brand ORCID: orcid.org/0000-0001-6126-8279¹,
R. Axel W. Wiberg¹,
Robert Pjeta²,
Philip Bertemes²,
Christian Beisel³,
Peter Ladurner² &
…
Lukas Schärer¹

BMC Genomics volume 21, Article number: 462 (2020) Cite this article

3503 Accesses
8 Citations
6 Altmetric
Metrics details

Abstract

Background

The genus Macrostomum consists of small free-living flatworms and contains Macrostomum lignano, which has been used in investigations of ageing, stem cell biology, bioadhesion, karyology, and sexual selection in hermaphrodites. Two types of mating behaviour occur within this genus. Some species, including M. lignano, mate via reciprocal copulation, where, in a single mating, both partners insert their male copulatory organ into the female storage organ and simultaneously donate and receive sperm. Other species mate via hypodermic insemination, where worms use a needle-like copulatory organ to inject sperm into the tissue of the partner. These contrasting mating behaviours are associated with striking differences in sperm and copulatory organ morphology. Here we expand the genomic resources within the genus to representatives of both behaviour types and investigate whether genes vary in their rate of evolution depending on their putative function.

Results

We present de novo assembled transcriptomes of three Macrostomum species, namely M. hystrix, a close relative of M. lignano that mates via hypodermic insemination, M. spirale, a more distantly related species that mates via reciprocal copulation, and finally M. pusillum, which represents a clade that is only distantly related to the other three species and also mates via hypodermic insemination. We infer 23,764 sets of homologous genes and annotate them using experimental evidence from M. lignano. Across the genus, we identify 521 gene families with conserved patterns of differential expression between juvenile vs. adult worms and 185 gene families with a putative expression in the testes that are restricted to the two reciprocally mating species. Further, we show that homologs of putative reproduction-related genes have a higher protein divergence across the four species than genes lacking such annotations and that they are more difficult to identify across the four species, indicating that these genes evolve more rapidly, while genes involved in neoblast function are more conserved.

Conclusions

This study improves the genus Macrostomum as a model system, by providing resources for the targeted investigation of gene function in a broad range of species. And we, for the first time, show that reproduction-related genes evolve at an accelerated rate in flatworms.

Background

The genus Macrostomum (Platyhelminthes, Macrostomorpha) consists of small free-living flatworms and contains the model organism Macrostomum lignano, which has been used in numerous studies investigating a broad range of topics, ranging from sexual selection in hermaphrodites [1,2,3], ageing [4, 5] and stem cell biology [6], to bioadhesion [7,8,9] and karyology [10]. To enable this research many state-of-the-art tools have been established, such as an annotated genome and transcriptome [11, 12], efficient transgenesis [12], in situ hybridisation (ISH) [7, 13], and gene knock-down through RNA interference (RNAi) [3, 14]. The wealth and breadth of research on M. lignano make this species unique among the microturbellarians, for which research is generally restricted to taxonomic and morphological investigations.

Given the success of using M. lignano as a model system, it is now desirable to produce genomic resources for more species within the genus to test if insights gained in M. lignano can be generalised. This is especially relevant since two contrasting types of mating behaviour occur within this genus [15]. Some species, including M. lignano (Fig. 1), show the reciprocal mating syndrome. They mate via reciprocal copulation, where, in a single mating, both partners insert their male copulatory organ (the stylet) into the female sperm storage organ (the antrum), and simultaneously donate and receive sperm [15]. In addition, these reciprocally mating species possess stiff lateral bristles on their sperm, which are thought to be a male persistence trait to prevent the removal of received sperm [17]. Sperm removal likely occurs since, after copulation, worms of these species are frequently observed to place their pharynx over their female genital opening and then appear to be sucking, most likely removing seminal fluids and/or sperm from the antrum [18]. The sperm bristles could thus anchor the sperm in the epithelium of the antrum during this post-copulatory suck behaviour [17]. Other species within the genus, such as M. hystrix, show the hypodermic mating syndrome (Fig. 1). They mate via hypodermic insemination, where worms use a needle-like stylet to inject sperm into the tissue of the partner and the sperm then move through the tissue to the site of fertilisation [15, 19, 20]. Sperm of hypodermically mating species lack bristles entirely [15]. As a consequence of these contrasting mating behaviours there likely are differences in the function of reproduction-related genes between reciprocally and hypodermically mating species. Genomic resources for species with contrasting mating syndromes could, therefore, be used to identify these genes and investigate their function.

A range of empirical gene annotations derived from RNA-Seq experiments in M. lignano are available, with candidate gene sets that are differentially expressed (DE) between body regions [21], stages of tissue regeneration [22], social environments [23], animals of different ages [5], and between somatic cells and somatic stem cells (called neoblasts in flatworms) [6]. Identifying the homologs of genes with such empirical annotations in other Macrostomum species will allow us to investigate their function and rate of evolution in a broader phylogenetic context. For example, it can be assessed whether genes identified as being involved in neoblast function are conserved, and this may identify genes that are particularly important in flatworm regeneration.

Moreover, insights into the biology of these species can be gained by identifying rapidly evolving genes, since there is evidence that in a range of organismal groups reproduction-related genes evolve faster than genes serving other functions (reviewed in [24, 25]). Among the fastest-evolving genes are those encoding for proteins directly involved in molecular interaction with the mating partner, such as pheromone receptors (e.g. [26]), seminal fluid proteins (e.g. [27]), and proteins involved in gamete recognition and fusion (e.g. [28]). Groups of genes with biased expression in reproduction-related tissues, such as the testis and ovary, can also show elevated rates of evolution. Evidence for this comes both from sequence based analysis of the rate of divergence and the increased difficulty of detecting homologs of reproduction-related genes [29, 30].

Here we present transcriptomes and differential expression (DE) datasets of three Macrostomum species (Fig. 1; Additional file 1: “Amino acid alignment of one-to-one orthologs”; Additional file 2: “Maximum likelihood phylogeny” and Additional file 3: “IQ-TREE logfile”), namely i) M. hystrix, a close relative of M. lignano that mates via hypodermic insemination, ii) M. spirale, a somewhat more distantly related species that, like M. lignano, mates via reciprocal copulation, and finally iii) M. pusillum, which represents a clade that is deeply split from the other three species and which also mates via hypodermic insemination (see also [15, 16] for the broader phylogenetic context). All three species are routinely kept in the laboratory and studies have been published using cultures of M. hystrix [10, 19, 20, 31], M. pusillum [32], and M. spirale [10]. Since the comparison to M. pusillum represents one of the largest genetic distances within the genus, it is an ideal choice to identify genes that are either conserved or evolve rapidly. The inclusion of two species with hypodermic insemination furthermore allows candidate selection for genes involved in determining differences in sperm morphology.

In all three species, we produced RNA-Seq libraries for adults (A), hatchlings (H), and regenerants (R), in order to capture the expression of as many genes as possible and to allow for DE analyses between these biological conditions (Fig. 2a, red labels). Since hatchlings lack sexual organs, genes with higher expression in adults compared to hatchlings can serve as candidate genes that are specific for those organs. Conversely, genes with higher expression in hatchlings are candidates for genes regulating early development. Finally comparing gene expression in adults vs. regenerants can identify regeneration-related candidate genes involved in the development of structures that are not actively forming in the adult steady state, such as the male genitalia (as demonstrated in [22]). Besides conducting the described DE analysis, we also determined groups of homologous genes (called orthogroups [OGs] throughout the text) between the three species presented here and M. lignano (Fig. 2). This allowed us to transfer the empirical annotations from three RNA-Seq experiments performed in M. lignano (Fig. 2b-d, red labels) to these inferred OGs and investigate whether OGs with particular annotations show signs of conservation or rapid evolution in patterns of protein sequence divergence and/or gene presence/absence.

Results

Transcriptome assembly and quality

We used > 300 million paired-end reads per species—derived from adults (A), hatchlings (H), and regenerants (R)—to assemble the transcriptomes of M. hystrix, M. spirale, and M. pusillum (Table 1). All three transcriptomes were fairly complete in gene content when assessed using BUSCO, with more than 92.5% of all 978 core metazoan genes found either complete or as fragments in all species (Table 1). Moreover, the assemblies were a good representation of the reads used to infer them, with > 87 and > 46% of the reads mapping back to the raw and the (CD-HIT) reduced assembly, respectively (Table 2). TransRate scores were between 0.28 and 0.29 (Table 1), placing them above average when compared to 155 publicly available transcriptomes evaluated in [33] (which ranged from 0 to 0.52, with an average of 0.22). The M. spirale transcriptome contained almost twice as many transcripts as the other two, but although M. spirale had the highest absolute number of functional annotations (Table 1), it had the lowest percentage of transcripts with annotations. The M. spirale assembly could thus contain more redundant sequences, contain more poorly assembled contigs due to increased heterozygosity or contain more non-coding transcripts than the others (see Discussion).

Table 1 Transcriptome assembly statistics per species. The initial number of reads used, the number of reads after Trimmomatic processing, the number of initially assembled transcripts, the empirical mean insert size of the RNA-Seq libraries, the number of distinct 21-mers, the number of transcripts removed by CroCo, and the final number of transcripts, as well as the mean transcript length and number of bases in the final assemblies are shown. The BUSCO score is given as the percentage of complete (C) genes—divided into present as single copies (S) or duplicates (D)—and fragmented (F) genes of the 978 metazoa gene set. The next three rows detail the TransRate score, the number of transcripts remaining after TransDecoder translation and CD-HIT clustering, and the number of transcripts considered in the DE analysis. Below this a summary of the results from the Trinotate annotations giving the number of transcripts (and the corresponding percentage of the whole transcriptome in brackets) with a given annotation: ORF, contains a predicted open reading frame; BLASTX, the predicted ORF and/or the entire transcript produced a hit in the protein database; Pfam, a protein family domain was found; SignalP, a signal peptide was detected; TMHMM, a transmembrane helix is predicted

Full size table

Table 2 Read mapping statistics. The average percentage of reads per species and condition, which could be mapped back to the raw or reduced transcriptome assemblies, respectively

Full size table

Orthology detection

We used OrthoFinder to infer 23,764 OGs, with 11,331 of those OGs containing sequences from all four species, and 1190 containing all species except for M. lignano (see Additional file 4: Table S1 for all inferred OGs). OGs were generally large with only 1263 single-copy orthologs identified between all four species (these orthologs were used for the species tree inference depicted in Fig. 1, see also below). OrthoFinder provides a summary of the number of gene duplications that occurred on each node of the species tree (Fig. 1), and this analysis indicated that most of the gene duplications occurred on the terminal branches, with the highest number occurring in M. lignano.

DE analysis

When comparing expression of adults vs. hatchlings (AvH), similar numbers of transcripts were DE in all three species, with about twice as many transcripts with higher expression in adults compared to hatchlings (Fig. 3a, see also Additional file 5: Table S2 for the DE results of the AvH comparison, and Additional file 6: Table S3 and Additional file 7: Table S4 for the AvR and RvH contrasts). M. pusillum showed slightly lower numbers of DE genes and a DE distribution that deviated from that of the other two species. Specifically, the distributions of DE genes in both M. hystrix and M. spirale shows a cloud of off-diagonal points, representing transcripts with high expression in adults, but low expression in hatchlings. In M. pusillum, this cloud of adult-biased transcripts also exists, but it is shifted up on the y-axis because many of these transcripts also show substantial expression in hatchlings.

We identified a total of 634 OGs that had at least one transcript from every species DE in the AvH contrast (Fig. 3b). 404 of these showed higher expression in adults, 117 showed higher expression in hatchlings, and 113 did not have a consistent signal. Again, we observed differences between M. pusillum and the other two species. All but two of the transcripts in those with higher expression in adults also had expression in hatchlings, while in M. hystrix and M. spirale many transcripts had no expression in hatchlings (see points with red colour at the bottom of the y-axis in Fig. 3b). We explore possible reasons for these observations in the Discussion.

Orthogroup annotation

18,938 OGs contained transcripts from M. lignano and could thus potentially carry over empirical annotations. Out of these, 6119 OGs could be annotated with information from the positional (2495 OGs), neoblast (1924 OGs), or social (3717 OGs) RNA-Seq datasets (see Additional file 8: Table S5 for the full annotations). In the positional dataset 173 OGs contain Mlig_37v3 transcripts with conflicting positional information (e.g. tail region and testis region). We categorised these as “positional_mix” and did not consider them further in the downstream analysis since they contain multiple small groups with non-intuitive annotations. Similarly, in the neoblast dataset, we categorised 20 OGs as neoblast_mix because they contained transcripts with the germline annotation (germline_FACS) and transcripts with one of the two neoblast annotations (neobast_FACS and neoblast-strict). Finally, in the social dataset, we categorised 10 OGs as social_mix because they contained transcripts with the octets vs. isolated annotation (OvI) annotation and transcripts with the octets vs. pairs (OvP) annotation, but no transcript annotated from both contrasts (BOTH). We also excluded both the neoblast_mix and the social_mix annotations from the downstream analysis.

There was also overlap between the three RNA-Seq datasets, with several OGs being annotated from multiple sources. The most substantial overlap was between the germline_FACS and the testis region annotation, followed by the overlap between these two annotations and the octets vs. isolated (OvI) annotation (Fig. 4 and Additional file 9: Fig. S1). This overlap was expected since testis region transcripts likely contain mostly transcripts expressed in the testes. Since the neoblast annotation was independent from our reanalysis of the positional dataset, the considerable overlap it shows with the positional and social data supports that these annotations are indeed reflecting biological reality. However, this overlap also made them highly redundant, and we thus excluded the germline annotation from the downstream analysis, retaining only the neoblast annotations. Within the social dataset, most OGs were either annotated as OvI or as BOTH, while only 42 OGs carried the OvP annotation. We also excluded the OvP annotation due to small sample size, leaving us with seven DE annotations in total for the downstream analysis (testis region, ovary region, and tail region; neoblast_FACS and neoblast-strict; and OvI and BOTH; but see Additional file 10: Table S6 for a complete annotation of the Mlig_37v3 transcriptome).

The distribution of secretory signals, as estimated by SignalP, was not uniform across the different positional annotations (chi-squared = 18.0, df = 4, p-value = 0.001). The observed counts only differ substantially from the expected counts for the tail region OGs (54 observed vs. 32.9 expected, Table 3), indicating that OGs in the tail region are enriched in transcripts with a secretory signal.

Table 3 SignalP enrichment analysis. The number of complete OGs that contain transcripts with a SignalP hit, split by the positional annotation. The expected number of OGs with a SignalP is derived from the chi-square test

Full size table

Protein divergence and species composition of OGs differs by annotation

The majority (59.8%) of OGs with a transcript from M. lignano contained all four species and 19.1% contained all species except M. pusillum, while only a few (1.2%) were shared just between M. lignano and M. pusillum (Additional file 11: Table S7). The protein divergence of OGs containing all four species differed depending on their annotation, with higher divergence in OGs with a positional annotation (one-sample Wilcoxon: all p < 0.001, Fig. 5a) and lower divergence in OGs with the neoblast_FACS annotation (one-sample Wilcoxon: p < 0.001), but not the neoblast-strict annotation (one-sample Wilcoxon: p = 0.2, Fig. 5b) compared to OGs without an annotation from the respective sources. These patterns of divergence were also reflected in the species composition of OGs, with a smaller than expected percentage of OGs with a positional annotation containing all four species (Fig. 6), which is consistent with the more rapid evolution of these putative reproduction-related transcripts. Conversely, a substantially larger percentage of OGs with either of the neoblast annotations contained all four species (Fig. 6), suggesting that these genes are fairly conserved. Finally, while OGs annotated with the social dataset did not show a difference in protein divergence compared to OGs with no annotation (one-sample Wilcoxon: OvI: p = 0.34, BOTH: p = 0.34, Fig. 5 C) they contained a larger than expected percentage of OGs with all four species (Fig. 6). The difference between the expected and observed proportions was, however, quite small for the ‘BOTH’ annotation (Fig. 6), indicating a small effect size. Moreover, OGs annotated as testis or tail region contained a higher than expected percentage of OGs that were shared only between M. lignano and M. spirale (Fig. 6). Since both of these species mate through reciprocal copulation and have a characteristic sperm morphology with lateral bristles [15], these OGs are possible targets in the search for the genes underlying these traits. We explore these observations in more detail in the Discussion.

OG validation using ISH

As a case study to show the relevance of the OGs across all four studied species, we analysed the expression of a gene that affects the sperm bristle phenotype in M. lignano (RNA815_7008 in the MLRNA110815 transcriptome) [21]. This transcript is exclusively expressed in the testes in M. lignano [21], and we thus expect its orthologs to also be expressed in the testes of the other species. We designed probes for the orthologs in M. hystrix, M. spirale, and M. pusillum and performed ISH experiments to test this prediction. In addition, we also repeated the ISH experiments in M. lignano. We detected a highly specific signal in the testes in all four species (Fig. 7.; for sense control see Additional file 12: Fig. S2), which i) indicates that tissue specificity of this transcripts is conserved across the genus, and ii) demonstrates that our OGs can be used to identify orthologs and target them using molecular methods.

Discussion

In the following section, we will first highlight some differences in the transcriptome assemblies and the DE results between the three species and their possible influence on our conclusions. Then we will focus on the differences in protein sequence divergence and species composition of OGs by annotation and discuss their implications. Note that we were only able to arrive at these results because we spent considerable effort on the reannotation of the M. lignano transcriptome. We discuss the majority of this work in Additional file 13: ‘Reannotation of Mlig_37v3 transcriptome’ (which also makes reference to Additional file 14: Table S8; Additional file 15: Table S9; and Additional file 16: Table S10) and we direct the reader to this document for a detailed explanation of all annotations.