Genomics and transcriptomics of epizoic Seisonidea (Rotifera, syn. Syndermata) reveal strain formation and gradual gene loss with growing ties to the host

Background Seisonidea (also Seisonacea or Seisonidae) is a group of small animals living on marine crustaceans (Nebalia spec.) with only four species described so far. Its monophyletic origin with mostly free-living wheel animals (Monogononta, Bdelloidea) and endoparasitic thorny-headed worms (Acanthocephala) is widely accepted. However, the phylogenetic relationships inside the Rotifera-Acanthocephala clade (Rotifera sensu lato or Syndermata) are subject to ongoing debate, with consequences for our understanding of how genomes and lifestyles might have evolved. To gain new insights, we analyzed first drafts of the genome and transcriptome of the key taxon Seisonidea. Results Analyses of gDNA-Seq and mRNA-Seq data uncovered two genetically distinct lineages in Seison nebaliae Grube, 1861 off the French Channel coast. Their mitochondrial haplotypes shared only 82% sequence identity despite identical gene order. In the nuclear genome, distinct linages were reflected in different gene compactness, GC content and codon usage. The haploid nuclear genome spans ca. 46 Mb, of which 96% were reconstructed. According to ~ 23,000 SuperTranscripts, gene number in S. nebaliae should be within the range published for other members of Rotifera-Acanthocephala. Consistent with this, numbers of metazoan core orthologues and ANTP-type transcriptional regulatory genes in the S. nebaliae genome assembly were between the corresponding numbers in the other assemblies analyzed. We additionally provide evidence that a basal branching of Seisonidea within Rotifera-Acanthocephala could reflect attraction to the outgroup. Accordingly, rooting via a reconstructed ancestral sequence led to monophyletic Pararotatoria (Seisonidea+Acanthocephala) within Hemirotifera (Bdelloidea+Pararotatoria). Conclusion Matching genome/transcriptome metrics with the above phylogenetic hypothesis suggests that a haploid nuclear genome of about 50 Mb represents the plesiomorphic state for Rotifera-Acanthocephala. Smaller genome size in S. nebaliae probably results from subsequent reduction. In contrast, genome size should have increased independently in monogononts as well as bdelloid and acanthocephalan stem lines. The present data additionally indicate a decrease in gene repertoire from free-living to epizoic and endoparasitic lifestyles. Potentially, this reflects corresponding steps from the root of Rotifera-Acanthocephala via the last common ancestors of Hemirotifera and Pararotatoria to the one of Acanthocephala. Lastly, rooting via a reconstructed ancestral sequence may prove useful in phylogenetic analyses of other deep splits. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07857-y.


Assembly of Paraseison annulatus mitochondrial genes
DNA of Paraseison annulatus, a close relative of Seison nebaliae, was subjected to whole genome amplification (WGA) prior to library construction. Subsequent gDNA-Seq and processing steps were carried out as described in the main text. Assembling with MEGAHIT v. 1.2.9 [1] resulted in several contigs with mitochondrial sequences, which were used as seed for reference-based assembly with MITObim v. 1.9.1 [2] and NOVOPlasty v. 4.2 [3]. Nevertheless, the complete mitogenome could not be assembled from the data. Probably, initial WGA let to the underrepresentation of greater parts of the mitogenome in the gDNA-Seq data. Still, major parts of cox1 and nd1 were contained in several contigs, each. After having aligned the corresponding sequences with MAFFT v. 7 (https://mafft.cbrc.jp/alignment/server/) [4], we manually derived a consensus sequence for both genes.

Phylogeny of Seisonidea as inferred from cox1 and nd1
The aforementioned sequences were used to validate that present NGS data reflect strains within S. nebaliae and not contamination of our S. nebaliae sample with P. annulatus or their common crustacean host, Nebalia bipes. Corresponding tree reconstruction was carried out on alignments (MAFFT) including cox1 and nd1 sequences of P. annulatus in addition to their orthologues in S. nebaliae strains A and B (present study). The cox1 alignment additionally contained a S. nebaliae sequence (DQ297765.1), which had been reconstructed by another working group [5]. We further collected sequences of the crustaceans Nebalia pseudotroncosoi (Leptostraca; cox1: JX442539.1), Pacifastacus leniusculus (Astacoidea; nd1: NC_033509.1) and Penaeus (Litopenaeus) vannamei (Penaeoidea; cox1 and nd1: EF584003.1) and of Caenorhabditis elegans (NC_001328.1). All reconstructions show a monophylum including all sequences classified as S. nebaliae (see Table S1), when rooted by C. elegans. The Crustacea species used for the nd1 and cox1 trees were monophyletic in all phylogeny reconstructions 3 (Supplementary Table S1). Bayesian inference trees (MrBayes) are additionally depicted in Figure 2 (main text) and Supplementary Figure S1. For further details, see Materials and Methods in the main text. Table S1. Trees as inferred from cox1 and nd1 alignments.

Dataset
Program Newick annotation

Generation of a custom S. nebaliae de novo repeat database
For de novo generation of a custom database of S. nebaliae repeats, we ran RepeatModeler v. 2.0.1 [6], dnaPipeTE [7] and RepARK v. 1.3.0 [8]. The program dnaPipeTE [7] used a subset of trimmed and filtered forward reads for generating a Trinity assembly [9], which should result in a draft of only the repetitive genome sequences. The subset was determined by setting an estimated genome size and coverage. We used a pipeline developed by Schell et al. [10] to find the 5 best subset size to assemble the repetitive sequences as coherent as possible. While genome size was persistently set to 50 Mb, we implemented coverage values from 0.0001 to 1.3, in order to determine the best fitting one (22 increments). In doing so, we found that N50 values were highest for small coverage values ( Supplementary Fig. S2). executed with default settings on trimmed and filtered forward and reverse DNA reads. In a first step, the distribution of kmers (31 bp) was inferred using Jellyfish [11] implemented in RepARK. Based on the distribution observed, RepARK defined a coverage threshold for extracting more abundant repeat-derived kmers from their counterparts representing the non-repetitive portion. The threshold determined by the program was 204, which seems plausible considering a coverage of 6 the homozygote portion of 132 x for GA2 (see Fig. 3

Annotation of the S. nebaliae draft genome
Repeats in draft genome GA2 were annotated with RepeatMasker v. open-4.0.7, applying more sensitive "slow search". Slightly fewer bases (total of 6,995,276) were masked in GA2 than reported as total span of the repetitive portion in Supplementary Table S2. This discrepancy is probably due to short stretches (less than 10 bp) between identical simple repeats, which are overstretched by annotation, but not masked in the draft genome (see http://www.repeatmasker.org/webrepeatmaskerhelp.html). Phylogenetic trees as inferred from 100 concatenated proteins Table S3. Tabular survey of phylograms prior and after alignment editing and substitution of the outgroup by an ancestral sequence.

Attraction to the platyhelminth and its push-away from other rotifers
Seison nebaliae fell together with P. laevis upon combined removal of the playthelminth and singletons only. We see in this an indication of the other side of the coin to the attraction to the platyhelminth, namely the push-away from the other rotifers. According to this, alignment positions in which S. nebaliae has a private character state due to an exchange would tentatively support a clustering of the other OTUs. However, removal of such positions alone would not affect the