Gonadal transcriptomes associated with sex phenotypes provide potential male and female candidate genes of sex determination or early differentiation in Crassostrea gigas, a sequential hermaphrodite mollusc

Background In the animal kingdom, mollusca is an important phylum of the Lophotrochozoa. However, few studies have investigated the molecular cascade of sex determination/early gonadal differentiation within this phylum. The oyster Crassostrea gigas is a sequential irregular hermaphrodite mollusc of economic, physiological and phylogenetic importance. Although some studies identified genes of its sex-determining/−differentiating pathway, this particular topic remains to be further deepened, in particular with regard to the expression patterns. Indeed, these patterns need to cover the entire period of sex lability and have to be associated to future sex phenotypes, usually impossible to establish in this sequential hermaphrodite. This is why we performed a gonadal RNA-Seq analysis of diploid male and female oysters that have not changed sex for 4 years, sampled during the entire time-window of sex determination/early sex differentiation (stages 0 and 3 of the gametogenetic cycle). This individual long-term monitoring gave us the opportunity to explain the molecular expression patterns in the light of the most statistically likely future sex of each oyster. Results The differential gene expression analysis of gonadal transcriptomes revealed that 9723 genes were differentially expressed between gametogenetic stages, and 141 between sexes (98 and 43 genes highly expressed in females and males, respectively). Eighty-four genes were both stage- and sex-specific, 57 of them being highly expressed at the time of sex determination/early sex differentiation. These 4 novel genes including Trophoblast glycoprotein-like, Protein PML-like, Protein singed-like and PREDICTED: paramyosin, while being supported by RT-qPCR, displayed sexually dimorphic gene expression patterns. Conclusions This gonadal transcriptome analysis, the first one associated with sex phenotypes in C. gigas, revealed 57 genes highly expressed in stage 0 or 3 of gametogenesis and which could be linked to the future sex of the individuals. While further study will be needed to suggest a role for these factors, some could certainly be original potential actors involved in sex determination/early sex differentiation, like paramyosin and could be used to predict the future sex of oysters. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07838-1.

Conclusions: This gonadal transcriptome analysis, the first one associated with sex phenotypes in C. gigas, revealed 57 genes highly expressed in stage 0 or 3 of gametogenesis and which could be linked to the future sex of the individuals. While further study will be needed to suggest a role for these factors, some could certainly be original potential actors involved in sex determination/early sex differentiation, like paramyosin and could be used to predict the future sex of oysters.
Keywords: RNA-Seq, Gonad transcriptome, Early sex differentiation, Sex determination, Irregular hermaphrodite, Oyster Background Sex differentiation is a physiological process triggered by the activation of one/a few sex-determining gene(s). In mammals, sex is attributed to the presence of chromosomes and sex determination corresponds to the activation of a gene that is necessary and sufficient for testis or ovary determination (for review see [38]). This event sets in train a cascade of interactions of differentiatinggenes that, in the end, direct the differentiation into testis or ovary. In this paper sex determination and sex differentiation are considered together since we could not distinguish the sex-determining from the sexdifferentiating genes.
In sequential hermaphrodites, sex determination occurs during the embryonic development, as in gonochoric species. Sequential hermaphrodites are also able to change sex during their adulthood years [4,19,49]. However, it is unclear if these sex changes are only sex reversals (differentiated cells able to switch developmental fate: hypothesis 1) or if they involve a new sex determination (undetermined germ cells recruited within the gonad to initiate a new course of differentiation: hypothesis 2) [19]. The second hypothesis draws on the results of studies carried out in fish (for review see [19]). It is also often implicitly considered in the few existing studies on sequential hermaphrodite molluscs by mentioning sex determination mechanisms in adults [13, 42, 56-58, 62, 64]. This paper will also be based on the second hypothesis.
Mollusca is a phylum of lophotrochozoa encompassing around 85,000 species [50]. Although the number of whole-genome sequencing studies on molluscan species, especially among bivalves and gastropods, have greatly increased over the last decade (for more information, see the Genome Online Database GOLD, https://gold.jgi. doe.gov), only a few of them work on the understanding of the molecular mechanisms of reproduction, especially of sex determination and/or differentiation [7, 8, 13, 18, 20, 22, 28-30, 42, 48, 54, 57, 58, 61, 62, 64]. Yet, information at the molecular level is necessary to understand the evolution of these physiological mechanisms amongst the animal kingdom, and also to control the sex of species of aquaculture interest.
Among molluscs, Crassostrea gigas is the main bivalve product, with 639,030 tons produced in 2017. This species is known as a sequential hermaphrodite [10,15,31] that displays both genetic and environmental sex determination [12,24,31,33,45,52]. The time window of sex determination / early gonadal differentiation in adults C. gigas has been defined between the end of a cycle, when the animals are mature and after spawning, at the beginning of a new cycle (respectively stages 3 and 0 according to [6]), two stages presenting germinal stem/progenitor cells [14] which, under the impulse of sex-determining genes will initiate a new male or female gonadal differentiation [22,45,46,51,53]. However, at both stages, the future sex of the oysters is unknown, especially as the oyster is devoid of identifiable heteromorphic sex chromosomes. Few orthologs of classical genes involved in sex determination/differentiation have been identified by targeted or more recently by largescale studies, such as Foxl2 (Forkhead Box L2), Fem (Feminizer), Wnt-4, Gata-4 (Globin Transcription Factor 4), Run, β-catenin, SoxH (sex-determining region Y boxcontaining H/30, Dsx/Dmrt1-like (Doublesex-and mab-3-related transcription factor 1) and SoxE (Sex-determining region Y box-containing E) ( [20,22,45,46,51,53,63] and 2014 -C. gigas genome v9 GCA_ 000297895.1 [62];). Most of these genes can also be found in other bivalves, ecdysozoan or vertebrates, suggesting that sex-determining/−differentiating pathways share common genes among vertebrates and 'invertebrates' [42,51,52,62,64]. However, the role of most of those genes in molluscs still requires further investigations.
In C. gigas, while almost no functional approach is available, genome-wide molecular studies are very powerful tools to study sex determination/differentiation. They provide large-scale expression patterns, which can be associated to putative roles in these physiological processes. Thus, a microarray-based analysis [22] identified sex-related genes and highlighted the sexually dimorphic gene expression of Foxl2 at stage 3, as previously identified [45] as well as 511 genes expressed at stage 0. Later on, a genomic analysis on mature animals (stage 3) ( [64]; Crassostrea gigas genome v9 GCA_000297895.1) allowed for the identification of two new homologs of sex-determining/−differentiating genes (Dsx/Dmrt1-like and SoxH) and Cg-FoxN5 (Crassostrea gigas Forkhead Box N5), all presenting a malespecific expression. More recently, a gonadal RNA-Seq analysis [62] highlighted some genes clustered in five expression profiles: (i)-highly expressed at stage 1 [such as Dsx/Dmrt1-like, rather highly expressed in males at stage 3 by Zhang et al. [64]]; (ii) -decreasing or increasing expression throughout the oogenesis and spermatogenesis; (iii) -specific expression in female gonads [such as FoxL2 and Malrd1 (MAM and LDL-receptor class A domain-containing protein 1-like)]; (iv)higher expression in male gonads; and (v) increased expression throughout spermatogenesis [such as SoxH, Cg-Sh3kbp1 (C. gigas SH3 Domain Containing Kinase Binding Protein 1, lncRNA (LOC105321313; long non-coding RNA) and uncharacterized LOC105345697]. However, apart from a few rare disagreements on the expression patterns, most of these previous studies are limited as follows: (1) the scarcity of information on expression at stage 0, (2) the impossibility to associate expression patterns to the future sex as it is unknown in this sequential and irregular hermaphrodite species and (3) the lack of information about ploidy, while it is naturally variable and it is known to modify the oyster's gametogenesis [20].
A principal component analysis (PCA) was performed on all 10,061 genes of the 18 gonads sampled to assess the homogeneity of the whole dataset and the degree of correlation of gene expression patterns at sex and gametogenic stages. The first two principal components explained 43.1% of the total variance. The scatter plot ( Fig. 1) based on expression patterns of the expressed genes showed that transcriptional profiles are rather clustered by gametogenetic stage (determined by histology) than by sex (determined by long-term monitoring of sex phenotypes). A volcano plot used as a function of gametogenesis stages (Fig. 2) showed a good scattering of dots, thus demonstrating a significant differential gene expression according to gametogenetic stages.

Differential gene expression analysis
A Differential Expression Analysis (DEA) was conducted in order to identify sex-and stage-specific gene expression patterns. It revealed a total of 9723 genes (96.64% of the 10,061 gonadic genes) significantly differentially expressed either between gametogenetic stages (9582 genes; 95.24%) or males and females (141 genes; 1.40%) with a FDR < 0.05 (Fig. 3). Among genes with a sex-specific expression, 98 (69.50%) showed a significantly higher expression in females and 43 (30.50%) in males. When data from both DEA were coupled, 84 genes (0.86%) were significantly differentially expressed between gametogenetic stages and sexes (Additional file 1). These genes were classified by stage with 13, 27 and 44 genes showing a higher differential expression at stage 0, 1 and 3, respectively, and by sex with 46 and 38 genes showing a higher differential expression in females and males respectively ( Table 2). Fifty-seven of them (67.9%) were significantly highly expressed in one sex during the time-window of sex determination/early gonadal differentiation which extends from stage 3 of 1 cycle to stage 0 of the next cycle.
The heat map based on log 10 (FPKM; Fragments Per Kilobase of transcript per Million mapped reads) of DEA ( Fig. 4) revealed that hierarchical clustering grouped the genes into five clusters and samples into three clusters: gonads at early female gametogenetic stage (F_st0 and F_st1), at early male gametogenetic stage (M_st0 and M_st1) and gonad at late gametogenetic stage regardless of gender (F_st3 and M_st3). The expression profiles appeared more homogeneous between the first two stages of a sex than between the two sexes of the same stage.

Gene ontology analysis
A gene ontology (G0) annotation was performed at level 2 among the 84 genes resulting from the DEA (Fig. 5). A total of 47 genes (56%) had eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) assignments and 49 genes (58.3%) were assigned with at least one GO term. Within the "biological process" category, the main GO terms were grouped in metabolic (21%) and cellular processes (18%). Binding (23%) and catalytic activity (19%) were predominantly assigned to "molecular functions". In the category "cellular components", the largest proportion of GO terms referred to the membrane (40%). A GO enrichment analysis was conducted to identify the over-representation of GO terms based on sex-and stage-biased differentially expressed genes. However, no enrichment of GO terms was found.
Expression patterns of candidate genes identified from the DEA, measured by real-time qPCR during the entire gametogenetic cycle RT-qPCR were performed on seven genes (Table 3) selected from the 84 relevant ones (in addition to FoxL2; GenBank Accession Number: FJ768956.1) in order to support the RNA-Seq analysis and to explore their temporal expression during an entire gametogenetic cycle (stages 0, 1, 2 and 3 for males and females). They were chosen because, by RNA-Seq, they exhibited a significantly higher expression during the sex-determining period (i.e. stages 0 or 3) which also revealed a dimorphism between sexes. In other words, these seven genes emerged from the DEA of the RNA-Seq as having a significantly different expression between gametogenetic stages (0 versus 1, 0 versus 3 or 3 versus 1 ie exhibiting a significant peak of expression in stage 0 or 3, the sexdetermination period) and also between sexes (significantly higher in male or in female). According to these criteria and based on the literature, we selected (1) one gene significantly highly expressed in stage 0 female (Predicted: paramyosin), (2) three significantly highly expressed in stage 3 male (Trophoblast glycoprotein-like, Protein singed-like, Hypothetical protein) and (3) three significantly highly expressed in stage 3 female [protein PML-like (Protein ProMyelocytic Leukemialike), MAM and LDL-Receptor protein 1-like, Gata protein 14-like]. FoxL2 was used as a gold standard. EF1α, with a STD of 0.72 and with a r of 0.861, was considered as stable and was therefore chosen as housekeeping gene. As expected, for each primer pair, no DNA was amplified in the negative controls (sterile water and non-RT DNA) (Results not illustrated).
These seven genes exhibited a dimorphic expression by RT-qPCR, especially more significantly at stage 3 ( Fig. 6 and Additional file 2). Four of them increased their expression over the course of gametogenesis in females (by a factor of 3 to 150; CGI 10011004-FoxL2, CGI 10018971-Protein PML-like, CGI 10025872-MAM and LDL-receptor class A domain-containing protein 1like and CGI 10008094-GATA zinc finger domaincontaining protein 14-like) and four in males (by a factor of 40 to 2000; CGI 10006800-Trophoblast glycoproteinlike, CGI 10016132-Protein singed-like and CGI 10026009-Hypothetical protein). For the "female genes", the expression of CGI 10011004 (FoxL2) and CGI 10018971 (Protein PML-like) sharply increased between stages 2 and 3, while the increase was more progressive for CGI 10025872 (MAM and LDL-receptor class A domain-containing protein 1-like) and CGI 10008094 (GATA zinc finger domain-containing protein 14-like). For the "male genes", the expression sharply increased from stage 2 for CGI 10006800 (Trophoblast  Venn diagram within gonad transcriptome for genes significantly differentially expressed between gametogenetic stages and sexes. 9582 genes significantly differentially expressed between gametogenetic stages are indicated in blue. 141 genes significantly differentially expressed between sexes are indicated in orange glycoprotein-like) and CGI 10016132 (Protein singedlike) and from stage 1 for CGI 10026009 (Hypothetical protein) onwards. One gene (CGI 10028666 -PREDIC TED: paramyosin) was more highly expressed in females than in males at stage 0 (× 3900) and its expression collapsed from stage 1 onwards.

Discussion
The aim of our study was to shed light on the sexspecific molecular events that take place during sex determination/early gonadal differentiation in oysters C. gigas. For this purpose, we identified, by RNA-Seq analysis, genes that were significantly differentially expressed between "true males" and "true females" at the time of these physiological processes. To be able to select, for the first time, these "true males" and "true females" in such a sequential irregular hermaphrodite, it was necessary to work on animals whose sex phenotypes have been known for several consecutive years and in particular oysters, which never changed sex all along the course of study. This allowed us to develop the experimental strategy discussed below.
Experimental strategy: first association of molecular expression patterns over the entire time-window of sex determination/early gonadal differentiation to future sex phenotypes The challenge was compounded: -(1) highlight relevant genes covering the entire sex-determining/early gonadal differentiating period, which extends from stage 3 of one gametogenetic cycle when animals are matures to stage 0, just after spawning when the animals start a new cycle [22,45,46,51,53]. -(2) link each molecular expression pattern to a sex phenotype, i.e to the most statistically likely future sex of the animal, while this future sex is usually impossible to determine because C. gigas is a sequential irregular hermaphrodite.
Previous studies carried out on this topic in C. gigas are scarce and rather focused on mature animals [22,62,64] than on oysters at stage 0 [22]. Moreover, the gene expression patterns were never associated with the future sex of the animals. The same limitation exists for the very few large-scale molecular studies on sequential hermaphrodite oysters C. hongkongensis [58] and Pinctada margaritifera [56,57], conducted on animals whose future sex was unknown. These limits also exist for studies made in simultaneous hermaphrodites, such as  Argopecten purpuratus [7]. In Nodipecten subnosus, Galindo-Torres et al. [28] worked on undifferentiated juvenile gonads without discriminating against future sexes. They therefore adopted a targeted approach, looking for known homologs or genes containing key words "sex determination/differentiation", "development", "germ line", "spermatogenesis" or "oogenesis".
In the present study on C. gigas, transcriptomes were made from gonads of individuals (i) at stages 0 and 3, the time-window of sex determination/early gonadal differentiation and at stage 1, right afterwards, for comparative purposes and (ii) identified as males and females that had not demonstrated a sex change during the first four years of their life [10]. The challenge was to assign the most likely future sex phenotype to each oyster in order to better explain the expression patterns, at stages 0 and 3, when the future sex cannot be determined. To this purpose, we worked on 4-year-old oysters for two main reasons: (i) the older the oysters get, the less they change sex and (ii) less than 3% of the 4year-old oyster population changed sex [10]. To strengthen our results, we also made sure that oysters that went through the RNA-Seq pipeline were all diploids (results not shown), while it was never determined/ mentioned in the papers cited above. Yet in C. gigas, ploidy can vary naturally [1] and it is also known to modify gametogenesis and transcriptomes [20].
Genes differentially expressed between sexes during the sex-determining/−differentiating time window in C. gigas Our analysis provided a list of 57 genes significantly differentially expressed between gametogenetic stages and sexes, with a significantly higher expression during the sex-determination time-window (stages 0 and 3) ( Table 2 and Additional File 1). Amongst the 13 genes highly expressed in stage 0, 12 could be associated with a future female sex and 1 with a male sex. In the same way, amongst the 44 genes highly expressed at stage 3, 18 could be associated with a future female sex and 26 with a male sex. To our knowledge, this is the first time that genes could be linked with the future sex in C. gigas or in a hermaphrodite mollusc, thanks to our experimental strategy (see 1st paragraph). Amongst all these genes, some could certainly be original potential actors involved in sex determination/early sex differentiation, but not all of them. Indeed, in stage 0, when the gonad restructures after spawning, among the physiological processes that take place, we can count for instance the regression of gonadal tubules and the reformation of the storage tissue associated to an increase of circulating haemocytes [6,25]. The stage 3 is the last stage of gametogenesis before spawning. It corresponds to the mature reproductive stage where spermatids and spermatozoa in males or pre-vitellogenic and vitellogenic oocytes in females constitute the majority of the germ cells present in the tubules [6,27]. However, at this stage, putative germinal stem cells are also present at the edge of the tubules [14] which, under the impulse of sex-determining genes will initiate a new male or female gonadal differentiation while animals are still mature and ready to spawn [45,46,51,53]. Thus, further investigations will be necessary to precisely characterize the 57 factors we found and define their spatial and temporal expression in order to determine which ones are involved in the oyster's sex determination or early differentiation. The genes resulting from the DEA and studied by RT-qPCR were segregated on the basis of the gametogenetic stage presenting their highest expression. To our knowledge, most of them had not been previously identified in C. gigas or in other molluscs.
Among the statistically dimorphic genes, those expressed at mature stage (stage 3) highlighted two different patterns.
The first pattern showed a regular increase of expression following gametogenesis from stage 1 to stage 3, in males for CGI 10026009 (Hypothetical protein) and in females for CGI 10008094 (GATA zinc finger domaincontaining protein 14-like) and CGI 10025872 (MAM and LDL-receptor class A domain-containing protein 1like). The hypothetical protein (CGI 10026009) blasted with the mammalian protein TESK2 (Testis Associated Actin Remodelling Kinase 2), a testis-specific protein kinase 2 suggested to be involved in meiotic stages and/ or early stages of spermiogenesis in rat [26]. Its expression in C. gigas is in line with such a role. The femalespecific expression increasing over gametogenesis observed in C. gigas for CGI 10025872 (MALRD1, MAM and LDL-receptor class A domain-containing protein 1like) was similar to the one obtained by RT-qPCR for the so called ovarian gene Cg-MALRD1-like by Yue et al. [62]. Although this factor is rather expressed in the digestive tract in mammals, Yue et al. [62], based on their results in C. gigas gonad, suggested that the Cg-Malrd1like gene might mediate signal-cell interactions to activate the oogonia or oocyte DNA replication process. Concerning GATA zinc finger domain-containing protein 14-like (CGI 10008094), it blasted with the mammalian Gata-4. This latter protein is present in bipotential gonads of XX and XY mouse embryos and is markedly down-regulated shortly after the histological differentiation of the ovary, suggesting an involvement in early gonadal development and possibly sexual dimorphism [60]. Its role in C. gigas still needs to be elucidated. To conclude, a role of these genes in sex determination/ early differentiation seems unlikely given their patterns of expression during gametogenesis, which do not show a significant peak of expression in stage 3 compared to earlier stages, neither in males nor in females.
The 2nd pattern observed in C. gigas in the present study showed a sudden peak of expression at stage 3, for males for CGI 10016132 (Protein singed-like) and to a lesser extent for CGI_10,006,800 (Trophoblast glycoprotein-like) and in females for CGI 10018971 (PML-likeprotein) and CGI_10,011,004 (FoxL2), which is the gold standard which exhibits the same expression pattern as the one reported by Santerre et al. [53]. This female expression of gene encoding PML-like-protein is in line with previous studies. Indeed, in C. elegans, the PML protein (sub)-nuclear localization seems to be affected by SUMO (Small Ubiquitin-like Modifier), required for gonadal and uterine-vulval organogenesis [9]. In mice, the PML transcript isoform II was found in mature sperm and the isoforms I and II in oocytes, suggesting that mature gametes may carry the transcripts to the embryo [23]. In mice, once again, Hadjimichael et al. [32] showed that over-expression of the PML gene in embryonic stems cell lines delays cell differentiation, suggesting its essential role as regulator of stem cell pluripotency and somatic cell reprogramming. In C. gigas, in the current state of knowledge, it is impossible to discriminate, for the PML-like-protein, a role in late gametogenesis (spermiogenesis) or in very early events preparing the new cycle (sex determination/early sex differenciation), both occurring in mature gonads at stage 3. Now concerning the protein singed-like, whose gene is homologous to the mammalian fascin, it is required, in drosophila, for actin filament bundle formation in the cytoplasm of nurse cells during oogenesis [11]. In mice, Tubb et al. [59] reported a testis fascin (FSCN3, Fascin Actin-Bundling 3), specifically expressed in the elongated spermatids and remaining in mature spermatozoa. The expression found in the present work in C. gigas does not fit exactly with such localization as many elongated spermatids are already present at stage 2, but this gene expression only increased after this stage. The expression of CGI 10006800 (Trophoblast glycoprotein-like) although always dimorphic in favour of the males, also highlighted a peak of expression at stage 3. A homolog of the Trophoblast glycoprotein, called Waif1a (Wnt-activated inhibitory factor 1a), was reported, in zebrafish and xenopus male embryos as well as in mammalian cells, to inhibit the canonical Wnt (Winnt)/β-catenin signalling involved in female sex-determination [37]. The clear male dimorphic expression of this gene in C. gigas is in line with its likely role in the inhibition of the Wnt/β-catenin signalling, notably at stage 3, during the sex determination/early differentiation time-window. To conclude, further investigations need to be carried out on these three genes to elucidate their role in C. gigas gonads. However, an involvement in sex determination/ early differentiation cannot be ruled out considering their statistical dimorphic expression and their sudden and significant peak of expression in stage 3 compared to earlier stages. Foxl2, once again, appears as a very likely candidate for female sex determination in C. gigas.
Finally, one gene showed a peak of expression at stage 0 followed by a sharp decrease, CGI 10028666 (PRE-DICTED: paramyosin) in females only. In molluscs, paramyosin is a protein found in muscles [16]. In drosophila, it was suspected to up-regulate the expression of Tra2 (Transformer 2) in males, outside the sex determination pathway, as an off-target effect [2]. In cephalobidae nematods and in C. elegans, paramyosin is a structural component of the female gonad which plays an essential role in ovulation [5,47]. Both roles do not fit with the expression found in the present study, in females and at the very beginning of the gametogenesis. Interestingly, paramyosin of Schistosoma mansoni is homologous to Taenia solium antigen B, an immunogenic protein with anti-complement activity [40]. In vertebrates, a H-Y antigen is expressed in the heterogametic sex regardless of whether male or female. Its role, although still controversial, was notably suggested to be associated with a common denominator underlying the development of mammalian testes and avian ovaries which enhanced the growth rate of the dominant heterogametic gonad at a critical stage of development [44]. In mammals, such serologically detectable male antigens could also be associated with testis activity or spermatogenesis and may be antigenic when expressed in females [55]. In fish, some observations suggest that it may play a secondary role in primary sex determination but may well be very important for subsequent differentiation or re-differentiation [19]. Although it clearly appears that the role of paramyosin has to be elucidated in C. gigas, its female-specific expression at stage 0 during the oyster's sex determination/early differentiation time-window points to a possible important role in this process.

Conclusions
To conclude, 57 genes highly expressed in stage 0 or 3 of gametogenesis and which could be linked to the future sex of the individuals emerged from the present work. Several of them could certainly be original potential actors involved in sex determination/early sex differentiation, like PREDIC TED: paramyosin (CGI 10028666), Trophoblast glycoprotein-like (CGI_10,006,800), protein Singed-like (CGI 10016132) and PML-like-protein (CGI 10018971), also studied here by RT-qPCR. Amongst them, some may have direct effects in the sex determination/early differentiation pathway and others could indirectly be associated with sex determination/ early differentiation time-window in C. gigas. However, because little information is available on most of these factors for C. gigas and other molluscs, further investigations will be essential to clarify their homologies, as well as their spatial and temporal expression and their roles in the oyster's gonad.

Sampling and animal selection strategy
The C. gigas oysters are from a same full-sib family produced at the Ifremer facilities from a wild broodstock in March 2013 (for more information see [3]). Seven thousand four hundred and eighty-eight oysters were monitored in this environment and sexed since then (for more information see [10]). Twenty-four of them were individually selected according to their sex phenotype based on the 4th previous years and on their gametogenetic stage. They were sampled in January, February and June 2018, and their gonads were collected for RNA extractions (snap frozen in liquid nitrogen) and histology (fixed in a Davidson's fixative solution). Gills were also collected and conserved in 70% ethanol for flow cytometry analysis. This allowed us to verify that the chosen oysters were diploids, as ploidy can be very plastic in this species [41]. The gametogenetic stages were determined a posteriori based on the criteria described by Heude-Berthelin et al. [34]: Stage 0 (post-spawning undifferentiated cells), stage 1 (gonia proliferation), stage 2 (maturation; spermatogenesis or oogenesis) and stage 3 (ripe gonads before spawning; undifferentiated cells). Histologically, during one cycle, sex can traditionally only be determined from the end of stage 1 (therefore not in stage 0) and sex for the next cycle can never be predicted because C. gigas is a sequential irregular hermaphrodite. Thus, in order to be able to "discriminate between future males and future females", an asset of our study was to work on oysters which had had no sex change over the 4 previous years, with the aim to assign a "very likely" future sex phenotype to these "true males" and "true females". Indeed, two arguments support this choice: (i) the older the oysters get, the less they change sex; (ii) less than 3% of the 4-year-old oyster population changed sex ( [10], see supplementary material). Hence, nine males and nine females were chosen at stages 0, 3 and 1 (3 individuals per stage) for RNA-Seq analysis. The first two stages correspond to the time-window of sex determination/early sex differentiation in this species   22,45,46,51,53] and the last one is for comparison (Fig. 7). Quantitative Real time PCR (RT-qPCR) were also performed on twenty-four animals (3 individuals per stage and per sex) covering the entire gametogenetic cycle, i.e. stages 0, 1, 2 and 3 (Fig. 7).

RNA preparation, cDNA library construction and Illumina sequencing
The total RNA from 18 gonads (3 males and 3 females per stage; stages 0, 1 and 3) were individually extracted using TRI Reagent (Sigma®), treated with TURBO™ DNAse (Invitrogen) to remove genomic DNA and purified with Direct-zol™ RNA MiniPrep (Zymo Research) following the manufacturer's protocol. RNA concentration, integrity and purity were assessed on agarose gels (1%) and using NanoDrop 2000 (Thermo Scientific), Qubit® 2.0 Fluorometer (Invitrogen) and Bioanalyzer 2100 (Agilent Technologies). Complementary DNA libraries were made from the total RNA of the 18 individual samples, nine per sex, conformed to the required purity criteria (A260/A230 and A260/A280 > 1.8) and quality levels (RNA Integrity Number, RIN > 8) for cDNA library preparation. Complementary DNA libraries were constructed by means of the LIGAN-PM platform (Lille, France) using TruSeq RNA Library Prep kit v2 (Illumina) in line with the manufacturer's recommendations. Briefly, per library, approximately 1 μg of the total RNA sample was purified using oligo-dT beads, followed by a fragmentation with Elute Prime, Fragment Mix (Illumina). First-strand cDNA was generated by reverse transcription with First Strand Master Mix (25°C for 10 min, 42°C for 50 min and 70°C for 15 min) and the synthesis of second-strand cDNA was performed in the presence of Second Strand Master Mix and dATP, dGTP, dCTP, dUTP mix (16°C for 1 h). Purified fragmented cDNA was incubated in presence of End Repair Mix (30°C for 30 min), then adenylated (3′ end) by addition of A-Tailing Mix (37°C for 30 min) and mixed to RNA index adapter and ligation mix (30°C for 10 min). The final purification step was performed using AMPure XP beads (Beckman Coulter). Several rounds of PCR amplification were performed to enrich the cDNA fragments, prior to the purification of the PCR products. The libraries' quality was assessed by checking the distribution of the fragments size using the Agilent bioanalyzer DNA 1000 (Agilent Technologies) and the libraries were quantified by RT-qPCR (KAPA Library Quantification Kit, Roche). The resultant cDNA libraries were paired-end sequenced (150 bp paired-end reads generated) on an Illumina HiSeq™ 4000 using the LIGAN-PM platform (Lille, France). Using CLC Genomics Workbench 11.0.1 software (Qiagen bioinformatics), a main component scatter plot (PCA, "Principal Component Analysis") was performed based on the original expression values of the 10,061 genes expressed in the 18 gonadal samples before the differential expression analysis. Using the same metadata and software, a volcano plot was created to check for outlying samples of poor quality that need excluding from further analysis.

Sequence and differential expression analysis (DEA)
All the analyses were conducted using the CLC Genomics Workbench 11.0.1 software (Qiagen bioinformatics). First, the Illumina reads were trimmed at the ends by quality scores, on the basis of the presence of ambiguous nucleotides (poly-N) (maximum value set at 2) using a modified version of the Mott algorithm (the quality limit was set at 0.05), as default sets on CLC Genomics Workbench software. The "RNA-Seq analysis" option in the software was used to map reads to the genome reference of Crassostrea gigas (genome v9 [63];; Assembly GCA_ 000297895.1; https://www.ncbi.nlm.nih.gov/assembly/ GCF_000297895.1/) at the following settings: sequencing reads should be uniquely matched to the genome reference allowing up to two mismatches, without any insertion or deletion. The length fraction was at 0.8, minimum similarity was at 0.9 and the maximum number of hits pear read was 10. A differential gene expression analysis between different sexes and gametogenetic stages was performed using a model based on the negative binomial distribution. Only genes with a false discovery rate (FDR) < 0.05 and a fold change > 1.5 (p-value < 0.05) were considered as differentially expressed. All genes were functionally annotated by searching sequence similarities in the Swiss-Prot and Nr (NCBI, National Center for Biotechnology Information) databases using the CLC Genomics Workbench 11.0.1 software (Qiagen bioinformatics) with an e-value cut-off at 1e − 5 . A Venn diagram was created to identify genes significantly differentially expressed between gametogenetic stages and sexes within C. gigas gonad transcriptome and was supplemented by a hierarchical cluster analysis performed using the "pheatmap" package from R. Finally, the results were imported into Blast2GO [17] and Gene Ontology (GO) terms at level 2 and an eggNOG annotation [35] with hit max at 20 were assigned.

Quantitative real-time PCR (RT-qPCR)
In order to support the RNA-Seq data and to provide annual expression patterns for some relevant genes, RT-qPCR was undertaken. Seven genes that were differentially expressed based on DEA results, were selected for gene expression analysis in different stages of the gametogenetic cycle in both sexes. FoxL2 was also used here as a gold standard as its expression has already been studied in C. gigas by RT-qPCR ( [45,53]; GenBank Accession Number: FJ768956.1). Gene-specific primers were designed using Primer3 (v. 0.4.0) and custom synthesized by Eurogentec (France). Their sequences are listed in Table 4. The total RNA of twenty-four gonads (3 males and 3 females per stage; stages 0, 1, 2 and 3) were used for this experiment. They were first individually treated with DNAse I RQ1 (Promega) and purified with the NucleoSpin® RNA clean-up kit (Macherey-Nagel) in line with the manufacturer's recommendations. Then cDNA was synthesized using M-MLV Reverse Transcriptase (Promega) with oligo (dT) 15 Primer (Promega) in line with the manufacturer's recommendations. The amplifications were conducted on a CFX96 Real-Time PCR Detection System (Biorad) using GoTaq® qPCR Master Mix (Promega). The cycling parameters were: 95°C for 5 min, then 45 cycles of 95°C for 15 s and 60°C for 45 s. The PCR efficiency of each primer pair was determined based on a five-point standard curve generated from a two-fold dilution series (Additional File 3). Negative controls (total RNA not reversetranscribed), blank controls (sterile water) and interplate controls were performed on each microplate. All PCR reactions were performed in technical triplicates. A melting curve analysis was performed to verify the specificity of each primer. The elongation factor 1-alpha gene (EF1α, GenBank Accession Number: AB122066) was used as a reference gene for gene expression normalization of targeted genes. The high stability of this housekeeping gene was previously identified by Dheilly et al. [21,22] and was further confirmed in. the present study by using the software BestKeeper © . Generally, reference genes with raw standard deviation (BKSTD) < 1 and with a coefficient of correlation (r) close to 1.0 are considered the most stable genes [39]. Relative gene expression was calculated using the 2-ΔΔCT method [43]. Statistical bilateral Wilcoxon tests were performed at each gametogenetic stage, in order to test whether the differences in gene expression were significant depending on the sex. Analyses were performed on R® (× 64 version, 3.2.2) with significance at p < 0.05.
Additional file 1 Eighty-four genes identified in the C. gigas transcriptome by RNA-Seq Differential gene Expression Analysis (DEA): Name, NCBI gene ID, fold change (male vs female), p-value, FDR p-value, BLAST e-value, description. The 57 genes specifically differentially expressed between sexes in stage 0 or 3 are highlighted in bold.
Additional file 2 Statistical bilateral Wilcoxon tests performed for RT-qPCR experiments at each gametogenetic stage between sexes. Significance at p < 0.05.
Additional file 3. Primers used for RT-qPCR: list, sequences, number of bases, blasted name of the gene amplified, R 2 and % of PCR efficiency.