Organellar genome comparisons of Sargassum polycystum and S. plagiophyllum (Fucales, Phaeophyceae) with other Sargassum species

Sargassum polycystum C. Agardh and Sargassum plagiophyllum C. Agardh are inhabitants of tropical coastal areas, their populations are negatively influenced by global warming and marine environment changes. The mitochondrial and chloroplast genomes of these species have not been sequenced. The mitochondrial genomes of S. polycystum and S. plagiophyllum were 34,825 bp and 34,862 bp, respectively, and their corresponding chloroplast genomes were 124,493 bp and 124,536 bp, respectively. The mitochondrial and chloroplast genomes of these species share conserved synteny, sequence regions and gene number when compared with the organellar genomes of other Sargassum species. Based on sequence analysis of 35 protein-coding genes, we deduced that S. polycystum and S. plagiophyllum were closely related with S. ilicifolium; these species diverged approximately 0.3 million years ago (Ma; 0.1–0.53 Ma) during the Pleistocene period (0.01–2.59 Ma). Rates of synonymous and non-synonymous substitutions in the mitochondrial genome of the Sargassum genus were 3 times higher than those in the chloroplast genome. In the mitochondrial genome, rpl5, rpl31 and rps11 had the highest synonymous substitution rates. In the chloroplast genome, psaE, rpl14 and rpl27 had the highest synonymous substitution rates. Phylogenetic analysis confirms the close relationship between the two sequenced species and S. ilicifolium. Both synonymous and non-synonymous substitution rates show significant divergence between the group of mitochondrial genomes versus the group of chloroplast genomes. The deciphering of complete mitochondrial and chloroplast genomes is significant as it advances our understanding of the evolutionary and phylogenetic relationships between species of brown seaweeds.

are economically and ecologically important in tropical and temperate coastal areas [2] as they can form seabed forests that play a critical role in the ecology of tropical marine environment [4,5]. Sargassum provides resources for seaweeds fertilizers, alginate, cosmetics, etc., and exhibit antibacterial, antiviral, anti-oxidant [6], and antifungal activities [7].
Deciphering of complete mitochondrial and chloroplast genomes are prerequisites for understanding the evolutionary and phylogenetic background of brown seaweeds [8,9]. Both mitochondria and chloroplasts are semi-autonomous organelles that preserve their own genetic information [10]. Mitochondria are the main site for respiration and energy supply, whereas chloroplasts are the site of photosynthesis [11]. Generally, organellar genomes reflect the evolutionary history of the algal nuclear genome [12,13]. Nevertheless, limited information on the organellar genomes of the tropical Sargassum species hinders our understanding of their evolutionary history. Sargassum polycystum and S. plagiophyllum are morphologically distinct; they are considered two different species that occur in sympatry across the Sargassum subgenus Sargassum section Polycystae [14]. Nevertheless, molecular phylogenetic analyses far did not resolve the complexities of the two morphologically different species [15,16], reflecting their very recent divergence.
Through the sequencing of complete chloroplast and mitochondrial genomes of S. polycystum and S. plagiophyllum, and by comparing them with those of other Sargassum species, we detected variable regions that may be targets for future phylogenetic analysis.

Mitochondrial genome characteristics
The mitochondrial genomes (Fig. 1) of S. polycystum and S. plagiophyllum comprised 34,825 and 34,862 bp, respectively, and were within the size range of previously sequenced mitochondrial genomes (Table 1) of other Sargassum species [17] and Stramenopiles [18,19]. The mitochondrial genome of S. polycystum and S. plagiophyllum contained 25 tRNA (transfer RNA) genes, 3 rRNA (ribosomal RNA) genes, 37 protein-coding genes (PCGs). In S. plagiophyllum, there was an intergenic region located between rrn5 and rns. In S. plagiophyllum, only ORF (open reading frame) 129, rpl16, rps3, rps19, rpl2 and tatC were transcribed from the light strand (L-strand) (clockwise direction in Fig. 1), and the other genes from the complementary heavy strand (H-strand) (anticlockwise in Fig. 1). In S. polycystum, the genes of rpl16, rps3, rps19, rpl2 and tatC were transcribed from the L-strand, and the other genes from the complementary H-strand. In these two Sargassum species, the protein-coding regions analogously accounted for 77.50% of the total mitochondrial genome, whereas the non-PCGs regions only accounted for 22.50%. The total GC content ( Fig. 1 The mitochondrial genome maps of S. polycystum and S. plagiophyllum. The annotated genes are colored according to the functional categories. Genes on the inside are transcribed in the clock-wise direction, whereas genes on the outside are transcribed in the anticlockwise direction. The ring of bar graphs on the inner circle shows the GC content in dark gray mitochondrial genomes were similar to those in other Sargassum species (Table 1).

Chloroplast genome characteristics
The chloroplast genomes of S. polycystum and S. plagiophyllum were 124,493 bp and 124,536 bp, respectively, whereas the size of the chloroplast genome in the other Sargassum species (namely S. horneri, S. thunbergii, S. fusiforme, S. confusum and S. muticum) ( Table 2) ranged from 124,068 to 124,592 bp. The two assembled chloroplast genomes each had a total of 172 genes, which included 139 PCGs, 27 tRNA genes, and 6 rRNA genes (Fig. 2) In addition, there were no introns observed in the tRNA regions of the chloroplast genomes of S. plagiophyllum and S. polycystum, and only one intron was observed in the five other Sargassum species compared here. The features of the chloroplast genomes of these two Sargassum species and five other Sargassum species were summarized in Table 2.

IR boundary analysis
Although the chloroplast genomes of the Sargassum species studied were highly conserved, structural variations were prevalent in the IR boundary region (Fig. 3). In S. polycystum and S. plagiophyllum, the cbbx gene spanned the junction between the LSC and IRb regions; this gene extended 721 bp into the LSC and 185 bp into the IRb regions. At the junction of the SSC-IRb (JSB) regions, the rrn5 gene was located in the IRb region, 181 bp (S. polycystum and S. plagiophyllum) from the junction; the ycf19 gene was located in the SSC region, 416 bp (for S. polycystum) and 406 bp (for S. plagiophyllum) from the junction. The rpl21 gene spanned the junction of the   To determine the differences in the chloroplast genomes of Sargassum species, we compared the IR/ LSC/SSC boundary regions with those in five other Sargassum species. We found that they all share the same IR/LSC/SSC regions, with only minor differences in the length of each region (Fig. 3). The IR regions of S. horneri, with the shortest chloroplast genome, were 234 bp and 254 bp shorter than S. polycystum and S. plagiophyllum respectively. Remarkably, the rpl21 gene in S. polycystum was much shorter compared to the other species and overlapped only by 19 bp with IRa, whereas in the other species it overlapped with IRa by 87-89 bp.

Substitution rate estimations
We compared the non-synonymous (dN) and synonymous (dS) substitution rates of the sequences of 35 and tatC showed the highest dN (0.0996 and 0.0936, respectively) and moderate dS (0.6769 and 0.4832, respectively) values. Atp9 exhibited the lowest dN values with 0.0028. Nad4L and nad5 had identical dS value (0.4899) and dN value (0.0029), which reflected the similar degree of selection acting upon these two genes among these Sargassum species. The dN/dS ratios were different for the 35 PCGs in the mitochondrial genome [see Additional file 5]; the low dN/dS ratios of nad4L and nad5 (dN/dS = 0.0069) implied their slow evolutionary rate. The higher dN/dS ratio in tatC (0.2175) reflected the unique selection process (dN/dS < 0.25) of the 35 PCGs in the mitochondrial genome. Moreover, we verified that the dN/dS ratios of five genes (namely petJ, rpl21, psaD, rpoC2 and thiS) in the chloroplast genomes were greater than one and that the dN/dS ratios of the remaining genes were lower than one. In the 114 chloroplast PCGs [see Additional file 6], rpoC2 and thiS exhibited the highest dN/dS ratios with 3.40 and 3.48, respectively, whereas psbH, psbJ, psbT, rpl29, atpH and petD exhibited the lowest and identical dN/ The dS values of the mitochondrial genes varied from 0.1999 to 0.6769 and those of the chloroplast genes varied from 0.0196 to 0.2615 (Fig. 4). The dN values of the mitochondrial genes varied from 0.0028 to 0.0996 and those of the chloroplast genes varied from 0.0001 to 0.118. Compared with the chloroplast genes, most mitochondrial genes had higher dS and dN values. The average dN and dS values of the mitochondrial PCGs are approximately 3.6 and 3.2 times higher than those of the chloroplast PCGs, respectively. The non-synonymous substitution rates of most PCGs were 0.00-0.02, which accounted for 89.5% of all PCGs from the chloroplast genome.
The average dN/dS ratios of the mitochondrial genes were lower than that of the chloroplast genes (0.08 for the mitochondrion and 0.16 for the chloroplast). The dN/ dS ratios of the two organellar genomes from the seven selected Sargassum species were less than 0.25.

Phylogenetic analysis and divergence time of brown algae
A 26,360 bp nucleotide alignment was obtained by concatenating 35 shared PCGs from the mitochondrial genomes of thirteen Sargassum species, additionally Turbinaria ornata (Sargassaceae) and Fucus vesiculosus (Fucaceae) were selected as outgroups. This alignment was used to infer phylogenetic relationships between Sargassum species using Maximum Likelihood (ML) and Bayesian Inference (BI) approaches. The ML and BI tree were identical in topology; the ML bootstrap support percentages and BI posterior probabilities were shown in Fig. 5. The Sargassum clade had maximum support, with infrageneric relationships all having high to maximum support (ML bootstrap ≥94%, BI posterior probability = 1). All Sargassum subgenera and sections were monophyletic. The Polycystae clade (highlighted in Fig. 5), which consisted of the two species sequenced in this study, was sister to the Ilicifolia clade, which only had a single representative in the tree (i.e., S. ilicifolium).

Discussion
Although the number of rRNA genes and PCGs was identical across the five Sargassum species, the trnL-UAA and trnK-UUU tRNA genes were lost in the chloroplast genomes of S. plagiophyllum and S. polycystum, respectively. Plastid gene loss has been observed in Dictyochophyceae [20], Eustigmatophyceae [21] and Synurophyceae [22], indicating that gene loss was widespread in Stramenopiles, and these genes were lost during evolution. Fig. 4 The difference in synonymous and non-synonymous divergence rates between chloroplast and mitochondrial protein-coding genes. The alignments of 114 chloroplast genes (orange) and 35 mitochondrial genes (yellow) from seven Sargassum species (five previous reported and two newly sequenced) were used to estimate the divergence values The chloroplast genome of S. plagiophyllum was 43 bp longer than that of S. polycystum. We suspected that this was caused by expansion of the IR regions, which caused the enlargement of the spacing regions between the trnA-TGC and rnl genes. In S. plagiophyllum, IRa and IRb were each an extra 10 bp longer, the LSC region had an extra 19 bp, and the SSC region had an extra 4 bp. The different degrees of IR regions expansion from the chloroplast quadripartite genomes may be related to species diversification.
The dN/dS ratio can be used to estimate the type and degree of selective pressure acting on a proteincoding gene [23]. In our comparison of the organellar genomes data, both synonymous and non-synonymous substitution rates for the mitochondria-encoded genes were more than 3 times higher than those for the chloroplast-encoded genes. The mitochondrial genes in brown algae exhibited a higher mutation rate than the chloroplast genes [24]. Moreover, both nad4L and nad5 in the mitochondrial genome were involved in the synthesis of Nicotinamide adenine dinucleotide dehydrogenase. They showed similar dN/dS (=0.0069) values, which was possibly driven by their involvement in the same or similar functions. The dN/dS values (dN/ dS > 1) of five genes (namely petJ, rpl21, psaD, rpoC2 and thiS) in the chloroplast genome were considerably higher than the other genes from this organelle, demonstrating that they were under positive selection.
Phylogenetically, S. plagiophyllum and S. polycystum were sister-species and exhibited very limited genetic divergence and a relatively recent divergence time (0.1-0.53 Ma, 95% HPD). The two species retained their morphological distinctness, yet Stankovic et al. [1] reported that sympatric populations of the two species in Phuket did grow adjacent to each other; S. plagiophyllum grows in the low intertidal zones, and S. polycystum grows in deeper area of low intertidal zones. In the present study, S. plagiophyllum and S. polycystum were the closest related sister-taxa with the least genetic divergence. While S. fusiforme and S. horneri, which grow in moderate seawater temperature conditions [25], were observed to be the most divergent sister-taxa (5.86-8.83 Ma, 95% HPD). Comparatively S. plagiophyllum and S. polycystum, which grow in tropical temperature conditions [26], are the most closely related clades (diverging 0.1-0.53 Ma, 95% HPD). Decoding of the organellar genomes of S.

Conclusions
We firstly reported for the first time the mitochondrial and chloroplast genomes of S. plagiophyllum and S. polycystum, describing their genome size and structure. Phylogenetic analysis confimed their close relationship to each other, and to S. ilicifolium in the sister-section Ilicifolia. Both synonymous and non-synonymous substitution rates reflected significant differences in the selective pressure acting on the mitochondrial and chloroplast genomes. Our study improves our understanding of the phylogenetic and evolutionary history of Sargassum.

Genome sequencing, assembly, and annotation
To obtain a full-length mitochondrial and chloroplast sequence, we used the Illumina strategy: For each species of mitochondrion and chloroplast, 1 μg of purified DNA was used to construct a paired-end library. DNA was purified using the TIANgel Midi Purification Kit (Tiangen Biotech Co., Ltd., Beijing, China) following manufacturer's instructions; purified DNA was subsequently sonicated to ~ 450-bp insert sequences using the Covaris M220 system with the parameter to 450-bp. Sequencing libraries were constructed using the NEBNext Ultra ™ DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, England) following the manufacturer's instructions. The constructed libraries were sequenced using Illumina NovaSeq 6000 (Biozeron, Shanghai, China). We quality-controlled and trimmed low-quality raw sequences using Trimmomatic-0.39 [27] with the parameters "SLIDINGWINDOW: 4:15 MINLEN: 75". All the clean reads were used to generate de novo assemblies using SOAP denovo v2.04 [28]. Potential contigs were extracted by aligning against the protein-coding genes from the plant chloroplast database (http:// ftp. ncbi. nih. gov/ refseq/ relea se/ plast id/) and the mitogenome database (http:// ftp. ncbi. nlm. nih. gov/ refseq/ relea se/ mitoc hondr ion/) with BLAST v 2.8.1+ [29]. GapCloser v1.12 [28] was employed to fill the gaps in the scaffolds. Finally, the mitochondrial and chloroplast genomes of S. polycystum and S. plagiophyllum were obtained.  We used the online Dual Organellar GenoMe Annotator tool with default parameters [30] for annotating PCGs and ORFs. The cloverleaf structures of tRNA were predicted using tRNAscan-SE v1.23 with default parameters [31], and rRNA genes were predicted using RNAmmer v1.2 [32]. The circular physical maps of mitochondrial and chloroplast genomes were generated using OGDRAW v1.3.1 [33]. MEGA 7.0 [34] was used to align the sequences and determine the base composition. Functional annotations were performed using sequencesimilarity Blast searches with a typical cut-off E-value of 10 − 5 [35] against several publicly available protein databases, i.e., NCBI non-redundant (Nr) protein database, Swiss-Prot, Clusters of Orthologous Groups (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). The mitochondria and chloroplast genomes of the two Sargassum species were deposited into the GenBank database, with the accession numbers of MW485977 and MW485983 (respectively) for S. polycystum, MW485978 and MW485981 for S. plagiophyllum.

Chloroplast genome boundary region analysis
To investigate the contraction and expansion of inverted repeat (IR) boundary regions in the chloroplast genomes of Sargassum, we downloaded the chloroplast genome data of S. horneri, S. thunbergii, S. fusiforme, S. confusum and S. muticum from the GenBank database (Table 2) and compared and visualized the IR boundary positions and their adjacent genes using the IR scope online program [36]. The genome length, GC content, and gene length of chloroplast genome data of all Sargassum species were compared using MEGA v7.0 and BioEdit software [37].

Substitution rate estimation
We compared sequence divergence rates of genes between the mitochondrial and chloroplast genomes of S. polycystum and S. plagiophyllum, and with other studied Sargassum species (namely S. horneri, S. thunbergii, S. fusiforme, S. confusum and S. muticum). The non-synonymous (dN) and synonymous (dS) substitution rates for the 35 mitochondrial PCGs and 114 chloroplast PCGs were analyzed. Codon alignments for each PCG were performed with MEGA and the identified conserved blocks of multiple sequence alignments were extracted using Gblocks v0.91b with default parameters [38]. The sequence data were aligned and converted into PML format using DAMBE5 [39]. Subsequently, dN, dS, and the dN/dS ratios were calculated using the CodeML program from the Phylogenetic Analysis by Maximum Likelihood (PAML) package v4.8 [40]. The model was run with the following settings in the codeml.ctl files: runmode = − 2 and CodonFreq = 2. To ensure the accuracy and reliability of the data, synonymous substitution values > 5 were discarded from the subsequent analysis [41]. Bar plots and scatter plots of the values of dN, dS and dN/dS were generated for comparing mutation rates and screening of the selected genes.
To build the divergence time estimate tree, five mitochondrial genes (cox1, cox3, nad1, nad4 and atp9) and three chloroplast genes (rbcL, psbA and atpB) from ten Sargassum species (representing two subgenera and seven sections) and Fucus vesiculosus were concatenated and aligned. The multi-gene alignment was partitioned into 1st, 2nd and 3rd codon positions using DAMBE5 [39]. The ML trees were reconstructed using PhyML v.3.1 under the optimal model of TIM3 + F + I + G4 with 100 bootstrap replicates. MCMCTree of PAML v4.8 [48] was used to estimate the divergence times between ten Sargassum species by using the approximate likelihood calculations and two fossil calibrations. We set a minimum age of 13 Ma for the divergence time of Sargassaceae and Fucaceae [49,50] and the minimum age of the genus Sargassum at 6.7 (3.4-11) Ma based on the divergence time estimate of the Sargassacean genera Turbinaria and Sargassum in Yip et al. [14]. The overall substitution rates of the ML tree acquired through BASEML in PML were measured before rgene-gamma calculations. The gradient and Hessian of the branch lengths were estimated by BASEML using the TIM3 + F + I + G4 substitution model at the maximum likelihood estimates [51]. The gradient was calculated using a different method, and the Hessian parameter was adopted for calculating the estimated scores [52,53]. We used the program MCMCTree to estimate the divergence times under both the independent rates model (clock = 2 in the mcmctree. ctl control file) and the nucleotide substitution model (TIM3 + F + I + G4). The substitution rate per time unit (0.080406), rgene_gamma (1 12.5) and sigma2_gamma (1 4.5) were used as parameters. To determine whether convergence had been achieved, two independent MCMC chains were run with 5 × 10 6 steps after discarding 10 4 generations as burn-in.