Open Access

Nannochloropsis plastid and mitochondrial phylogenomes reveal organelle diversification mechanism and intragenus phylotyping strategy in microalgae

  • Li Wei1, 2,
  • Yi Xin1, 2,
  • Dongmei Wang1,
  • Xiaoyan Jing1,
  • Qian Zhou1,
  • Xiaoquan Su1,
  • Jing Jia1, 2,
  • Kang Ning1,
  • Feng Chen4,
  • Qiang Hu3 and
  • Jian Xu1Email author
Contributed equally
BMC Genomics201314:534

DOI: 10.1186/1471-2164-14-534

Received: 19 February 2013

Accepted: 31 July 2013

Published: 5 August 2013

Abstract

Background

Microalgae are promising feedstock for production of lipids, sugars, bioactive compounds and in particular biofuels, yet development of sensitive and reliable phylotyping strategies for microalgae has been hindered by the paucity of phylogenetically closely-related finished genomes.

Results

Using the oleaginous eustigmatophyte Nannochloropsis as a model, we assessed current intragenus phylotyping strategies by producing the complete plastid (pt) and mitochondrial (mt) genomes of seven strains from six Nannochloropsis species. Genes on the pt and mt genomes have been highly conserved in content, size and order, strongly negatively selected and evolving at a rate 33% and 66% of nuclear genomes respectively. Pt genome diversification was driven by asymmetric evolution of two inverted repeats (IRa and IRb): psbV and clpC in IRb are highly conserved whereas their counterparts in IRa exhibit three lineage-associated types of structural polymorphism via duplication or disruption of whole or partial genes. In the mt genomes, however, a single evolution hotspot varies in copy-number of a 3.5 Kb-long, cox1-harboring repeat. The organelle markers (e.g., cox1, cox2, psbA, rbcL and rrn16_mt) and nuclear markers (e.g., ITS2 and 18S) that are widely used for phylogenetic analysis obtained a divergent phylogeny for the seven strains, largely due to low SNP density. A new strategy for intragenus phylotyping of microalgae was thus proposed that includes (i) twelve sequence markers that are of higher sensitivity than ITS2 for interspecies phylogenetic analysis, (ii) multi-locus sequence typing based on rps11_mt-nad4, rps3_mt and cox2-rrn16_mt for intraspecies phylogenetic reconstruction and (iii) several SSR loci for identification of strains within a given species.

Conclusion

This first comprehensive dataset of organelle genomes for a microalgal genus enabled exhaustive assessment and searches of all candidate phylogenetic markers on the organelle genomes. A new strategy for intragenus phylotyping of microalgae was proposed which might be generally applicable to other microalgal genera and should serve as a valuable tool in the expanding algal biotechnology industry.

Keywords

Nannochloropsis Plastid phylogenomes Mitochondrial phylogenomes Intragenus phylotyping strategy

Background

Microalgae include many evolutionarily diverse lineages of unicellular photosynthetic eukaryotes that range in size from a few to several hundred micrometers. They contribute significantly to the primary production and the biogeochemical cycle of our biosphere [1]. They have also found increasing applications for production of lipids, sugars, bioactive compounds and in particular, biofuels [2].

Cellular functions of present-day microalgae are underpinned by plastid (pt), mitochondrial (mt) and nuclear (nc) genomes. Pt and mt play important roles in the evolution of microalgae and higher plants. The origin of pt has been traced to an endosymbiosis event between eukaryotic cell and cyanobacteria, which occurred around 1.2 Ga ago [3]. The engulfed photosynthetic unicellular cyanobacteria adapted to the environment inside the host cells and eventually became the present day eukaryotic pt [4]. The pt genome, in multiple copies, is inherited in a non-Mendelian fashion. Therefore, the genetic information from pt genome can provide an independent view of the phylogeny of its host organisms. Mt, according to the serial endosymbiosis theory, is the direct descendant of a bacterial endosymbiont (likely an alpha-proteobacterium) that became established in the early evolution of a nucleus-containing (but amitochondriate) host cell [5]. Analysis of microalgal organelle genomes has revealed their endosymbiotic origins [6], frequent gene transfers from organelles to nucleus [7] and the phylogeny among genera [8]. However, evolutionary dynamics of organelle genomes that drive microalgal speciation (i.e., within the genus) remains poorly understood.

Due to their asexual reproduction, slow evolution, few recombination, and relatively simple gene structure and dominance of single-copy genes, organelle genes have often been employed as phylogenetic markers [9], which are essential tools in algal research and biotechnology. Several molecular markers are frequently used for phylotyping algae, including the second internal transcribed spacer (ITS2) of nuclear ribosomal DNA (18S rRNA), mitochondrial cytochrome oxidase subunit I (cox1), and plastid ribulose-l-5-bisphosphate carboxylase/oxygenase (rbcL). However, limitations of the strategy are apparent: (i) different markers frequently gave different phylogenetic scenario (i.e., sub-specificity); (ii) most markers could not distinguish strains within a given species (i.e., sub-sensitivity); (iii) currently available markers could not be applied to microalgae of all kinds (i.e., sub-applicability) [10, 11]. For instance, cox1 is useful mainly for identification of red and brown algae [1215], whereas tufA (encoding plastid elongation factor Tu gene) and rbcL serve as the primary DNA barcodes for green algae and diatoms respectively [11, 16, 17]. However the genomic basis of such practices remains largely unknown. Exhaustive search and comparative assessment of phylogenetic markers have not been possible, largely due to the paucity of complete organelle genomes from phylogenetically closely related strains and species.

Nannochloropsis (Eustigmatophyceae) is a genus of unicellular photosynthetic microalgae, ranging in size from 2 to 5 μm and widely distributed in marine, fresh and brackish waters [1821]. It is an emerging model for photosynthetic production of oil (triacylglycerol; TAG) because of its ability to grow rapidly, synthesize large amounts of TAG and polyunsaturated fatty acids and tolerate a wide range of environmental conditions [2224]. Traditional approaches for identifying species in Nannochloropsis include morphology observation, pigment and fatty acid composition and 18S rRNA sequence analysis [25]. However previous analysis based on 18S (a nuclear gene) and rbcL (a pt gene) resulted in conflicting phylogenies among microalgae lineages that include Nannochloropsis[25]. Moreover, the intragenus relationship of Nannochloropsis spp. (especially among N. oculata, N. limnetica, N. granulata and N. oceanica) was inconsistent among 18S-based phylogenetic trees [20, 21]. In this study, using Nannochloropsis genus as a model, we assessed current intragenus phylotyping strategies by producing the complete pt and mt genomes of seven strains from six Nannochloropsis species. This first comprehensive dataset of organelle genomes for a microalgal genus was employed to dissect the evolutionary dynamics of organelle genomes at the genus, species and strain levels. Furthermore, the dataset enabled exhaustive exploration of novel phylogenetic markers suitable for inter-species and intra-species identification of microalgae. A new strategy for intragenus phylotyping of microalgae was therefore proposed.

Results and discussion

Global structural features of the organelle genomes in Nannochloropsis

To capture a comprehensive picture of microalgal organelle evolution at the strain-, species- and genus-levels, two N. oceanica strains (IMET1 and CCMP531) and one strain from each of other five known species in Nannochloropsis Genus: N. salina (CCMP537), N. gaditana (CCMP527), N. oculata (CCMP525), N. limnetica (CCMP505) and N. granulata (CCMP529) were chosen for sequencing (Methods). The pt and mt genomes of IMET1 were first assembled from whole-genome shotgun reads and then manually finished (Methods). Draft sequences of the other organelle genomes were extracted from whole-genome contigs by BLAST using IMET1 as a reference. Long-range PCR was used to test the orientation of large repeats and bridge the remaining gaps. The four junctions between the inverted repeats and single-copy segments were confirmed by sequencing PCR products. The seven sets of organelle genomes were manually inspected and completely finished (Table 1).
Table 1

Features of the Nannochloropsis organelle genomes (Plastid/Mitochondria)

 

N. oceanica IMET1

N. oceanica CCMP531

N. salina CCMP537

N. gaditana CCMP527

N. oculata CCMP525

N. limnetica CCMP505

N. granulata CCMP529

Size (bp)

117,548/38,057

117,634/38,057

114,883/41,907

114,867/42,206

117,463/38,444

117,806/38,543

117,672/38,791

LSC length (bp)

57,360/-

57,387/-

56,882/-

56,925/-

57,287/-

57,444/-

57,352/-

SSC length (bp)

45,235/-

45,240/-

47,364/-

47,698/-

45,227/-

45,259/-

45,247/-

IR length (bp)

7,485/-

7,496/-

5,320/-

5,122/-

7,476/-

7,549/-

7,527/-

Number of genes

160/63

160/63

156/64

156/64

160/63

160/63

160/63

Protein-coding genes

126/35

126/35

123/36

123/36

126/35

126/35

126/35

Structure RNAs

34/28

34/28

33/28

33/28

34/28

34/28

34/28

GC content (%)

33.6/31.9

33.6/31.9

33.1/31.4

33.0/31.4

33.4/31.8

33.5/31.7

33.4/32.0

Coding regions (%)

83.5/87.5

83.4/87.5

83.6/81.4

83.8/80.9

83.5/84.7

83.7/84.6

84.5/84.1

The circular pt genomes ranged in length from 114,867 to 117,806 bp, with an average GC content of 33.3%. Each pt genome was divided into four structural domains (Figure 1A): a large single copy (LSC), a small single copy (SSC), and inverted repeats (IR) which are present in pinpoint duplicate separated by the two single-copy regions. Such a quadripartite structure was previously found in many other algal pt genomes including the primary endosymbiotic Chlamydomonas reinhardtii and secondary endosymbiotic diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana[26, 27].
https://static-content.springer.com/image/art%3A10.1186%2F1471-2164-14-534/MediaObjects/12864_2013_Article_5240_Fig1_HTML.jpg
Figure 1

Plastid and mitochondrial genomes of seven Nannochloropsis strains. (A) Genome map of the complete pt sequence of N. oceanica IMET1. (B) Genome map of the complete mt sequence of N. oceanica IMET1. Genes shown outside the outer circle are transcribed clockwise and those inside are transcribed counter clockwise. Genes belonging to different functional groups are color-coded. Alignment of the Nannochloropsis plastid (C) and mitochondrial (D) genomes were also shown respectively. Genomic regions are color-coded as protein-coding (blue), rRNA/tRNA-coding (cyan) and conserved noncoding sequences (red). * CCMP527 and CCMP537 do not contain the region. **Two copies of cox1 are present in CCMP527 and CCMP537. ***In CCMP529, trnD-GUC was translocated to the interval between cox2 and rrn16.

Each pt genome encodes 152 unique genes including 26 tRNA, three rRNA and 123 proteins. In addition, eight genes (clpC-I, psbV, petJ, rrn16, trnI(gat), trnA(tgc), rrn23 and rrn5) were duplicated in the IR of CCMP505, CCMP525, CCMP531, CCMP529 and IMET1, while only five genes (rrn16, trnI(gat), trnA(tgc), rrn23 and rrn5) were duplicated in the IR of CCMP527 and CCMP537. The overall genome structure and gene content of Nannochloropsis pt are similar to those of T. pseudonana, P. tricornutum and Ectocarpus siliculosus[8, 26].

The circular mt genomes were 38,057 ~ 42,206 bp in length (Figure 1B), with an average GC content of 31%. The coding potential (for proteins and RNAs) was 80.9%-87.5%. Each consists of 63 genes and 5,422-9,600 bp non-coding sequences. The coding regions of the seven mt genomes were similar in size to those of T. pseudonana and P. tricornutum[28], yet the coding potential of Nannochloropsis mt genomes was higher, suggesting a relatively compact genome structure. Although most regions of the seven mt genomes were conserved, a pair of 3.5Kb-long, cox1-harboring repeats were found only in CCMP527 and CCMP537. Two segments of genes (rps8-rpl6-rps2-rps4, rpl2-rps19-rps3-rpl16) were conserved in previously reported stramenopiles including diatoms and brown algae. However in Nannochloropsis, the bacterial S10 operon block (rpl2-rps19-rps3-rpl16) was interrupted by rpl22 which inserted between rps19 and rps3.

Neither group I nor group II type introns were present in any of the Nannochloropsis organelle genes. Although the pt and mt genomes of CCMP529 and CCMP525 possessed increased numbers of small dispersed repetitive sequences compared to other Nannochloropsis pt and mt genomes, overall there were fewer repeats in the Nannochloropsis pt and mt genomes compared to those of diatoms. Moreover, the seven sets of pt and mt genomes were highly conserved in gene content and gene size (Figure 1A and B). In addition, the aligned regions (representing 96.89% and 97.16% of pt and mt genome lengths, respectively) showed high similarities (Figure 1C and D), with protein-coding regions generally more conserved than noncoding regions. Therefore, compactness in pt and mt genome organization is a shared feature among the seven Nannochloropsis strains.

Protein complements of the organelle genomes

Organelle genomes were thought to have undergone size- and functional reduction [29, 30], and frequent genetic exchange via endosymbiotic gene transfer (EGT) and homologous recombination [31, 32]. The present-day microalgal pt genomes mainly encode the components of photosystems, carbon assimilation, photosynthetic electron transport and gene translation machinery [33], while the mt genomes encode genes mostly involved in respiratory electron transport, oxidative phosphorylation, ATP synthesis and ribosome biosynthesis [5, 34]. In Nannochloropsis, brown algae and diatoms, nearly all the photosystem I and photosystem II genes encoded by the pt genomes were retained in a high degree of consistency (Figure 2). However, a photosystem I gene (psaM) was lost in Nannochloropsis pt genome. A photosystem II gene (psbM) was also absent in the pt genomes of Nannochloropsis as in other red algae, but was present in the green algae lineage [3539]. In addition, all of the cytochrome components found in other stramenopiles and the red lineage of algae (with the exception of petL) have been retained in Nannochloropsis pt genomes [4046].
https://static-content.springer.com/image/art%3A10.1186%2F1471-2164-14-534/MediaObjects/12864_2013_Article_5240_Fig2_HTML.jpg
Figure 2

Comparison of functional complements of organelle genomes. Among Nannochloropsis, brown algae and diatoms, shared and lineage-specific genes from plastid and mitochondrial genomes are compared via Venn diagrams. (A) Shared and lineage-specific genes of different plastid genomes. (B) Shared and lineage-specific genes of different mitochondrial genomes.

All of the ATP synthase genes (i.e.,atpA, atpB, atpD, atpE, atpF, atpG, atpH and atpI) were found in pt genomes of stramenopiles [8, 26, 47, 48], except the seven Nannochloropsis strains in which atpD was missing. The chlorophyll biosynthesis genes chlB, chlL and chlN were believed to be transferred to nucleus via EGT in Thalassiosira, Odontella and Heterosigma[26, 48], however four chlorophyll biosynthesis genes (chlB, chlI, chlL and chlN) are still present in the pt genomes of Nannochloropsis, Ectocarpus, Fucus, Vaucheria and Aureoumbra (Additional file 1: Table S1 and Figure 2) [8, 47]. RbcR (ycf30), which was usually encoded by pt genomes and autonomously governs transcription of Rubisco operon in red algae [49], is present in either pt or nuclear genomes of all known stramenopiles except Nannochloropsis. The organization of pt ribosomal-protein genes in Nannochloropsis was also similar to that of Thalassiosira, Odontella, Heterosigma, Ectocarpus and Fucus, although rpl9 and rpl24 were lost in the Nannochloropsis pt genomes. In addition, Synechococcus phage S-SM2 gene segment was found in the Nannochloropsis pt genomes, which is likely a signature of their cyanobacterial origin.

To identify the functional distinction of Nannochloropsis mt genomes, the gene repertoires of 25 algal mt genomes were compared (Additional file 1: Table S2). The protein profiles of Nannochloropsis mt genomes are largely similar to those of T. pseudonana and P. tricornutum. However, atp1 was retained only in Nannochloropsis mt genome (as are the cases in non-photosynthetic oomycetes such as Phytophthora spp. (another subgroup of stramenopiles) and Saprolegnia ferax) [50, 51]. In P. tricornutum and T. pseudonana, atp1 were thought to be transferred to the nuclear genome via endosymbiotic gene transfer [28]. Therefore Nannochloropsis exhibit an ancient feature, as is in the case of Phytophthora. On the other hand, the rrn5 gene which encodes the 5S rRNA component was lost in Nannochloropsis and Thalassiosira mt genomes (but present in other stramenopiles such as Heterosigma, Ectocarpus and Fucus), suggesting structural diversity in mitochondrial translation systems of stramenopiles.

One prominent feature shaping organelle evolution is the targeting of certain nuclear-encoded proteins to organelles, which functionally complement the reduced gene content of pt/mt genomes [52]. Analysis of subcellular localization (with PredAlgo; [53]) of 9,756 putative proteins in IMET1 suggested that 973 and 1,620 proteins were targeted to mt and pt, respectively. They mainly include tRNA synthetases, ribosomal proteins, DNA polymerases, eukaryotic translation factors, transcription factors, TATA-box binding proteins and ATP synthases. These proteins might participate in the transcription and translation of organelle-encoding genes. In addition, 26 pentatricopeptide repeat-containing proteins (PPRs) were annotated, with six (g707, g1422, g2743, g3644, g3813 and g10257) targeting to mt and five (g2850, g3634, g3565, g8976 and g9207) to pt. In higher plants PPRs were likely involved in RNA editing, a process of post-transcriptional modification of RNA primary sequences through nucleotide deletion, insertion, or modification [54, 55]. Thus in Nannochloropsis these proteins might participate in organelle RNA editing, which is an activity that has not been reported in microalgae.

Evolution of organelle genomes

Organelle-based phylogeny of Nannochloropsis

Phylogenetic trees based on pt genomes were constructed by Maximum-Likelihood (ML), Maximum Parsimony (MP) and Neighbor-Joining (NJ) methods using a dataset of 39 conserved proteins (7,406 amino-acid positions) encoded by the pt genomes of four taxa of red algae and 13 green-algal taxa (the green cladeViridiplantae as outgroup; Additional file 1: Figure S1A). Firstly, red- and green-algal pt genomes respectively formed a distinct cluster. Secondly, within the red algae lineage, stramenopile species formed a monophyletic cluster. Thirdly, Nannochloropsis as a representative of Eustigmatophyte was closely related to the diatom Thalassiosira. Similar analysis of the mt genomes using a dataset of seven protein-coding genes (2,101 amino-acid positions) present in the lineages of green and red algae revealed that the stramenopile lineages were clustered despite weak support among Nannochloropsis, Thalassiosira and Heterosigma (Additional file 1: Figure S1B). Thus both pt and mt genomes suggested that Nannochloropsis are phylogenetically close to diatoms and brown algae.

Evolution of conserved coding regions in organelle genomes

In the coding regions of the seven pt genomes, 11,749 SNPs were identified (6,856 transitions, 4,871 transversions and 22 indels), representing a density of 152 SNPs/Kb (Additional file 1: Figure S2). Each of these 22 indels was a triplet of bases, which may not disrupt the open reading frames, reflecting a mechanism by which the cells fine-tune structure and function of encoded proteins. Among the SNPs, 8,845 were synonymous and 2,904 nonsynonymous, with a nonsynonymous/synonymous rate of 0.326.

In the coding region of the seven mt genomes, 4,990 SNPs (2,985 transitions, 1,997 transversions and 8 indels) were identified. The SNP density was 200 SNPs/Kb (Additional file 1: Figure S2), which is about 1.3 times higher than that of their pt counterparts. Similar to pt, indels in mt coding regions did not disrupt the open reading frames. Several parameters describing SNPs were similar between pt and mt, including SNP density (0.152 in pt and 0.200 in mt) and transition/transversion (1.408 in pt and 1.495 in mt).

To test the selection pressure of organelle protein-coding genes, ratio of nonsynonymous (Ka) versus synonymous substitution (Ks) was analyzed, which suggested a strong negative selection might have occurred in Nannochloropsis organelles. Ka/Ks of most pt genes were below 0.09 (except psbK, psbN, psbW, atpF and ycf49; Additional file 1: Figure S3A). Among the 38 mt-encoded genes, Ka/Ks were mostly no more than 0.1 (except orf228, orf51, rps10, rpl5, atp8, rps14 and orf321, Additional file 1: Figure S3B). Notably, the mt orf228 (0.225) and pt psbK (0.098) were of the highest Ka/Ks ratios among all organelle genes. In Nannochloropsis, mean evolutionary rates of pt genes (at 0.031) and mt genes (at 0.064) were significantly lower than those of nuclear genes (at 0.093) (Additional file 1: Figure S3C; [56]), suggesting pt genomes have been evolving at a rate 50% and 33% of that of mt and nuclear genomes respectively.

Hotspots of structural and sequence polymorphism in plastid and mitochondrial genomes

Hotspot of structural polymorphism in plastid genomes

Despite the slow evolution of the Nannochloropsis organelle genomes, a single hotspot of structural polymorphism was found in the pt genomes. A large inverted repeat (IR), as a canonical structure of pt genome, was present in the vast majority of higher plants and algae studied so far [57]. In many stramenopile algae such as H. akashiwo, Thalassiosira oceanica and Skeletonema costatum, the IRs are large in size (22 kb, 18 kb and 20 kb, respectively) and include 17 ~ 20 genes (including rRNA genes such as rrn5, rrn16 and rrn23, ribosomal protein genes such as rpl32, rpl21 and rpl34, and photosynthetic genes such as psbA, psbY and psbC; [48, 58]). However, a pair of short IRs (IRa and IRb) each of 5,122 ~ 7,380 bp in size was found in each of the Nannochloropsis pt genomes (Figure 3), suggesting dramatic IR-size contraction. This may be due to the fewer number of genes harbored in the IRs: the ribosomal operon (rrn5, rrn16 and rrn23) was present while ribosomal protein and photosystem genes were absent in each of the Nannochloropsis strains; moreover, psbV, petJ and clpC-I (which were absent in the IRs of diatoms and brown algae [8, 26]) were present in only a subset of the strains (Figure 3A).
https://static-content.springer.com/image/art%3A10.1186%2F1471-2164-14-534/MediaObjects/12864_2013_Article_5240_Fig3_HTML.jpg
Figure 3

Hotspots of structural polymorphism that drive the diversification of organelle genomes. The phylogenetic tree on the left was based on whole-genome alignment of the seven complete mt genomes. (A) Structural and sequence polymorphism of inverted repeat in the pt genomes. (B) Structural and sequence polymorphism of the hotspot in the mt genomes. Within each sub-figure, genomic features were drawn proportionally to their actual length. Grey solid lines, inserted for alignment purposes, were not actual sequences.

Interestingly, among the different Nannochloropsis species, evolutionary patterns of the two IRs (IRa and IRb; Figure 3A) were distinct: IRb were highly conserved, while IRa were extraordinarily hypervariable. There were three types of IRa in Nannochloropsis (Figure 3A): (i) Type I, found in CCMP527 and CCMP537, did not contain a region of petJ-psbV-clpC-I. (ii) Type II, found in CCMP529, CCMP525 and CCMP505, possessed a petJ-psbV-clpC-I that was an exact duplicate of that in IRb. (iii) Type III, present in CCMP531 and IMET1, encompassed a fragmented petJ-psbV-clpC-I, which differed from that in IRb due to a disruption of open reading frame (Additional file 1: Figure S4). The particular type of IRa that a given strain carries appeared to correlate with its specific lineage in Nannochloropsis genus, suggesting ancient IRa-diversifying events that likely have driven the speciation from the common ancestor of present-day Nannochloropsis strains.

Alignment of Type II and Type III IRa (in the five strains) revealed that the structural polymorphism leading to different IRa types was mainly due to hyper variation of sequences in two of the genes: clpC-I_1 and psbV_1. Length of the two genes varied greatly as different start and stop codons were adopted among the strains (Additional file 1: Figure S4). Compared to Type II (CCMP529, CCMP525 and CCMP505), two bases were missing in Type III (IMET1 and CCMP531), resulting in a truncated psbV_1. Moreover, clpC-I_1 ORFs were altered due to several intragenic insertions and deletions.

In the pt genome of higher plants, the border between SSC, LSC and IR exhibited a large degree of variation, in that many genes located in the junction are often lost and thus IR is reduced [59, 60]. IR is important in higher plants because (1) it might stabilize ptDNA organizations [61], (2) it could mediate intra-molecular homologous recombination and (3) it may increase the relative copy number of rRNA genes [40]. The identification of a variable region located in the junction between IR and SSC in Nannochloropsis suggested an evolutionarily conserved link in hypervariable loci between higher plants and microalgae, however their differences are profound (Figure 3A): (i) Despite the presence of two IR copies in all higher plants and microalgae studied so far, the structural polymorphism is symmetric in higher plants (i.e., both IRs can be polymorphic; [57]) yet is strictly asymmetric in Nannochloropsis: only IRa were found as polymorphic while IRb were strictly conserved across all the seven strains tested. (ii) Unlike higher plants where the outer-most gene of IR underwent contraction [62, 63], the internal gene of IR underwent contraction in Nannochloropsis. (iii) In higher plants the mechanism driving IR expansion/contraction was believed to be gene conversion and double-strand DNA breaks based on the observation of recombination points and tRNA duplication in IR [57, 64]; however these observations were absent in any of the Nannochloropsis IR, suggesting a different and previously unappreciated mechanism for IR diversification in microalgae.

Single hotspot of structural polymorphism in mitochondrial genomes

A single hotspot of sequence variation was also discovered in mt genomes of the seven Nannochloropsis strains (Figure 1D). A pair of large repeats (~3,500 bp long), arranged as direct repeats, was found in N. gaditana CCMP527 and N. salina CCMP537. However only one such copy was present in each of the other strains (Figure 3B). Each of these regions was amplified by long-range PCR and fully sequenced to confirm the copy number variation.

Interestingly, in N. gaditana CCMP527 and N. salina CCMP537 mt genomes, each copy in the pair of large repeats harbors one intron-free cox1 (encoding cytochrome c oxidase I). Such a duplication producing two 99%-identical copies of cox1 was not found in either diatoms or brown algae. In diatom mt genomes (Synedraacus, T. pseudonana and P. tricornutum), a single copy of cox1 (not found within repeats) contained ORFs-harboring introns [28, 65], while in brown algae (Dictyota dichotoma, Fucus vesiculosus and Desmarestia viridis) a single intron-less cox1 was present [66]. Therefore the observed direct repeats that harbor cox1 was likely due to a duplication event in Nannochloropsis before the branching point of N. gaditana and N. salina (Figure 3B; the presence of two large repeats was also noted in N. gaditana CCMP526 mt genome [20]). Whether the event conveyed any biological consequence to the lineage is unknown.

Strategy for sensitive and reliable intragenus phylotyping

Inter-species markers

Molecular markers (DNA barcoding) are a powerful taxonomy tool as compared to morphology-based classification [67]. The seven pairs of complete pt and mt genomes in Nannochloropsis enable the first exhaustive search and full assessment of organelle genes for introgenus phylotyping in microalgae. A pt (and mt) genome based reference phylogenetic tree was first constructed based on the concatenated nucleotide sequences of all protein-coding genes on the pt (and mt) genomes of the seven Nannochloropsis strains. Then each orthologous set of the intragenic and intergenic sequences from mt and pt genomes was extracted (a total of 230 individual regions) for construction of individual sequence based phylogenetic trees (Methods). Those orthologous sequence-sets consistent with the reference trees were analyzed further for their sensitivity and specificity in phylotyping. The Euclidean distance between two trees (each represented by one p-distance matrix) was used to quantify the similarity of the encoded phylogeny (Methods).

A total of 54 candidate phylogenetic markers were identified whose nucleotide-sequence-based phylogenetic trees were consistent with the reference trees (Figure 4A). Forty-nine potential markers provided on average 1.5 times higher resolution (with SNP-density above 27%) than the seven commonly used phylogenetic markers (ITS2, cox1, cob, cox2, rbcL, rrn16_mt and 18S) in the interspecies taxonomy (Figure 4A; Table 2). Of these 49 candidates, twelve exhibited higher sensitivity than ITS2, which is the most commonly used microalgal phylogenetic marker at present and in effect provided the highest resolution among presently used phylogenetic markers in microalgae (Figure 4A; Table 2). Among these 12 markers, eight belonged to coding regions and another four to noncoding regions. Those encoded by mt included rps14, rps4, rpl6, rpl5, orf53, rpl14 and rps14-atp9 while those encoded by pt were ycf34, clpA, ycf34-psbD, trnQ(uug)-groEL and trnL(uag)-trnW(cca). Among them, rps14_mt shows the highest resolution with interspecies difference of 37.71% and the Euclidean distance of 0.403 (Table 2), representing a sensitivity of 36.3% higher than ITS2 (interspecies difference of 27.67%).
https://static-content.springer.com/image/art%3A10.1186%2F1471-2164-14-534/MediaObjects/12864_2013_Article_5240_Fig4_HTML.jpg
Figure 4

New inter-species phylogenetic markers for Nannochloropsis. (A) Comparison of the sensitivity and specificity among candidate regions/markers for inter-species phylogenetic reconstruction. The regions/markers derived from genic and intergenic sequences of pt and mt of the seven strains were listed on the X axe. The % nucleotide difference of each region/markers was calculated as the index for sensitivity. The average bootstrap values of all branches in the sequence-specific phylogenetic trees (maximum parsimony; MP) were shown as the index for specificity. *: presently used markers. Arrow: candidate phylogenetic markers that provided higher sensitivity than ITS2. (B) Phylogeny of Nannochloropsis strains reconstructed from the presently used and our proposed sequence markers. Phylograms were derived using MP analysis. *: reference trees based on the concatenated sequences of all pt/mt protein-coding genes. **: presently used markers. Blank box: presently used markers whose phylogenetic reconstructions are not consistent with the reference trees. CCMP537 was assigned as the root for each of the trees.

Table 2

Comparison of candidate markers for interspecies phylotyping in Nannochloropsis genus

Gene

Origin

Size

Difference*%

Euclidean distance***

SD****

Interspecies

Intraspecies**

rps14_mt

mt

297

37.71

0.34

0.397

0.046

ycf34

pt

252-261

33.72

0.00

0.409

0.038

rps14_mt-atp9

mt

102-195

32.83

0.00

0.500

0.048

rps4_mt

mt

726

32.09

0.41

0.260

0.031

rpl6_mt

mt

552

32.07

0.36

0.263

0.036

ycf34-psbD

pt

204-224

31.72

0.13

0.211

0.018

rpl5_mt

mt

525-540

31.67

0.57

0.299

0.037

clpA

pt

447-450

29.33

0.22

0.328

0.040

orf53

mt

156-162

29.01

0.00

0.258

0.041

rpl14_mt

mt

381

28.35

0.79

0.167

0.023

trnQ(uug)-groEL

pt

269-274

28.00

0.00

0.253

0.032

trnL(uag)-trnW(cca)

pt

648-673

27.99

0.46

0.234

0.021

ITS2

nc

385-499

27.67

0.52

-

-

rps2-rpoC2

pt

162-193

25.63

1.04

0.281

0.036

rpl16_mt

mt

432

24.54

0.00

0.098

0.018

nad4

mt

1578

23.76

0.63

0.038

0.008

atp9-rps13_mt

mt

215-233

23.50

0.43

0.107

0.022

ccs1

pt

1260-1272

22.90

0.00

0.111

0.010

rpl1

pt

687

22.56

0.00

0.114

0.012

ilvB

pt

1479

21.37

0.00

0.084

0.008

rpl6

pt

543

20.99

0.18

0.054

0.008

ccsA

pt

918-921

20.96

0.22

0.105

0.015

chlN

pt

1326-1335

20.37

0.00

0.062

0.008

ycf59

pt

1044

20.31

0.00

0.054

0.007

cbbX

pt

1011

19.88

0.10

0.053

0.008

chlL

pt

867

19.26

0.12

0.026

0.005

rps4

pt

627

19.14

0.00

0.029

0.005

rpoB

pt

3168

19.10

0.03

0.034

0.004

dnaK

pt

1809

19.02

0.11

0.030

0.004

clpC-II

pt

1155

18.87

0.09

0.020

0.004

atp8-orf228

mt

1294-1413

18.55

0.29

0.140

0.012

chlB

pt

1521-1524

18.37

0.00

0.023

0.005

ThiS-rbcL

pt

314-330

18.15

0.00

0.052

0.011

nad3

mt

369

17.89

0.27

0.108

0.009

ycf36-petN

pt

391-397

17.84

0.00

0.055

0.010

rps16

pt

255

17.65

0.00

0.119

0.025

cox1

mt

1521

17.55

0.13

0.144

0.018

ycf49

pt

294-297

16.84

0.00

0.035

0.007

ycf3

pt

504

16.47

0.00

0.059

0.009

rpl23

pt

360

16.11

0.00

0.062

0.007

cob

mt

1161

15.25

0.00

0.207

0.024

ilvB-rpl35

pt

504-512

15.04

0.00

0.091

0.019

nad2

mt

1482

14.71

0.34

0.039

0.005

hyp.protein

pt

645-699

14.57

0.14

0.086

0.009

rps13

pt

372

14.25

0.00

0.107

0.011

cox3

mt

813

13.65

0.12

0.221

0.021

cox2

mt

909

13.53

0.00

0.215

0.018

tufA

pt

1230-1275

13.36

0.08

0.067

0.009

rps7

pt

456-477

12.55

0.00

0.160

0.017

atpG

pt

477-483

11.18

0.00

0.171

0.014

rbcL

pt

1464

10.04

0.00

0.213

0.021

psaE

pt

204

9.80

0.00

0.199

0.017

rrn23_mt

mt

2235

8.18

0.18

0.365

0.038

rrn16_mt

mt

1491-1494

6.87

0.00

0.381

0.035

psbA

pt

1083

4.16

0.00

0.364

0.036

18S

nc

1790-1792

2.51

0.16

-

-

Note: “-” indicated that the gene was not encoded by the organelle genomes. *Difference = SNP/Size; **Difference between IMET1 and CCMP531; ***A measure of the similarity between two trees calculated from the two p-distance matrixes that each represents a phylogenetic tree; ****Square deviation of the corresponding p-distance between two matrixes.

Furthermore, these new sequence markers yielded a phylogeny consistent with the reference trees. Among those presently used markers, however, only cox1 (but not psbA, rbcL, cox2 and rrn16_mt) produced a phylogeny in consensus with the reference trees (Figure 4B). The psbA, rbcL, cox2 and rrn16_mt are not suitable for distinguishing closely related species due to their low SNP-density (4.16%-13.53% among the six Nannochloropsis species; Table 2), especially among N. oceanica (IMET1 and CCMP531), N.oculata CCMP525 and N.granulata CCMP529 (SNP-density ranging from 0.64% to 3.74%). Thus the newly identified candidate markers may be more suitable than current markers for species classification in Nannochloropsis.

To test their wider applicability, these new candidate markers (rps14, rps4, rpl6, rpl5, orf53, rpl14, ycf34 and clpA) were searched in available organelle genomes from other algal genera: they were rarely present in mt or pt genomes of the green lineage (e. g. Chlamydomonas, Volvox and Dunaliella). ITS2 and 18S rRNA are universally found and widely used for species-level identification in higher plants and algae, however their resolution is limited as shown in this study. Moreover, being localized on the nuclear genomes, ITS2 and 18S rRNA genes can become divergent paralogous copies as a result of incomplete concerted evolution and sexual incompatibility among individuals [68, 69]. Our proposed new organelle markers provide certain advantages: higher discriminatory power, clonal modes of evolution and non-Mendelian inheritance [70, 71]. Our analysis also suggested different microalgal lineages may require different sets of organelle marker genes for reliable and sensitive intragenus phylotyping.

Intra-species phylogenetic markers

Intraspecies divergence of microalgal genomes can be significant: despite their close phylogenetic relationship, the comparison of nuclear genomes revealed significant differences in coding sequences between the two N. oceanica strains IMET1 and CCMP531 (2.6% IMET1-specific genes; Methods). Therefore sensitive and reliable phylogenetic markers for intraspecies phylotyping are crucial. We tested the presently used markers and the candidate species-level markers identified above on the two N. oceanica strains IMET1 and CCMP531. All presently used phylogenetic markers were not sufficiently sensitive to distinguish IMET1 and CCMP531 due to their low SNP density (e.g. 0, 0, 2, 1 SNPs were respectively detected in cox1, rbcL, 18S and ITS2). In fact, between IMET1 and CCMP531, merely 87 and 129 SNPs were found in pt and mt genomes respectively. Moreover the SNP loci were physically distributed in a scattered manner, confounding their utilization via PCR followed by sequencing for phylogenetic analysis (Figure 1C, 1D).

To identify the most variable regions between IMET1 and CCMP531, the full lengths of IMET1 and CCMP531 organelle genomes were aligned. Only three highly variable regions (rps11_mt-nad4, rps3_mt and cox2-rrn16_mt) were found (Table 3), each with at least 5 SNPs per 1,000 bases. There were 8, 7 and 14 SNPs in rps11-nad4, rps3 and cox2-rrn16, respectively and all these SNPs were synonymous substitutions. On IMET1 and CCMP531, the combined sequences of these three regions provided at least two-fold higher resolution than the above-mentioned presently used and new markers (Figure 5A). Thus combination of the three regions, as Multiple-Locus Sequence Typing (MLST) markers, can provide higher resolution for intraspecies discrimination.
https://static-content.springer.com/image/art%3A10.1186%2F1471-2164-14-534/MediaObjects/12864_2013_Article_5240_Fig5_HTML.jpg
Figure 5

Multiple locus sequence tag for high-resolution phylotyping of three closely related N. oceanica strains . (A) Comparison of the sensitivity and specificity among candidate regions/markers for intra-species discrimination. The regions/markers derived from genic and intergenic sequences of pt and mt of the N.oceanica strains (IMET1 and CCMP531) were listed on the X axe. The % nucleotide difference of each region/markers was calculated as the index for sensitivity. The bootstrap values of the IMET1-CCMP531 branches in the sequence-specific phylogenetic trees (maximum parsimony; MP) were shown as the index for specificity. *: presently used markers. Arrow: the three regions (rps11_mt-nad4, rps3_mt and cox2-rrn16_mt) for Multiple-Locus Sequence Typing (MLST). (B) Phylogeny of IMET1, CCMP531 and CCMP1779 as reconstructed from three MLST loci. Grey background: identical loci in two strains. Blank box and *: A base that is different between CCMP1779 and IMET1.

Table 3

Intraspecies phylogenetic markers of the three Nannochloropsis oceanica strains of CCMP531, CCMP1779 and IMET1

Region

Location

CCMP531

IMET1

CCMP1779

Synonymous/nonsynonymous

rps11_mt-nad4

8512

G

A

A

-

 

8513

C

T

T

-

 

8555

-

G

G

-

 

8577

C

A

A

-

nad4

8710

T

C

C

synonymous

 

8720

G

A

A

synonymous

 

8941

C

A

A

synonymous

 

8956

C

T

T

synonymous

rps3_mt

21403

G

C

C

synonymous

 

21488

T

C

C

synonymous

 

21583

A

T

T

synonymous

 

21637

A

G

G

synonymous

 

21682

A

G

G

synonymous

 

21940

G

A

A

synonymous

cox2-rrn16_mt

34621

G

T

T

-

 

34727

A

G

G

-

 

34756

C

T

T

-

 

34801

A

G

G

-

 

34866

-

T

T

-

 

34867

-

T

T

-

 

34870

-

A

A

-

 

34871

-

A

A

-

 

34880

A

-

-

-

 

34885

A

-

-

-

 

34898

T

C

C

-

 

34927

-

A

A

-

 

34934

C

T

T

-

 

34971

G

A

A

-

 

34995

A

A

T

-

Note: “-” indicated that the mutation was located at a non-protein-coding region.

To further test whether these MLST markers can be used for intraspecies phylogenetic reconstruction, we PCR-amplified and sequenced the three candidate MLST loci in CCMP1779, another N.oceanica strain whose nuclear genome along with partial plastid and mitochondrial genomes was recently released [21]. Despite significant divergence in the encoded proteome between IMET1 and CCMP1779 (1.8% IMET1-specific genes and 7.2% CCMP1779-specific genes; Methods), one of the presently used markers and our newly proposed Nannochloropsis species-level markers were able to discriminate the two strains. However, one high-quality SNP (confirmed by re-sequencing on both directions) was found in the cox2-rrn16_mt region of the IMET1 and CCMP1779 mt genomes. Thus our proposed MLST marker-set consisting of rps11_mt-nad4, rps3_mt and cox2-rrn16_mt, were able to discriminate the three closely related N. oceanica strains (Figure 5B). Moreover the reconstructed phylogeny based on the marker-set was consistent with that based on the whole-genome comparison (Figure 5B). These findings thus suggested a strategy for high-resolution intra-species typing of microalgae.

On the other hand, a total of 26 simple short repeats (SSRs; or microsatellites) were identified in the organelle genomes of IMET1 and CCMP531. Eleven of these SSRs were from pt genomes and 15 from mt genomes (Table 4). Between the two strains, 11 of the SSRs were shared. However two strain-specific SSRs were found in IMET1 pt genome: one poly (G) 14 mononucleotide intergenic sequence between psbV and clpB and one multiple (TA) 7 dinucleotide sequence located in trnK(uuu)-trnG(gcc). Moreover a specific poly (A) 12 mononucleotide genic sequence located in rps3 was found specifically in CCMP531 mt genome. SSRs offer potential advantage for strain discrimination as they are locus-specific, PCR-friendly and highly polymorphic [72]. Thus the three specific SSRs identified can be used to identify CCMP531 and IMET1. As SSR loci can be strain-specific, a searchable database of microalgal SSRs such as those reported here can be established for high-resolution microalgal strain-typing.
Table 4

Simple sequence repeat (SSRs) for intra-species discrimination

Repeat

Length

Region

Locus

Organelle

Strain

A

10

Intergenic

psbB-petF

pt

IMET1, 531

T

10

Genic

rps12

pt

IMET1, 531

T

10

Intergenic

rpl16-rps3

pt

IMET1, 531

A

10

Intergenic

secA-rpl34

pt

IMET1, 531

G *

14

Intergenic

psbV-clpC

pt

IMET1

TA *

14

Intergenic

trnK(uuu)-trnG(gcc)

pt

IMET1

T *

10

Intergenic

tufA-rps7

pt

531

T

11

Genic

coxI

mt

IMET1, 531

A

11

Genic

atp1

mt

IMET1, 531

A

10

Intergenic

orf321

mt

IMET1, 531

A

10

Genic

rpl14

mt

IMET1, 531

T

10

Intergenic

trnD(gtc)-trnG(tcc)

mt

IMET1, 531

A

10

Genic

rps13

mt

IMET1, 531

T

10

Intergenic

trnK(ttt)-nad4L

mt

IMET1, 531

A *

12

Genic

rps3

mt

531

* SSRs that are specifically present in IMET1 or CCMP531.

Conclusion

The complete organelle genome sequences of seven strains from six Nannochloropsis species enabled the first systematic analysis of organelle evolution within a microalgal genus. Both pt and mt genomes of Nannochloropsis were among the most compact known in stramenopiles, with the absence of introns, tight packaging of genes and a paucity of disperse repeats. Being highly conserved in gene content, gene size and gene order and strongly negatively selected in protein-coding regions, the pt and mt genomes were evolving at a rate 33% and 66%, respectively, of that occurred in nuclear genomes.

In Nannochloropsis, the pt genome diversification was driven by asymmetric evolution of two copies of inverted repeats (IRa and IRb), while mt genome evolution was shaped by a single evolution hotspot varied in copy-number of a 3.5Kb-long, cox1-harboring repeat. Genomic engineering of plastids, the primary energy production site in the cell, offers many opportunities to improve algal feedstock productivity. Transgene integration into a plastid genome may occur via homologous recombination of flanking sequences used in vectors. However, plastid transformation vectors are usually species-specifically designed, leading to low efficiency and even intractability in other species [73]. The high degree of conservation of pt and mt genomes suggested the feasibility of “universal vector” based on the highly conserved intergenic spacer regions. On the other hand, discovery of the evolutionary hotspots (i.e. IR in the pt genomes and the large repeats harboring cox1 in the mt genomes) and the mechanism underlying the polymorphism should guide rational genetic engineering of plastids for possible phenotypic trait improvement and even for de novo design of organelle genomes for a synthetic algal cell [74].

This organelle phylogenome dataset, the most comprehensive for a microalgal genus to-date, also provided a first opportunity to evaluate existing phylogenetic reconstruction and strain-typing strategies in microalgae. Our analysis showed that, despite their wide uses in distinguishing among different microalgal genera, existing organelle gene markers (cox1, cox2, psbA, rbcL and rrn16_mt) and nuclear gene markers (ITS2 and 18S) have limited power in distinguishing closely related species due to the low SNP densities in these genes. Exhaustive searches and evaluation of all coding and non-coding sequences on the organelle genomes enabled us to propose the strategy for intra-genus phylotyping of microalgae: (i) twelve sequence markers of higher sensitivity than ITS2 (the most widely used microalgal phylogenetic marker at present) for interspecies phylogeny, (ii) genus-specific multi-locus sequence tag of rps11_mt-nad4, rps3_mt and cox2-rrn16_mt for intraspecies phylogenetic analysis, and (iii) several SSR loci for reliable strain identification. As a result, new community resources such as databases of genus-specific phylogenetic markers and strain-identifier sequences (e.g. SSRs) should be developed for microalgae. The intragenus analysis strategy developed in this study may be generally applicable to other microalgal genera. As screening, development and protection of microalgae frequently demand species-, strain- and even isolate-level resolution, our findings may be valuable to the expanding algal biotechnology community.

Methods

Algal culture and genomic DNA extraction

Nannochloropsis strains including N. oceanica CCMP531, N. salina CCMP537, N. gaditana CCMP527, N. oculata CCMP525, N. limnetica CCMP505 and N. granulata CCMP529 were from the Provasoli-Guillard National Center for Culture of Marine Phytoplankton (CCMP). Nannochloropsis oceanica strain IMET1 was from the University of Maryland Biotechnology Institute. All of them were cultivated in liquid modified f/2 medium containing sterilized seawater (salinity1.5%, w/v) at 25°C under light–dark cycles of 12 h:12 h at an exposure intensity of 40 μmol/m2sec. Genomic DNA was then extracted via a published protocol [75].

Sequencing and finishing of the 14 organelle genomes

All the organelle genomes were extracted from the whole-genome sequencing project of seven Nannochloropsis strains. Firstly, the high-quality draft genome sequence of Nannochloropsis oceanica strain IMET1 was generated using a hybrid sequencing and assembly strategy that combines the powers of pair-ended reads from 454 and Solexa. The pt and mt genomes of IMET1 were assembled from whole-genome shotgun reads using Newbler-v2.5.3 (Roche, Switzerland) and SOAPaligner-v2.21 [76] and then were manually finished using the Phred-Phrap-Consed package [7779]. The IMET1 pt and mt sequences were circled into complete genomes with the support of high-quality reads. The IMET1 organelle genomes then served as a reference for assembly of other organelle genomes. Secondly, draft sequences of the other six Nannochloropsis strains were extracted from their whole-genome assemblies by blast using IMET1 sequence as a reference. Long Range PCR Kit (Takara) was employed using total genomic DNA as template to identify, confirm or bridge the gaps. Direction of single- and large-copy segments were also confirmed using PCR. Moreover the four junctions between the single-copy segments and inverted repeats were confirmed based on PCR product sequencing. Sequences from PCR products were assembled into the shotgun assemblies using CodonCode Aligner-v3.7.1 (CodonCodeCorp., USA).

Sequence annotation and analysis

The organelle genomes were firstly annotated using DOGMA [80]. Genes not detected by DOGMA were identified by Blastx (http://www.ncbi.nlm.nih.gov/BLAST) and ORF Finder (http://www.ncbi.nlm.nih.gov/gorf). Ribosomal RNA genes were identified using RNAmmer [81]. Transfer RNA genes were identified using DOGMA and tRNAscan-SE 1.21 [82], and then confirmed by ERPIN [83] and TFAM Webserver-v1.3. Short repeat sequences including direct and inverted repeats in pt genome were discovered using REPuter [84] at repeat length of at least 30 bp and with a Hamming distance of 3. Tandem repeats were detected by Tandem Repeat Finder V4.0.4. Multiple sequence alignments of pt or mt genomes were performed via MEGA-v4.1-ClustalW [85]. Full alignments with annotations were visualized with VISTA [86]. The genetic divergence represented by p-distance was calculated by MEGA-v4.1. The circular gene maps of organelle genomes were drawn by GenomeVx [87] followed by manual modification.

Phylogenetic analysis

To reconstruct whole-organelle based phylogeny, pt and mt datasets were assembled on the basis of genomes available in public databases and those newly sequenced in this study. Deduced amino acid sequences of each set of orthologous protein-coding genes were aligned using MUSCLE 3.7 (multiple sequence alignment by log-expectation) [88]. The ambiguously aligned regions in each alignment were removed and optimized using GBLOCKS 0.91b [89] with the -b2 option (minimal number of sequences for a flank position) set to 13. The concatenated protein alignments were used to infer phylogenetic trees using PhyML 2.4.4 [90] with the approximate likelihood ratio test [91]. Maximum Parsimony (MP) and Neighbor-Joining (NJ) analysis was performed with MEGA4.1 [85].

Estimation of nucleotide substitution rate

A total of 37 mt and 110 pt protein-coding sequences among the seven Nannochloropsis strains were respectively aligned with MEGA-v4.1-ClustalW, using a maximum of 1,000 iterations for alignment refinement. Nonsynonymous substitutions rate (Ka), synonymous substitutions rate (Ks) and their ratio were estimated using the yn00 program of the PAML 4.4c [92] with the codon frequencies model F3 × 4 as substitution matrix. Ka and Ks were determined by the Nei-Gojobori method as implemented in yn00.

Identification of phylogenetic markers

To mine the SNPs, the two sets (pt and mt) of genome sequences were respectively aligned with MEGA4.1-ClustalW. The SNPs were validated manually. To construct the phylogeny based on individual sequences, a total of 230 pt and mt coding and non-coding regions were employed to reconstruct phylogenetic trees by Maximum Parsimony (MP) via Phylip-v3.69 [93]. CCMP537 was assigned as the root for each of the trees. Then each of the sequence-based phylogenetic trees was individually compared with the corresponding pt or mt reference trees by Topd (TOPological Distance; [94]). The Euclidean distance of p-distance matrixes was used as the quantitative measure of the similarity between two trees (e.g. the test tree and the reference tree). Those trees consistent with reference trees were extracted to further analyze their power of discrimination.

To screen for intra-species markers for the N. oceanica strains, the organelle sequences of IMET1 and CCMP531 were aligned with MEGA4.1-ClustalW. Scatter diagram of variable-site distribution was drawn by DnaSP 4.10.7 [95], with a window length of 500 sites and a step size of 25 sites. Those sections with S-value of at least 6 were selected as highly variable regions. SNPs were validated by manual inspection and if necessary via targeted sequencing.

Accession numbers

The complete sequences of the 14 plastid and mitochondrial genomes were deposited at GenBank: KC598086 and KC568456 for IMET1, KC598085 and KC568456 for CCMP529, KC598088 and KC568458 for CCMP537, KC598089 and KC568459 for CCMP505, KC598087 and KC568460 for CCMP525, KC598084 and KC568461 for CCMP527, and KC598090 and KC568462 for CCMP531.

Notes

Declarations

Acknowledgements

This work was supported by National Basic Research Program from Ministry of Science and Technology of China (2012CB721101), International Research Collaboration Program (31010103907) from National Natural Science Foundation of China and International Innovation Partnership Program from Chinese Academy of Sciences. Correspondence and requests for materials should be addressed to J.X. (xujian@qibebt.ac.cn).

Authors’ Affiliations

(1)
BioEnergy Genome Center and Shandong Key Laboratory of Energy Genetics, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences
(2)
University of Chinese Academy of Sciences
(3)
Laboratory for Algae Research and Biotechnology, Department of Applied Biological Sciences, Arizona State University
(4)
Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science

References

  1. Tirichine L, Bowler C: Decoding algal genomes: tracing back the history of photosynthetic life on Earth. Plant J. 2011, 66 (1): 45-57. 10.1111/j.1365-313X.2011.04540.x.View ArticlePubMedGoogle Scholar
  2. Georgianna DR, Mayfield SP: Exploiting diversity and synthetic biology for the production of algal biofuels. Nature. 2012, 488 (7411): 329-335. 10.1038/nature11479.View ArticlePubMedGoogle Scholar
  3. Dyall SD, Brown MT, Johnson PJ: Ancient invasions: from endosymbionts to organelles. Science. 2004, 304 (5668): 253-257. 10.1126/science.1094884.View ArticlePubMedGoogle Scholar
  4. Bodyl A, Mackiewicz P, Stiller JW: The intracellular cyanobacteria of Paulinella chromatophora: endosymbionts or organelles?. Trends Microbiol. 2007, 15 (7): 295-296. 10.1016/j.tim.2007.05.002.View ArticlePubMedGoogle Scholar
  5. Gray MW, Burger G, Lang BF: The origin and early evolution of mitochondria. Genome Biol. 2001, 2 (6): REVIEWS1018PubMed CentralView ArticlePubMedGoogle Scholar
  6. Gray MW: Origin and evolution of organelle genomes. Curr Opin Genet Dev. 1993, 3 (6): 884-890. 10.1016/0959-437X(93)90009-E.View ArticlePubMedGoogle Scholar
  7. Timmis JN, Ayliffe MA, Huang CY, Martin W: Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat Rev Genet. 2004, 5 (2): 123-135.View ArticlePubMedGoogle Scholar
  8. Le Corguille G, Pearson G, Valente M, Viegas C, Gschloessl B, Corre E, Bailly X, Peters AF, Jubin C, Vacherie B: Plastid genomes of two brown algae, Ectocarpus siliculosus and Fucus vesiculosus: further insights on the evolution of red-algal derived plastids. BMC Evol Biol. 2009, 9: 253-10.1186/1471-2148-9-253.PubMed CentralView ArticlePubMedGoogle Scholar
  9. Hollingsworth PM, Graham SW, Little DP: Choosing and using a plant DNA barcode. Plos One. 2011, 6 (5): e19254-10.1371/journal.pone.0019254.PubMed CentralView ArticlePubMedGoogle Scholar
  10. Robba L, Russell SJ, Barker GL, Brodie J: Assessing the use of the mitochondrial cox1 marker for use in DNA barcoding of red algae (Rhodophyta). Am J Bot. 2006, 93 (8): 1101-1108. 10.3732/ajb.93.8.1101.View ArticlePubMedGoogle Scholar
  11. Saunders GW, McDevit DC: Methods for DNA barcoding photosynthetic protists emphasizing the macroalgae and diatoms. Methods Mol Biol. 2012, 858: 207-222. 10.1007/978-1-61779-591-6_10.View ArticlePubMedGoogle Scholar
  12. Lane CE, Lindstrom SC, Saunders GW: A molecular assessment of northeast Pacific Alaria species (Laminariales, Phaeophyceae) with reference to the utility of DNA barcoding. Mol Phylogenet Evol. 2007, 44 (2): 634-648. 10.1016/j.ympev.2007.03.016.View ArticlePubMedGoogle Scholar
  13. Kucera H, Saunders GW: Assigning morphological variants of Fucus (Fucales, Phaeophyceae) in Canadian waters to recognized species using DNA barcoding. Botany-Botanique. 2008, 86 (9): 1065-1079. 10.1139/B08-056.View ArticleGoogle Scholar
  14. McDevit DC, Saunders GW: On the utility of DNA barcoding for species differentiation among brown macroalgae (Phaeophyceae) including a novel extraction protocol. Phycological Res. 2009, 57 (2): 131-141. 10.1111/j.1440-1835.2009.00530.x.View ArticleGoogle Scholar
  15. McDevit DC, Saunders GW: A DNA barcode examination of the Laminariaceae (Phaeophyceae) in Canada reveals novel biogeographical and evolutionary insights. Phycologia. 2010, 49 (3): 235-248. 10.2216/PH09-36.1.View ArticleGoogle Scholar
  16. Saunders GW, Kucera H: An evaluation of rbcL, tufA, UPA, LSU and ITS as DNA barcode markers for the marine green macroalgae. Cryptogamie Algologie. 2010, 31 (4): 487-528.Google Scholar
  17. Hall JD, Fucikova K, Lo C, Lewis LA, Karol KG: An assessment of proposed DNA barcodes in freshwater green algae. Cryptogamie Algologie. 2010, 31 (4): 529-555.Google Scholar
  18. Brown JS: Functional organization of chlorophyll-alpha and carotenoids in the alga, Nannochloropsis salina. Plant Physiol. 1987, 83 (2): 434-437. 10.1104/pp.83.2.434.PubMed CentralView ArticlePubMedGoogle Scholar
  19. Pan K, Qin JJ, Li S, Dai WK, Zhu BH, Jin YC, Yu WG, Yang GP, Li DF: Nuclear monoploidy and asexual propagation of Nannochloropsis oceanica (Eustigmatophyceae) as revealed by its genome sequence. J Phycol. 2011, 47 (6): 1425-1432. 10.1111/j.1529-8817.2011.01057.x.View ArticleGoogle Scholar
  20. Radakovits R, Jinkerson RE, Fuerstenberg SI, Tae H, Settlage RE, Boore JL, Posewitz MC: Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropis gaditana. Nat Commun. 2012, 3: 686-PubMed CentralView ArticlePubMedGoogle Scholar
  21. Vieler A, Wu GX, Tsai CH, Bullard B, Cornish AJ, Harvey C, Reca IB, Thornburg C, Achawanantakun R, Buehl CJ: Genome, functional gene annotation, and nuclear transformation of the heterokont oleaginous alga Nannochloropsis oceanica CCMP1779. Plos Genetics. 2012, 8 (11): e1003064-10.1371/journal.pgen.1003064.PubMed CentralView ArticlePubMedGoogle Scholar
  22. Laurens LML, Quinn M, Van Wychen S, Templeton DW, Wolfrum EJ: Accurate and reliable quantification of total microalgal fuel potential as fatty acid methyl esters by in situ transesterification. Anal Bioanal Chem. 2012, 403 (1): 167-178. 10.1007/s00216-012-5814-0.PubMed CentralView ArticlePubMedGoogle Scholar
  23. Jinkerson RE, Radakovits R, Posewitz MC: Genomic insights from the oleaginous model alga Nannochloropsis gaditana. Bioengineered. 2013, 4 (1): 37-43. 10.4161/bioe.21880.PubMed CentralView ArticlePubMedGoogle Scholar
  24. Roleda MY, Slocombe SP, Leakey RJ, Day JG, Bell EM, Stanley MS: Effects of temperature and nutrient regimes on biomass and lipid production by six oleaginous microalgae in batch culture employing a two-phase cultivation strategy. Bioresour Technol. 2013, 129: 439-449.View ArticlePubMedGoogle Scholar
  25. Fawley KP, Fawley MW: Observations on the diversity and ecology of freshwater Nannochloropsis (Eustigmatophyceae), with descriptions of new taxa. Protist. 2007, 158 (3): 325-336. 10.1016/j.protis.2007.03.003.View ArticlePubMedGoogle Scholar
  26. Oudot-Le Secq MP, Grimwood J, Shapiro H, Armbrust EV, Bowler C, Green BR: Chloroplast genomes of the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana: comparison with other plastid genomes of the red lineage. Mol Genet Genomics. 2007, 277 (4): 427-439. 10.1007/s00438-006-0199-4.View ArticlePubMedGoogle Scholar
  27. Turmel M, Lemieux B, Lemieux C: The chloroplast genome of the green alga Chlamydomonas moewusii - localization of protein-coding genes and transcriptionally active regions. Mol Gen Genet. 1988, 214 (3): 412-419. 10.1007/BF00330474.View ArticlePubMedGoogle Scholar
  28. Oudot-Le Secq MP, Green BR: Complex repeat structures and novel features in the mitochondrial genomes of the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana. Gene. 2011, 476 (1–2): 20-26.View ArticlePubMedGoogle Scholar
  29. Danne JC, Gornik SG, MacRae JI, McConville MJ, Waller RF: Alveolate mitochondrial metabolic evolution: dinoflagellates force reassessment of the role of parasitism as a driver of change in apicomplexans. Mol Biol Evol. 2013, 30 (1): 123-139. 10.1093/molbev/mss205.View ArticlePubMedGoogle Scholar
  30. Odintsova MS, Iurina NP: Plastidic genome of higher plants and algae: structure and function. Mol Biol (Mosk). 2003, 37 (5): 768-783.View ArticleGoogle Scholar
  31. Aravind L, Anantharaman V, Zhang D, de Souza RF, Iyer LM: Gene flow and biological conflict systems in the origin and evolution of eukaryotes. Front Cell Infect Microbiol. 2012, 2: 89-PubMed CentralPubMedGoogle Scholar
  32. Hao W, Palmer JD: Fine-scale mergers of chloroplast and mitochondrial genes create functional, transcompartmentally chimeric mitochondrial genes. Proc Natl Acad Sci USA. 2009, 106 (39): 16728-16733. 10.1073/pnas.0908766106.PubMed CentralView ArticlePubMedGoogle Scholar
  33. Prihoda J, Tanaka A, de Paula WB, Allen JF, Tirichine L, Bowler C: Chloroplast-mitochondria cross-talk in diatoms. J Exp Bot. 2012, 63 (4): 1543-1557. 10.1093/jxb/err441.View ArticlePubMedGoogle Scholar
  34. Hancock L, Goff L, Lane C: Red algae lose key mitochondrial genes in response to becoming parasitic. Genome Biol Evol. 2010, 2: 897-910. 10.1093/gbe/evq075.PubMed CentralView ArticlePubMedGoogle Scholar
  35. Brouard JS, Otis C, Lemieux C, Turmel M: The chloroplast genome of the green alga Schizomeris leibleinii (Chlorophyceae) provides evidence for bidirectional DNA replication from a single origin in the chaetophorales. Genome Biol Evol. 2011, 3: 505-515. 10.1093/gbe/evr037.PubMed CentralView ArticlePubMedGoogle Scholar
  36. Turmel M, Otis C, Lemieux C: The chloroplast and mitochondrial genome sequences of the charophyte Chaetosphaeridium globosum: insights into the timing of the events that restructured organelle DNAs within the green algal lineage that led to land plants. Proc Natl Acad Sci USA. 2002, 99 (17): 11275-11280. 10.1073/pnas.162203299.PubMed CentralView ArticlePubMedGoogle Scholar
  37. Pombert JF, Otis C, Lemieux C, Turmel M: The chloroplast genome sequence of the green alga Pseudendoclonium akinetum (Ulvophyceae) reveals unusual structural features and new insights into the branching order of chlorophyte lineages. Mol Biol Evol. 2005, 22 (9): 1903-1918. 10.1093/molbev/msi182.View ArticlePubMedGoogle Scholar
  38. Turmel M, Otis C, Lemieux C: The chloroplast genomes of the green algae Pedinomonas minor, Parachlorella kessleri, and Oocystis solitatia reveal a shared ancestry between the Pedinomonadales and Chlorellales. Mol Biol Evol. 2009, 26 (10): 2317-2331. 10.1093/molbev/msp138.View ArticlePubMedGoogle Scholar
  39. Pombert JF, Lemieux C, Turmel M: The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes. BMC Biol. 2006, 4: 3-10.1186/1741-7007-4-3.PubMed CentralView ArticlePubMedGoogle Scholar
  40. Glockner G, Rosenthal A, Valentin K: The structure and gene repertoire of an ancient red algal plastid genome. J Mol Evol. 2000, 51 (4): 382-390.PubMedGoogle Scholar
  41. Moustafa A, Beszteri B, Maier UG, Bowler C, Valentin K, Bhattacharya D: Genomic footprints of a cryptic plastid endosymbiosis in diatoms. Science. 2009, 324 (5935): 1724-1726. 10.1126/science.1172983.View ArticlePubMedGoogle Scholar
  42. Li SL, Nosenko T, Hackett JD, Bhattacharya D: Phylogenomic analysis identifies red algal genes of endosymbiotic origin in the chromalveolates. Mol Biol Evol. 2006, 23 (3): 663-674.View ArticlePubMedGoogle Scholar
  43. Sanchez Puerta MV, Bachvaroff TR, Delwiche CF: The complete plastid genome sequence of the haptophyte Emiliania huxleyi: a comparison to other plastid genomes. DNA Res. 2005, 12 (2): 151-156. 10.1093/dnares/12.2.151.View ArticlePubMedGoogle Scholar
  44. Janouskovec J, Horak A, Obornik M, Lukes J, Keeling PJ: A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc Natl Acad Sci U S A. 2010, 107 (24): 10949-10954. 10.1073/pnas.1003335107.PubMed CentralView ArticlePubMedGoogle Scholar
  45. Khan H, Parks N, Kozera C, Curtis BA, Parsons BJ, Bowman S, Archibald JM: Plastid genome sequence of the cryptophyte alga Rhodomonas salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol Biol Evol. 2007, 24 (8): 1832-1842. 10.1093/molbev/msm101.View ArticlePubMedGoogle Scholar
  46. Donaher N, Tanifuji G, Onodera NT, Malfatti SA, Chain PSG, Hara Y, Archibald JM: The complete plastid genome sequence of the secondarily nonphotosynthetic alga Cryptomonas paramecium: reduction, compaction, and accelerated evolutionary rate. Genome Biol Evol. 2009, 1: 439-448.PubMed CentralView ArticlePubMedGoogle Scholar
  47. Linne von berg KH, Kowallik KV: Structural organization of the chloroplast genome of the chromophytic alga Vaucheria bursata. Plant Mol Biol. 1992, 18 (1): 83-95. 10.1007/BF00018459.View ArticlePubMedGoogle Scholar
  48. Cattolico RA, Jacobs MA, Zhou Y, Chang J, Duplessis M, Lybrand T, Mckay J, Ong HC, Sims E, Rocap G: Chloroplast genome sequencing analysis of Heterosigma akashiwo CCMP452 (West Atlantic) and NIES293 (West Pacific) strains. BMC Genomics. 2008, 9: 211-10.1186/1471-2164-9-211.PubMed CentralView ArticlePubMedGoogle Scholar
  49. Minoda A, Weber APM, Tanaka K, Miyagishima S: Nucleus-independent control of the Rubisco operon by the plastid-encoded transcription factor Ycf30 in the red alga Cyanidioschyzon merolae. Plant Physiol. 2010, 154 (3): 1532-1540. 10.1104/pp.110.163188.PubMed CentralView ArticlePubMedGoogle Scholar
  50. Lamour KH, Mudge J, Gobena D, Hurtado-Gonzales OP, Schmutz J, Kuo A, Miller NA, Rice BJ, Raffaele S, Cano LM: Genome sequencing and mapping reveal loss of heterozygosity as a mechanism for rapid adaptation in the vegetable pathogen Phytophthora capsici. Mol Plant Microbe Interact. 2012, 25 (10): 1350-1360. 10.1094/MPMI-02-12-0028-R.PubMed CentralView ArticlePubMedGoogle Scholar
  51. Grayburn WS, Hudspeth DSS, Gane MK, Hudspeth MES: The mitochondrial genome of Saprolegnia ferax: organization, gene content and nucleotide sequence. Mycologia. 2004, 96 (5): 981-989. 10.2307/3762082.View ArticlePubMedGoogle Scholar
  52. Selosse MA, Albert BR, Godelle B: Reducing the genome size of organelles favours gene transfer to the nucleus. Trends Ecol Evol. 2001, 16 (3): 135-141. 10.1016/S0169-5347(00)02084-X.View ArticlePubMedGoogle Scholar
  53. Tardif M, Atteia A, Specht M, Cogne G, Rolland N, Brugiere S, Hippler M, Ferro M, Bruley C, Peltier G: PredAlgo: a new subcellular localization prediction tool dedicated to green algae. Mol Biol Evol. 2012, 29 (12): 3625-3639. 10.1093/molbev/mss178.View ArticlePubMedGoogle Scholar
  54. Mach J: Chloroplast RNA: editing by pentatricopeptide repeat proteins. Plant Cell. 2009, 21 (1): 17-10.1105/tpc.109.210114.PubMed CentralView ArticlePubMedGoogle Scholar
  55. Zehrmann A, Verbitskiy D, Hartel B, Brennicke A, Takenaka M: PPR proteins network as site-specific RNA editing factors in plant organelles. RNA Biol. 2011, 8 (1): 67-70. 10.4161/rna.8.1.14298.View ArticlePubMedGoogle Scholar
  56. Drouin G, Daoud H, Xia J: Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 2008, 49 (3): 827-831. 10.1016/j.ympev.2008.09.009.View ArticlePubMedGoogle Scholar
  57. Goulding SE, Olmstead RG, Morden CW, Wolfe KH: Ebb and flow of the chloroplast inverted repeat. Mol Gen Genet. 1996, 252 (1–2): 195-206.View ArticlePubMedGoogle Scholar
  58. Lommer M, Roy AS, Schilhabel M, Schreiber S, Rosenstiel P, LaRoche J: Recent transfer of an iron-regulated gene from the plastid to the nuclear genome in an oceanic diatom adapted to chronic iron limitation. BMC Genomics. 2010, 11: 718-10.1186/1471-2164-11-718.PubMed CentralView ArticlePubMedGoogle Scholar
  59. Kim JS, Jung JD, Lee JA, Park HW, Oh KH, Jeong WJ, Choi DW, Liu JR, Cho KY: Complete sequence and organization of the cucumber (Cucumis sativus L. cv. Baekmibaekdadagi) chloroplast genome. Plant Cell Rep. 2006, 25 (4): 334-340. 10.1007/s00299-005-0097-y.View ArticlePubMedGoogle Scholar
  60. Rodriguez-Moreno L, Gonzalez VM, Benjak A, Marti MC, Puigdomenech P, Aranda MA, Garcia-Mas J: Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin. BMC Genomics. 2011, 12: 424-10.1186/1471-2164-12-424.PubMed CentralView ArticlePubMedGoogle Scholar
  61. Strauss SH, Palmer JD, Howe GT, Doerksen AH: Chloroplast genomes of two conifers lack a large inverted repeat and are extensively rearranged. Proc Natl Acad Sci USA. 1988, 85 (11): 3898-3902. 10.1073/pnas.85.11.3898.PubMed CentralView ArticlePubMedGoogle Scholar
  62. Lin CP, Wu CS, Huang YY, Chaw SM: The complete chloroplast genome of Ginkgo biloba reveals the mechanism of inverted repeat contraction. Genome Biol Evol. 2012, 4 (3): 374-381. 10.1093/gbe/evs021.PubMed CentralView ArticlePubMedGoogle Scholar
  63. Wu CS, Wang YN, Hsu CY, Lin CP, Chaw SM: Loss of different inverted repeat copies from the chloroplast genomes of Pinaceae and cupressophytes and influence of heterotachy on the evaluation of gymnosperm phylogeny. Genome Biol Evol. 2011, 3: 1284-1295. 10.1093/gbe/evr095.PubMed CentralView ArticlePubMedGoogle Scholar
  64. Wang RJ, Cheng CL, Chang CC, Wu CL, Su TM, Chaw SM: Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evol Biol. 2008, 8: 36-10.1186/1471-2148-8-36.PubMed CentralView ArticlePubMedGoogle Scholar
  65. Ravin NV, Galachyants YP, Mardanov AV, Beletsky AV, Petrova DP, Sherbakova TA, Zakharova YR, Likhoshway YV, Skryabin KG, Grachev MA: Complete sequence of the mitochondrial genome of a diatom alga Synedra acus and comparative analysis of diatom mitochondrial genomes. Curr Genet. 2010, 56 (3): 215-223. 10.1007/s00294-010-0293-3.View ArticlePubMedGoogle Scholar
  66. Oudot-Le Secq MP, Loiseaux-de Goer S, Stam WT, Olsen JL: Complete mitochondrial genomes of the three brown algae (Heterokonta: Phaeophyceae) Dictyota dichotoma, Fucus vesiculosus and Desmarestia viridis. Curr Genet. 2006, 49 (1): 47-58. 10.1007/s00294-005-0031-4.View ArticlePubMedGoogle Scholar
  67. Bruni I, De Mattia F, Martellos S, Galimberti A, Savadori P, Casiraghi M, Nimis PL, Labra M: DNA barcoding as an effective tool in improving a digital plant identification system: a case study for the area of Mt. Valerio, Trieste (NE Italy). Plos One. 2012, 7 (9): e43256-10.1371/journal.pone.0043256.PubMed CentralView ArticlePubMedGoogle Scholar
  68. Li DZ, Gao LM, Li HT, Wang H, Ge XJ, Liu JQ, Chen ZD, Zhou SL, Chen SL, Yang JB: Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proc Natl Acad Sci USA. 2011, 108 (49): 19641-19646.PubMed CentralView ArticlePubMedGoogle Scholar
  69. Moniz MBJ, Kaczmarska I: Barcoding of diatoms: nuclear encoded ITS revisited. Protist. 2010, 161 (1): 7-34. 10.1016/j.protis.2009.07.001.View ArticlePubMedGoogle Scholar
  70. Wolfe KH, Li WH, Sharp PM: Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci USA. 1987, 84 (24): 9054-9058. 10.1073/pnas.84.24.9054.PubMed CentralView ArticlePubMedGoogle Scholar
  71. Birky CW: Uniparental inheritance of mitochondrial and chloroplast genes - mechanisms and evolution. Proc Natl Acad Sci USA. 1995, 92 (25): 11331-11338. 10.1073/pnas.92.25.11331.PubMed CentralView ArticlePubMedGoogle Scholar
  72. Kuntal H, Sharma V, Daniell H: Microsatellite analysis in organelle genomes of Chlorophyta. Bioinformation. 2012, 8 (6): 255-259. 10.6026/97320630008255.PubMed CentralView ArticlePubMedGoogle Scholar
  73. Verma D, Daniell H: Chloroplast vector systems for biotechnology applications. Plant Physiol. 2007, 145 (4): 1129-1143. 10.1104/pp.107.106690.PubMed CentralView ArticlePubMedGoogle Scholar
  74. Wang D, Lu Y, Huang H, Xu J: Establishing oleaginous microalgae research models for consolidated bioprocessing of solar energy. Adv Biochem Eng Biotechnol. 2012, 128: 69-84.PubMedGoogle Scholar
  75. Varela-Alvarez E, Andreakis N, Lago-Leston A, Pearson GA, Serrao EA, Procaccini G, Duarte CM, Marba N: Genomic DNA isolation from green and brown algae (Caulerpales and Fucales) for microsatellite library construction. J Phycol. 2006, 42 (3): 741-745. 10.1111/j.1529-8817.2006.00218.x.View ArticleGoogle Scholar
  76. Li RQ, Li YR, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics. 2008, 24 (5): 713-714. 10.1093/bioinformatics/btn025.View ArticlePubMedGoogle Scholar
  77. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8 (3): 175-185. 10.1101/gr.8.3.175.View ArticlePubMedGoogle Scholar
  78. Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.View ArticlePubMedGoogle Scholar
  79. Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8 (3): 195-202. 10.1101/gr.8.3.195.View ArticlePubMedGoogle Scholar
  80. Wyman SK, Jansen RK, Boore JL: Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 2004, 20 (17): 3252-3255. 10.1093/bioinformatics/bth352.View ArticlePubMedGoogle Scholar
  81. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW: RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007, 35 (9): 3100-3108. 10.1093/nar/gkm160.PubMed CentralView ArticlePubMedGoogle Scholar
  82. Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964.PubMed CentralView ArticlePubMedGoogle Scholar
  83. Gautheret D, Lambert A: Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol. 2001, 313 (5): 1003-1011. 10.1006/jmbi.2001.5102.View ArticlePubMedGoogle Scholar
  84. Kurtz S, Schleiermacher C: REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics. 1999, 15 (5): 426-427. 10.1093/bioinformatics/15.5.426.View ArticlePubMedGoogle Scholar
  85. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
  86. Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I: VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004, 32: W273-W279. 10.1093/nar/gkh458.PubMed CentralView ArticlePubMedGoogle Scholar
  87. Conant GC, Wolfe KH: GenomeVx: simple web-based creation of editable circular chromosome maps. Bioinformatics. 2008, 24 (6): 861-862. 10.1093/bioinformatics/btm598.View ArticlePubMedGoogle Scholar
  88. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
  89. Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17 (4): 540-552. 10.1093/oxfordjournals.molbev.a026334.View ArticlePubMedGoogle Scholar
  90. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520.View ArticlePubMedGoogle Scholar
  91. Anisimova M, Gascuel O: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol. 2006, 55 (4): 539-552. 10.1080/10635150600755453.View ArticlePubMedGoogle Scholar
  92. Yang ZH: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13 (5): 555-556.PubMedGoogle Scholar
  93. Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
  94. Puigbo P, Garcia-Vallve S, McInerney JO: TOPD/FMTS: a new software to compare phylogenetic trees. Bioinformatics. 2007, 23 (12): 1556-1558. 10.1093/bioinformatics/btm135.View ArticlePubMedGoogle Scholar
  95. Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19 (18): 2496-2497. 10.1093/bioinformatics/btg359.View ArticlePubMedGoogle Scholar

Copyright

© Wei et al.; licensee BioMed Central Ltd. 2013

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.