Transcriptome-wide identification of optimal reference genes for expression analysis of Pyropia yezoensis responses to abiotic stress

Background Pyropia yezoensis, a marine red alga, is an ideal research model for studying the mechanisms of abiotic stress tolerance in intertidal seaweed. Real-time quantitative polymerase chain reaction (RT-qPCR) is the most commonly used method to analyze gene expression levels. To accurately quantify gene expression, selection and validation of stable reference genes is required. Results We used transcriptome profiling data from different abiotic stress treatments to identify six genes with relatively stable expression levels: MAP, ATPase, CGS1, PPK, DPE2, and FHP. These six genes and three conventional reference genes, UBC, EF1-α, and eif4A, were chosen as candidates for optimal reference gene selection. Five common statistical approaches (geNorm, ΔCt method, NormFinder, BestKeeper, and ReFinder) were used to identify the stability of each reference gene. Our results show that: MAP, UBC, and FHP are stably expressed in all analyzed conditions; CGS1 and UBC are stably expressed under conditions of dehydration stress; and MAP, UBC, and CGS1 are stably expressed under conditions of temperature stress. Conclusion We have identified appropriate reference genes for RT-qPCR in P. yezoensis under different abiotic stress conditions which will facilitate studies of gene expression under these conditions. Electronic supplementary material The online version of this article (10.1186/s12864-018-4643-8) contains supplementary material, which is available to authorized users.


Background
Pyropia yezoensis (Ueda), previously known as Prophyra yezoensis [1], is a seaweed of economic importance. The gametophyte of this species has been widely cultivated and harvested in East Asia. P. yezoensis is an important seafood with annual production of over 1,100,000 t (in fresh weight) and an annual value of approximately US $1.5 billion (http://www.fao.org/fishery/statistics/en). P. yezoensis thrives in the upper intertidal zone. This is a harsh niche and during daily low tides, P. yezoensis is routinely exposed to high levels of light, dehydration, and extreme fluctuations in temperature and osmotic pressure due to the seawater to air transition. Blades can tolerate dehydration with water loss of up to 85%, but are metabolically active immediately upon rehydration [2]. These features make P. yezoensis an ideal model for studying the molecular mechanisms of intertidal seaweed stress-tolerance.
Precise quantification of expression fluctuations in genes involved in abiotic stress will further our understanding of stress-tolerance mechanisms in specific algae. RT-qPCR is widely used to assess gene expression and allows rapid and reliable quantification of transcripts expressed in low levels [3,4]. However, for accurate quantification of transcripts in different spatial-temporal conditions, the crucial first step involves the selection of optimal reference genes. Housekeeping genes including actin (ACT), ubiquitinbinding protein (UBC), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), translation initiation factor 4A (eif4A), elongation factor 1-α (EF1-α), and α-tubulin (TUA) are thought to be appropriate reference genes for the normalization of gene expression [5][6][7]. However, recent investigations have shown that these housekeeping genes may not be suitable for normalization of gene expression in all development stages or environmental conditions [8][9][10]. Moreover, it is difficult to completely normalize gene expression data from all types of samples using any single gene [11,12]. Therefore, it is recommended that multiple reference genes be used to improve the reliability and accuracy of RT-qPCR results.
With the development of high-throughput sequencing technologies, RNA-sequencing (RNA-seq) provides a means to profile spatio-temporal transcriptomes [13,14]. The rapid accumulation of transcriptome datasets presents a new strategy for the identification of novel sets of reference genes. For example, hundreds of candidates were mined from Lycoris aurea transcriptome datasets, and 14 were selected for further qPCR analysis after various abiotic stresses and from different tissues [15]. A similar approach was used to identify candidate reference genes from Oxytropis ochrocephala Bunge transcriptome datasets [16]. Twelve candidate genes were identified, and qPCR was used to analyze their expression levels following exposure to a range of abiotic stress conditions.
In P. yezoensis, the expression of seven housekeeping genes was quantified at different life-history stages and GAPDH was recommended as a potential internal control for gene expression studies [17]. In 2015, Kong validated the expression stability of six traditional housekeeping genes and recommended ACT3, eIF4A, and EF1-α as optimal P. yezoensis reference genes under conditions of stress [18]. Recently, accumulated P. yezoensis transcriptome data [19][20][21] has enabled us to identify sets of optimal reference genes. In this study, nine candidate reference genes were chosen for further analysis and optimal reference gene selection. These genes include three conventional reference genes (UBC, EF1-α, and eIF4A) and six genes (MAP, ATPase, CGS1, PPK, DPE2, and FHP) with relatively stable expression levels in transcriptome profiling data obtained under different abiotic stress conditions. Further, to validate the effectiveness of the selected reference genes, the expression levels of the Δ9 fatty acid desaturase (PyOLE-1) and oxygen-evolving enhancer protein 1 (PyOEE-1) target genes were quantified and compared with transcriptome profiling data.
These two genes, PyOLE-1 and PyOEE-1, represent stress-responsive genes in algae. Under chilling and freezing temperature stress, PyOLE-1, which encodes a fatty acid desaturase, is up-regulated, to increase membrane fluidity [19]. PyOEE-1 is a component of the oxygen evolving complex of photosystem II (PSII) and is, which were slowly down-regulated under drought stress in red algae [22].
Additionally, the best and worst reference genes (selected from P. yezoensis RZ58) were used to normalize the expression levels of these two target genes in two other genotypes of P. yezoensis, including a genetically pure line (P. yezoensis S21) and a cultural line (PyC-1), in order to demonstrate the applicability of our results within this species.

Plant materials and treatments
Pyropia yezoensis RZ58 is a genetically pure line established by clonal cultivation of an isolated single somatic cell and self-fertilization in the laboratory. Fresh leafy of RZ58 gametophytes were cultured in bubbling natural seawater with Provasoli's enrichment solution medium under 50 μmol photons m − 2 s − 1 at 8 ± 1°C and a 12:12 light:dark (L:D) photoperiod. Then the healthy gametophytes with sizes from 8 to 15 cm were treated with dehydration, rehydration, and cold and heat stress respectively. The same treatments were performed in S21 and PyC-1 too.
Gametophytes were subjected to dehydration and rehydration by exposing them to the air and then transferring them back to seawater. Algal samples under normal conditions were harvested before the dehydration treatment. Algal samples were also collected when the algae reached water loss levels of 20 ± 5%, 50 ± 5% and 70 ± 5% respectively. For rehydration, severely dehydrated algae were transferred back to normal conditions, and samples were collected after 0.5 h. Water loss was determined according to the method of Kim et al. [23]. All treatments were performed at 8 ± 1°C and 50 μmol photons m − 2 s − 1 .
For subjecting gametophytes to various temperature conditions, four temperature treatments were used: normal temperature (8 ± 1°C), high temperature (24 ± 1°C), chilling stress (0 ± 1°C) and freezing temperature (− 8 ± 1°C). Temperature stress was detected according to Sun et al. [19]. Three biological replicates were collected for each treatment and control, frozen in liquid nitrogen, and stored at − 80°C prior to RNA isolation.

RNA isolation and cDNA synthesis
Total RNA was extracted from samples using the Plant RNA Kit (Omega, USA). To eliminate DNA contamination, total RNA was digested with DNase I (Omega, USA) and purified according to the manufacturer's protocol. RNA integrity was evaluated by 1% (w/v) agarose gel electrophoresis, and RNA concentration and purity was determined with a NanoDrop 2000 Spectrophotometer (NanoDrop Technologies, Thermo Scientific, USA). RNA samples with concentrations above 150 ng/μl and an A 260 /A 280 ratio of 1.8-2.0 were used for cDNA synthesis.
An RNA aliquot of 1 μg was used for cDNA synthesis with PrimerScript™ RT regent Kit (TaKaRa Bio Inc., Dalian, China) according to the manufacturer's protocol. The cDNA was diluted 10-fold with nuclease-free water for RT-qPCR.

Selection of candidate reference genes and primer design
Due to a lack of P. yezoensis genome information, we generated a transcriptome for this species. Transcriptome sequencing of P. yezoensis (RZ58) was performed using Illumina paired-end sequencing technology on an Illumina Hi-Seq™ 2000 platform under the five treatments (control, dehydration, rehydration, cold, and heat). After assembly and annotation, expression profile data for each treatment was mapped to the transcriptome. The read counts of unigenes from different stress treatments were converted into fragments per kilobase of exon model per million mapped reads (FPKM values) using the RNA-Seq by Expectation Maximization software package [24].
The expression stability of each of the analyzed genes was calculated using the Pattern Gene Finder (PaGeFinder). PaGeFinder is a web-based server for the on-line detection of gene expression patterns from serial transcriptomic data generated by high-throughput technologies like microarray or next-generation sequencing. The dispersion measure (DPM) was introduced and implemented in PaGeFinder to evaluate the variability and degree of diversity of gene expression profiles. Most stable genes exhibit lower DPM values [25].
The transcriptome was screened for genes with credible protein annotation (Nr databases), appropriate expression levels (FPKM> 10), and a low dispersion measure (DPM ≤ 0.3) [25]; genes that met these criteria were deemed candidate reference genes (Table 1). Additionally, three commonly used reference genes, UBC, EF1-α, and eIF4A, were selected from the P. yezoensis transcriptome based on a previous study by Kong [18].
Specific primers were designed using Primer5 software based on the sequences of these unigenes ( Table 2). The criteria for primer design were as follows: primer lengths of 17-24 bp, GC content of 50-66%, melting temperature of 58-61°C, and amplicon lengths of 100-200 bp.

RT-qPCR analysis
RT-qPCR was conducted in 96-well plates in a LightCycler 480 (Roche Molecular Biochemicals, Mannheim, Germany). The reaction mix contained 2 μl diluted cDNA, 10 μl LightCycler®480 SYBR Green I Master (Roche, Germany), 0.6 μl of each primer, and ddH 2 O in a final volume of 20 μL. Three biological replicates were performed for each treatment. Three technical replicates of each biological replicate as well as a no-template control were also performed. RT-qPCR cycling parameters were as follows: 95°C for 5 min, followed by 45 cycles of 95°C for 10 s, 58°C for 10 s, and 72°C for 20 s. To confirm the specificity of each primer, a melting-curve analysis was included from 65°C to 95°C. The mean amplification efficiency of each primer pair was checked by the LightCycle®480 gene scanning software (version 1.5).

Data analysis
The three most commonly used software tools, geNorm, NormFinder, and BestKeeper, were used in conjunction with the comparative ΔCt method to calculate and identify the expression level stability of each candidate reference gene. The geNorm algorithm [26] calculates the expression stability value (M-value) and pairwise variation (Vn/n + 1) for all candidate genes. Lower M-values reflect a greater level of gene expression stability. The Vn/n + 1 value determines the optimal number of reference genes for accurate normalization. A cut-off value of Vn/n + 1 < 0.15 indicates that an additional reference genes make no significant contribution to the normalization.
The NormFinder program [27] is based on an ANOVA model, and calculates a stability value (SV) for evaluating expression variation when using reference genes for normalization with a lower SV indicating higher stability.
BestKeeper [28] is an Excel-based tool that uses pairwise correlations. BestKeeper calculates three variables for the expression level of all candidate genes: the standard deviation, coefficient of correlation (r), and coefficient of variance (CV). The BestKeeper index is established based on the combination of the mean of Ct values for each sample across all candidate genes. Subsequently, each candidate gene is tested in a pair-wise manner via Pearson correlation coefficients, the coefficient of determination (r 2 ), and the P-value. The most stable gene exhibits the lowest CV ± SD (standard deviation) value, and genes with an SD value greater than one are deemed unacceptable and should be excluded.
The comparative ΔCt method depends on pairwise comparisons [29], which calculate the mean and SD of each pair candidate genes and the average SD of each gene. The gene with lowest average SD is considered the most stable reference gene.

Comprehensively ranking the candidate genes
A comprehensive stability value for each gene was produced using ReFinder (http://150.216.56.64/referencegene.php) [30] based on the four computational programs, NormFinder, BestKeeper, GeNorm, and comparative ΔCt. The Ct value of each gene was input directly and the geometric mean of each gene was calculated to arrive at its overall final ranking. A lower geometric mean of ranking value indicates more stable expression.

Experimental validation of the reference genes
The expression patterns of the two target genes PyOLE-1 and PyOEE-1 were analyzed using the most and least stable reference gene sets after normalization across two experimental sets temperature stress (for PyOLE-1) and drought stress (for PyOEE-1). To validate the results, the expression levels of the target genes based on RT-qPCR were compared with the FPKM values derived from the RNA-seq data for each sample. Moreover, the relative expression levels of each target gene were compared using a single reference gene as well as the most stable reference genes to determine whether the inclusion of multiple reference genes improves the reliability and accuracy of RT-qPCR results. Finally, we also quantified the expression patterns of these target genes in two another genotypes of P. yezoensis (S21 and PyC-1) using the most and least stable reference gene sets.

Global transcriptome assembly and function annotation
A total of 1.72 × 10 7 quality paired-end reads were obtained after filtering out low-quality data (tags containing adaptors). The GC content of the transcriptome was 67.74%. After assembly and annotation, a total of 19,643 unigenes with a mean length of 779.5 bp and an N50 value of 1149 bp were obtained. To assign accurate annotation information to all unigenes, the NCBI nonredundant protein (Nr) database was interrogated, and a total of 13,160 unigenes (67%) were annotated.
Selection of candidate reference genes in P. yezoensis and specificity and efficiency of PCR amplification As shown in Additional file 1: Figure S1, the expression stabilities of all transcripts were evaluated by PaGeFinder the results showed that only 2060 unigenes (326 unigenes marked by asterisk whose DPM values were lower than 0.2 and all of novel reference genes derived from them; the DPM value of 1734 unigenes were located in the range from 0.2 to 0.3) can be used to further reference genes selection (DPM < 0.3). Then, we removed some transcripts which did not have a credible function annotation or whose expression level is too low (FPKM< 10) [15,16]. Finally, 865 unigenes were retained.
MAP, ATPase, CGS1, PPK, DPE2, and FHP were selected from these unigenes based on the ranked order of the DPM values from smallest to largest. Three commonly used reference genes, UBC, EF1-α, and eif4A, were selected from the transcriptome directly [19].
Primers were designed for each of the nine genes and their specificities were confirmed by agarose gel electrophoresis, and melting curves analysis, which showed single amplicon of the expected size and single peak melting curve (Additional file 2: Figure S2). Meanwhile, we also sequenced all PCR products to ensure that only the intended target was being amplified (Table 2). RT-qPCR products ranged from 137 to 213 bp and the mean PCR efficiency for each gene ranged from 1.946 to 2.014.

Cq values of candidate reference genes
RNA transcript levels of the nine reference genes were assessed in conditions of dehydration and temperature stress. The raw Cq values for the reference genes are shown in Additional file 3: Figure S3.
According to the summary showed in Additional file 3: Figure S3A, the raw Cq values of dehydration samples were between 18 Figure S3B). Of the nine candidate reference genes, CGS1, PPK, and DPE2 were observed to have the lowest expression levels with Ct values of 25.84, 25.36, and 28.56, respectively. Ct values for ATPase, UBC, and EF1-α were 21.54, 20.00, and 21.12, respectively, indicating that transcripts of these genes were abundant in samples under temperature stress. The least variable reference genes were MAP and ATPase with SD values of 0.29 and 0.47, respectively. Conversely, DPE2 and eif4A were the most variable genes with SD values of 1.30 and 1. 53, respectively.

Expression stability of candidate reference genes
To identify optimal reference genes for the experimental conditions used, four statistical approaches were employed. The M-values of nine reference genes were calculated and the stability of each candidate reference gene was ranked by the M-value calculated using geNorm. Genes with the lowest M-values are considered to have the most stable expression with an M-value less than 0.5 denoting stable gene expression [31]. geNorm analysis showed that CGS1 and FHP shared the lowest M-value of 0.219, and were regarded as the best reference genes for dehydration stress (Additional file 4: Figure S4A). Under conditions of temperature stress, MAP and ATPase were the reference genes with the greatest expression stability (Additional file 4: Figure S4B). When considering all treatments, the M-based ranking of the reference genes examined, from most (lowest M value) to least stable (highest M value), was: MAP, UBC, FHP, EF1-α, CGS1, ATPase, PPK, eifA, and DPE2 (Table 3 and Additional file 4: Figure S4C).
In the dehydration stress subset, the V2/3 value was 0. 115, suggesting that two reference genes should be used for normalization. In the temperature stress subset, the V3/4 value was lower than 0.15 indicating that only three reference genes were necessary. Additionally, when all samples were considered, the pairwise variation V3/4 value was the lowest (0.153) but still above 0.15 (Additional file 5: Figure S5 and Additional file 6: Table S1).
Based on normalization factor calculation, NormFinder ranked the candidate reference genes according to their minimal combined inter-and intra-treatment expression variation. According to the stability value calculated with the NormFinder algorithm, UBC, CGS1, and FHP were the most reliable references genes for dehydration treatments, and UBC, MAP, and CGS1 were the optimal reference genes for conditions of heat and cold stress. When both dehydration and temperature treatments were considered together, the three most reliable reference genes were UBC, MAP and EF1-α with stability values of 0.254, 0.373 and 0.406 respectively (Table 4).
BestKeeper analysis determined stable reference gene candidates based on the Ct values of each gene, SD, and CV. Genes with a SD greater than one are considered unstable. Under conditions of dehydration stress CGS1 (0.63 ± 0.16) and FHP (0.67 ± 0.16) were the most stable genes. While under conditions of temperature stress, MAP, ATPase, and UBC were the most suitable reference genes. Combining all abiotic stress treatment conditions revealed that UBC, MAP, and FHP had CV ± SD values of 1.68 ± 0.43, 2.08 ± 0.41, and 2.04 ± 0.49 respectively, and were regarded as the most appropriate reference genes for normalization (Table 5).
By comparing the differential expression of 'gene pairs' , the ΔCt method identifies stably co-expressed gene pairs when the ΔCt value of two genes remains constant across different samples [32]. The boxplot of ΔCt values for each 'gene pair' is shown in Additional file 7: Figure S6. Table 6, the results showed that UBC (mean SD = 0.948), MAP (mean SD =1.009), and FHP (mean SD = 1.048) were the most reliable reference genes under all analyzed conditions;

Comprehensive stability analysis of reference genes
ReFinder integrates the four statistical approaches used to compare and rank the candidate reference genes. In all abiotic stress treatments, ReFinder ranked the candidate reference genes from the highest to the lowest stability as: UBC > MAP > FHP > EF1-α > CGS1 > ATPase > eif4A > DPE2 (Table 7). Under conditions of temperature stress, MAP, UBC, and CGS1 were the three most stable reference genes analyzed, while under dehydration stress the most stable reference genes were CGS1, followed by UBC and FHP. The overall ranking showed that UBC and MAP were the most reliable reference genes in all different abiotic stress conditions, while DPE2 and eif4A were the least reliable.

Reference genes validation
Under conditions of temperature stress, at 24°C, 0°C, and − 8°C, PyOLE-1 expression was up-regulated 1.50fold, 4.05-fold, and 10.28-fold respectively, when using the most stable reference genes (MAP, UBC, and CGS1). Using the least stable reference genes, PPK, DPE2, and eif4A, resulted in overestimation of PyOLE-1 expression was overestimated at 5.97-fold, 18.78-fold, and 39.76fold for conditions of 24°C, 0°C, and − 8°C, respectively (Fig. 1a, b, c). Similarly, under conditions of dehydration, with water loss rates of 20%, 50%, and 70%, PyOEE-1 expression was down-regulated 0.81-fold, 0.96-fold, and 0. 82-fold, respectively, when normalized using the two stable genes (CGS1 and UBC). In contrast, the expression levels of PyOEE-1 were up-regulated 3.80fold, 7.72-fold, and 7.63-fold respectively, when the least stable reference genes (ATPase and DPE2) were used (Fig. 2a, b, c). Next, we compared these RT-qPCR results with those derived from the RNA-seq-based expression profiling. As shown in Fig. 3, the relative expression levels of PyOLE-1 quantified by the best reference genes were more consistent with the RNA-seq-based expression patterns of PyOLE-1 under temperature stress (upregulated 1.03 fold, 4.19 fold and 6.50 fold). Under drought conditions, the RT-qPCR results, when normalized by the best reference genes, were also more similar to the RNA-seq-based results (down-regulated 0.63 fold, 0.60 fold and 0.67 fold). Furthermore, we compared the difference between the relative expression levels when using a single reference gene and those obtained using the best reference genes. Under conditions of temperature stress, similar PyOLE-1 expression levels were observed when either a single reference gene or multiple reference genes were used (Fig. 4). At 24°C, 0°C and − 8°C, when only MAP was used as a reference gene, the relative expression of PyOLE-1 was upregulated by 1.17-fold, 4.21-fold and 10.92-fold respectively. Similarly, PyOLE-1 was up-regulated by 1.56-fold, 4.05-fold and 10.28-fold respectively, when several stable reference genes (MAP, UBC and CGS1) were employed to calculate the relative expression of PyOLE-1. Under conditions of 20%, 50% and 70% dehydration, however, the PyOEE-1 expression levels were down regulated by 0.72-fold, up-regulated 1.36-fold, and 1.21-fold when  using a single reference gene (CGS1), while relative expression levels of PyOEE-1 were down-regulated: 0.81fold, 0.96-fold and 0.82-fold respectively, when CGS1 and UBC were both used as reference genes (Fig. 5). Additionally, to confirm whether our reference genes could be applied to other experimental models of P. yezoensis under the same abiotic stress conditions, two other genotypes of this species (S21 and PyC-1) were treated with the same temperature and drought stress like conditions as RZ58. Then, we used the most and lest stable reference genes (MAP, UBC, CGS1, PPK, DPE2 and eif4A for temperature conditions and CGS1, UBC, DPE2 and ATPase for dehydration conditions) to normalize the relative expression levels of PyOLE-1 and PyOEE-1. For temperature stress, the similar with RZ58, relative expression levels of PyOLE-1 were observed in S21 and PyC-1 (Fig. 1d, e, f and g, h, i), when the best reference genes were used for quantification. In S21, PyOLE-1 was up regulated: 1.58-fold, 5.25-fold and 11.74-fold at 24°C , 0°C, and − 8°C, respectively, while, in PyC-1, PyOLE-1 was up-regualted 0.95-fold, 3.03-fold and 7.45-fold, respectively. In contrast, the expression level of PyOLE-1 was overestimated by 2.03-fold, 15.58-fold and 34.62- Fig. 1 Normalized expression level of PyOLE-1 gene in different genotypes in temperature stress. (a, b, c) Relative quantification of PyOLE-1 gene expression using the best stable reference genes (MAP, UBC and CGS1) and the least stable genes (PPK, DPE2 and eif4A) under temperature stress in RZ58. d, e, f Relative quantification of PyOLE-1 gene expression using the best stable reference genes (MAP, UBC and CGS1) and the least stable genes (PPK, DPE2 and eif4A) under temperature stress in S21. g, h, i Relative quantification of PyOLE-1 gene expression using the best stable reference genes (MAP, UBC and CGS1) and the least stable genes (PPK, DPE2 and eif4A) under temperature stress in PyC-1. The average Ct value was calculated from three biological and technical replicates and used for relative expression analyses. Error bars indicate standard errors. The statistical significance was showed by one or two asterisks (P value is lower than 0.05 or 0.01) respectively fold respectively, in PyC-1 when the least stable reference genes were used.

Discussion
RT-qPCR is a highly sensitive technique used in a wide range of applications. Therefore, the selection and use of suitable reference genes is a prerequisite for the accurate quantification of gene expression levels. Based on current researches, transcriptomic profile data is a reliable source for exploration of suitable reference genes for specific experimental conditions [15,16,33]. In this study, six candidate reference genes were selected from transcriptome of P. yezoensis by some criteria including d, e, f Relative quantification of PyOEE-1 gene expression using the best stable reference genes (CGS1 and UBC) and the least stable genes (DPE2 and ATPase) under dehydration stress in S21. g, h, i Relative quantification of PyOEE-1 gene expression using the best stable reference genes (CGS1 and UBC) and the least stable genes (DPE2 and ATPase) under dehydration in PyC-1. The average Ct value was calculated from three biological and technical replicates and used for relative expression analyses. Error bars indicate standard errors. The statistical significance was showed by one or two asterisks (P value is lower than 0.05 or 0.01) respectively credible protein annotation (Nr databases), appropriate expression levels (FPKM> 10), and a low dispersion measure (DPM ≤ 0.3).
On the other hand, traditionally, housekeeping genes have been widely used as reference genes due to their ubiquitous expression in different spatial-temporal conditions. However, recent studies show that expression levels of some classic reference genes are not as stable as previously thought [8,34]. Here, three housekeeping genes (UBC, EF1-α, and eif4A) were selected to evaluate their suitability as reference genes for qPCR following abiotic stress. UBC, an ubiquitin-conjugating enzyme gene, exhibited stable expression in each group examined, whilst the EF1-α was not one of the top three most a b Fig. 3 Comparison of normalized expression level of target genes in RT-qPCR and FPKM value under temperature and dehydration stress respextively. (a) For temperature conditions, FPKM values of PyOLE-1 were showed in this figure and evaluated it by the Y-axis of left, and the right Y-axis were used to indicate the relative quantifications of PyOLE-1, normalized by the most stable reference genes (MAP, UBC and CGS1) and the worst stable reference genes (PPK, DPE2 and eif4A). b For dehydration conditions, FPKM values of PyOEE-1 were showed in this figure and evaluated it by the Y-axis of left, and the right Y-axis were used to indicate the relative quantifications of PyOEE-1, normalized by the most stable reference genes (CGS1 and UBC) and the worst stable reference genes (DPE2 and ATPase) a b c Fig. 4 Comparison of Normalized expression level of PyOLE-1 gene between single and the best reference genes under temperature stress. (a) Relative quantification of PyOLE-1 gene expression using the single reference gene MAP and the best stable reference genes (MAP, UBC and CGS1) under high temperature (a), chilling temperature (b) and freezing temperature (c) respectively. The average Ct value was calculated from three biological and technical replicates and used for relative expression analyses stable genes in any of the experimental conditions. Nevertheless, eif4A expression was variable in all experimental subsets, and especially in the temperature subset. Therefore, we should take results with some caution when housekeeping genes have been directly used as reference genes for expression normalization in the absence of validation.
Gene expression can be highly tissue-specific and often varies based on the physiological status of the organism or experimental treatments. Therefore, the simultaneous use of several reference genes could decrease the probability of biased normalization [35,36]. In addition to identifying the most stable reference genes, geNorm results suggested the optimum pair of genes with the least amount of variation in their expression ratios.
To validate selected reference genes, the relative expression levels of the target genes according to RT-qPCR were compared with those derived from RNAseq-based gene expression profiling. As shown in Additional file 2: Figure S2, the RT-qPCR results quantified using best reference genes were more consistent with the RNA-seq-based target genes expression patterns.
Our results also indicated that multiple internal references are necessary for the accurate study of gene expression under various experimental conditions. Indeed, a single reference gene can be insufficient to accurately normalize expression data, or can lead to erroneous interpretation. The optimal number of reference genes required for RT-qPCR analysis has been ongoing discussion [26]. A threshold value of V < 0.15 was suggested for normalization. Our results demonstrated that two genes (CGS1 and UBC) were suitable for normalization under dehydration stress, whilst three genes (MAP, UBC, and CGS1) were optimal for temperature stress. In addition, geNorm showed that the V3/4 value was the lowest among all treatment samples but that it was still above 0. 15. According to several reports, the threshold V value (pairwise variation) of 0.15 should not be considered as an absolute cutoff but rather a suggested one. Some studies have even reported higher V values in some species, and the threshold used is thus dependent on a consideration of the research purpose [37,38]. Considering our results and the practical feasibility, three genes (UBC, MAP, and FHP) were shown to be appropriate for gene expression a b c Fig. 5 Comparison of Normalized expression level of PyOEE-1 between single and the best reference genes under dehydration stress. (a) Relative quantification of PyOEE-1 gene expression using the single reference gene CGS1 and the best stable reference genes (CGS1 and UBC) under water loss rate 20% (A), 50% (b) and 70%(c) respectively. The average Ct value was calculated from three biological and technical replicates and used for relative expression analyses normalization when all samples were analyzed in combination.
Additionally, in order to confirm that our results are more generally applicable across the species, other two genotypes of P. yezoensis were treated with the same abiotic stress conditions and the relative expression patterns of PyOLE-1 and PyOEE-1 were then quantified using the best reference and the worst reference genes. We obtained results similar to those obtained for line RZ58, and these results were also consistent with the characteristics of the target genes in algae under abiotic stress (Fig. 6).

Conclusions
In this study, six candidate genes were selected form the transcriptome of P. yezoensis to search several appropriate reference genes for using in RT-qPCR. And this study also demonstrated that the use of housekeeping genes as reference genes for normalization of expression data should be validated. Based on these results, we identified optimal sets of reference genes to accurately normalize and quantify gene expression under abiotic stress conditions in P. yezoensis. We also compared their difference of expression levels between RNA-seq data and RT-qPCR data in different treatments. The result showed that the RNA-seq data is reliable to valuing the expression stability of genes. Further, our results also indicated that multiple reference genes are necessary for accurate study of gene expression in different treatments, such as CGS1 and UBC were suitable for dehydration stress. And, the similarly expression patterns of PyOLE-1 and PyOOE-1 were observed in two other genotypes of P. yezoensis confirmed that our identified reference genes more generally applicable across the species. In summary, these reference genes will facilitate further research towards elucidating the molecular mechanisms of stress-tolerance in this economically important species.