Genome-wide survey of potato MADS-box genes reveals that StMADS1 and StMADS13 are putative downstream targets of tuberigen StSP6A

Background MADS-box genes encode transcription factors that are known to be involved in several aspects of plant growth and development, especially in floral organ specification. To date, the comprehensive analysis of potato MADS-box gene family is still lacking after the completion of potato genome sequencing. A genome-wide characterization, classification, and expression analysis of MADS-box transcription factor gene family was performed in this study. Results A total of 153 MADS-box genes were identified and categorized into MIKC subfamily (MIKCC and MIKC*) and M-type subfamily (Mα, Mβ, and Mγ) based on their phylogenetic relationships to the Arabidopsis and rice MADS-box genes. The potato M-type subfamily had 114 members, which is almost three times of the MIKC members (39), indicating that M-type MADS-box genes have a higher duplication rate and/or a lower loss rate during potato genome evolution. Potato MADS-box genes were present on all 12 potato chromosomes with substantial clustering that mainly contributed by the M-type members. Chromosomal localization of potato MADS-box genes revealed that MADS-box genes, mostly MIKC, were located on the duplicated segments of the potato genome whereas tandem duplications mainly contributed to the M-type gene expansion. The potato MIKC subfamily could be further classified into 11 subgroups and the TT16-like, AGL17-like, and FLC-like subgroups found in Arabidopsis were absent in potato. Moreover, the expressions of potato MADS-box genes in various tissues were analyzed by using RNA-seq data and verified by quantitative real-time PCR, revealing that the MIKCC genes were mainly expressed in flower organs and several of them were highly expressed in stolon and tubers. StMADS1 and StMADS13 were up-regulated in the StSP6A-overexpression plants and down-regulated in the StSP6A-RNAi plant, and their expression in leaves and/or young tubers were associated with high level expression of StSP6A. Conclusion Our study identifies the family members of potato MADS-box genes and investigate the evolution history and functional divergence of MADS-box gene family. Moreover, we analyze the MIKCC expression patterns and screen for genes involved in tuberization. Finally, the StMADS1 and StMADS13 are most likely to be downstream target of StSP6A and involved in tuber development. Electronic supplementary material The online version of this article (10.1186/s12864-018-5113-z) contains supplementary material, which is available to authorized users.

Based on phylogenetic relationship, MADS-box gene family has been divided into two major lineages in plants, type I and type II, which were resulting from an ancestral gene duplication [11,12]. Type I genes are also named as M-type MADS-box genes, which contain three subgroups (Mα, Mβ, and Mγ). The classical structure of M-type MADS-box genes is an N-terminal MADS domain and a relatively less conservative domain in the C-terminal [13]. In most plants, higher frequency of segmental gene duplications and weaker purifying selection result in a faster step of birth-and-death to type I genes compared to type II genes [14]. Type II MADS-box genes are also known as MIKC-type genes, which encode MEF2-like proteins [15]. In addition to the MADS domain, type II MADS-box genes contain three other domains, including intervening (I), kertain-like (K), and C-terminal (C) domains from N-terminal to C-terminal. The intervening (I) domain consists of approximately 30 amino acids and contributes to the dimerization of MADS-box proteins [16]. The kertain-like (K) domain is about 70 amino acids and more conservative than intervening (I) domain. The coiled-coil structure is significant to regulate the dimerization of MADS-box proteins. C-terminal (C) domain is a highly variable region in MADS-box proteins related to transcriptional activation and formation of protein complexes [17]. Type II MADS-box genes can be further classified into MIKC C (the 'C' stands for 'Classic') and MIKC * based on the variable intervening (I) domain [18]. The domain compositions of these two subfamilies in type II are quite different. MIKC * subfamily exhibit a longer intervening (I) domain and less conservative kertain-like (K) domain [19]. Therefore, in early studies, MIKC * subfamily was attributed into M-type MADS-box genes named Mδ [12].
The first MADS-box gene in plants was found to be related to the differentiation of flower [3]. ABCDE model had been successfully adopted to explain the determination of floral organ identity. Recent studies have also found that MIKC C subfamily is related to photoperiod-regulated floral meristem identity, gametophyte development, sporophyte (diploid) generation, seed pigmentation, and embryo development [20][21][22][23][24]. Most of genes in ABCDE model belong to MIKC C subfamily [6,25,26]. Besides, MADS-box genes in MIKC C subfamily plays irreplaceable biological functions in the stress-responsive processes, for instance, TaMADS2 was up-regulated in response to wheat stripe rust infection [27].
The functions of MIKC* MADS-box genes are less elucidated than those in MIKC C subfamily and it is found that the heterodimers of MIKC*-type proteins are essential for the pollen maturation and pollen tube growth in Arabidopsis [28]. In potato, only three MADS-box genes have been previously reported, they are potato MADS-box 1-1 (POTM1-1), StMADS11, and StMADS16 [29,30]. POTM1-1 gene expression is temporally and spatially regulated in both vegetative and floral organs, transcriptional suppression of POTM1-1 activates axillary meristem development by increasing the cytokinin levels [29,31,32]. StMADS11 is expressed in all vegetative tissues of the potato plant, mainly in the stem, but not in flower organs [33]. Ectopic expression of StMADS16 modifies the inflorescence structure by increasing both internode length and flower proliferation of the inflorescence meristems and confers vegetative features to the flower [29]. Recent study finds that FLOWERING LOCUS T in potato (StSP6A) is a mobile signal for potato tuberization. StSP6A, homologs of FT in Arabidopsis, is very likely to control tuberization through regulating the expressions of downstream MADS-box genes [34,35]. Therefore, there is an urgent need to characterize the MADS-box gene family in potato and screen for MADS-box candidates involved in tuberization. The complete genome sequencing of the potato in 2011 enabled us to perform a genome-wide identification of MADS-box genes in potato [36].
In this study, multiple bioinformatics methods were applied to perform a comprehensive survey of MADS-box genes in potato. In addition, the gene structure, phylogenetic relationships, chromosomal locations, conserved motifs and tissue-specific expressions of MADS-box genes were investigated in potato. Our work would be useful in helping to establish the basic information of MADS-box genes in potato and in screening out several MADS-box genes related to tuberization and following tuber development.

Identification of MADS-box genes in potato
The potato genome sequence data used for the identification and annotation of StMADS genes was downloaded from Potato Genome Sequencing Consortium (PGSC, http://potato.plantbiology.msu.edu/). BLASTP, InerPro ID and keyword searches were performed to obtain the putative MADS-box genes in potato. First of all, the known Arabidopsis MADS-box protein sequences were used as query to perform BLASTP utility against the potato protein database (PGSC_DM_v3.4_pep_nonredundant.fasta) in local computer with an expected value cutoff of 1e-3. Then, InterPro ID (IPR003340) and keyword searches (MADS-box) were also applied to identify putative potato MADS-box proteins in PGSC database by online searching. All putative MADS-box sequences were collected and the redundant sequences were manually removed, the remaining candidate MADS-box sequences were submitted to NCBI Conserved Domain (CD) search (https:// www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) to confirm the existence of MADS-box domain.The gene structure of MADS genes was drawn with TB tools (http://cj-chen. github.io/tbtools/) using GFF3 files downloaded from PGSC.

Chromosomal location and gene duplication
MapChart 2.2 was exploited to draw the gene location in the physical map of potato MADS-box genes [37]. Potato MADS-box genes were named based on the position information obtained from PGSC with concerning about already reported POTM1-1 (StMADS1), StMADS11, and StMADS16. According to the nomenclature method used in rice, the remaining potato MADS-box genes were named as StMADS2 to StMADS153 followed the order of MIKC C , MIKC*, Mα, Mβ, and Mγ. Potato MADS-box genes without chromosomal positions were named at the last of the list. Tandem duplicated genes were determined in PGSC with a criterion that no more than five genes between two genes with high homology (> 50%). Segmental duplicated genes of potato were obtained from the Plant Genome Duplication Database (PGDD, http://chibba.pgml.uga.edu/duplication/).

Phylogenetic and conserved motif analyses of potato MADS-box genes
Potato, rice, and Arabidopsis MADS-box protein sequences were aligned using ClustalX version 1.83. The phylogenetic trees were generated by using Neighbor joining (NJ) method in MEGA6.06 with bootstrap value 1000 replicates to evaluate the significance of the nodes. To ensure that the divergent domains could contribute to the topology of the NJ tree, pairwise gap deletion mode was used to construct the tree. Moreover, the potato MADS-box protein sequences were submitted to MEME (http://meme-suite.org/) to determine the conserved motif in these sequences.

Phylogenetic analysis of MADS-box genes between potato and tomato
To investigate the phylogenetic relationships of MADS-box genes between potato and tomato, a genome-wide search against the Solanum lycopersicum proteome (Solanum lycopersicum Annotation Release 103, ftp://ftp.ncbi.nlm.nih.gov/ genomes/Solanum_lycopersicum/protein/) using blastp program in dos environment of windows. The threshold of e-value was set for 1e-3. The candidate gene was submitted to InterPro to exclude genes without MADS-box domain. A phylogenetic tree was generated to determine the relationship of MADS genes in these two species using the protein sequence aligned with Muscle program in MEGA7.0 and the same method mentioned above.
Expression analyses based on publicly available RNA-seq and microarray data The expression patterns of potato MADS-box gene family members were determined using the data deposited in the PGSC database which derived from Illumina RNA-seq of a wide range of developmental stages [36]. The expression profiles of genes on POCI (Potato Oligo Chip Initiative) microarrays were previously performed in stolons of StSP6A-overexpression and StSP6A-RNAi plants [34]. The MADS-box genes were used as queries to search DNA probes using BLASTN and the results of MADS-box genes were used for further study with e-value set for 1e-3. The probes were selected if they were annotated as MADS-box family members by the microarray platform and were assigned to corresponding potato MADS-box genes if their identity were 100% with exceptions of single nucleotide polymorphism (SNP). The deduced FPKM value of these genes acquired from published data were normalized with Log2 to make it suitable for the further visualized in Pretty Heatmap (http://www.ehbio.com/ImageGP/index.php/Home/Index/PHeatmap.html).

Plant materials collection and qRT-PCR
The plants of potato cultivar 'Desire' was cultivated in greenhouse of Northwest A&F University from March to June (23 ± 2°C,16 h light/8 h dark). The different tissues and organs were collected at different time after sprouting. Stem, leaf, and flower were sampled at flowering, whereas stolons and young tubers were collected ten days after flowering. In addition, mature tubers were taken 90 days after sprouting. All samples were immediately frozen in liquid nitrogen and stored at -80°C until used. Total RNA was extracted using a high purity total RNA rapid extraction kit (BioTeke, RP1202, China) and first-strand cDNA was synthesized using a ReverTra Ace Kit (TOYOBO, FSK-100, Japan) following the manufacturer's instructions.
Primer 5.0 was used to design gene-specific primers of MADS-box in potato (Additional file 1: Table S1). Real-time quantitative RT-PCR was performed by using the SYBR green mix (KAPA, KK4601, USA) in a Real-time PCR machine (BioRad, CFX96, USA). The internal reference gene was ef1α and three biological replicates were used to estimate the expression level by the method of two stand curves as described previously [38].

Identification and comparative analysis of MADS genes in potato
Three bioinformatics methods were used to identify the MADS-box genes in potato. A local BLASTP search was performed with a cutoff e-value of 1e-3 by using the Arabidopsis MADS-box proteins as query, which resulted in 169 MADS-box candidates. The keyword and InterPro ID (IPR003340) searches against the PGSC website resulted in 145 and 156 MADS-box candidates, respectively. These candidates were submitted to NCBI CDD to confirm the existence of MADS-box domain. After removing the redundant sequences, 153 total MADS-box genes were found in potato. The names of StMADS1, StMADS11, and StMADS16 were introduced by previous reports and remaining genes were named from StMADS2 to 153 (except for StMADS11 and StMADS16) according to their chromosomal locations and subfamily affiliation ( Table 1). Based on phylogenetic relationships of Arabidopsis, rice, and potato MADS-box proteins (Fig. 1), 153 potato MADS-box proteins were classified into two subfamily MIKC and M-type. In potato, the number of MADS-box genes in MIKC subfamily was 39 and this subfamily was mainly comprised of two subgroups, 30 MADS-box genes in MIKC C and 9 MADS-box genes in MIKC*. The number of MADS-box genes in M-type is 114 and this subfamily contained three subgroups, 70 MADS-box genes in Mα, 28 MADS-box genes in Mβ, and 16 MADS-box genes in Mγ.
MIKC subfamily members are about 200 amino acid in length and contain more exons than those of M-type subfamily ( Table 1). The MIKC family members have an average of 6.4 exon number, and 86.7% of them contain more than 5. But for the M-type, most of them (106 of 114) have only one exon (Table 1). These results about exon number in MADS-box genes are similar to those have been reported in Arabidopsis, rice, cucumber, and apple. The exon-intron structures of MIKC members are more complex than those of M-type.
The total number of MADS-box genes in 10 species that had been previously reported was quite different (Table 2) [12,[39][40][41][42][43][44][45][46][47][48][49]. The reported numbers of MADS-box genes were from 43 to 167, which were positively correlated to corresponding genome size except for that of Arabidopsis (Table 2). Generally, MIKC subfamily consisted of more members than M-Type subfamily as reported in previous studies, but we found that the number of M-type MADS-box genes (114) is approximately three times to that of MIKC MADS-box genes (39) in potato ( Table 2). In the other nine species, the number of MIKC subfamily members was close to or more than that of M-type subfamily members.

Chromosomal distribution and duplication events of StMADS genes
The MapChart software was used to map the physical position of MADS-box genes on 12 chromosomes of potato, which would be helpful for us to perform further study of function of MADS-box genes in potato (Fig. 2). Based on the information of chromosomal locations of potato MADS-box genes, it was found that seven genes were not localized to the chromosomes of potato, five of which belonged to the M-type subfamily ( Table 1). The rest MADS-box genes (146) were distributed on the 12 chromosomes and the top five chromosomes with more MADS-box genes are Chr01 (31 genes), Chr04 (25 genes), Chr05 (15 genes), Chr11 (14 genes), and Chr03 (10) (Fig. 2).
To further explore the distribution patterns of MADS-box genes, a radar map was exhibited to show the distributions of each subfamily in 12 chromosomes. It was found that substantial clustering was detected in each of at least four chromosomes which was mainly contributed by the gene number of M-type subfamily rather than MIKC subfamily, implying there may be a selective expansion pattern mainly happened in the M-type subfamily (Fig. 3). The MADS-box genes belong to MIKC subfamily distributed on all chromosomes except on Chr09 (Fig. 2). For the M-type MADS-box genes, 52.9% MADS-box genes of the Mα subgroup was clustered on Chr01 and Chr04.
Moreover, the gene duplication events in the MADS-box gene family were analyzed and it was found that 47.7% (73 of 153) MADS-box genes derived from gene duplications (Figs. 2 and 4a). Tandem duplicated genes were mainly located on chromosome 1 and chromosome 5, accounting for about 51.9% of tandem duplicated genes. 78.9% (45 of 57) tandem duplicated genes of belonged to the M-type, indicating that tandem duplications played an important role in the expansion of M-type family genes. 31.5% (12 of 38) genes of the MIKC subfamily were resulted from tandem duplications. Interestingly, it was found that tandem duplications could occur between different subgroups (e.g. StMADS44-46 belonged to Mα and StMADS111 belonged to Mβ), indicating that gene duplication not only contributed to the expansion of MADS-box gene family but also lead to functional diversifications.
Compared with tandem duplications, segmental duplications only accounted for 10.5% of the total MADS-box genes in potato (Fig. 4a). 26.7% (8 of 30) of MIKC C MADS-box genes were resulted from segmental duplications. Those genes were located in Chr02 (two genes), Chr03 (two genes), Chr04 (one genes), Chr05 (two genes), and Chr11 (one genes). We found, interestingly, three copies of the segmental duplicated gene pair (StMADS17 and 18, StMADS13 and 14, StMADS8 and 9), among which StMADS9, 14 and 17 are from SEP group, while StMADS8,   13 and 18 from SQUA group. Moreover, the Mα subfamily members StMADS71-79 located in chr04 were segmental duplicated genes, which shows a cluster in the physical map (Fig. 4b). It clearly shows that there probably a chromosome doubling event in chr04 in the process of potato evolution, which contribute greatly to the expansion of Mα type MADS-box genes.

Phylogenetic relationships and conversed motifs of StMADS proteins
An unrooted tree was built based on the full-length amino acid sequences of 153 potato, 89 Arabidopsis, and 60 rice MADS-box proteins using MEGA6.0 software (Fig. 1). StMADS proteins can be classified into two major subfamilies, MIKC (also known as type II, 39 genes) and M-type (also known as type I, 114 genes), based on the phylogenetic tree. MIKC subfamily can be further divided into MIKC C (30 genes) and MIKC* (9 genes), whereas M-type contains Mα (70), Mβ (28) and Mγ (16). According to the classification method defined in Malus domestica, Oryza sativa, and Brassica rapa, MIKC C subgroup was organized into 13 clades. Interestingly, potato MADS-box genes were absent in the FLC-like, AGL15-like, and TT16-like clades.
Subsequently, potato MIKC C subgroup consisted of ten clades. TM3-like clade was the largest clade containing seven StMADS proteins. The orthologous and paralogous relationships of MADS-box proteins are analyzed in potato, rice, and Arabidopsis, it was found that most of the M-type subfamily members were concentrated in a cluster, namely all of these homologous MADS-box proteins are paralogous genes. These results indicated that the MADS-box gene family was formed in an ancestral species before the divergence of monocotyledonous and dicotyledonous plants, which was consistent with the results of previous studies [12,[39][40][41][42][43][44][45][46][47][48][49][50][51][52]. Moreover, orthologous pairs come from the MIKC family were with relatively high homology, indicating that the functions of MIKC family genes were relatively conservative in the evolutionary process. Similarly, an unrooted tree was also built based on the full-length amino acid sequences of 153 potato MADS-box proteins, which could be partitioned into MIKC C , MIKC*, Mα, Mβ, and Mγ with good supporting values (Fig. 5a). To further analyze the motif compositions of potato MADS-box proteins, MEME online software was used to analyze the conserved motifs (Fig. 5b). The number of conserved motif was set to 20, where motifs 1, 2, 3, and 20   were located at MADS-box domains and motifs 6, 11 and 16 were located at K domain. Moreover, motifs 4, 5, and 7-10 represented coil regions and low complexity regions. Besides, the rest motifs were less conservative and only appeared in several MADS-box proteins. As shown in Fig.  4, MIKC C subgroup contains seven conservative motifs 1, 2, 3, 6, 11, 16, and 20 and motifs 11 and 16 belonging to K domain only existed in this subgroup. The MIKC* subgroup contains fewer motif varieties, mainly motifs 1 and 3, and some of them had motifs 6 and 20 similar to MIKC. Specifically, StMADS31 had motif 18 adjacent to motif6 which was similar to the members belonging to subgroup Mα. These results showed that the MADS-box proteins included in the same clade in the phylogenetic tree have almost identical motif distribution types. Moreover, the structure of MIKC* had both characteristics of M-type and MIKC C , but in Arabidopsis the subgroup MIKC* was  [12,[39][40][41][42][43][44][45][46][47][48][49][50][51][52].

Phylogenetic relationship of MADS-box genes of potato and tomato
Tomato is the most studied model plant in Solanaceae family. Therefore, a comparison with tomato MADS genes could provide more clues on the function differentiation of potato MADS genes ( Fig. 6 and Additional file 2: Table S2). There was a total of 107 MADS-box genes in tomato, including 53 MIKC type and 54 M-type MADS-box genes. Tomato compromised more MIKC genes (53) compared with those in potato (39) even if the total number of potato MADS-box genes (153) is more than those in tomato (107). On the contrary, there were more M-type MADS-box genes in potato compared with those in tomato, especially the number of Mα (70) and Mβ (28) in higher than those in tomato (Mα, 34; Mβ, 6), respectively ( Fig. 6 and Table 2). These evidences suggested that the expansion of MADS-box in Solanaceae might be quite different.
To speculate the functions of potato MADS-box genes, we compare the MIKCC MADS-box genes with their closely related homologs. The orthologs of MIKCC potato MADS-box genes was screened by following criteria, which were BLASTP e-Value was less than 10e-10) with more than 80% coverage in length and the ortholog was the best-matching homolog than other candidate in tomato. The orthologs of most potato MIKCC MADS-box genes could be found in tomato except StMADS3, 5, and 20 (Additional file 3: Table S3). The Orthologs in different species have evolved from a common ancestral gene via speciation, which often retain the same functions during evolution.

Tissue specific expression patterns of MIKC C StMADS genes
Illumina RNA-Seq transcriptome data of DM and RH was retrieved to explore the expression patterns of StMADS genes, including vegetative organs (including root, stem, petiole, and leaf ), floral organs (including flower, stamen, sepal, petal, and carpel) and storage organs (including stolon and tuber) [36]. The expression The MIKC C StMADS genes were selected for further expression analysis because they were probable downstream targets of tuberigen StSP6A based on previous studies about its homologue−FLOWERING LOCUS T in Arabidopsis and rice [34]. Hierachical clustering of MIKC C StMADS genes was performed by using the transcriptome data of DM and RH, respectively (Fig. 7). The MIKC C StMADS genes in both DM and RH were similarly divided into two major clusters. The first group of MIKC C StMADS genes was mainly expressed in floral organs and the other group was expressed in vegetative Fig. 6 Phylogenetic analyses of MADS-box genes between Solanum tuberosum and Solanum lycopersicum. The NJ-tree was generated using the method mentioned above and storage organs. More specifically for their expressions in DM, 21 genes, five genes, and four genes were with highest expression in floral organs, vegetative organs, and storage organs, respectively (Fig. 7a). And for RH, 16 genes, five genes, and nine genes were with highest expression in floral organs, vegetative organs, and storage organs, respectively (Fig. 7b). It was found that most MIKC C StMADS genes were expressed in floral organs, indicating their possible roles in controlling floral organ development. Whereas, there were more MIKC C StMADS genes expressed in storage organs of RH compared with DM.
Moreover, the expressions of MIKC C StMADS genes in storage organs were further analyzed. It was found that 17 and 26 MIKC C StMADS genes were expressed (FKPM > 1) in storage organs of DM and RH, respectively. And nine and eight and 12 MIKC C StMADS genes were highly expressed (FKPM > 10) in storage organs of DM and RH, respectively. Taken together, six genes (StMADS1, 3, 11, 13, 16, and 29) were consistently with high expression levels in storage organs of both DM and RH, which may be involved in tuberization and following tuber development. Based on the phylogenic relationship, it was found that StMADS1 and StMADS13 were homologous genes of AGL8/FUL and OsMADS14/15, StMADS3 was homologous gene of SOC1 and OsMADS56, StMADS11 and StMADS16 were homologous genes of AGL22/SVP, and StMADS29 was homologous gene of AGL12. Among these genes expressed in potato stolon and tubers, StMADS1, StMADS3, and StMADS13 were the most likely downstream genes of tuberigen StSP6A because their homologous genes AGL8/FUL, OsMADS14/15, and SOC1 were proved to be downstream targets of Arabidopsis and rice FLOWERING LOCUS T [53][54][55][56][57].

QRT-PCR verifications of tissue specific MIKC C StMADS genes
To validate the results of RNA-seq analysis, real-time PCR analysis was performed for 29 MIKC C StMADS genes. Our tests showed that the real-time PCR experiments of 25 MIKC C StMADS genes (except StMADS7, 10, 20, and 26) were successfully conducted in tissues including roots, leaves, stolons, young tubers, mature tubers, and flowers. The results of real-time PCR showed that the expression patterns of most were in general agreement with the data of RNA-seq analysis. For example, eleven StMADS genes (StMADS4, 6, 9, 14, 15, 18, 19, 21, 22, 23, and 28) were overwhelmingly expressed in flowers compared with any other tissues (Fig. 8a), which were perfectly consistent with their expression patterns in flowers of both DM and RH (Fig. 7). The StMADS genes specifically expressed in potato flowers were most likely to control floral organ formation like their homologues in ABCDE model of other species [58,59].
StMADS1, 12, 13, and 27 were not only expressed in flowers but also expressed in stolons and young tubers (Fig. 8b), indicating that they might control the formations of both flower organs and tuberization. StMADS3, 11, 16, and 17 were highly expressed in stolons and/or young tubers but their expressions in flower were relatively low (Fig. 8c). Besides, we found six StMADS genes were expressed in almost all examined tissues without obvious tissue specific patterns (Fig. 8d, the expressions of StMADS24 and StNADS29 were not showed.

Screening for downstream targets of tuberigen StSP6A
StSP6A, a FLOWERING LOCUS T homologue in potato, have been reported to be a mobile signal in controlling not only flowering but also tuberization, while its homologue StSP3D is mainly involved in floral transition [34]. According to previous studies about flowering, MADS-box genes encoding proteins involved in flowering identity determination are major targets of FLOW-ERING LOCUS T and it was speculated that StMADS genes were downstream targets of tuberigen StSP6A. Therefore, the whole-genome microarray data from stolon tissue of StSP6A-overexpression (StSP6A-OX) and StSP6A-RNAi plants was used to screen for downstream StMADS genes. Firstly, the DNA probe sequences on POCI (Potato Oligo Chip Initiative) [34] microarrays were used as queries to perform BLASTN searches against the transcript sequences of StMADS genes. It was found that 22 probes corresponded to 17 StMADS genes (16 genes were belonged to MIKC C type) were presented on the POCI microarray chip and four genes Moreover, to verify the whether the expressions of StAMDS1 and StMADS13 were associated with the expression of StSP6A, we investigate the expressions of StSP6A, StMADS1, and StMADS13 in leaves of 30 days after sprouting, leaves of 60 days after sprouting, and young tubers, respectively. It was found that these three genes were not expressed in potato leaves at juvenile stage (30 days after sprouting), whereas StMADS1 and StSP6A were highly expressed in potato leaves at early flowering stage (60 days after sprouting) and young tubers (Fig. 9b). The expression of StMADS13 was not detected in 60d leaves but was observed in young tubers (Fig. 9b). These results indicated that StMADS1 expressions were associated with StSP6A in both 60d leaves and young tubers, whereas StMADS13 was only associated with StSP6A in young tubers. Though both StMADS1 and StMADS13 were putative downstream genes of StSP6A, their regulatory mechanism might be different depending on tissue types.

Discussion
Potato is one of the major food crops, which feeds millions of people all over the world [60]. As an important tuber crop, the improvement of yield is a key issue to potato breeder in china, owing to its low production far fewer than the global average. Thus, the investigation of molecular mechanism of tuberization and tuber development remains unclear. It will be helpful to identify candidate genes related to tuberization and tuber development, which are the key resources to promote the improvement of yield for both genetic modified crop and traditional breeding. Previous study had shown that FT protein StSP6A functioned as a mobile signal in controlling tuberization under short-day condition [34], and its paralogue StSP3D was involved in day-neutral flowering control. FT was a classical upstream regulator of MADS-box genes in the conservative ABC model in flower organ identity [59]. Based on clues that we mentioned above, it was reasonable to believe that some potato MADS-box genes were related to tuberization and tuber development.
In this study, a total of 153 members of MADS-box gene family were characterized in potato (Table 1). To confirm the gain and loss of MADS-box genes in potato, a phylogenetic tree was produced using ammo acid sequence of 153 potato MADS-box genes and representative MADS genes of Arabidopsis and rice (Fig. 1). It was found that most of MADS-box genes had their orthologs in potato, except for TT16-like, AGL17-like, and FLC-like subgroups. Besides, it was found that members of M-type, specifically Mα, were much more than any other species ever studied (Table 2), implying that potato MADS-box genes might have a different evolution pattern [39][40][41][42][43][44][45][46][47][48][49][50][51][52]. Apparently, the large potato MADS-box family might be due to its larger genome, which was produced by differently genomic duplication events in different species during the course of plant evolution  [61,62]. To explore what behind the extremely different composition of MADS-box family in potato, the effects of gene duplication events on expansion of MADS-box genes were investigated (Fig. 2). It was suggested that segmental duplications mainly contributed to expansion of MIKC subfamily, whereas the boom of M-type was mostly derived from tandem duplications (Figs. 2 and 3). It was believed that this phenomenon was due to the M-type genes mainly derived from the site-specific duplications within the same chromosome, while MIKC mainly came from the whole genome duplication events [61,[63][64][65]. Birth-and-death rates of MADS-box genes after gene duplications in different species were different which resulted in variable number of MADS-box genes in the same subfamily of different species [61,62]. These evidences could be one explanation for the abnormality of number of Mα in potato.
Besides gene duplication events, gene mutation and loss of certain domain, might also play important role in generating a part of Mα in potato. It had been reported that MIKC * was the intermediate form between MIKC C and M-type in the course of plant evolution [58,59,62]. Based on intron-exon structures of potato MADS-box genes, it was found that the intron number of MIKC C was the most and MIKC* had less intron than MIKC C , whereas most of M-type MADS-box genes was intronless. It could be speculated that the exon-intron loss mutations mainly happened in the K-box domain of MIKC C in the process of plant evolution, thus a new group, MIKC * , was born. MIKC * further lost several exons and introns corresponding to K-box domain and then produced M-type, which was also found in previous studies [49]. These evidences could be the other explanation for the abnormality of number of Mα in potato.
To compare the composition of MADS-box genes of potato and tomato, phylogenetic analysis was performed. Since these two species are genetically close species, it was surprising that tomato contain much more MIKC genes, while potato included more M-type genes. This suggested that these species might undergo a different evolution history of MADS-box. The candidate genes StMADS1 and StMADS13 that may be related to tuberization, the functions of their orthologous genes in tomato, SlMADS26 and SlMADS12, were still undiscovered. Though potato and tomato are both Solanaceae plants, the stolons and tubers are only found in potato. Thus, the functions of StMADS1 and StMADS13 need to be investigated in potato. Nevertheless, there was a homolog of StMADS1 and StMADS13, RIN found in tomato, which is proved to play important roles in induction of tomato ripening [66,67]. Besides, compared with potato, most genes that had been studied in tomato have few homologs, indicating that a  [68][69][70].To investigate the possible roles of StMADS genes in tuberization, we used RNA-seq data of DH and RM available in PGSC. Most of the StMADS genes showed tissue-specific expression patterns. Among these genes, StMADS1, 3, 11, 13, 16, and 29 were highly expressed in storage organs of both DM and RH (Fig. 7). Consistent with RNA-seq data, the results of QRT-PCR showed that StMADS1, 3, 11 and 16 were overwhelmingly expressed in stolons and/ or young tubers (Fig. 8), indicating that these genes were probably involved in tuberization and/or tuber development. Previous studies had shown us a fine regulation map of MADS-box genes and its significant roles in flower organ differentiation in several model plants.
In Arabidopsis, the expression of FLOWERING LOCUS T (FT), a core flower development regulator, was suppressed by the FLOWERING LOCUS C (FLC), a typical MADS-box genes, bound in its CArG site between first intron and promoter [71][72][73]. Interestingly, the expressions of MADS-box genes including APETALA1 (AP1) and SUPPRESSOR OF OVEREXPRESSION OF CO 1(SOC1) were related to flowering promotion that was controlled by two interacted flowering-related proteins FT and FD (FLOWERING LOCUS D) [74]. In monocot plants, orthologs and paralogs of FT and MADS-box presented nearly the same transcriptional regulation, for instance, a pair of FT genes Heading-date 3a (Hd3a) and RICE FLOWERING LOCUS T (RFT1) upregulated the expression of OsMADS15, which is crucial for floral initiation [75][76][77][78]. Given the highly conservative model of FT and MADS-box genes, it was reasonable to believe that this model would probably work in potato. Recent study showed that there was a functional diversification of FT proteins in potato. StSP3D was mainly involved in floral transition, and StSP6A was involved in tuberization transition. [34]. Therefore, these StMADS genes (StMADS1, 3, 11-13, 17, and 27) mainly expressed in stolons and/or young tubers were possible downstream targets of StSP6A. Interestingly, it was found StMADS1 and 13 were strongly correlated with the expression of StSP6A in leaves and/or young tubers. More evidences were obtained through analyzing the microarray data from stolon tissue of StSP6A-overexpression (StSP6A-OX) and StSP6A-RNAi plants, the expression of StMADS1 and 13 were upregulated in StSP6A-OX plants and downregulated in StSP6A-RNAi plants. Given the evidence that discussed above, StSP6A and several MADS-box genes are probably share the same regulation map with their homologs in the model plant. However, the truth of how StSP6A regulate StMADS1 and 13, in a directly interaction way or in the promotor region, remain unclear. As a master of transcription, tracing the target of MADS-box gene would also be a valuable subject in the future study.