Development of a genome-wide multiple duplex-SSR protocol and its applications for the identification of selfed progeny in switchgrass

Background Switchgrass (Panicum virgatum) is a herbaceous crop for the cellulosic biofuel feedstock development in the USA and Europe. As switchgrass is a naturally outcrossing species, accurate identification of selfed progeny is important to producing inbreds, which can be used in the production of heterotic hybrids. Development of a technically reliable, time-saving and easily used marker system is needed to quantify and characterize breeding origin of progeny plants of targeted parents. Results Genome-wide screening of 915 mapped microsatellite (simple sequence repeat, SSR) markers was conducted, and 842 (92.0%) produced clear and scorable bands on a pooled DNA sample of eight switchgrass varieties. A total of 166 primer pairs were selected on the basis of their relatively even distribution in switchgrass genome and PCR amplification quality on 16 tetraploid genotypes. Mean polymorphic information content value for the 166 markers was 0.810 ranging from 0.116 to 0.959. From them, a core set of 48 loci, which had been mapped on 17 linkage groups, was further tested and optimized to develop 24 sets of duplex markers. Most of (up to 87.5%) targeted, but non-allelic amplicons within each duplex were separated by more than 10-bp. Using the established duplex PCR protocol, selfing ratio (i.e., selfed/all progeny x100%) was identified as 0% for a randomly selected open-pollinated ‘Kanlow’ genotype grown in the field, 15.4% for 22 field-grown plants of bagged inflorescences, and 77.3% for a selected plant grown in a growth chamber. Conclusions The study developed a duplex SSR-based PCR protocol consisting of 48 markers, providing ample choices of non-tightly-linked loci in switchgrass whole genome, and representing a powerful, time-saving and easily used method for the identification of selfed progeny in switchgrass. The protocol should be a valuable tool in switchgrass breeding efforts.


Background
Switchgrass (Panicum virgatum L.) is a C4 perennial grass native to the prairies of North America and being developed as a herbaceous crop for the biofuel feedstock production in the USA and Europe [1][2][3]. Recurrent selection procedures have been widely employed in genetic improvement of populations and development of cultivars in switchgrass [4]. Because of its wind facilitated pollination behavior and low fertility of bagged inflorescences, switchgrass is considered to be allogamous.
Although homozygous inbred lines of switchgrass had not been reported yet, some studies indicated that the rate of self-pollination ranged from less than 1% of bagged inflorescences [5][6][7] to higher than 50% in some specific individuals [8] or a genotype grown in a controlled environment [9]. Through continuous selfing, development of inbred lines is potentially possible, which will produce single cross hybrid cultivars in switchgrass [8][9][10]. Hybrid vigor (i.e., biomass yield) has been reported in switchgrass [11]. Using tissue culture protocols to clonally propagate two heterozygous parents for hybrid seed production was suggested [11]. But the approach for producing hybrid cultivars is not applied due to high costs associated with producing large quantities of switchgrass clones through tissue culture and transplanting the clones into field plantings for large scale seed production. Identification of selfed progeny, offers the potential for the development of switchgrass inbred lines to serve as parents of F1 hybrids. The proposed procedure has proven successful for large amounts of seed production in maize (Zea mays), likely applicable in switchgrass.
The attempt to obtain switchgrass selfed seeds was carried out by bagging inflorescences of selected plants [5][6][7]. However, bagging may not be fully effective to prevent pollen contamination [12]. It has been proposed that seeds from bagged panicles needed to be genotyped with molecular markers to confirm parentage [8]. Morphological traits, such as pubescence on the adaxial surface of the leaf blade, foliage color, and seed size, were used to identify selfed and crossed progeny in previous experiments [13,14]. Although these phenotypic markers are simple and easily used, they are not only genotypedependent but also may be environmentally sensitive. Instead, simple sequence repeat (SSR) markers have many advantages due to their co-dominance, low cost, high polymorphism, and environmental independence, and have been available in switchgrass [15][16][17][18][19]. SSR markers were used for genetic diversity [20], cultivar classification [21], and evolution [22] in switchgrass. Using PCR amplifications of six individual SSR markers, one preliminary study reported the confirmation of selfed progeny in switchgrass [12]. It will save 50% time and cost in the lab work if a duplex PCR protocol is developed. In addition, one prerequisite for molecular genetic-based inbred testing requires non-tightly linked markers [23]. Recently available SSR linkage maps [18,24] enable to select molecular markers covering much of the genome and less of linkage in switchgrass.
Because of inherent variation for selfing rates in switchgrass, development of a technically reliable and easily used multiplex marker system is very useful to quantify and characterize selfing and crossing rates of switchgrass. However, no similar study has been reported in switchgrass. The objectives of this study were: (1) to select a set of polymorphic SSR markers based on genome-wide screening, (2) to develop a duplex PCR-based protocol, and (3) to apply this SSR system in the identification of self-and cross-fertilized progeny of selected switchgrass plants in different growth conditions.

Screening and evaluation of genome-wide mapped SSR markers
Of the 915 primer pairs (PPs) that were positioned on the published linkage maps [18,24], 842 (92.0%) produced clearly scorable bands with approximate sizes as reported previously [15][16][17][18][19]. The remaining 73 (8.0%) PPs produced either no amplicons or nonspecific or smear products. The number of alleles among the scorable SSR markers ranged from one to 20. The mean number of alleles per locus was 14.3 for dinucleotide, 10.5 for trinucleotide, and 8.3 for other SSR markers with repeat motifs ≥4 (Table 1). The SSRs with dinucleotide repeats produced a significantly greater number of alleles than those with trinucleotide repeats (t-test, p < 0.01).
From the 842 PPs, 166 well amplified SSR markers were selected due to their relatively high allele number (≥4) per locus. These markers were distributed on 18 linkage groups (LGs) and spanned 1751.4 cM (84.0% coverage) of the reference map [24]. The number of SSR markers in each LG ranged from 2 on LG 7b to 20 on LG 3b (Additional file 1). Average marker interval was  Figure 1). The raw data (e.g., SSR allele size range, heterozygosity, PIC value, frequency of each allele, etc.) for each SSR marker are presented in Additional file 2.

Development of a set of duplex PCR markers
Multiplex PCRs were developed by using the principles described by Edwards and Gibbs [26] and Hayden et al. [35,36]. PCR robustness, polymorphism and map position were used as the screening criteria to select 60 out of the 166 single-locus markers, and then these markers were empirically tested for duplex PCR quality. Comparing to monoplex PCR conditions, which generally didn't work in duplex PCR by just combining two sets of primers together (Figure 3, set A), several adjustments associated with reaction chemicals were tested to optimize the protocol. The increase of dNTP, template DNA, buffer, IR-M13 dye, and primer concentrations did not significantly improve the amplification quality ( Figure 3, set C, D, E, G and H). In contrast, the increase of Taq polymerase concentration from 0.25 to 0.5 units per 10μl reaction partially increased amplicon quantity but did not correct uneven amplification or pull up unamplified alleles (Figure 3, set B). The most effective change that affected SSR primer compatibility was the increase in Mg 2+ concentration (from 1× to 1.6×), which generally pulled up faintly amplified and unamplified loci ( Figure 3, set F).
After an optimal duplex PCR protocol was identified, individual SSR primer chemical regent quantities were modified as necessary to obtain appropriate fluorescent signals for two SSR markers in each duplex ( Figure 4). The process of calibrating primer quantities was done by comparing fluorescent signal intensity. The relative ratio between two SSR PPs' concentrations was more important than the absolute quantities, which is consistent with previous results [34]. The duplex PCR protocol required adding 0.125 to 4 pmoles of each SSR primer in a 10 μl reaction volume (see details in Table 2).
Of the 60 tested PPs, 48 SSR markers were assembled into 24 duplexes (set #1-24) by testing them on eight individual DNA samples (Table 2). Twelve markers were discarded due to unsatisfactory amplifications in duplex PCRs. All duplex PCRs produced the same SSR alleles as monoplex PCRs ( Figure 4). The 48 SSR markers   Table 2). The mean PIC for dinucleotide and trinucleotide SSRs were 0.823 and 0.820, respectively ( Table 2). Eleven markers were eSSRs and the other 37 gSSRs ( Table 2). The 48 markers constituting the 24 duplexes are distributed on 17 LGs and the number of SSR markers per LG ranged from one (on LG 1a and 8a, respectively) to five (on LG 2b and 5b, respectively), based on a published linkage map [24] (see Additional file 1, in red). The mean distance of two immediate neighboring markers was 37.7 cM, and the nearest markers were on LG 2b with a mean distance of 8.5 cM. The only LG with no SSR marker loci contributing to the duplex PCR sets was LG 7b, one of the shortest and least polymorphic LGs in the mapping population [24].
The minimum, mean and maximum PIC values of the 48 markers were 0.622, 0.829 and 0.945, respectively. The mean of non-exclusion probability of one marker, if one parent is known (NE-1P), were 0.414 ranging from 0.191 to 0.709 (Table 2).  Validation of the duplex PCR in identification of selfed progeny using different populations Three populations produced in different environments were used to determine selfing ratios ( Figure 5). Population 1: A maternal plant (K4) grown in the field condition and its 46 putative half-sib progeny were genotyped with 100 sets of randomly selected duplex markers ( Figure 5A). Each set contained five duplexes with 10 loci on different LGs. One plant (K4-11) was identified as a contaminant because all alleles of its six loci (PVCAG2361/2, 5211-B07, PVAAG3163/4, PVGA1143/ 4, PVCAG2397/8 and PVGA1963/4) were not inherited from K4. The remaining 45 plants showed the maternity relatedness with K4 and were included in further analysis. The results showed that, if genotyping with one random marker, the mean value of selfing ratio was 46.7%. As more markers were tested and more polymorphisms were detected between the maternal plant and unknown pollen parents, the cumulative selfing ratio began to decline. When the number of marker loci increased up to eight or more, the cumulative selfing ratio dropped to 0 ( Figure 6). Population 2: Twenty-two different families with totally 99 progeny were genotyped for 10 loci with five sets of duplex PCR (Sets 1, 9, 12, 14 and 18). These progeny plants were from seeds harvested in a field plot by bagging panicles. Unexpectedly, of 99 progeny, 34 (34.3%) were identified as contaminants because each of them showed no alleles from their corresponding maternal parents in at least two loci ( Figure 5B, indicated by lozenges). In the remaining 55 progeny, 10 plants from four families (i.e., NS85-1, 2, 3, 5, 6, and 7; SN16-1 and -2; SN17-1; and SN44-1) shared the same alleles from their respective maternal parents, and therefore were identified as selfed progeny ( Figure 5B, indicated by asterisks). The other 55 progeny were identified as hybrids due to their possession of alleles, which were different from their seed parents (Table 3). Later, the 10 selfed progeny were further confirmed by detecting two additional SSR duplexes (Sets 2 and 24). The overall selfing ratio by the bagging method was only 15.4% (10/65) if contaminants were excluded.
Population 3: Forty-four progeny plants from seed of a breeding line ('SL93 7x15' , abbreviated as SL93) grown in a controlled growth chamber, in which another breeding line ('NL94 LYE 16x13' , NL94) was grown as a potential pollen donor, were genotyped with five sets of PCRduplex (Sets 1, 11, 12, 16 and 24, Figure 5C). The genotyping results were consistent across all 10 SSR loci. For the 44 progeny from SL93, 34 (77.3%) were identified as originating from self-fertilization and 10 (22.7%) were hybrids between SL93 and NL94. SL93, like NL94 as reported recently [9], was a self-compatible genotype in the specific environment.

Discussion
Switchgrass has become one important energy crop in the USA and Europe due to its high biomass yield and adaptability on marginal lands, low nutrient and water requirements, and powerful ability as a carbon sink [1][2][3]8] Table 2. The DNA templates were the same as Figure 3.  [39] and soybean [40]. In other crops, such as rapeseed [27], cotton [31], and sorghum [30,41], multiplex PCR systems have been established, although markers were selected not covering the whole genome.

SSR marker polymorphism
Initially, DNA samples from eight diverse switchgrass cultivars were pooled together, which not only kept the diversity of different ecotypes of switchgrass but simultaneously minimized the number of genotypes used for the preliminary screening of SSR markers. Similar strategy had been used in a previous switchgrass study [19]. Generally, SSR markers with dinucleotide repeats were more polymorphic than trinucleotide repeats in several plant species, such as barley [42], rice [43], wheat [44], maize [45], and soybean [46]. In this study, of all  scorable PPs in the preliminary screening, the dinucleotide repeat SSRs produced a significantly greater number of alleles than those with trinucleotide repeats (Table 1), which was consistent with previous studies in switchgrass [19,24,47]. However, in the final 48 PPs selected for duplex PCR, we found that both classes of SSRs were equally polymorphic. The PIC values for the selected dinucleotide and trinucleotide SSRs were not significantly different (0.823 vs. 0.820), and the number of alleles per locus were nearly identical (12.3 for dinucleotide and 12.6 for trinucleotide repeats) ( Table 2).

Duplex PCR
Due to the competition of primers, DNAs, Mg 2+ , and other reaction components, PCR multiplexing generally requires some optimization [26]. In this study, we found increasing Taq and Mg 2+ concentrations improved the duplex PCR quality (Figure 3). Previous studies showed that a Taq DNA polymerase concentration (with an appropriate increase in MgCl 2 concentration) four to five times greater than that required in monoplex PCR, was necessary to achieve optimal nucleic acid amplification [48]. In contrast, the alteration of other PCR components such as PCR buffer constituents, dNTPs, and primer absolute concentrations in multiplex PCR over those reported for most monoplex PCRs usually resulted in little improvement in the sensitivity or specificity of the test [49]. But another study showed that only increasing the buffer concentration markedly improved the quality of multiplex PCR [34]. It is evident that the optimization is necessary in developing multiplex PCR [26,35,36]. Aside from technical factors discussed above, the selection of SSR markers to create PCR duplexes in  switchgrass also integrated information of marker map position, allele-length range, genotyping quality, and polymorphism. The 48 selected markers covered the major portion of switchgrass genome based on the available genetic maps although they were not evenly distributed in the genome. Despite that we tried to select unlinked markers before designing duplex combinations, the selected marker loci in LG 2b and 5b remained linked although most of them >10 cM. Tightly linked loci are not ideal in paternity analysis because they are usually inherited together, but these markers indeed provide more choices, especially in case some of them lack polymorphisms within and between tested parents. The band-size separations of individual SSR markers in each duplex combination were mostly more than 10-bp, which should be wide enough to unequivocally score alleles amplified in major switchgrass lowland varieties. Even for the nearest distance of non-allelic bands (3-bp, duplex Set 15), it could be easily differentiated on frequently-used polyacrylamide gels [50]. The duplex marker system might also perform well on capillary electrophoresis instruments due to their similar resolutions with the LI-COR DNA analyzer used here [50].

SSRs needed for parentage analysis
A comprehensive review of 53 articles showed an average of seven microsatellites (ranging 3 to11) was used for plant parentage studies [51]. In general, the number of markers required to resolve parentage with a given level of confidence depends on a number of factors. One of the main factors is the expected heterozygosity or polymorphism of each marker [52]. Of the 48 markers, the mean PIC value was 0.829. In an actual example (population 1) with 46 individuals derived from naturally wind pollination, four sets of randomly selected duplexes containing eight markers were enough to discriminate the breeding origin of each progeny (inbreeding vs. outcrossing). In another population (population 3) harvested from a control environment, theoretically only one polymorphic locus could assign parentage to each progeny plant. Therefore, four duplex sets identified in this study would be recommended for the identification of self-or cross-fertilized progeny. Thus, the 24 sets of duplex SSRs should provide a reservoir used for the breeding origin analysis in switchgrass.
In addition, from population 2, using 14 markers, we found the overall selfing ratio of switchgrass plants by bagging their inflorescences was 15.4%, which was slightly lower than the results in a previous report [12]. The bagging method with pillow cases did not produce only selfed progeny, perhaps because openings on pillow cases were bigger than switchgrass pollen grains. Previous study showed pollen size of switchgrass was in the range of 42.5 to 54.0 μm [53]. Therefore, bagging methods need to be improved if a large number of inbreds are needed in a hybrid breeding program in the future.

Conclusions
Based on the genome-wide screening of a large set of SSR markers, we developed a multiple duplex PCR system including 48 polymorphic PPs. The applications of this SSR test system demonstrated its high discrimination capability and effectiveness for the identification of switchgrass selfed progeny, which were produced on multiple plants in different pollination conditions. The protocol provides ample SSR markers, which should be a powerful tool for the detection of inbreds in switchgrass.

Plant material
Plants of ' Alamo' , 'Kanlow' , 'Nebraska 28' , 'Cave-in-Rock' , 'Summer' , 'Docotah' , 'Shelter' , and 'Blackwell' [63], were grown in an Oklahoma State University (OSU) greenhouse, Stillwater, OK. These cultivars had been widely used in the USA and represented eco-and cyto-type diversity within the species [64]. For initial SSR marker screening, equimolar DNAs from the eight cultivars were mixed to form a pooled DNA sample. For each cultivar, DNA sample was a mix from four to six plants.
In an OSU switchgrass nursery, ' Alamo' , 'Kanlow' , and 'Cimarron' plants were space planted on 3.5 feet × 3.5 feet centers in 2008. Four individuals from each cultivar were randomly selected. These 12 individual plants with additional four 'Summer' plants grown in the greenhouse constituted a panel (totally 16 plants), which was used for marker polymorphism analysis.
Open-pollinated seeds from a randomly selected 'Kanlow' genotype (encoded K4) in the nursery were harvested in 2010, and then they were germinated on filter paper in petri dishes after pre-chilling treatment for two weeks [65]. The obtained seedlings were transplanted into conetainers and grown in the greenhouse for leaf collection. The obtained half-sib progeny population (Population 1) of 46 individuals (encoded as K4-1 to 46) was used to examine selfing and outcrossing rates of a plant grown in the open-pollinating, natural field condition.
In 2010, 22 first-generation selfed (S1) plants of two genotypes NL94 and SL93 [9] were selected according to spring growth vigor, plant height, and crown size. And then two inflorescences from each plant were bagged with pillow cages [66] in the field before inflorescences fully emerged out. The obtained seeds were germinated respectively in a growth chamber in the spring of 2011. Survived plants were transplanted in a field plot on August 1, 2011 and constituted 22 families of totally 99 progeny plants (Population 2). The family size ranged from 1 to 13 in Population 2. Population 3 included 44 progeny collected from SL93 in a growth chamber, in which NL94 served as the pollen donor [9]. Genomic DNA was isolated from healthy leaf tissues using the CTAB method [67]. The DNA concentration was measured using an ND1000 spectrophotometer (NanoDrop Products, Wilmington, DE). The working solutions were adjusted to10 ng/μl as PCR templates.

SSR primer screening
PCR PPs were obtained principally from two sources: those on two-sister linkage maps (totally 585 PPs) [18], and the others from a recent linkage map (totally 473 PPs) [24]. Primer sequence information was collected from previous studies [15][16][17][18][19]. After excluding SSR redundancy, unique PPs were used in this study. All forward primers were appended with a M13 sequence (5'-CACGACGTTG TAAAACGAC-3') at the 5' end to allow indirect labeling in PCR reactions. The initial screening for polymorphism was performed using the pooled DNA sample to determine the number of alleles at each locus. Candidate SSR PPs were selected based on the previous data generated in the mapping experiment [24] and screening results of amplifying clear bands, displaying four or more alleles per locus and avoiding tight linkages (i.e., >10 cM between two neighboring loci). And then selected PPs were tested on the panel with 16 individual DNA samples to determine polymorphism. Monoplex PCR with 10 μl volume mixtures each reaction and a 'Touchdown' thermal cycling program was used [68,69].

Development and optimization of duplex PCR
Based on amplified allele size, map position and heterozygosity for each of the candidate SSR markers, a smaller set of SSRs was selected for testing in duplex PCR. The criteria used to combine SSR markers into duplex PCR were as the following: (1) Non-overlapping allele size for each pair of two markers; (2) Primer compatibility and genotyping quality in duplex PCR; (3) High polymorphism estimated by PIC value [70]; (4) Two high-quality bands in each genotype due to disomic inheritance identified in tetraploid switchgrass [9,18]; (5) Genetic distance between selected SSRs ≥ 10 cM; (6) SSRs with tri-, tetra-, or higher nucleotide repeats were preferred to lessen slippage during PCR [71].
An optimization procedure was carried out before the final PP combinations for duplex PCR were assembled. Eight switchgrass genotypes from Alamo (A) and Kanlow (K), i.e., A2, A4, A5, A10, K1, K3, K4 and K5, were used as amplification templates. The SSR PPs used here to optimize duplexes were PVCAG-2397/8 and 2517/8. The adjustment of duplex PCR parameters on the amplification effect followed: increasing Taq polymerase (BioLabs®, Catalog #M0273X, NEW ENGLAND Inc., USA) from 0.25 to 0.5 units, dNTPs from 0.2 to 0.4 mM, template DNAs from 15 to 30 ng, PCR buffer from 1 × to 1.6 ×, Mg 2+ concentration from 1.5 to 2.4 mM, IR-M13 forward primer (labeled with either 700 nm or 800 nm florescence) concentration from 0.02 to 0.04 μM and PP quantity from 1.0 to 2.0 pmoles. Subsequently, the compatibilities of different SSR primer combinations were tested on the same eight genotypes.
Duplex PCRs were performed in 10 μl of reaction mixture containing 1 × PCR buffer, 2.4 mM of Mg 2+ , 0.2 mM each of dNTPs, 0.125 to 4.0 pmoles of each primer (Table 2), 0.5 units of Taq polymerase (BioLabs®, USA), 0.02 μM IR-M13 forward primer, and 15 ng of genomic DNA. The cycling parameters were the same as monoplex PCR mentioned above. PCR products labeled with 700 and 800-nm dye were pooled, and mixed thoroughly. After denaturation, they were separated using 6.5% KB plus polyacrylamide gels with a LI-COR 4300 DNA Analyzer (LI-COR Biosciences, Lincoln, NE, USA) [69].

Genotyping and data analysis
The gel bands were visually scored and band sizes were determined using Saga Generation 2 software, version 3.3 (LI-COR Biosciences, Lincoln, NE, USA). For Population 3, the scoring of PCR bands and identification of selfed progeny were the same as our former study [9]. For Populations 1 and 2, PCR bands were recorded as "ab" if two bands ("a" indicated upper band and "b" for lower band) or "aa" if only one band for the parent. In the progeny, if bands were from parents, they were