Identification, characterization, and gene expression analysis of nucleotide binding site (NB)-type resistance gene homologues in switchgrass

Background Switchgrass (Panicum virgatum L.) is a warm-season perennial grass that can be used as a second generation bioenergy crop. However, foliar fungal pathogens, like switchgrass rust, have the potential to significantly reduce switchgrass biomass yield. Despite its importance as a prominent bioenergy crop, a genome-wide comprehensive analysis of NB-LRR disease resistance genes has yet to be performed in switchgrass. Results In this study, we used a homology-based computational approach to identify 1011 potential NB-LRR resistance gene homologs (RGHs) in the switchgrass genome (v 1.1). In addition, we identified 40 RGHs that potentially contain unique domains including major sperm protein domain, jacalin-like binding domain, calmodulin-like binding, and thioredoxin. RNA-sequencing analysis of leaf tissue from ‘Alamo’, a rust-resistant switchgrass cultivar, and ‘Dacotah’, a rust-susceptible switchgrass cultivar, identified 2634 high quality variants in the RGHs between the two cultivars. RNA-sequencing data from field-grown cultivar ‘Summer’ plants indicated that the expression of some of these RGHs was developmentally regulated. Conclusions Our results provide useful insight into the molecular structure, distribution, and expression patterns of members of the NB-LRR gene family in switchgrass. These results also provide a foundation for future work aimed at elucidating the molecular mechanisms underlying disease resistance in this important bioenergy crop. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3201-5) contains supplementary material, which is available to authorized users.


Background
Switchgrass (Panicum virgatum L.) is a North American prairie grass that can be used as a second generation bioenergy feedstock. As a readily outcrossing species, native switchgrass germplasms have maintained a high level of genetic diversity over time [1,2]. Two ecotypes of switchgrass, lowland and upland, have emerged that are adapted to different growth habitats. Lowland ecotypes are generally tetraploid in nature and grow in warm, moist, southern climates whereas upland ecotypes of switchgrass can be tetraploid, hexaploid, or octoploid and are usually found growing in the northern part of the United States into southern Canada [3]. Lowland ecotypes also typically produce more biomass and are more tolerant to diseases than their upland counterparts; however, upland ecotypes are considered to be more tolerant to drought and cold stresses [3,4].
Currently, industrial scale breeding programs of switchgrass have focused on optimizing biomass yield and improving feedstock quality in order to produce more biofuel [5]. However, this practice is likely to reduce the genetic diversity in switchgrass that promotes disease resistance. Airborne foliar fungal pathogens, like switchgrass rust, have potential to cause nationwide epidemics on switchgrass and have been shown to cause significant biomass yield losses [6]. The causal agent of switchgrass rust, Puccinia emaculata Schw., is widespread and has been reported in Tennessee [7], Arkansas [8], Virginia [9], and Mississippi [10]. The majority of switchgrass cultivars, including lowland and upland ecotypes, have been shown to be moderate to highly susceptible to this rust pathogen [11,12].
In order to understand and improve disease resistance in switchgrass, the molecular mechanisms underlying tolerance to pathogen infection must be elucidated. A recent report found that many genes were differentially expressed during switchgrass rust infection and that some of these genes belonged to the NB-LRR gene family, which is the largest family of plant disease resistance (R) genes [13]. NB-LRR genes encode proteins that contain a C-terminal leucine rich repeat (LRR) domain, a highly conserved central nucleotide binding (NB) domain, and a variable N-terminal region [14]. The NB domain has been well characterized with regards to several preserved motifs including the Walker-A/P-loop, kinase 2, RNBS/kinase 3, and GLPL [15]. Recent studies have suggested that the LRRs within the C terminal region may contain either a 'LxxLxxLxxLxLxx' signature [16] or a 'LxxLxxL' signature [17]. Two major classes of NB-LRR genes have emerged based on conserved sequences within the N-terminal. These classes are characterized by the presence of either a coiled-coil (CC) motif or a domain homologous to the Drosophila Toll and mammalian Interleukin-1 Receptor (TIR) [15]. To date, no TIR-NB-LRR genes have been identified in cereal grass species [18].
Putative NB-containing R genes have been identified in numerous plant species by experimental methods, such as PCR cloning [19] which has been used to identify potential RGHs in species such as Arabidopsis [20], rice [18], and cotton [21]. Homology-based bioinformatics approaches have also been used to identify thousands of putative NB-containing R genes in plants, including several important crop species such as rice [22], potato [23], and soybean [24]. A recent study by Li et al. [25] used readily available genomes for four grass species to computationally identify 129, 245, 239, and 508 NB-LRR R genes in maize, sorghum, Brachypodia, and rice, respectively. Alternatively, advances in next generation sequencing technologies have allowed for preferential high throughput sequencing of R genes in a process known as Resistance gene enrichment Sequencing (Ren-Seq) [26]. This method has been successfully used in potato and tomato to identify NB-LRR resistance genes [26].
Depending on the size of the plant genome, it is estimated that NB-LRR genes comprise on average between 0.5 and 1.8 % of the protein coding genes [15,27]. A few plant species, such as cucumber [28] and papaya [29], have considerably less than 0.5 % NB-LRR genes.
Additionally, NB-LRR genes in grass species have been calculated to make up anywhere from 0.4 to 1.35 % the total number of predicted proteins [25]. Despite recent advances in determining potential R genes, few studies have been performed in switchgrass. One recent report by Zhu et al. [30] identified RGHs in switchgrass using a combination of PCR-cloning and EST database mining. The results of this study identified approximately 380 RGHs. Since there are an estimated 98,007 protein-coding loci in the switchgrass genome [31], the number of RGHs identified by Zhu et al. [30] is roughly 0.39 % of the total number of protein-coding transcripts in switchgrass. The recent release of the draft switchgrass genome (v 1.1) provides an excellent opportunity to comprehensively identify NB-LRR RGHs in this important bioenergy plant. Using a homology-based computational method, 1011 putative NB-containing RGHs were identified in the switchgrass genome (v 1.1). We also identified several switchgrass RGHs that contained unique domains, including a jacalinlike lectin binding domain, a calmodulin-like binding domain, and a major sperm protein domain. Additionally, RNA-sequencing (RNA-seq) data from a rust-resistant cultivar, ' Alamo' [9], and a rust-susceptible cultivar, 'Dacotah' [9], was used to identify variants within the RGHs and to examine basal expression differences of these RGHs in uninfected leaf tissue. An analysis of RNA-seq datasets from the flag leaves of field-grown cultivar 'Summer' plants indicated that the expression levels of some RGHs were developmentally regulated. The results of this study significantly improve our understanding of disease resistance genes in this important biofuel crop species.

Structural analysis of the newly identified RGHs
Coils (v 2.2) [34], a program embedded in the InterProScan software (v 5.11) [35], was employed to search the putative switchgrass RGHs for the presence of CC domains. SMART (Simple Modular Architecture Research Tool, http://smart.embl.de/), a widely used online resource that identifies and annotates protein domains and protein domain architectures [36], was used to search the RGHs for TIR domains. The SMART and Pfam databases were also used to detect other highly conserved unique domains that may be present in the RGHs. SignalP (v 4.0) [37], as part of InterProScan, was used to analyze the N-terminal region of the putative switchgrass RGHs for the presence of signal peptides. TargetP 1.1 (http://www.cbs.dtu.dk/services/Targ etP/) was then employed to analyze the subcellular localization of the switchgrass RGHs that contained a signal peptide [38]. Finally, NLStradamus [39] was used to determine if any of the RGHs may contain a putative nuclear localization signal.

Phylogenetic analysis of switchgrass RGHs containing fulllength NB domains
The amino acid sequences of the switchgrass RGHs were manually searched for the presence of 4 highly conserved motifs within the NB domain: the P-loop/WalkerA, the Kinase 2, the RNBS, and the GLPL [15]. For those RGHs in which all four motifs could be easily identified, the amino acid sequences were extracted and compiled together. Clustal Omega [40] was used to align the sequences and Skylign [41] was used to determine the consensus sequences for the four conserved motifs mentioned above. Sequences that closely followed the consensus sequence for each motif were kept for further analysis. To analyze the genetic diversity present in the switchgrass RGHs, the amino acid sequences of the full-length NB domains, starting from the beginning of the P-loop and ending at the GLPL motif, were compared to 116 full-length NB domains from Brachypodium distachyon [42]. All of the sequences were aligned using ClustalW, as part of the MEGA software (v 6) program, and the default parameters [43]. After alignment, a Maximum-likelihood phylogenetic tree was constructed in MEGA using all sites and the remaining default parameters and a bootstrap value of 100.

Plant materials
The seeds for the ' Alamo' and 'Dacotah' plants used in this study were obtained from the USDA Plant Genetic Resources Conservation Unit (Griffin, Georgia). The plants were maintained in the greenhouse at Virginia Tech. The source of the field grown switchgrass plants were maintained at the experimental farms of the University of Nebraska, near Mead, NE [44].

Plant growth conditions and tissue collection
One rust-resistant ' Alamo' plant and one rust-susceptible 'Dacotah' plant were used for this study. The plants were grown in the greenhouse facility at Virginia Tech with a 12-14 h photoperiod and day/night temperatures of 28 and 22°C, respectively. After 3 months of growth, each plant was clonally propagated into four biological replicates by planting single tillers in Miracle-Gro Potting Mix (Miracle-Gro Lawn Products, Inc., Marysville, OH, USA) in 1.1 × 10 -2 m3 pots. The plants were watered twice a week and maintained in the greenhouse. After 2 months of growth, the first fully expanded leaf of E2-E3 stage tillers was collected for each biological replicate, frozen in liquid nitrogen, and stored at −80°C until RNA isolation.

RNA isolation and sequencing
Total RNA was isolated from switchgrass leaf samples using a modified TRIzol combined with columns method. Briefly, frozen leaf samples were ground to a fine powder in liquid nitrogen using a mortar and pestle. TRIzol reagent was added to the frozen tissue and total RNAs were extracted following the manufacturer's protocol. After precipitation of the RNA with 100 % isopropanol, the mixture was applied to an RNeasy spin column that is part of the Qiagen RNeasy Plant Mini kit (Qiagen, Valencia, CA, USA). The remaining RNA isolation steps, including DNase treatment, were performed according to the manufacturer's protocol.
After RNA isolation, the quality and quantity of the RNA was assessed using a Nanodrop-D1000 (Nanodrop, Wilmington, DE, USA) and a bioanalyzer (Virginia Bioinformatics Institute, Blacksburg, VA, USA). RNA for one sample of ' Alamo' and one sample of 'Dacotah' was sent to the Genomics Resources Core Facility at Weill Cornell Medical College (Cornell Medical School, New York, NY, USA) for 101 bp paired-end RNA-sequencing. The remaining six RNA samples (3 biological replicates per cultivar) were sent to the Genomics Facility of Michigan State University (Lansing, MI, USA) for 50 bp single-end RNA-sequencing.

RNA-seq analysis and variant detection between 'Alamo' and 'Dacotah'
The 101 bp paired-end RNA-seq datasets for ' Alamo' and 'Dacotah' were imported and analyzed using CLC Genomics Workbench (v 7.5). Raw reads that passed default quality scores and were greater than 50 bp in length were mapped to the switchgrass (v 1.1) reference genome ( Table 2) using the RNA-seq analysis tool with the following parameters: mapping was also performed to intergenic regions, only 1 hit was allowed per read, a similarity fraction of 0.9, a length fraction of 1.0, a mismatch cost of 2, an insertion cost of 3, and a deletion cost of 3.
After read-mapping, the reads that mapped were locally realigned and variants were called using the Basic Variant Detection tool with the following parameters: broken pairs were not ignored, non-specific matches were ignored, minimum read coverage of 3, minimum variant count of 2, and a minimum variant frequency (%) of 25.0. Following variant detection, the variant tracks for both ' Alamo' and 'Dacotah' were filtered against the RNA-seq mapped reads of the other cultivar, respectively, to identify variants uniquely different between them. High-quality variants, including single nucleotide polymorphisms (SNPs), multiple nucleotide polymorphisms (MNPs), insertions/deletions, and replacements were selected for based on the following criteria: zygosity = homozygous, Frequency = 100 %, coverage ≥3, control count = 0, control coverage ≥3, and control frequency = 0 %. The resulting lists from both variant detections were compiled together for a comprehensive list of high quality variants between the two cultivars (Additional file 1: Table S1).
Gene expression analysis of switchgrass RGHs in 'Alamo', 'Dacotah', and 'Summer' In order to determine if any of the RGHs identified in this study are expressed basally in switchgrass leaves, the 50 bp single-end RNA-Seq datasets for the three biological replicates of ' Alamo' and 'Dacotah' were analyzed using CLC Genomics Workbench. Reads with a quality score (Q-score) >20 were mapped to the switchgrass reference genome (v 1.1) ( Table 2). After read-mapping, the gene expression output tracks for all replicates of ' Alamo' and 'Dacotah' were compared using readily available tools within the program. A gene was considered expressed if five or more total reads mapped to it. Differential expression analysis between the two cultivars was carried out using an Empirical analysis of Differential Gene Expression (EDGE) test that is available as part of the CLC Genomics Workbench software. The results were filtered based on a False Discovery Rate (FDR) p-value of < 0.05 and a corrected fold change >2.
We utilized the raw RNA-sequencing data obtained from flag leaves of field grown cultivar 'Summer' plants [44] to analyze developmental gene expression of the 1011 switchgrass RGHs over the course of a growing season. Only RGHs that exhibited detectable expression levels in all three biological replicates, and also in at least one of the five sampled time points, were kept for further analysis. Weighted Gene Co-expression Network Analysis (WGCNA, version 1.43) in R was then used to identify RGHs that demonstrated similar expression patterns [45][46][47]. Signed co-expression networks were identified using WGCNA with a soft threshold value of 16 and a minimum module size of 30. The module eigengene (ME), which corresponds to the first principal component of a specific module, represents the expression pattern of each co-expression module.

Isolation of DNA from 'Alamo' and 'Dacotah'
DNA was isolated from young leaf tissue using a modified CTAB method [48]. The DNA was re-suspended in 1x TE buffer (10 mM Tris pH 8.0, 10 mM EDTA pH 8.0) and the quantity and quality was measured using a Nanodrop ND-1000 (Wilmington, DE, USA).

Validation of SNPs using allele-specific PCR primers and DNA sequencing
Allele-specific SNP PCR primers for both ' Alamo' and 'Dacotah' were designed as previously described [49]. Both the forward and reverse primers for each locus accounted for a SNP at the 3′ end. The 3 rd base pair from the 3′ end of each primer was changed according to the recommendations from Hirotsu et al. [49] in order to maximize allele-specificity. Four sets of forward and reverse primers that detected 16 SNPs were designed and used for PCR amplification in a total reaction volume of 20 μL (Additional file 2: Table S2). After the reactions were completed, the products were separated by gel electrophoresis on a 1.0 % agarose gel and visualized using a GelDoc system (Bio-Rad, Hercules, CA, USA).
Conserved PCR primers were also designed to amplify DNA fragments containing SNPs between ' Alamo' and 'Dacotah' (Additional file 3: Table S3). These SNPs were validated using traditional DNA sequencing. PCR reactions were performed in a total volume of 30 μL and contained the following components: 15 μL of high fidelity iProof (Bio-Rad, Hercules, CA, USA), 11 uL of ddH 2 O, 2 μL of 200 ng/μL DNA, 1 μL of 10 μM forward primer, and 1 μL of 10 μM reverse primer. The conditions for the PCR reactions were: 98°C for 3 min, followed by 30 cycles of denaturation at 98°C for 30 s, annealing at 60°C for 45 s, extension at 72°C for 1 min and 30 s, and a final extension at 72°C for 7 min. Once the PCR reactions finished, 2 μL of each reaction was run on a 0.8 % agarose gel in order to verify amplification. Next, the remaining 28 μL of each reaction was purified. Finally, all purified PCR products were sent for DNA sequencing.

Availability of data and materials
The data and datasets supporting the conclusions of this article are included within the article and its supplemental files. In addition, the RNA-seq datasets used in this article have been deposited in the GenBank database and can be found under the following accession numbers: SRR3473343, SRR3473344, SRR3467193, SRR3467194, SRR3467195, SRR3467196, SRR3467197 and SRR3467198. The phylogenetic tree generated in this study has been deposited in TreeBASE (Submission 19789) and can be accessed at the following URL: http:// purl.org/phylo/treebase/phylows/study/TB2:S19789.

Identification of 1011 putative NB-containing RGHs in switchgrass
In this study, a total of 1542 switchgrass proteins were identified that contained R gene-like NB-ARC domains. After selecting for protein sequences with an NB-ARC e-value of <1E-10 and removing alternatively spliced transcripts, a final number of 1011 unique switchgrass proteins were collected that contained R gene-like NB-ARC domains ( Fig. 1, Additional file 1: Table S1). This accounts for approximately 1.03 % of the total protein-coding genes in switchgrass. From here on, these will be referred to as putative switchgrass RGHs. The putative RGH proteins varied in length from 98 amino acids (Pavir.J31808) to 1649 amino acids (Pavir.Ba02898). Of the 1011 RGH proteins, 695 are considered to be complete protein sequences in the switchgrass genome (v 1.1), meaning they are predicted to contain a start methionine (ATG) and a stop codon. The remaining 316 RGH proteins lacked one or both of these features. All of the 1011 putative switchgrass RGHs were manually annotated to validate the presence of the NB-ARC domain (Additional file 4: File S1).
Since NB-LRR disease resistance genes tend to cluster together in plant genomes [15], the availability of a draft reference genome for switchgrass provided an opportunity to analyze the chromosomal distribution of the 1011 RGHs. The current version of the switchgrass genome (v 1.1) is comprised of 18 main pseudomolecules that represent the A and B subgenomes [31]. An additional several hundred thousand sequences are located on unanchored contigs [31]. Of the 1011 RGHs identified in this study, 511 were assigned to one of the major pseudomolecules while the remaining 500 were dispersed among unanchored contigs (Fig. 2). Switchgrass chromosome 8 was found to contain the most RGHs with 92 genes located on Chr08a and 93 genes located on Chr08b. In contrast, chromosomes 4 and 7 contained the least total number of RGHs with 20 and 21, respectively (Fig. 2).

Structural analysis of the 1011 newly identified RGHs in switchgrass
The two major classes of NB-LRR disease resistance genes contain either a coiled-coil (CC) motif or a domain homologous to the intracellular signaling domain of Drosophila Toll and mammalian Interleukin-1 Receptors (TIR) in their N-terminals [14]. A total of 405 genes were discovered that contain putative CC domains (Fig. 1, Additional file 2: Table  S2). In contrast to the CC domains, no TIR domains were detected in any of the 1011 RGHs.
Using SignalP, 19 of the 1011 switchgrass RGHs were identified to contain an N-terminal signal peptide (Additional file 3: Table S3). None of these signal peptides were predicted to span a cellular membrane. Based on the amino acid sequence of the signal peptide, TargetP (v 1.1) was used to predict the subcellular localization of these proteins (Additional file 3: Table S3). Of the 19 proteins, 13 were predicted to enter the secretory pathway and 4 were predicted to go to the mitochondria. The remaining proteins, Pavir.Ea02039.1 and Pavir.Ba00602.1, were predicted to go to other subcellular locations that were not the mitochondria or the chloroplast. Several NB-LRR disease resistance proteins, such as RRS1 [50], have been shown to localize in the plant nucleus upon pathogen detection. NLStradamus was employed to search the 1011 RGHs for the presence of putative nuclear localization signals (NLSs). A total of 104 of the 1011 RGHs were identified to contain a putative NLS (Additional file 5: Table S4). Of these proteins, the NLS was predicted to be in the N-terminal region in 72 of the RGHs, while 32 of the proteins were predicted to contain an NLS in their C-terminal region.
The majority of NB-containing resistance genes contain highly variable leucine rich repeats (LRRs) in their C-terminals. LRRs are believed to function in protein-protein interactions, as well as in ligand binding [14]. The 1011 switchgrass RGH proteins were manually screened to identify a consensus sequence (LxxLxxLxx) that was predominant among the RGH proteins. A Perl  Table S5). The number of 'LxxLxxLxx' signatures identified ranged from 1 to 19 per protein with the majority of these proteins containing between 5 and 9 (445 or 65.2 %). Therefore, the majority of the switchgrass RGHs do in fact have the typical NB-LRR gene structure.

Identification of unique domains in the 1011 switchgrass RGHs
Previous reports have identified unique domains in the Nand C-terminals of NB-LRR resistance genes and have suggested that these domains may play a role in R gene function [51]. As summarized in Table 1, 40 switchgrass NB-LRR proteins were also predicted to contain other known functional domains. These unique domains can be classified into 7 different functional categories: protein modification, DNA binding/transcription, protein trafficking and vesicle movement, protein-protein interaction, sugar binding, signal transduction, and transposable element. The majority of switchgrass RGHs with unique domains (24 out of 40, or 60 %) fell into the protein modification category. Of these 24 proteins, two RGHs contained a thioredoxin domain (Pavir.Fa01782 and Pavir.Hb00484), one RGH contained a glutaredoxin domain (Pavir.Bb01048), and one RGH contained a phosphatase domain (Pavir.J24356). A putative C-terminal NLS was also detected for Pavir.Fa01782. Interestingly, the remaining 20 RGHs were found to contain a protein kinase domain (Table 1). Of these 20 proteins, ten are located on chromosome 8 (5 on Chr08a and the other 5 on Chr08b). Two of these protein kinase-containing RGHs, Pavir.Ha00561 and Pavir.Ha01108, were also found to contain a NLS in their C-terminal and N-terminal, respectively.
The next largest category included RGHs with domains that function in DNA binding and transcription. Two switchgrass RGHs, Pavir.Ba02315 and Pavir.J20878, were found to contain an N-terminal B3 DNA binding domain. Interestingly, Pavir.J20878 is also predicted to contain a C-terminal WRKY domain. Five other RGHs including Pavir.Fa02339, Pavir.Ga00028, Pavir.Gb00931, Pavir.J19380, and Pavir.J40131 were also found to have an N-terminal zinc finger-BED DNA binding domain (Table 1).
Finally, the remaining eight switchgrass RGHs fell into the last 5 categories. Three switchgrass RGHs (Pavir.Ib02384, Pavir.Ib02433, and Pavir.J18369) were predicted to contain an N-terminal domain homologous to major sperm protein (PF00635) ( Table 1). These proteins are classified in the protein trafficking and vesicle movement category. Two switchgrass RGHs were predicted to contain domains that may play a role in protein-protein interactions (Table 1). These include Pavir.J03445, which has a C-terminal WD40 domain, and Pavir.Fb01504, which contained a C-terminal hAT family dimerization region that is common to transposable elements. Pavir.J03445 was also predicted to contain an N-terminal nuclear localization signal. Another element that is found in some transposons, a gag-polypeptide of LTR copia-type domain, was predicted in the C-terminal of Pavir.Aa01444. The final two RGHs with unique domains, Pavir.Hb01174, which contains a

Phylogenetic analysis of switchgrass sequences containing a full-length NB domain
Of the 1011 switchgrass RGHs, 600 were found to contain a full-length NB domain in which four highly conserved motifs could be found: the Walker-A/P-loop, the Kinase 2, the Kinase 3/RNBS-B, and the GLPL. These 600 sequences were then aligned to determine the consensus sequences for these four motifs (Fig. 3).  [52] and Brachypodium [42]. After generating the consensus sequences for the four highly conserved motifs, only 578 switchgrass RGHs were found to closely match these sequences. The amino acid sequences for these RGHs, along with those of 116 Brachypodium NB-LRR genes [42], were extracted and then subjected to phylogenetic tree analysis. The un-rooted phylogenetic tree can be divided into 35 different groups, with the majority of groups containing both switchgrass and Brachypodium NB-LRR genes (Fig. 4, Additional file 7: Figure S1). Switchgrass and Brachypodium diverged about 50 million years ago [53]. However, most NB-LRR genes are still conserved in the two plant species, since no clear separation of switchgrass or Brachypodium genes clusters were observed (Fig. 4, Additional file 7: Figure S1). This is also supported by strong bootstrap values (>50) of many clades that included sequences from both species.
Switchgrass NB-LRR genes are enriched on chromosome 8 (Fig. 2). Therefore, we examined if chromosome 8 NB-LRR genes (highlighted in red) tended to cluster together in the phylogenetic tree (Fig. 4). As shown in Fig. 4, four small clusters of chromosome 8 NB-LRR genes are evident. These clusters were further analyzed to determine if the ten chromosome 8 RGHs that were predicted to contain a protein kinase domain were phylogenetically similar. Seven of the 10 protein kinase-containing RGHs (highlighted in green) were located in the same cluster, indicating that these genes are highly homologous. These genes could have rapidly evolved from the same ancestor and duplicated on chromosome 8. Interestingly, no Brachypodium genes were located in this cluster. The remaining three protein kinase-containing RGHs were dispersed across the phylogenetic tree with two of the genes clustering together in the same group as the bigger Fig. 3 Skylign output of the NB region for the 600 switchgrass sequences that contained a full-length NB domain. Four highly conserved motifs were identified in these sequences: the Walker-A/P-loop motif, the Kinase 2 motif, the Kinase 3/RNBS-B motif, and the GLPL motif. The height of each letter within a stack corresponds to the frequency and conservation of that letter at that position cluster. A similar trend of genes on the same chromosome clustering together was also observed for the Brachypodium genes (Fig. 4, Additional file 7: Figure S1).

Identification of variants between RGHs in 'Alamo', a rustresistant switchgrass cultivar, and 'Dacotah', a rustsusceptible switchgrass cultivar
The RNA-seq mapping statistics obtained from the analysis of these datasets is shown in Table 2 Table 2). The slightly higher number of ' Alamo' reads mapping to the reference genome could be attributed to the fact that the switchgrass reference genome is a plant from the cultivar ' Alamo' (AP13).
After alignment to the switchgrass reference genome (v 1.1), the RNA-seq reads for both ' Alamo' and 'Dacotah' were analyzed in order to identify variants, including SNPs, MNPs, insertions/deletions (indels), and replacements in the RGHs. A total of 23,156 variants were found in 781 RGHs between ' Alamo' and 'Dacotah'. After filtering for high quality homozygous variants, 2634 variants were found in 344 RGHs. These variants include 2347 SNPs, 136 MNPs, 145 indels, and 6 replacements (Additional file 8: Table S6).
Allele-specific PCR is a high throughput and costeffective way of distinguishing SNPs at a particular locus without the need for DNA sequencing [49]. A total of 8 primer pairs, 4 for ' Alamo' and 4 for 'Dacotah' (Additional file 9: Table S7), were designed to validate 16 SNPs identified by RNA-seq analysis. The results showed that the primers designed to detect the 'Dacotah' alleles were more specific to 'Dacotah' DNA than the primers designed to detect the ' Alamo' alleles (Fig. 5). The presence of unwanted PCR products using these allele-specific primers suggests that the primers may be amplifying unwanted PCR products. Alternatively, the other allele may actually be present in each DNA sample but is either not expressed or expressed at a level not captured by the sequencing depth of this experiment and thus, was not detected in our RNAsequencing data.
Traditional PCR amplification and DNA sequencing with conserved primers was also used to validate SNPs identified from RNA-seq analysis. A total of 8 different primer sets were designed to detect 12 SNPs within 8 putative RGHs (Additional file 10: Table S8). Using this method, DNA sequencing confirmed the presence of 11 out of the 12 SNPs (Additional file 11: Figure S2). For the one SNP in Pavir.Ib01513 that could not be validated, the PCR products for the ' Alamo' and 'Dacotah' sequences differed greatly from the expected DNA sequence, indicating that this primer set may have amplified unwanted targets.  Analysis of gene expression of the 1011 switchgrass RGHs between 'Alamo' and 'Dacotah' It has been suggested that NB-LRR disease resistance genes are basally expressed in plant tissues [54]. Upon pathogen detection, the expression of these genes is up-regulated in order to initiate defense responses [55]. RNA-sequencing reads from three biological replicates of ' Alamo' and 'Dacotah' ( Table 2) were used to determine if any of the 1011 switchgrass RGHs were basally expressed in un-inoculated leaf tissue. Of the 1011 RGHs, 338 were expressed in both cultivars (Additional file 12: Table S9). For the individual cultivars, 117 RGHs were found to be expressed only in ' Alamo' and 134 RGHs were found to be expressed only in 'Dacotah' (Additional file 13: Table S10). Differential expression of NB-LRR resistance genes has been shown to play a role in plant disease response. The results of gene expression analysis were filtered such that the 338 genes expressed in both cultivars were considered significantly differentially expressed if their RPKM values had a fold change greater than 2 and an FDR p-value less than 0.05. Using these criteria, 21 genes were found to be significantly differentially expressed between the two cultivars (Table 3). Overall, the results of this analysis suggest that these two different cultivars basally express RGHs.

Expression of specific switchgrass NB-LRR genes are regulated by leaf developmental stages
Recent studies have suggested that NB-LRR resistance genes are also differentially regulated during various developmental stages [56,57]. In order to determine if this occurs in switchgrass, RNA-seq data from the flag leaves of field grown cultivar 'Summer' switchgrass plants, collected over five time points from July to September of 2012 [44], were obtained and analyzed for NB-LRR gene expression. A total of 755 of the 1011 RGHs were found to be expressed in all 3 biological replicates from at least 1 time point. These 755 RGHs were selected for Weighted Gene Correlation Network Analysis (WGCNA) which classified these genes into eight co-expression modules (Fig. 6). Modules 1, 2, and 7 ( Fig. 6a, b, and g) had clear harvest/developmental stage specific expression at anthesis (7/27), seed set (8/16), and at both anthesis and seed set, respectively. Modules 4 and 6 were comprised of RGHs that were expressed at high levels at heading (7/03; Fig. 6d and f, respectively); however, module 4 showed additional Fold Change = expression level difference between the two samples when using 'Dacotah' as the reference sample 3 FDR corrected p-value = p-value that the Fold Change is significant based on the corrected False Discovery Rate expression at seed maturation (8/31) whereas module 6 showed additional expression at seed set (8/16). Coexpressed RGHs found in module 3 showed primary expression at seed maturation (8/31; Fig. 6c) and additional minor expression at heading (7/03). Finally, RGHs in modules 5 and 8 were expressed mainly at the onset of senescence (9/19; Fig. 6e and h, respectively). In general, the RGHs belonging to module 8 were expressed at a higher level than those belonging to module 5, although some variation was observed among the biological replicates.

Discussion
The identification and validation of plant disease resistance genes is a major focus in the molecular investigations of plant-pathogen interactions. While other studies have aimed to understand the molecular mechanisms controlling switchgrass resistance to switchgrass rust [13,30], none of these studies have mined the currently available draft switchgrass genome (v 1.1) for potential NB-LRR resistance gene homologs. In this research, a homology-based computational approach was used to   Approximately 266 % more RGHs were identified than in a similar study that detected switchgrass RGHs from EST sequences and PCR-based cloning [30]. However, the total number of RGHs in switchgrass may change with further refinement of the switchgrass genome; although, the percentage of RGHs identified in this study is similar to rice [58]. Structural analysis of the 1011 switchgrass RGHs provided useful insights into their putative molecular functions. We identified several of the major features expected in plant NB-LRRs [20,54] in switchgrass. Additionally, the large number of putative RGHs indicates that there is a substantial repertoire of disease resistance genes in the switchgrass genome and that a robust immunity potential is present in the event of pathogen invasion. However, further studies are needed to validate the sub-cellular localization and functions of these proteins.
The NB domain of plant R protein has been shown to act like a molecular switch and function in signal transduction pathways. In analyzing the 1011 switchgrass RGHs, 600 were found to contain a full-length NB domain while the remaining sequences were missing one or more highly conserved motifs (P-loop/WalkerA, Kinase 2, RNBS, and GLPL; [15]. This could be explained by the assembly of the switchgrass genome (v 1.1), which consists of 18 pseudomolecules and several thousand additional contigs ranging in size from 1000 to 88,021 bp [31]. Incomplete NB-LRR gene sequences in switchgrass could be a result of incomplete duplications or transversions, incomplete assembly or annotations, or may actually be pseudogenes. Some NB-LRR pseudogenes have been shown to code for non-functional or truncated protein products [59]. Interestingly, the evolution of a pseudogene at the Pid3 gene locus has been found to promote disease in susceptible rice cultivars [60].
Sequence duplication and divergence is also prominent in NB-LRR genes. Phylogenetic analysis of the switchgrass and Brachypodium NB-LRRs found that the majority of the Brachypodium NB-LRRs have been conserved in switchgrass. Several switchgrass RGHs, including ten that were identified to contain a protein kinase domain, however, appear to have emerged after the two species diverged.
Unique domains other than the NB-LRRs identified in the switchgrass RGHs have also been identified in other species, and could play roles similar to ones reported recently in Arabidopsis. In Arabidopsis the RRS1-R gene encodes NB-LRR protein that contains a WRKY domain that acts as a decoy to intercept effector molecules secreted by Pseudomonas syringae pv. pisi and Ralstonia solanacearum [61].
Two switchgrass RGHs, Pavir.Fa01782 and Pavir.Hb00484, were found to have a C-terminal domain homologous to thioredoxin proteins. One switchgrass RGH, Pavir.Bb01048, was predicted to have a C-terminal glutaredoxin domain. Thioredoxin and glutaredoxin proteins participate in oxidation/reduction reactions and have been associated with increased disease resistance in tobacco [62,63] and increased disease susceptibility in Arabidopsis [64]. Therefore, the NB-LRR containing a thioredoxin domain may function in disease resistance by reducing pathogen-induced oxidative stresses. Since glutaredoxin has been shown to promote disease resistance [64], the presence of a glutaredoxin domain in a NB-LRR disease resistance gene may function as a decoy similar to the data described by Sarris et al. [61].
A total of 5 switchgrass RGHs were predicted to contain unique domains that are involved in DNA binding. One switchgrass gene, Pavir.J03445, was found to contain a WD40 domain in the C-terminal. Plant genes that contain WD40 domains have been shown to be differentially regulated during pathogen infection, suggesting that these genes may be important regulators of defense-related responses [65,66]. Another switchgrass gene, Pavir.J20878, which contains an N-terminal B3 DNA-binding domain, was also found to contain a C-terminal WRKY DNA-binding domain. This further supports a role for Pavir.J20878 in DNA binding and transcription regulation.
Aside from DNA-binding, several other smaller categories were identified. One switchgrass RGH, Pavir.Hb01174, is predicted to function in sugar binding. This particular RGH contains a C-terminal jacalin-like lectin binding domain. Jacalin-like lectin domains bind carbohydrates, mainly mannose and galactose, and have been shown to play an important role in disease resistance [67]. For example, the RTM1 gene of Arabidopsis encodes a protein that contains a jacalin-like lectin domain and this protein is critical for inhibiting long-distance movement of the tobacco etch virus [68]. Additionally, three switchgrass RGHs, Pavir.Ib02384, Pavir.Ib02433, and Pavir.J18369, are predicted to function in protein trafficking and vesicle movement, a function that has not yet been demonstrated by plant disease resistance genes. These proteins contain a domain in their Nterminals that shows strong homology to a major sperm protein of nematodes. A previous report has identified a similar domain in the N-terminal region of the VAP27 protein of tomato, which has been shown to interact with the Cf9 resistance protein; however, no direct role for VAP27 in disease response has been established [69]. To our knowledge, this is the first report of a major sperm protein domain attached to the N-terminal of a NBcontaining resistance gene.
The switchgrass cultivars ' Alamo' and 'Dacotah' exhibit significantly different disease responses after exposure to switchgrass rust (Puccinia emaculata). ' Alamo' is relatively resistant to the rust pathogen whereas 'Dacotah' is highly susceptible [9]. Polymorphisms within RGHs may contribute to the disease resistance phenotype that is observed between ' Alamo' and 'Dacotah'. In our study, we identified 2634 variants between ' Alamo' and 'Dacotah' , including SNPs, MNPs, indels, and replacements. Approximately 89 % of the variants detected were SNPs. The predominance of SNPs could be attributed to the fact that single nucleotide changes in the coding regions of genes are less likely to disrupt the reading frame, which often times results in nonsense mutations. SNPs could also explain the disease response phenotypes of ' Alamo' and 'Dacotah' , as they could alter the amino acid sequence of resistance genes and potentially disrupt gene function. Since these SNPs are associated with defenserelated genes, they could be further developed into molecular markers for use in breeding of disease resistance.
In addition to polymorphisms within the RGHs, differential expression of NB-LRR disease resistance genes may contribute to the different phenotypes observed between the two cultivars. It is believed that R genes are expressed at relatively low levels in unchallenged plant cells in anticipation of pathogen attack [14]. Indeed, we found that 338 RGHs displayed expression evidence in both cultivars in an unchallenged state, supporting the idea that R genes are basally expressed in healthy plant cells [14]. There could be several reasons for the genes that we could not find expression evidence for. First, these genes could be expressed at extremely low levels in healthy plant cells and thus, they escaped detection at the sequencing coverage used in this study. Second, the expression of these genes could be induced upon pathogen detection and they are not basally expressed in healthy plant cells. Finally, some of these genes may be pseudogenes and may not be expressed under any conditions. Further studies are needed to evaluate the expression patterns of the switchgrass RGHs and to determine the exact role, if any, that these genes play in switchgrass disease response.
Developmental regulation of specific RGHs could also contribute to disease resistance phenotypes at different stages of plant growth. RNA-sequencing from the flag leaves of field grown cultivar 'Summer' provided a unique opportunity to examine switchgrass RGH expression over the course of a growing season [44]. The first group of developmentally regulated RGHs (module 1, Fig. 6a) is of particular interest since these genes are up-regulated at the end of July. The end of July and the beginning of August are optimal times for switchgrass rust infection and thus, these genes may play an important role in immediate defense responses against foliar pathogens like switchgrass rust. Correspondingly, the transcripts for these RGHs decreased over the remaining harvests, supporting the idea that these genes are involved in the earlier stages of disease response. Fieldgrown switchgrass plants appear to be more susceptible to switchgrass rust as they begin to flower and set seed (data not published). As displayed in modules 5 and 8, 119 of the 755 RGHs (16 % of the genes) exhibited peak gene expression in at least one biological replicate during the last sampling point (9/19). The remaining 84 % of the genes displayed peak gene expression over the first four sampled time points. Since fewer RGHs showed preferential expression during the later stages of the growing season, these results support the likelihood that switchgrass plants may utilize resources towards other processes, such as flowering and nutrient remobilization, rather than disease resistance in the later stages of development.

Conclusion
The results of this study provide useful insight into the molecular structure, distribution, and expression patterns of members of the NB-LRR gene family in switchgrass, an important biofuel crop. Switchgrass molecular breeding is relatively recent and has been hindered by a lack of informative genomic resources. The 2347 SNPs that we identified by RNA-sequencing can potentially be developed into molecular markers, which may assist switchgrass breeders in creating new cultivars with improved disease resistance. In addition, we also identified 40 RGHs that are predicted to contain unique domains including major sperm protein domain, jacalin-like binding domain, calmodulin-like binding, and thioredoxin. NB-LRR proteins that contain unique domains may have arisen in switchgrass as a method of molecular adaptation to plant pathogens. Thus, this research can be used for homology-based methods that could identify NB-LRR disease resistance genes with these unique domains in other plant species Taken together, the results of this study provide novel findings that will aid in the identification of NB-LRR disease resistance genes and breeding for durable disease resistance in the biomass crops.

Additional files
Additional file 1: Table S1. List of 1011 unique switchgrass proteins that are predicted to contain an NB-ARC domain (1E < -10). Table S1 lists the protein gene IDs for the 1011 putative switchgrass RGHs, the length of the predicted proteins according to Phytozome, and the amino acid sequences. (XLSX 433 kb) Additional file 2: Table S2. List of switchgrass RGHs that were predicted to contain one or more coiled-coil (CC) domains. Table S2 contains a list of switchgrass RGHs that were predicted to contain one