Evolution of the Rdr1 TNL-cluster in roses and other Rosaceous species
© Terefe-Ayana et al.; licensee BioMed Central Ltd. 2012
Received: 18 April 2012
Accepted: 6 August 2012
Published: 20 August 2012
Skip to main content
© Terefe-Ayana et al.; licensee BioMed Central Ltd. 2012
Received: 18 April 2012
Accepted: 6 August 2012
Published: 20 August 2012
The resistance of plants to pathogens relies on two lines of defense: a basal defense response and a pathogen-specific system, in which resistance (R) genes induce defense reactions after detection of pathogen-associated molecular patterns (PAMPS). In the specific system, a so-called arms race has developed in which the emergence of new races of a pathogen leads to the diversification of plant resistance genes to counteract the pathogens’ effect. The mechanism of resistance gene diversification has been elucidated well for short-lived annual species, but data are mostly lacking for long-lived perennial and clonally propagated plants, such as roses. We analyzed the rose black spot resistance gene, Rdr1, in five members of the Rosaceae: Rosa multiflora, Rosa rugosa, Fragaria vesca (strawberry), Malus x domestica (apple) and Prunus persica (peach), and we present the deduced possible mechanism of R-gene diversification.
We sequenced a 340.4-kb region from R. rugosa orthologous to the Rdr1 locus in R. multiflora. Apart from some deletions and rearrangements, the two loci display a high degree of synteny. Additionally, less pronounced synteny is found with an orthologous locus in strawberry but is absent in peach and apple, where genes from the Rdr1 locus are distributed on two different chromosomes. An analysis of 20 TIR-NBS-LRR (TNL) genes obtained from R. rugosa and R. multiflora revealed illegitimate recombination, gene conversion, unequal crossing over, indels, point mutations and transposable elements as mechanisms of diversification.
A phylogenetic analysis of 53 complete TNL genes from the five Rosaceae species revealed that with the exception of some genes from apple and peach, most of the genes occur in species-specific clusters, indicating that recent TNL gene diversification began prior to the split of Rosa from Fragaria in the Rosoideae and peach from apple in the Spiraeoideae and continued after the split in individual species. Sequence similarity of up to 99% is obtained between two R. multiflora TNL paralogs, indicating a very recent duplication.
The mechanisms by which TNL genes from perennial Rosaceae diversify are mainly similar to those from annual plant species. However, most TNL genes appear to be of recent origin, likely due to recent duplications, supporting the hypothesis that TNL genes in woody perennials are generally younger than those from annuals. This recent origin might facilitate the development of new resistance specificities, compensating for longer generation times in woody perennials.
Plants are constantly challenged by a large number of different pathogens with diverse infection strategies. To avert these attacks, plants use different mechanisms consisting of active and passive defense lines. Among the active defense mechanisms of plants, specific resistance genes (R-genes) are key factors involved in so-called gene-for-gene interactions. Plants harboring a resistance gene recognize specific avirulence (Avr) gene products that characterize particular genotypes of the pathogen [1, 2].
Several R-genes have been isolated from a variety of plant species . The majority of R-genes encode nucleotide-binding site (NBS) and leucine-rich repeat (LRR) proteins [2, 4, 5]. On the basis of their N-terminal domains, the NBS-LRR resistance genes can be subdivided into two classes. The first class encodes proteins with an N-terminal TIR domain (homology to the Drosophila Toll and mammalian Interleukin-1 receptors), whereas the second class encodes proteins with coiled-coils (CC), sometimes in the form of a leucine zipper (LZ) at the N-terminus of the protein [3, 6, 7]. Two basic strategies for pathogen recognition are currently thought to exist: direct recognition of Avr-gene products by R-proteins and indirect recognition via sensing perturbations of host proteins (the so-called guard hypothesis) [2, 8]. Different domains of the NBS-LRR R-genes have been shown to be involved in pathogen recognition, but most studies indicate that the LRR domain plays the most important role in pathogen recognition .
Most of the R-genes described to date are organized in clusters reviewed in [3, 10]. This clustering may facilitate R-gene diversity in the course of adaptation to counteract newly emerging Avr-protein variants in newly evolving virulent races of a pathogen.
Extensive studies have been conducted to understand the mechanism of R-gene diversification, mainly in herbaceous annual plants, such as for Rp1 in maize [11–15], Cf4/Cf9 and Mi-1 in tomato [16–18], Xa21 in rice , Dm3 (RGC2) in lettuce [20–23], RPP5 in Arabidopsis, N in flax  and R1 in potato . Sequence analyses from these studies indicate that R-genes display significantly higher rates of sequence evolution than other plant genes. Furthermore, LRR domains generally evolve more rapidly than the other domains of NBS-LRR genes and often display signs of positive selection. Tandem and segmental gene duplications, recombination, unequal crossing over, point mutations and diversifying selection have been shown to contribute to R-gene diversity. Recent R-gene sequence analyses in Arabidopsis, maize, tomato, barley, lettuce, rice and wheat further indicated illegitimate recombination (IR) as a major source of duplications and deletions . Illegitimate recombination is a type of recombination between two DNA molecules which are not necessarily homologous to each other but share a few identical sequences. These identical sequences are called illegitimate recombination signatures. Illegitimate recombination may result in duplications or deletions .
Unlike herbaceous annuals, woody perennial species are characterized not only by long-lived individual plants but also by longer average generation times than annuals. Therefore, the nature of R-gene diversification could vary from that of annual plants. Some perennial plant species, such as roses, propagate clonally as well as sexually via seeds. These different forms of reproduction could also contribute to a possible deviation in R-gene diversity in perennials, as differences in evolutionary rates between annuals and perennials have been noted several times . The mechanisms underlying such differences are still unknown. More frequent and recent duplications of R-genes have been described in poplar and grapevine compared with rice and Arabidopsis, indicating different evolutionary patterns of R-genes between perennial and annual plants.
Roses are attacked by a number of pathogens and pests , among which black spot is the most severe disease of field-grown roses. It is caused by the hemibiotrophic ascomycete Diplocarpon rosae, for which a number of pathogenic races have been identified . Resistance to black spot has been found to be caused by both quantitative and qualitative resistance genes , with the single dominant R-gene Rdr1 from R. multiflora being the best studied rose R-gene thus far .
Recently, Rdr1 was finely mapped to a telomeric position in rose linkage group 1 in a contig of four overlapping BAC clones and isolated via map-based cloning [33–35]. The Rdr1 gene is a TIR-NBS-LRR (TNL) type resistance gene and a member of a multigene family of nine highly similar genes clustered in a region of 265.5 kb in R. multiflora.
Here, we present sequence information from the Rdr1 locus of a second rose species, R. rugosa, and analyze the sequence conservation of this locus and the TNL family within roses. Furthermore, we analyze synteny with other Rosaceae, represented by sequences from strawberry, peach and apple, and with members of other plant families.
In addition to the previously published sequence of a 265.5-kb region spanning the Rdr1 locus of R. multiflora, a set of four overlapping BAC clones spanning the Rdr1 region in R. rugosa was sequenced with Roche 454 sequencing. The sequences were assembled to a total length of 340,415 bp, with individual sizes of 96.3, 144.9, 75.4 and 78.6 kb for the BAC clones 31C14, 95G17, 78F5 and 35D6, respectively. The complete sequence has been deposited in GenBank [accession number GenBank: JQ791545]. The first 67,036 bp from the R. rugosa sequence extended beyond the left end of the corresponding R. multiflora homologous BAC-clone 29O3.
List of predicted genes from the 340,415-bp contig of R. rugosa orthologous to the Rdr1 locus of R. multiflora
Position on the contig (bp)
Similarity as revealed by BLASTp (similar to GenBank accession number)
Retrotransposon protein, Ty1-copia (ABF96803.1)
Retrotransposon protein, Ty1-copia (ABA98286.2)
Neuroblastoma-amplified sequence (XP_003602296.1)
Major facilitator superfamily domain (XP_003526731.1)
rhodanese-like domain-containing protein (NP_567785.1)
Vacuolar protein sorting-associated (XP_002274585.1)
Transcription factor B3 (ABN06173.1)
Gag-pol polyprotein (BAK64102.1)
Transcription factor B3 (XP_003517920.1)
Mutator-like transposase (BAB10320.1)
Gag-pol polyprotein (AAO73527.1)
Shikimate dehydrogenase (EEF45470)
Non-LTR retroelement reverse transcriptase (AAG13524)
Transcription factor B3 (XP_003535137.1)
Phospholipase C (ACF93733.1)
Transcription factor B3 (XP_003517920.1)
Transcription factor B3 (XP_003517920.1)
Retrotransposon protein, Ty1-copia (ABF96803.1)
Non-LTR retroelement reverse transcriptase (AAB82639)
Copia-type polyprotein (AAG51247.1)
ATP binding protein (XP_002515676.1)
AAA domain-containing protein (XP_003544721.1)
Yellow stripe-like protein (XP_003602315.1)
GTPase-activating protein (XP_003526739.1)
Aldo-keto reductase (XP_003602320.1)
Homeobox leucine zipper protein (AAD38144.1)
Hypothetical protein (XP_003602325.1)
TOPLESS-RELATED protein (XP_002275116.1)
Serine/threonine protein kinase (NP_001234146.1)
UDP-N-acetylglucosamine transporter (XP_003531350.1)
F-box protein (XP_003610959.1)
Unnamed protein product (CBI23069.3)
The additional sequence that extends the R. rugosa contig beyond the borders of the R. multiflora contig contains two TNL elements (ruRdr1A and ruRdr1B) as well as two transposable elements and sequences with similarity to a neuroblastoma amplified gene, a major facilitator superfamily domain and a rhodanese-like domain-containing protein.
GATA alignment and dot plot comparison of the 265.5-kb region from R. multiflora and the 340.4-kb region from R. rugosa indicates a high degree of synteny between the two species (Figure 1).
A group of nine sequences (ATP binding, AAA type ATPase, Yellow stripe-like, GTPase activator, 6-phosphogluconolactonase, Ubiquitin fusion, Homeobox leucine zipper, TOPLESS-RELATED and Serine/threonine protein kinase) at the right end of the R. multiflora contig and a sequence stretch comprising a predicted gene for a vacuolar protein sorting-associated protein and transcription factor at the left end of the contig are perfectly conserved between the two species. However, the region between these sequences exhibits several copy number changes, inversions and deletions/insertions. Of the 11 TNL genes located on the R. rugosa contig, nine are in the same and two are in a reverse orientation compared with the R. multiflora contig in which all of the TNL genes are in the same orientation. For some of the non-TNL genes, differences are observed in terms of the relative location and the number of homologs within each cluster. Furthermore, some sequences are completely missing in one cluster and present in another. For example, a 23-kb region with similarity to prolyl 4-hydroxylase alpha, aminotransferase-like, WUSCHEL protein terminator and inosine-5'-monophosphate dehydrogenase is present in the homologous locus in R. multiflora but absent in the R. rugosa locus.
In addition, the transposable elements distributed over the two contigs differ both in their position and sequence. The ten transposable elements in the R. multiflora locus belong to the Ty1/copia type retroelements, whereas the R. rugosa locus contains Ty1/copia as well as other different retroelements.
Among the plant species with sequenced genomes, the strawberry is the closest relative to the genus Rosa. We therefore compared the R. multiflora contig to sequences from Fragaria vesca. We subjected individual sequences from the Rdr1 contig to BLAST searches against the Fragaria genome sequence and located a stretch of similar sequences of approximately 354 kb from strawberry chromosome 7 between positions 19,798,478 bp and 20,152,477 bp. Several insertions, deletions and large rearrangements in the form of inversions and translocations were found (Additional file 1). The cluster of conserved genes from the right and the left sides of the R. multiflora contig are also found in Fragaria. Twelve TNL genes with high similarity to the Rdr1 gene family (71% to 87% identity at the DNA level and 61% to 81% similarity at the amino acid level) are also located in the selected Fragaria sequence region. In the following analyses, we designate these genes as FvTNL1 through FvTNL12. Unlike at the R. multiflora locus, the orientation of the Fragaria TNLs varies, in that the majority of the genes are inverted relative to the R. multiflora copies. Further differing from the R. multiflora locus, the genes within the conserved cluster from the right side of the contig, such as the yellow stripe-like gene and the ubiquitin fusion gene, are in an inverted position, inserted within the TNL genes.
The 354-kb Fragaria sequence contains a stretch of 14.2 kb of ambiguous sequence (represented by stretches of Ns) resulting from problems in the assembly. This may lead to changes in the Fragaria locus structure in the future, although this is unlikely.
In contrast to what is observed in Fragaria, the Rdr1 homologous locus is located on two different chromosomes in P. persica (Additional file 2). The closest relatives to the Rdr1 gene family are found in a cluster of 15 genes in linkage group 8 (scaffold no. 8 from bases 2,050,000 bp to 2,510,000 bp), whereas the genes flanking the TNL cluster at the right and left margins in Rosa and Fragaria are located in linkage group 2 (scaffold no. 2 from 26,050,000 bp to 26,110,000 bp) in Prunus. The 15 Prunus TNLs are designated PpTNL1 through PpTNL15. The cluster is characterized by large differences in terms of the non-TNL genes and the orientation of the TNLs. In contrast, the flanking genes are highly conserved between Rosa and Prunus.
The similarity between rose and apple is comparable to the above-mentioned situation in peach, in which TNLs and flanking genes are located in two different linkage groups (Additional file 3). The closest relatives to the Rdr1 TNLs are located in a cluster of 11 genes in apple linkage group 15 spanning a position from 41,166,396 bp to 41,719,891 bp. Hereafter, they are designated MdTNL1 through MdTNL11. The flanking genes are located in linkage group 1 (position 35,200,000 bp to 35, 294,999 bp).
The strong synteny of the Rdr1 locus between Rosa and Fragaria and the low synteny with Prunus and Malus raise the question of whether the group of Rdr1 TNL genes is present at a similar locus in other plant families. We defined synteny simply as close linkage of a TNL cluster with similarity to the Rdr1 family to the flanking genes conserved among the Rosoideae species.
The first species investigated was Medicago truncatula, as the Fabaceae are a family closely related to the Rosaceae. We found some of the flanking genes of the right side of the contig distributed among more than three chromosomes in M. truncatula, but no TNL genes related to the Rdr1 family are located close to that locus (Additional file 4). We also did not detect the vacuolar protein sorting-associated protein or transcription factor from the left side of the R. multiflora locus, indicating a lack of synteny in Medicago. Related TNL genes are found on several of the Medicago chromosomes, but the similarity to the Rdr1 family is too low to infer orthology relationships. The M. truncatula sequences utilized in these analyses are downloaded from http://www.medicagohapmap.org/.
We performed the same analysis in the Arabidopsis thaliana genome, and again, no syntenic block of sequences could be detected (Additional file 5). The flanking genes in this case also do not form a cluster at one location but are distributed on more than two chromosomes in more than two copies. The A. thaliana sequences utilized in these analyses are downloaded from http://www.arabidopsis.org/.
The tree shows four major groups that are highly supported by bootstrap values of 99% to 100%. There is a distinct cluster (I) formed by Rosa and Fragaria, representing the subfamily Rosoideae of the Rosaceae, which is separated from a single Malus sequence (MdTNL1, II), a cluster (III) in which sequences from Malus and Prunus each form distinct subclusters and a cluster (IV) comprised mainly of Prunus sequences and two Malus sequences in one subcluster (Figure 3). Within the largest Rosoideae cluster, sequences from Rosa and Fragaria each form distinct highly supported subclusters, indicating recent evolution of the Rdr1 TNLs after the genera diverged. Within the subcluster comprised of the rose sequences, there is no separation of R. multiflora and R. rugosa sequences. Instead, highly supported clusters with pairs of sequences from both species (e.g., muRdr1F/ruRdr1F, muRdr1A/ruRdr1H) and mixed subclusters with low bootstrap support indicate that some of the sequences evolved before the species separated. As an exception to this, the pair muRdr1B/muRdr1G shows almost no divergence, indicating a recent gene duplication in Rosa multiflora.
In contrast to the divergence of sequences from Rosa and Fragaria, the Rdr1 homologues in Prunus and Malus are mixed in clusters III and IV. Cluster III is comprised of two subclusters (IIIa and IIIb), each including genes only from Prunus or Malus, in contrast to cluster IV, which consists of genes from both species. The branch lengths of subclusters IIIa and IIIb are shorter than the branch lengths of cluster IV, indicating that they evolved after Prunus and Malus diverged.
To obtain additional information about the processes that led to the diversity of the Rdr1 TNL genes, we analyzed the sequence variability of the genes in more detail.
Approximately 1270 polymorphic sites and 87 indels varying in length from 1 bp to 160 bp are observed. The longest indel is in the microsatellite repeat region in exon 4. Some indels are duplications flanked by IR signatures resulting from unequal crossing over. The majority of indels are in the noncoding region of the TNL sequences.
The 26 gene conversion tracts detected in the 20 TNL homologs of the two rose species, R. multiflora and R. rugosa
Sequence tract begin (bp)
Sequence tract end (bp)
Sequence length (bp)
We sequenced a genomic region of 340.4 kb from R. rugosa orthologous to the recently published Rdr1 gene cluster from R. multiflora. Comparison of the two regions reveals a high degree of conservation of genes flanking a cluster of 11 TNL genes, but we also found rearrangements within the group of TNLs. This corresponds to other studies in which major rearrangements, including deletions, gene duplications and inversions, have been found among groups of NBS-LRR genes . The structure of the Rdr1 locus indicates several mechanisms that have led to the above-mentioned structural differences between the two orthologous regions. The close relationships between the TNL genes, presenting DNA similarities with between an 88% and 95% identity, might have led to unequal crossing over, resulting in some of the observed duplications/deletions. Another factor promoting recombination at the Rdr1 locus is the presence of transposable elements belonging to the Ty1/Copia class as well as non-LTR classes, which are present on both contigs. Several authors have hypothesized that both the repetitive nature of various copies of retroposons and their capacity to transpose neighboring genes have likely contributed to the diversity of R-gene clusters [10, 40]. The high variability detected among the TNL genes at the locus is also reflected in the large number of alleles for a microsatellite from the LRR region of the Rdr1 gene family that has been analyzed in rose varieties, species and individuals from natural populations of the diploid species Rosa arvensis.
Comparing the rose Rdr1 locus with a syntenic region of the Fragaria genome revealed conservation of the flanking genes as well as the presence of the Rdr1 TNL gene family. This finding is in agreement with synteny studies conducted with molecular markers showing that the region on rose chromosome 1 to which we mapped Rdr1 is syntenic to Fragaria chromosome 7 [42, 43]. It also indicates that a cluster of TNL genes with similarity to the Rdr1 TNLs existed prior to the separation of the tribe Rosa (to which roses belong) from the Potentilleae (to which strawberries belong). Apart from the flanking genes and the presence of TNLs with high similarity to the Rdr1 family, large structural changes occurred, in that many non-TNL genes found in rose do not match similar genes in strawberry and vice versa. As the two species belong to separate tribes, this is not surprising, given the variability that we detected between the two rose species. The mechanisms leading to these differences are likely the same mechanisms that shaped the differences within the genus Rosa.
In contrast to the conservation of the Rdr1 locus in Fragaria, no conservation of the TNL cluster is found in Malus and Prunus. Although the flanking genes are conserved as a tightly linked group in both Malus and Prunus, there are no closely linked TNLs, as observed in Fragaria. Rather, the closest relatives to the rose Rdr1 family are detected in clusters on separate chromosomes. As conservation is also lacking in Medicago and Arabidopsis, one possible conclusion is that the TNLs were inserted at their present location after the Rosoideae and Spiraeoideae split. Although errors during whole genome sequencing and assembly cannot be excluded , the cause of this transposition is difficult to infer. However, the presence of mobile genetic elements is one possible explanation. Comparative analysis of an NBS-LRR cluster in Phaseolus vulgaris and other Fabaceae indicates that subtelomeric positions are prone to transpositions of repeated DNA elements . This might be the case for the Rdr1 locus split in Spiraeoideae, as the Rdr1 TNL clusters are located at a telomeric position in the rose chromosome 1 map . In contrast to the situation in Phaseolus, we have no evidence that any sequence closely related to the Rdr1 family maps outside the cluster on linkage group 1. Mapping experiments in two diploid and one tetraploid mapping population reveal that all polymorphic fragments are linked in one region on linkage group 1, although they spread over a distance of up to 18 cM (42).
Phylogenetic analyses (Figure 3) led to a dendrogram that was highly supported by bootstrap values above 80%, in which the rose and Fragaria genes form separate clusters, whereas the Malus and Prunus genes are located in mixed clusters. None of the Fragaria genes clustered with any of the rose genes, which form clusters according to the two different rose species.
This indicates that although the TNL genes most likely translocated to their current positions after the separation of the Rosoideae and the Spiraeoideae, the locus underwent further independent evolution in each of the genera. This is consistent with the observation of high similarity among the DNA sequences within each cluster, indicating that individual members of the gene families arose relatively recently. Explanations for this situation have been proposed in a number of other studies on R-genes and involve the evolution of R-gene clusters via duplications and deletions of family members and through gene conversion . The discrepancy of TNL genes in woody plants forming clusters of highly similar genes while TNL genes are generally phylogenetically old, predating the split between gymnosperms and angiosperms, has been noted previously . Studies on the sequenced genomes of grapevine and poplar compared with rice and Arabidopsis indicate that most clustered TNL genes from woody perennials are of a more recent origin than genes from annual plant species . One explanation for this observation is the lower number of generations in woody perennials compared with annuals, which is also held responsible for the generally lower rate of molecular evolution in protein coding genes . Within cluster I of the dendrogram, the rose subclusters include genes from both species (Figure 3), indicating that these copies existed before the species separated. There are only three exceptions to this pattern, among which the gene pair muRdr1B and muRdr1G indicates a very recent duplication event, as the two genes are almost identical. Subclusters IIIa and IIIb harboring genes from Malus (IIIa) and Prunus (IIIb) indicate phylogenetic relationships similar to the Rosa and Fragaria subclusters in that they only include genes from one of the species, which indicates that the genes evolved after Prunus and Malus diverged. The shorter average branch lengths of these clusters compared with all other subclusters indicates that these two groups of genes are the ones that evolved most recently.
Enhanced rates of nucleotide diversity are a major factor in the evolutionary dynamics of NB-LRR resistance genes . We therefore analyzed both TNL and non-TNL genes across rose, strawberry, apple and peach. In line with the topology of the phylogenetic tree, low values for nucleotide diversity were observed among the Rosoideae (Rosa and Fragaria) compared with the other Rosaceae (Malus and Prunus, Figure 4). However, if we consider the higher rates of evolution among the TNL compared with the non-TNL genes, it is somewhat surprising how closely related the Fragaria TNL genes are to the Rosa TNL genes (Figure 4). This emphasizes the very close phylogenetic relationship between roses and strawberries, which belong to the same subfamily and the same supertribe of the Rosaceae, which is also reflected in the high degree of macrosynteny found in their chromosome structure [42, 43]. Although the Rosaceae TNL genes seem to be of relatively recent origin, their nucleotide diversity is more than two-fold higher when compared with the values for the cluster of flanking genes and an arbitrarily chosen gene (Figure 5). This has been observed in several NBS-LRR gene clusters and most likely reflects selective advantages due to higher rates of sequence evolution, which accelerate the evolution of new resistance specificities . For many R-gene clusters, positive selection, indicated by Ka/Ks ratios of greater than 1.0, particularly in the LRR regions, has been postulated [3, 23, 47, 48]. We also found increased Ka/Ks ratios in the LRR regions of rose Rdr1 TNLs. However, in contrast to other studies in which a dramatic increase of Ka/Ks values has been observed in LRR regions, we found only moderate increases in the LRR region. This difference might be due to the overall lower rate of sequence evolution observed in perennial plants because of more recent duplications. Alternatively it might be caused by a lower rate of positive selection on the rose Rdr1 gene family. There is initial evidence that the Rdr1-TNL cluster includes several resistance genes against black spot and that the evolution of new pathogenic races in black spot is slow due to mostly asexual reproduction and low gene flow .
Our analyses indicate illegitimate recombination, gene conversion, unequal crossing over, indels, point mutations and transposable elements as mechanisms involved in the evolution of the Rdr1 locus in rose. It is well documented that similar factors play a role in resistance gene diversity in several annual plant species. Therefore, the diversifying mechanisms associated with the Rdr1 locus of the perennial, clonally propagated rose are principally comparable to those of annual plant species, although the Rdr1 TNLs are further characterized by recent duplication.
Analyses of other TNL-type resistance genes in Rosaceae, for example, the recently cloned Ma gene from Prunus cerasifera, may provide additional information if the pattern of Rdr1-TNL evolution is universal for all TNL genes of perennials or specific to the rose Rdr1 alone.
The Rdr1 locus is highly conserved within the genus Rosa and somewhat less conserved between Rosa and Fragaria; nevertheless, the synteny is disturbed when compared with Malus and Prunus, possibly due to recombination and chromosome translocation aided by transposable elements.
In the Rosoideae, TNL gene diversification occurred before and after the split of Rosa and Fragaria. A similar phenomenon took place in the Spiraeoideae between Prunus and Malus TNL genes, indicating that most TNL genes in the Rosaceae arose relatively recently.
DNA sequence viewing, editing and basic manipulations were performed using Bioedit .
In R. rugosa, sequences orthologous to the Rdr1 region have been located in four overlapping BAC clones (31C14, 95G17, 78F5 and 35D6) . Sequencing of the four overlapping R. rugosa BAC clones was performed following the procedures described in Terefe-Ayana et al. , with a few modifications. Escherichia coli DH10B cells carrying the BAC clones were delivered in stab agar to Cogenics (Cogenics Ltd, Morrisville, NC) for 454 FLX sequencing with 50% of a full run. The sequences were automatically clipped for adaptor and primer sequences and de novo assembled with Newbler assembler software by Cogenics (Cogenics Ltd, Morrisville, NC). Because the BAC clone sequences did not completely assemble into a single contig, subclones were generated from DNA of each BAC clone as described in Terefe-Ayana et al.  and Sanger sequenced using commercial sequencing services.
The sequences generated via Sanger sequencing and the first contigs generated using 454 sequencing were assembled with SeqMan (DNASTAR, Madison, WI) and ContigExpress (Invitrogen, La Jolla, CA). Gene prediction and annotation based on the completely assembled sequence was carried out using the gene prediction program FGENESH (http://www.softberry.com). To identify coding sequences in the contig and determine sequence similarity, the whole sequence was fragmented in silico into 1-kb fragments with 200-bp overlaps by the EMBOSS program Splitter (http://emboss.bioinformatics.nl/) and subjected to BLASTn and BLASTx searches  against the GenBank database. Domains among the putative protein coding genes were analyzed using Pfam version 23.0 (http://pfam.sanger.ac.uk/) and SMART 6 (http://smart.embl-heidelberg.de/). Sequences that were identical to a known gene in GenBank were assigned that gene name.
The complete genomic sequences of apple, peach and strawberry were downloaded from the Genome Database for Rosaceae (http://www.rosaceae.org). Regions orthologous to the Rdr1 locus and flanking genes were identified using local BLAST searches with the help of Bioedit . Gene predictions for apple, peach and strawberry provided by the respective authors of the genome sequences [53–55] were employed directly, with some additional predictions made using FGENESH (http://www.softberry.com).
The complete contig sequences of R. multiflora, R. rugosa Fragaria vesca Malus x domestica and Prunus persica were aligned using GATA  with the default parameters, then compared to determine patterns of gene clusters, the position and orientation of genes and the absence or presence of certain sequence regions.
The predicted Rdr1 homologs (TNL homologues) and their flanking genes from each contig of R. multiflora, R. rugosa, strawberry, peach and apple were aligned using ClustalW with the default options . For the TNL homologues, alignments were carried out for the open reading frame (ORF) and the derived amino acid sequences of the complete gene and separately for each of the TIR, NBS, LRR, exon and intron regions. Alignment of the flanking genes was performed using the complete gene sequences.
For the aligned TNL homologues from the five Rosaceae species, phylogenetic trees were constructed in MEGA5 using the Maximum Likelihood (ML) method based on the Kimura 2 parameter model [58, 59]. Phylogenetic analysis of the amino acid sequence was performed using the Jones-Taylor-Thornton (JTT) matrix-based model in MEGA5. Initial trees for the heuristic search were obtained automatically with MEGA 5. The tree topology was tested via a bootstrap analysis with 500 replicates. The majority rule bootstrap consensus tree was taken to represent the evolutionary history of the taxa analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed.
The ratio between the nonsynonymous nucleotide substitutions per nonsynonymous site (Ka) and synonymous nucleotide substitutions per synonymous site (Ks) was evaluated for TIR, NBS and LRR domains separately. The TIR, NBS and LRR domains were determined based on Pfam v23 (http://pfam.sanger.ac.uk/). The amino acid sequences of the different protein domains were aligned in MEGA5 using ClustalW and employed to guide the corresponding cDNA sequence alignment. The resulting cDNA alignments were used to calculate Ka and Ks with DnaSP  following Nei and Gojobori . The selection pattern was characterized by the ratio of Ka to Ks substitution, in which Ka/Ks > 1 indicates positive selection or Darwinian adaptive evolution, Ka/Ks < 1 indicates purifying or stabilizing selection, and Ka/Ks = 1 indicates neutral evolution .
The aligned TNL homologues and flanking genes from the five Rosaceae species were analyzed for DNA polymorphisms using π applying the Jukes and Cantor correction with DnaSP . This DNA polymorphism analysis indicates the average number of nucleotide substitutions per site between two sequences, and the nucleotide diversity value is the average of all comparisons.
Recombination and sequence exchange between Rdr1 homologues from R. multiflora and R. rugosa was determined with the programs Geneconv  and DnaSP . The default parameters were used in both the Geneconv and DnaSP analyses. Events detected with Geneconv were examined and confirmed visually.
The study was supported by the German Research Foundation (DFG) grant number DE 511/4-1 and DE 511/4-2.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.