The vast majority of the cloned disease resistance genes from plants encode nucleotide-binding site (NBS) and leucine-rich repeat (LRR) domains. The NBS-LRR proteins are often referred to as R proteins and their encoding genes as R-genes. R proteins can be further divided into two subclasses, the TIR (toll, interleukin receptor-like) subclass and the non-TIR subclass . The TIR subclass proteins have the TIR domain in their N terminals, while most R proteins from the non-TIR subclass have a coiled-coil (CC) domain instead.
The R-genes in plants belong to a large gene family, and R-genes tend to be clustered in genomes. For instance, approximately 66% of the 149 R-genes in Arabidopsis thaliana (Col-0) and 76% of the 623 R-genes in rice (Oryza sativa cultivar Nipponbare) are located in clusters [2, 3]. Many R-genes within a cluster belong to the same subfamily and may have had frequent sequence exchanges (either by gene conversion or recombination) resulting in chimeric structures [4–16]. Those chimeras, termed Type I R-genes, are highly diverse in different genotypes of a species, and consequently, a large number of R-genes with distinct sequences are predicted in a population/species [12, 13, 17]. Those chimeras were generated either by unequal crossovers or gene conversions. The frequent sequence exchanges among some Type I R-genes did not homogenize their coding sequences (i.e. no concerted evolution), though their intron sequences may be homogenized . The lack of concerted evolution for the coding sequences of R-genes was likely due to diversifying selection after sequence exchanges .
In contrast to the extensively chimeric R-genes, other R-genes (termed Type II) evolved independently and did not have sequence exchanges with homologues. The sequences of Type II R-genes, when present, are highly conserved in different genotypes of the same or closely related species. Surprisingly, these highly “conserved” R-genes are frequently absent in some genotypes, showing presence/absence (P/A) polymorphism [3, 12, 17–20]. For example, 124 R-genes in two rice cultivars 93–11 and Nipponbare exhibit P/A polymorphism . In the absence haplotypes, the entire Type II R-gene sequence is missing. Balancing selection may have played an important role in maintaining such P/A polymorphism [20, 21]. The mechanism for such balancing selection remains poorly understood, but it is likely that the presence of some R-genes may have fitness cost such as low viability, low seed productions, etc..
The number of R-genes in different plant genomes varies dramatically. Some genomes, such as the genomes of apple and wheat, contain approximately 1,000 R-genes [23, 24]. In contrast, less than 100 R-genes are present in the sequenced genomes of papaya, cucumber, watermelon and melon, respectively [25–28]. It remains unclear why the number of R-genes varies considerably in different genomes while the total number of coding genes in a genome is relatively stable. Interestingly, the number of R-genes in a genome is significantly correlated with the number of LRR-LRK encoding genes, which may also be involved in disease resistance . The identification and annotation of R-genes in a genome are challenging, simply because they are highly diverse and a considerable proportion of them are pseudogenes [2, 30, 31]. Large deletions (i.e. partial genes), frameshift indels or nonsense point mutations of R-genes make annotations using computer programs problematic. Consequently, many (the vast majority, in some cases) R-genes may be mis-annotated by gene prediction programs, and manual annotation is recommended to correct the errors .
The Cucurbitaceae family includes several agriculturally important crops such as melon (Cucumis melo), cucumber (Cucumis sativus), pumpkin (Cucurbita moschata) and watermelon (Citrullus lanatus). Disease is one of the main factors affecting their yields and forcing massive use of chemical sprays. Only one R-gene, Fom-2 in melon, has been cloned from the Cucurbitaceae speies, while a candidate gene Ccu encoding resistance against cucumber scab was identified [32, 33]. Recently, genomes of cucumber, melon and watermelon have been sequenced [26–28]. Only 61, 81 (R-genes plus genes encoding TIR only) and 44 R-genes were reported in the genomes of cucumber (9930), melon and watermelon, respectively. Low copy number of R-genes was also found in cucumber cultivar Gy14 . The genetic mechanisms for such low copy number of R-genes in Cucurbitaceae species remain unclear. The R-genes from Cucurbitaceae genomes (except watermelon) were annotated using computer programs and were not verified manually. Thought the distribution of R-genes on cucumber chromosomes, R-gene sequences from other Cucurbitaceae species and phylogenetic comparison of R-genes from Cucurbitaceae and Arabidopsis thaliana were investigated in a previous study , the evolution of R-genes and the genetic mechanisms underlying low copy number of R-genes in Cucurbitaceae remain poorly understood.
In this study, R-genes in the sequenced genomes of cucumber, melon and watermelon were de novo identified and annotated. The structure (exon and intron) of each R-gene lineage in Cucurbitaceae was determined. The R-gene loci and R-gene sequences in different Cucurbitaceae species were compared. Degenerate primers were used to amplify R-genes from 9 species of Cucurbitaceae. The diversity of R-genes in cucumber and a wild Cucurbitaceae species, Trichosanthes kirilowii, was studied in detail. The genetic mechanisms for low copy number of R-genes were investigated through phylogenetic comparison of R-genes in Cucurbitaceae and those from poplar (Populus trichocarpa) and soybean (Glycine max). The evolutionary mechanisms for large variation of copy number of R-genes in different species were discussed.