Identification of a BAC clone corresponding to the Rfp/Rfn region
A B. rapa cosmid clone containing a polymorphism tightly linked to Rfp, a SNP in the gene corresponding to A. thaliana At1g12910 at chromosome 1 coordinate 4.395 [35], was recovered, sequenced and used as a source of probes to screen a B. napus bacterial artificial chromosome (BAC) library derived from a line with the Rfn genotype (see Methods) at the restorer locus. A single BAC, of approximately 180 kb, designated NO202E11, containing a sequence identical to the corresponding region of the cosmid probe was selected and sequenced. Sequence analysis showed that the BAC contained a sequence highly similar to the At1g12910 gene with the Rfn-linked allele of the Atg12910-orthologous SNP. Since our previous studies indicated Rfp and Rfn are different haplotypes of the same genomic region, we considered this BAC to likely be anchored in the Rfn region. The BAC sequence was found to be collinear with the region of the A. thaliana genome extending from chromosome 1 coordinates 4.27–4.47 and with the region of the B. rapa genome extending from chromosome A09 coordinates 42.18 Mb to 41.79 Mb. Dot matrix visualization of the synteny between the sequenced BAC and the B. rapa and A. thaliana genomic regions is shown in Figs. 1a and 1b, respectively.
High resolution mapping of the Rfn gene
The premise that Rfp and Rfn are alleles or closely linked alternative haplotypes of the same genetic locus is based on relatively rough genetic mapping and genetic crosses in which the transcript modification activities associated with the two genes have been found to be mutually exclusive [8]. We would therefore expect that higher resolution genetic mapping studies should confirm that Rfn does, indeed, co-localize with the region in the selected BAC. We constructed a BC1 mapping population, genotyped individuals with SNPs from the region of B. napus chromosome A09 corresponding to B. rapa coordinates 39.75 Mb to 43.27 Mb. One of these SNPs, localized at coordinate 42.13 Mb, fell within the BAC sequence. Specific information on the SNPs that were polymorphic between the mapping parents of the cross is provided in Additional file 1: Table S1.
The nap CMS phenotype is leaky in a temperature-dependent, cultivar-specific manner [36, 37], a challenge for the precise mapping of Rfn. The choice of the Rfn parent for the mapping cross is therefore critical. Based on our previous work, we knew that utilizing the cultivar “Karat” could provide a BC1 mapping population that would allow us to satisfactorily distinguish between CMS and fertility restored progeny [8]. To derive our BC1 population, two F1 individuals generated by pollinating CMS plants (rf/rf [nap]) with Karat (Rfn/Rfn [nap]) were crossed back as females to the Karat parent. Of 293 individual BC1 plants, 146 were unambiguously scored as fertile (Rfn/rf [nap]) and 147 as male sterile (rf/rf [nap]). The floral phenotypes of the parental and fertile and sterile progeny are illustrated in Fig. 2a. Genotyping of the population, as illustrated in Fig. 2b, allowed us to map the Rfn gene to the region of B. napus chromosome BnA09 containing the selected BAC, confirming that Rfp and Rfn were closely linked alleles. The single marker positioned within the BAC was perfectly linked to Rfn. Because both Rfp and Rfn are located within this chromosomal region we henceforth refer to it as the B. napus Rf locus.
To more precisely localize Rfn we identified additional polymorphic molecular markers mapping within the region delimited by the SNPs. Primers designed to amplify genomic regions extending across introns in genes located between the SNPs most proximal to Rfn identified two intron length polymorphisms (ILPs, [38]), amplification products that differed in size between the two parents of the mapping population. When no noticeable amplicon length difference was evident, the products were further subjected to restriction cleavage to reveal cleaved amplified polymorphisms (CAPS, [39]). This strategy allowed four additional polymorphic markers anchored in the targeted region to be identified. These markers were then used to genotype individuals in which recombination had occurred between the closest flanking SNPs. This strategy allowed us to delimit the Rfn containing region to the segment of corresponding B. rapa chromosome A09 coordinates 41.68–42.58 Mb.
Characteristics of the B. napus Rf locus
Several interesting features were revealed through dot matrix visualization of the synteny between the sequenced BAC and the B. rapa and A. thaliana genomic regions (Figs. 1a and 1b, respectively). Regions at each end of the BAC, roughly located at positions 10–30 and 120–140 kb, showed similarity to sets of short repeated sequences, shown as boxed regions on the figures. Detailed comparative annotation of the genes in the BAC with the corresponding portions of the B. rapa and A thaliana (Additional file 2: Table S2) indicated that the repeated sequences corresponded to sets of directly repeated genes and/or pseudogenes encoding thionin (PR-13) proteins (positions 10–13 kb) in one case and Cytochrome P450 Cyp2 proteins (positions 120–140 kb) in the other. Comparative annotation with the more recently released B. napus cv. “Darfur” chromosome BnA09 [40] indicated that a sequence inversion has taken place in B. napus by which the sequence extending from B. rapa chromosome A09 coordinates 41.80–42.04 Mb is inverted (Additional file 3: Table S3). The inverted region is flanked by sequence spans encoding Cytochrome P450 Cyp2 and F-box domain encoding genes. This rearrangement appears as a gap in synteny from BAC coordinates 120–140 kb in both the B. rapa and A, thaliana plots. At around coordinate 160 kb, near the very end of the BAC, sequence similarity is observed between the BAC and the Cytochrome P450 Cyp2 genes, but in reverse orientation, as expected in the case of an inversion. Key features of the differences between the BAC, B. rapa and B. napus cv. “Darfur” genomes are illustrated in Fig. 3.
Rf-like PPR genes in the B. napus Rf locus
Another striking feature of the dot matrix comparisons was the presence of sequences located near BAC coordinates 34, 38, 52 and 118 kb (illustrated by double headed arrows in Figs. 1a and 1b) that mapped to four corresponding sites in B. rapa chromosome A09. In the Arabidopsis genome, only three of the four sites were located at corresponding positions; no Brassica sequence was found to correspond to the site located near the 4.295 Mb coordinate on Arabidopsis chromosome 1. Inspection of the Brassica repeat sequences indicated that they all corresponded to regions encoding highly similar RFL PPR genes predicted to be targeted to the mitochondria, which we designated as PPR1-4.
Because the genetically defined limits of the Rfn region extended beyond the boundaries of the BAC, we searched the corresponding region of the B. rapa chromosome 9 for additional RFL PPRs that could serve as candidates for Rfn. We were able to identify two more such genes, one B.rara.I05036, located between coordinates 41.777 and 41.776 Mb, and the other, B.rara.I05115, between coordinates 42.211 and 42.213 Mb. Orthologous PPR genes at corresponding locations were found to be present on B. napus “Darfur” chromosome BnA09 and are designated Bn036 (coordinates 31.334–31.341 Mb) and Bn115 (31.797–31.806 Mb). Both of these genes encode products predicted to be targeted to the mitochondrion. Notably, Bn036 is located roughly 100 kb from the flanking marker 4.4BB but 300 kb from the RFL genes in the BAC. No genes encoding PPR proteins other than RFL genes were detected in the region. A preliminary phylogenetic analysis (Additional file 4: Figure S1) indicated that five of the six genes clustered with three A. thaliana RFL genes located on the long arm of chromosome 1, AtRFL2 (At1g12300), AtRFL3 (At1g12620) and At1g12775 (re-annotated At1g12770 or AtRFL25). The sixth gene PPR2, clustered within a neighboring branch with its ortholog AtRFL4 (At1g12700). PPR3 and AtRFL25, like PPR2 and AtRFL4, are located at matching positions in their corresponding chromosomes.
A comparison of the relative positions of the genes in the A. thaliana and B. rapa/B. napus A genomes is presented in Fig. 4. The phylogenetic analysis indicated that all of the Brassica genes represented Rf-like PPRs, as did three of the four Arabidopsis genes; a non-RFL A. thaliana PPR gene, At1g13030, is found at a site close to but not precisely matching the Brassica RFL PPR4 gene at coordinate 118 kb in the BAC. One of the genes, At1g12700 (AtRFL4, [41]) located at a matching position in both genomes, is known to encode a mtRNA processing factor, RPF1, which confers nuclease cleavage events on nad4 transcripts, which are also a target of the Brassica Rfn gene. These observations revealed six RFL B. napus PPR genes that could serve as candidates for Rfn. One of the genes, PPR4, has been previously proposed as a candidate for Rfp on the basis of fine mapping data [42]. None of the Brassica PPR genes were predicted to contain an intron, an observation confirmed by RT-PCR analysis of floral transcripts (see below).
Expression of the Rf-region RFL genes in nap CMS and fertility restored plants
The observation that rf-PPR592, the non-restoring allele of the petunia restorer, Rf-PPR592, is not expressed but is otherwise similar to the restorer suggested that expression differences among different candidate PPR genes could be used as a tool to prioritize candidates for further analysis of restoration function. We used RT-PCR to examine the expression of the six candidate RFL genes located within the Brassica Rf-locus. As shown in Fig. 5, we did not detect expression of Bn115 in floral buds of either CMS or nuclear restored plants. Of the remaining 5 B. napus Rf-region RFL genes, expression of one, PPR4, was detected in the buds of nuclear fertility restored but not CMS plants; PPR4, is thus seen a strong candidate for Rfn. PPR1 and PPR3 both also show higher levels of expression in restored than in CMS flowers.
The sequences of the RT-PCR products, as shown in Additional file 5: Figure S2, provided further information relevant to the structures of the genes and their possible roles in nuclear fertility restoration. The sequences of the transcripts were co-linear with the corresponding genomic DNA sequence, indicating that, as expected, these genes lacked introns. Interestingly, a termination codon was found at nucleotide position 1119 in the coding sequence of the RT-PCR products of PPR2 from both the CMS and nuclear restored lines. This termination codon was not detected in genomic sequences of either B. rapa or the BAC NO202E11, and resulted from a one nucleotide insertion in the CMS and restorer sequences prior to the termination codon, followed by a two nucleotide insertion in the same sequences.
Expansion of a family of Rf-like PPR genes within Brassica genomes
The B. napus genome is derived from a recent interspecific hybridization event between the C genome species B. oleracea and the A genome of B. rapa, two species which diverged in descent approximately 3.7 Mya. Because RFL PPR genes are known to be variable in chromosomal position between closely related genomes [22, 23], it was of interest to determine the position of the Rfn candidates and close paralogs in the B. oleracea C genome. To accomplish this we first identified close homologs of the different B. napus Rf region PPR genes on the B. oleracea genome using the blastn resource of the Brassica database (BRAD, [43]). We found a cluster of highly similar sequences between chromosome 8 coordinates 38.54 and 38.75 Mb, a region over which synteny was maintained with the corresponding regions of A. thaliana and B. rapa. The annotation of the region indicated that the homologous sequences corresponded to 10 highly related RFL-PPR genes. Eight of these genes lacked predicted introns, whereas two, Bol31351 and Bol31388, were each predicted to contain a single intron.
The relative position of PPR genes in the A. thaliana, B. rapa/napus (A) and B. oleracea (C) genomes in the region over which synteny is conserved among the three genomes is illustrated in Fig. 4. Seven of the 10 C genome B. oleracea PPR genes are located a position corresponding their location in the A genome. In three cases, involving Bol31370/Bol31371, Bol31387/Bol31388 and Bol313407/Bol313408, a tandem pair of B. oleracea PPR genes is found at sites occupied by only a single PPR gene in the A genome. In one case a single B. oleracea gene, Bol31384, was found at a site containing two adjacent PPRs in the A genome. These observations indicated that the relative locations of the majority of Rf region PPR genes have not changed since the A/C genome divergence.
Positional variation of RFL genes in the Rf-orthologous regions of two Arabidopsis genomes
Because we observed some conservation of location between some A. thaliana RFL genes and Brassica RFL genes in the Rf-region, it was of interest to determine to what extent this positional conservation could also be observed between the A.thaliana and A. lyrata genomes, which diverged from each other between 4 and 5 Mya. Fujii et al. [22] observed the RFL genes in the A. thaliana region orthologous to the Brassica Rf-region (Fig. 1a) fell into RFL subgroup 1, most of which are located between the 4.18 and 4.33 Mb coordinates of chromosome 1. A. lyrata subgroup 1 genes similarly cluster within a 221 kb segment of the scaffold 1 genome assembly unit.
Dot matrix visualization of the similarities between these two Arabidopsis genomic regions revealed several interesting features of the positional relationships among these genes (Fig. 6) At the position corresponding to AtRFL2 (At1g12300), two highly similar genes (indicated by the double-headed arrows in Fig. 6a), designated in Fig. 6b as AlyRFL1 and AlyRFL2
Footnote 1
, are found at the corresponding site in the A. lyrata genome. The 5’ and 3’ non-coding regions surrounding these genes are similar to one another, indicating that the two lyrata genes arose from a tandem duplication of a region encoding an AtRFL2-like gene. At the position corresponding to AtRFL3 (At1g12620), a tandem triplication of a similar gene was found at two different positions in the A. lyrata genome. This arrangement arose from the duplication of an approximately 14 kb region spanning the triplication and extending in the direction matching the centromere proximal side of A. thaliana chromosome 1 (Fig. 6a). Thus six RFL genes, which we designate as AlyRFL3-9, are found within a duplication spanning the region around AtRFL3. Another tandem duplication with sequence similarity to the AlyRFL1/AlyRFL2 pair is found at a site corresponding to AtRFL4 (At1g12700). These observations suggest segmental duplication is the primary mechanism behind the proliferation of this family of RFL genes in this region the A. lyrata genome.
Phylogenetic relationships among Brassica and Arabidopsis RFL proteins
We constructed a maximum-likelihood phylogeny to examine how the positional relationships among the various RFL genes reflected the sequence relatedness of the various encoded proteins. As shown in Fig. 7, all of the RFL proteins encoded in the Rf-syntenic regions of the different genomes formed a single monophyletic cluster encompassing the Arabidopsis subgroup 1 RFL proteins [22], and excluding radish Rfo/PPRA, their closest Arabidopsis homolog, AtRFL18, as well as the petunia Rf-PPR592 restorer protein and its non-restoring homolog (bootstrap support values of 0.90 and 1.00, respectively). Most of the Brassica proteins fell into a distinct cluster most closely related to the Arabidopsis branch containing AtRFL1-3. The exception were those proteins encoded by genes located at the same position as AtRFL4 (At1g12700), PPR2 and Brara.I05097; these formed a distinct phylogenetic cluster (bootstrap support 1.00), suggesting that these Arabidopsis and Brassica proteins have descended from a common ancestor located at the same position in the ancestral genome.
Within the major Brassica clade, proteins in a common genomic location generally clustered together in the tree. The major exception concerned the B. oleracea genes Bol03187 and Bol03188. These proteins formed a cluster distinct from that of their positional counterparts, the B. napus PPR3 proteins, which clustered with PPR4 and its positional B. oleracea counterpart, Bol031408. An interesting situation is observed among the proteins encoded by Brara.I05115 and the genes at the corresponding positions in the B. oleracea and B. napus genomes, B. oleracea Bol03170 and Bol03171 and Bn 115. Although the Rf-region is derived from a B. rapa (A genome) ancestor, the B. napus and B. rapa 115 proteins each group with a different B. oleracea protein. Conceivably, orthologs of both genes were present in the common ancestor of the sequenced varieties of B. napus and B. oleracea, and different orthologs were lost during the subsequent evolution of the two A genome forms.
Our manual annotation of the portion of A. lyrata genome enriched in subgroup 1 RFL genes [22] led to the identification of 10 distinct genes. AlyRFL1-10, one of which (AlyRFL10) had too few PPR domains to merit inclusion in the phylogenetic tree. The two A. lyrata genes located at the position of AtRFL2/At1g12300 formed a distinct clade most closely related to the group of A. thaliana RFL genes encompassing AtRFL2 as well as the closely related genes AtRFL3/At1g12620 and At1g12775. The position of these two lyrata genes in the tree is consistent with a model in which they arose through a tandem duplication of an AtRFL2-like gene in an ancestral Arabidopsis genome, as proposed above. Similarly, AlyRFL3, AlyRFL4 and AlyRFL5 formed a monophyletic group with AlyRFL6 and AlyRFL8, as would be predicted if there were a gene triplication followed by duplication of the three gene set. The exception to this model concerns AlyRFL7, which was predicted to cluster with AlyRFL6 and AlyRFL8 but instead groups with AlyRFL9. Interestingly, although the coding sequence of this AlyRFL7 is more closely related to AlyRFL9 than to AlyRFL6/8, more similarity is observed in the regions upstream and downstream of the gene to the corresponding sequences in the AlyRFL3-5 region. Conceivably, AlyRFL7 underwent a gene conversion event involving an AlyRFL9 like sequence following duplication of the three gene region.