Comparative sequence analysis of Solanum and Arabidopsis in a hot spot for pathogen resistance on potato chromosome V reveals a patchwork of conserved and rapidly evolving genome segments

Background Quantitative phenotypic variation of agronomic characters in crop plants is controlled by environmental and genetic factors (quantitative trait loci = QTL). To understand the molecular basis of such QTL, the identification of the underlying genes is of primary interest and DNA sequence analysis of the genomic regions harboring QTL is a prerequisite for that. QTL mapping in potato (Solanum tuberosum) has identified a region on chromosome V tagged by DNA markers GP21 and GP179, which contains a number of important QTL, among others QTL for resistance to late blight caused by the oomycete Phytophthora infestans and to root cyst nematodes. Results To obtain genomic sequence for the targeted region on chromosome V, two local BAC (bacterial artificial chromosome) contigs were constructed and sequenced, which corresponded to parts of the homologous chromosomes of the diploid, heterozygous genotype P6/210. Two contiguous sequences of 417,445 and 202,781 base pairs were assembled and annotated. Gene-by-gene co-linearity was disrupted by non-allelic insertions of retrotransposon elements, stretches of diverged intergenic sequences, differences in gene content and gene order. The latter was caused by inversion of a 70 kbp genomic fragment. These features were also found in comparison to orthologous sequence contigs from three homeologous chromosomes of Solanum demissum, a wild tuber bearing species. Functional annotation of the sequence identified 48 putative open reading frames (ORF) in one contig and 22 in the other, with an average of one ORF every 9 kbp. Ten ORFs were classified as resistance-gene-like, 11 as F-box-containing genes, 13 as transposable elements and three as transcription factors. Comparing potato to Arabidopsis thaliana annotated proteins revealed five micro-syntenic blocks of three to seven ORFs with A. thaliana chromosomes 1, 3 and 5. Conclusion Comparative sequence analysis revealed highly conserved collinear regions that flank regions showing high variability and tandem duplicated genes. Sequence annotation revealed that the majority of the ORFs were members of multiple gene families. Comparing potato to Arabidopsis thaliana annotated proteins suggested fragmented structural conservation between these distantly related plant species.


Background
The potato (Solanum tuberosum) is the most important crop of the Solanaceae. It is a tetraploid, non-inbred, annual plant species that is vegetatively propagated by tubers. Polyploidy and inbreeding depression prevent the generation of homozygous lines. When the ploidy level is reduced from 4n to 2n, the diploid potatoes are self incompatible. Potato genotypes at all ploidy levels are therefore heterozygous [1]. The basic chromosome number of potato is twelve and its genome size is in the order of 800 to 1000 megabases, similar to the closely related tomato (Solanum lycopersicum). Detailed RFLP (restriction fragment length polymorphism) linkage maps have been constructed for the twelve chromosomes [2][3][4][5]63], which were subsequently used to locate in the potato genome factors controlling monogenic and polygenic traits of agronomic relevance such as resistance to pests and pathogens or tuber quality (e. g. starch and sugar content) (reviewed in [1,6]). When using the same locus specific DNA-based markers in different mapping populations, the positional information of the mapped factors controlling qualitative and quantitative traits can be compared and integrated. This comparison showed that a number of the factors which control qualitative (R genes) or quantitative resistance (QRL = quantitative resistance loci) to different types of pathogens map to similar positions. These chromosomal regions are socalled hot-spots for pathogen resistance. One of the most conspicuous resistance hot-spots in the potato genome is located on potato chromosome V, in a chromosome segment tagged by the DNA-based markers GP21 and GP179. The 3 cM interval between GP21 and GP179 [7] includes the R genes Rx2 and Nb both for resistance to Potato Virus X [8,9] and the R1 gene for race-specific resistance to the oomycete Phytophthora infestans causing late blight [10,7]. The same markers are also linked to QRL for P. infestans [11][12][13][14] and QRL for the root cyst nematodes G. rostochiensis and G. pallida [15][16][17][18]. As shown by QTL mapping [12][13][14]19,20], this region on potato chromosome V not only contains genes for resistance to various pathogens but also genes controlling plant vigor, plant maturity (the time the plant needs from planting to reach maturity under long day conditions), tuber yield, tuber starch and tuber sugar content.
Two R genes from the chromosome V resistance hot-spot have been functionally characterized, Rx2 for extreme resistance to Potato Virus X [21] and R1 for resistance to P. infestans [22]. Both R genes are members of the superfamily of plant resistance genes characterized by a coiled coil (CC), a nucleotide binding (NB) and a leucine rich repeat (LRR) domain [23], but otherwise share low sequence similarity. R1 has been introgressed from the allo-hexaploid -wild potato species Solanum demissum into the cultivated potato germplasm pool [1,25] and is one member of a clustered gene family in the GP21-GP179 interval [22,25]. The molecular basis of the QRL for late blight and root cyst nematodes in the same region is unknown. One possibility is that alleles of the R1 and/ or the Rx2 gene, or other members of the R1 gene family and/or another resistance-gene-like (RGL) family in this genome region encode the factors for the quantitative resistance phenotypes, similar to classical plant genetic studies, where resistance loci with multiple specificities to different races of a pathogen may be alleles of the same gene, or tightly linked genes [11,26]. However, the resolution of QTL mapping in this region of the potato genome, even when based on large populations related by descent [27] is low. Genes physically linked to the R1 family, but structurally and functionally unrelated to R1 or other RGLs, cannot be excluded as candidates for the QRL of interest. High resolution mapping and map-based cloning of QTL based on recombinant inbred populations or near isogenic lines [28] is not feasible in potato due to selfincompatibility. As an alternative approach, genomic sequencing and annotation can provide information on all putative genes present in the whole region, which then might be further examined for function as quantitative trait loci, first in silico by functional annotation and then experimentally by analysis of natural allelic diversity of positional candidates and complementation analysis using candidate gene allelic variants [28]. Powerful bioinformatic tools for gene annotation [29] and functional analysis of sequence related genes in model plants such as Arabidopsis thaliana and rice can facilitate the selection of functional candidate genes among all the positional candidates in a genome segment harboring QTL.
Parts of the genomic region corresponding to the GP21-GP179 interval in S. demissum were sequenced and three different haplotypes A, B and C equivalent to the three homeologous chromosome pairs of S. demissum were identified [25]. They demonstrated substantial structural variation among the haplotype sequences. In this paper, we report the genomic sequence analysis of two orthologous chromosome segments of the heterozygous, diploid Solanum tuberosum genotype P6/210 in the same GP21-GP179 interval. Two independent bacterial artificial chromosome (BAC) contigs, corresponding to the homologous chromosomes of P6/210 were constructed, sequenced and annotated, thereby extending the genomic sequence information available in this functionally important region of the potato genome. Comparative sequence analysis revealed regions with severe structural distortions and deviations from gene by gene co-linearity, which are flanked by conserved regions showing microsynteny with the A. thaliana genome.

Contig construction and BAC sequencing
The potato clone P6/210 used to construct the BAC libraries was heterozygous for R1 (R1/r1). Physical mapping in the context of cloning the R1 resistance gene had identified overlapping BAC insertions that originated from the homologous chromosomes carrying either the R1 resistance allele or an r1 allele for susceptibility [22]. In order to obtain two contigs, one for each homologous chromosome in the R1 genomic region, the physical map was extended. Subsequently, we refer to the two contigs as the R1-contig and the r1-contig.
The R1-contig was extended proximally, starting from BA27c1 by clone BC93c12 (Figure 1). This BAC was identified by PCR screening of the BC library using specific primers that were developed based on the sequence of the proximal BAC end of BA27c1. Additional 34 BACs from this region, all containing one or more members of the R1 gene family, were identified by colony filter hybridization using as probe a 1.4 kb DNA fragment amplified with R1 gene specific oligonucleotides 76-2sf2 and 76-2SR [22]. Among the clones containing R1 homologous genes were BA132h9 and BA213c14. Clone BA132h9 overlaps with BC93c12, and BA213c14 overlaps with BA87d17 ( Figure  1). The gap between BA132h9 and BA87d17 was bridged by clone BC136j5 (Figure 1). From the seven BAC clones that constitute a minimal tiling path including the R1 locus, six BAC inserts were fully and one (BC93c12) was partially sequenced (Table 1). The r1-contig consisted of clones BA122p13 and BA76o11 and was extended proximally by clone BA111o5 (Figure 1). All three clones were fully sequenced (Table 1).

Genomic sequence analysis
The 620, 226 kbp genomic sequence obtained from seven R1-and three r1-BAC insertions was assembled into two distinguished, unambiguous stretches of DNA sequence corresponding to the R1-and r1-contig of 417,445 base pairs [GenBank:EF514212] and 202,781 base pairs [Gen-Bank:EF514213], respectively, using MegaMerger. Overall GC content in the R1-and r1-contig is 33.26% and 34.11%, respectively.
The two contig sequences share highly similar and collinear regions, but these are interrupted by more variable regions ( Figure 2). In addition, an inversion and an inverted repeat were identified. Tandem repeats corresponding to six copies of R1 homologous genes in the R1 contig and three copies in the r1 contig were identified. These repeats are embedded in a region of non-alignable sequence.
For easier reference, we label regions along the R1 contig from A to F ( Figure 2) based on features revealed by comparison of the R1 with the r1 contig using MUMmer (Figure 2). For region A, no sequence was obtained from r1. Region B is highly similar to the start of contig r1. In region C, co-linearity and alignment are disturbed, and similarity is primarily detected in three tandem repeats. No similarity to r1 is detected in region D. Region E (69,850 bp) again aligns well with r1, but in reverse orientation, indicating a genomic inversion. Region F, resembling region C, is not aligned well but contains two tandem repeats that are highly similar to the tandem repeats in region C, but in inverse orientation. Region B contains a palindromic structure discussed in more detail below.
Physical map and position of the R1-and r1-contig in the genetic interval GP21-GP179 on potato chromosome V and their annotated gene content

Gene content
We predict that the R1 contig contains 48 genes, seven of which are transposons and three are pseudogenes. Thirteen protein coding genes on the r1 contig can be identified as orthologs of collinear genes on R1. However, on R1 there are two additional protein coding genes. Six transposon genes are specific to r1, and for one pseudogene, we could not identify a partner on R1. On average, one ORF is annotated in every 9 kbp of genomic sequence. The average GC content of exons is 39%.
A putative function was assigned based on similarity to proteins of known function with an E-value smaller than 10 -10 ( Table 2, details in Additional file 5). Thirteen genes are related to retrotransposons. Two multigene families, Fbox proteins and NBS (NBS-LRR) type resistance genes, are represented by several members. Six proteins on contig R1 and three on r1 are members of the R1 gene family. At least eleven F-Box proteins are found on R1, but none of these are found in r1-contig.
Where sequence from both contigs was available, most proteins and some transposon-related genes are conserved between the R1 and r1 contig ( Figure 1 and Additional file 5). Notable exceptions are 13 retroelements, six of which were found only in the r1 contig (ORF 14/1, ORF 19/1, ORF 21, ORF 49, ORF 51 and ORF 53) and 7 in the R1 contig (ORF 7, ORF 9, ORF 11, ORF 12, ORF 25, ORF 28 and ORF 40). The palindromic structure in region B is formed by an inverted repeat of two highly similar RNAdirected RNA polymerases (ORFs 14 and 16), separated by a hypothetical protein (ORF 15). The r1 specific retrotransposon 14/1 is inserted between the inverted repeats. Region E contains five genes that are conserved between R1 and r1 but in reverse order and orientation, indicating a genomic inversion. The proximal two genes are members of the R1 family, ORF 44 being the functional R1 resistance gene (accession AF447489) and the following ORF 45 is a tandem duplicate. ORFs 44 and 45 are conserved in sequence, but having inverse order and orientation, with resistance gene homologues 52 and 54 on contig r1, suggesting that the inversion includes these two genes. However, proteins 44 and 45 are more similar to each other than to 52 or 54, indicating that they may have arisen through tandem duplication after the inversion event. Another resistance gene homologue (ORF 46) follows on contig R1, but is less similar to the other R1related genes. Also two genes (ORFs 47 and 48) could not be related to any locus on contig r1. This suggests that the proximal breakpoint of the genomic inversion in contig R1 is after gene 44 or 45.
To map the distal breakpoint of the inversion event on contig R1, we used sequence from S. demissum BAC PGEC093P17 (accession AC149290), as the sequence of contig r1 did not extend sufficiently far in proximal direction. The PGEC093P17 sequence contains orthologs of proteins 43, 42, 41, 39 and 38 in the same order and orientation as found on contig r1, therefore inverted with respect to R1. As in r1, the R1 specific transposon ORF 40 is not found in PGEC093P17 (Fig. 1). Proximal to protein 38 followed probable orthologs of proteins 37, 36 and 35 Dot-Plot comparison done using MUMmer of the DNA sequences of the R1-and r1-contig in reverse order and orientation compared to contig R1, indicating that these three genes are part of the inversion. The remaining PGEC093P17 sequence proximal to ORF 35 contains only transposon fragments and hypothetical proteins, which are unrelated to genes distal to ORF 35 (mainly F-box genes, Table 2) in the R1-contig. This maps the distal inversion breakpoint in contig R1 between genes 34 and 35. As consequence of the inversion, genes 23 to 34 in the R1 contig are probably part of a region that is, at least at this genomic location, R1 specific. This region includes two R1 homologous genes (ORF 23 and 24) in tandem orientation to ORF 22, two transposon-related genes and a series of eight F-box genes. For the flanking regions of contig R1, we found almost perfect gene-bygene co-linearity to r1 or PGEC093P17. We could not assign three genes in contig r1 to any collinear region, two of which are transposons and one resembles a fragmented resistance gene.

Microsynteny with the A. thaliana genome
Five microsyntenic blocks were identified in A. thaliana based on the arbitrary criterion of finding at least three pairs of homologous protein sequences within a compa-rable physical distance (  Figure 1). The largest syntenic region identified in both species is block III, includ- ORF No. (1) Manual Functional Annotation (2) ORF No. (1) Manual Functional Annotation (2)

Genome structure
We found extensive co-linearity of protein-coding genes, interrupted by unilateral insertions of retrotransposons and a region of highly diverged DNA sequence in the vicinity of clusters of tandem duplicated genes. This corresponds to findings in the orthologous genomic region of hexaploid S. demissum [25]. We show that a 70 kb region containing ten protein-coding genes is inverted in both the R1 contig and presumably also the A haplotype of S. demissum. While sequence similarity at the nucleotide level was not sufficient to precisely map the inversion breakpoints in the intergenic regions, order and orientation of homologous gene pairs were consistent without exception and allowed to estimate the position of the inversion breakpoints. This inversion is not evident in the available sequence of the S. demissum A haplotype, as BAC PGEC472P22 (accession AC151815) mainly contains genes from the inversion and only few beyond the inversion breakpoint. The gap in the sequence of haplotype A presumably led to BAC PGEC472P22 being oriented to achieve co-linearity with B and C haplotypes. Kuang et al.
[25] do not present any data, e.g. mapping of BAC ends or overlaps with neighboring BAC clones, to verify the orien- tation. Thus, we take the high sequence similarity, uninterrupted across the presumed inversion break point, between the R1 contig and PGEC472P22 to indicate that the inversion also exists in the A haplotype (Additional files 3 and 4).
Currently, there are only few examples where comparative structural analysis of orthologous genome segments was performed in crop plants over several hundred kb. In all cases reported, micro-structural diversity was found, e.g. in Zea mays [30,31]. In maize the major contribution to diversity stems from LTR retrotransposons. In our analysis transposon related genes are less conserved, also suggesting recent insertions and deletions. In tomato (S. lycopersicum), a segment on chromosome 6 containing the Mi-1 gene for resistance to root knot nematodes, which has been introgressed from S. peruvianum, was shown to be inverted in resistant when compared to susceptible tomato genotypes [32]. The situation at the potato R1 locus described here remarkably resembles this finding. The R1-contig is part of a genome fragment of unknown size introgressed from S. demissum into S. tuberosum, whereas the r1-contig originated either from S. tuberosum or S. spegazzinii, another closely related tuber bearing Solanum species. P40, the parental donor of the r1 allele, was an inter-specific hybrid between S. tuberosum and S. spegazzinii [33]. The structural differences between the homologous chromosomes could interfere with chromosome pairing and crossing-over during meiosis, explaining the low frequency of recombination observed in this region [22]. Similarly, in regions on tomato chromosome 6 and 11, where the resistance loci Mi and Tm2a, respectively, have been introgressed from the wild species S. peruvianum, a high degree of recombination suppression was observed [32,34].
In the R1 contig, the inversion seems to have separated the R1 resistance gene from a tandem array of R1 homologs (proteins 22, 23 and 24). We attempted to date the inversion relative to the duplications of tandem resistance genes by phylogenetic analysis of the protein sequences, but the results were inconclusive (data not shown the R1-and r1-contig identified haplotypes A and B/C as most similar but not identical to the R1-and r1-contig, respectively. Similarity of individual homologous protein pairs between B and r1 or C and r1 ranges between 90 and 99% identity on the amino acid level, with most, but not all proteins slightly more similar between C and r1 than between B and r1. On the other hand, the B haplotype is structurally more similar to r1, as C shows no tandem repeated R1 homologous genes but only one single copy. Remarkable is the discovery of highly conserved and seemingly rapidly evolving genome regions in close vicinity. The distinguishing feature, aside from the observed breakdown of nucleotide sequence similarity in non-coding regions and the lack of gene-by-gene co-linearity, is the presence of tandem repeated genes, namely R1 homologs and F-box containing genes. Such tandem arrays have been found in other hypervariable genome regions [35] and may have lead to a greatly enhanced rate of evolution due to relaxed selection pressure on duplicated genes. Neighboring unique sequences are, in contrast, highly conserved. In the variable region, the R1 and r1 contigs and the three S. demissum haplotypes show striking structural variation, ranging from lack of the variable region in S. demissum haplotype C to the expanded set of F-box proteins found in the R1 contig. The latter are missing from r1 and B. Instead, in the B haplotype the R1 homologous gene tandem array is more expanded ( figure  3).

Encoded proteins
We found a gene density of one gene every 9 kb, which is similar to previous findings in S. demissum (7.6 kb, [24]) and tomato (8 kb, [36,37]), but lower than A. thaliana (5 kb, [38]) and rice (6 kbp, [39]), and higher than in barley, Comparison between gene loci of the R1-and r1 contig as well as of the A, B and C haplotype of S. demissum where only three genes were found in a stretch of 60 kbp genomic DNA [40]. The overall GC content was 37% and 39.5% within the putative gene coding regions. These values are comparable to tomato (37% overall GC content and 42% in coding regions, [36]) and A. thaliana (36% overall GC content and 44% in coding regions, [38]) but lower than in rice (44% overall GC content and 54% in coding regions, [39]) and maize (47% overall GC content and 55% in coding regions.  (Figures 1 and 3). Allelic relationships between the R1 homologues could not be deduced with certainty, but proteins 22 and 22/1 might be allelic based on collinear positions in the R1-and r1-contig, whereas proteins 44 and 45 might be allelic to 54 and 52 respectively as they are collinear under the assumption that they are part of the proposed genomic inversion.

The R1 gene family and disease resistance QTL
Most of the molecular characterized plant R genes are members of tightly linked gene families [41] [42]. Furthermore, an F-box domain was identified in the SGT1 protein that was shown to play a role as co-chaperon in the stabilization of R-proteins [43,44]. With the annotation of the sequenced region, the list of positional candidate genes for the QTL is certainly not complete, as the sequence covers only part of the GP21-GP179 interval. To ultimately validate the role of any candidate gene for a QTL, complementation analysis with allelic variants is required. Unless high-throughput methods for complementation analysis become available, strategies to reduce the number of candidate genes to be considered for complementation analysis are necessary. For example, we perform functional testing by expression studies and by down-regulation of candidate gene expression by antisense or RNAi approaches [45]. The model plant A. thaliana may also be used to study the function of genes that are most closely sequence-related to potato positional candidate genes. Unfortunately, this approach may not be applicable to the F-box family, as this is also a highly expanded gene family in A. thaliana with diverse cellular roles.

Synteny with A. thaliana
We identified at least five microsyntenic relationships between the R1 contig and A. thaliana. These cover varying stretches of the genome, ranging from just four consecutive genes within 7 kb of A. thaliana and 25 kb of potato to basically the entire R1 contig, covering 405 kb. Frequent insertion-deletion events can be detected. Similar patterns of interrupted co-linearity on the DNA sequence level were found among cereals [46] and between A. thaliana and rice [47]. In the highly collinear tomato genome [2,5], genomic sequences of 57 kbp and 106 kbp on chromosome 2 [48] and 7 [37], respectively, have been compared to the A. thaliana genomic sequence. These studies revealed syntenic blocks of comparable redundancy and size with respect to the A. thaliana syntenic regions. In contrast to these previous studies, the contiguous potato sequence compared was 4 to 8-times longer. This revealed that the syntenic potato genes in the R1-contig were organized in three clusters (ORF 2 to 5, ORF 17 to 19 and ORF 38 to 48) that were separated by two non-syntenic regions (ORF 6 to 16 and ORF 20 to 37). In the r1-contig, a non-syntenic region (ORF 20 to 54) separated two syntenic regions (ORF 17 to 19 and ORF 38 to 43).
The most notable of the syntenic relationships spans almost the complete R1 contig (405 kbp) and 54 kbp of A. thaliana chromosome 1. Seven genes are conserved in sequence, order and orientation, except for two from region E that show reverse order and orientation compared to A. thaliana. This could indicate that the genomic inversion occurred in the R1 lineage after the divergence of A. thaliana and potato, with r1 and S. demissum B and C haplotypes showing the ancestral orientation. In the R1 contig, a large number (18) of genes do not show synteny, whereas in the A. thaliana region this only applies to five genes. The discrepancy is less pronounced if the 17 tandemly duplicated genes in potato are ignored.
The non-syntenic regions correspond to the highly divergent regions between R1 and r1 and included all but one (ORF 40) transposon sequences, all F-box-containing genes and six of the ten resistance-gene-homologues. The annotation of the A. thaliana syntenic regions identified, besides the sequence related ORFs, some transposon sequences but only one F-box-containing gene and no resistance gene homolog. Moreover, the non-syntenic regions in the R1-and r1-contigs coincided with regions II and IV in S. demissum, which showed the highest divergence between the homeologous chromosome segments A, B and C [25]. This suggests that the genome of potato and related species in the sequenced region consists of a patchwork of faster and more slowly evolving segments.
The sequenced potato genomic segment covers a genetic distance of only 0.1 Centimorgan. At a hundred times larger scale, when genome-wide genetic maps of potato, sunflower, sugar beet and Prunus were compared to the A. thaliana physical map (macrosynteny), syntenic blocks from 1 to 20 Centimorgans were identified. A common fraction of the genomes of these distantly related plant species appear to have been conserved throughout the evolution of the dicots, when compared to the rest of the genome [49]. The GP21-GP179 interval was not part of a macrosyntenic block between potato and A. thaliana [4].

Conclusion
Two contiguous sequences of 417,445 and 202,781 base pairs were assembled and annotated for a region on potato chromosome V, which contains genes controling several agronomic traits. Comparative sequence analysis revealed highly conserved collinear regions that flank regions showing high variability and tandem duplicated genes. The co-linearity between the homologous chromosomes was disrupted by non-allelic insertions of retrotransposon elements, stretches of diverged intergenic sequences, differences in gene content and gene order. The latter was mainly caused by inversion of a 70 kbp genomic fragment.
Annotation of the genomic sequence identified 48 putative open reading frames (ORF) in one contig and 22 in the other, with an average of one ORF every 9 kbp. The majority of the ORFs were members of multiple gene families. Ten ORFs were classified as resistance-gene-like, 11 as F-box-containing genes, 13 as transposable elements and three as transcription factors. Comparing potato to Arabidopsis thaliana annotated proteins revealed five micro-syntenic blocks of three to seven ORFs with A. thaliana chromosomes 1, 3 and 5, suggesting fragmented structural conservation between these distantly related plant species.

BAC plasmid DNA isolation
A single colony was pre-cultured in 250 μl LB medium including 12.5 mg/l tetracycline for clones from the "BA" library and 12.5 mg/l chloramphenicol for clones from the "BC" library. The pre-culture was used to inoculate 50 ml LB medium containing the corresponding antibiotic. Plasmid DNA was isolated from 50 ml overnight culture using the QIAGEN Plasmid Midi Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions.

BAC-library screening
High-density colony filters of the libraries were prepared and screened by colony-hybridization as described [22]. The 'BC' library was also screened by the polymerase chain reaction (PCR) after isolating plasmid DNA from bacterial cells pooled at three levels: mini-pools, maxi-pools and super-pools. One thousand fifty six mini-pools were made from 96 clones each, 264 maxi-pools were prepared from 384 clones each (four mini-pools) and the 88 super-pools consisted of 1152 clones each (three maxi-pools). To identify a single positive clone, four rounds of PCR screening were performed. First, the DNA of the 88 super-pools was used as template. Second, the three maxi-pools constituting a positive super-pool were screened. Third, the four mini-pools of a positive maxi-pool were amplified and last, the 96 clones of the positive mini-pool were screened individually.

Physical mapping
Overlapping BACs were identified and ordered in two contigs corresponding to the two homologous chromo-somes of genotype P6/210 as described previously [22]. In short, BAC insertion ends were sequenced using T3 and T7 oligonucleotides as sequencing primers. The end sequences were used to detect overlaps, either based on 100% sequence identity with already sequenced BACs or by generating amplicons with identical sequences in different BACs. BAC contigs were assigned to either one or the other homologous chromosome by identifying DNA polymorphisms in insertion end sequences that were specific either for the allele inherited from parent P41 or parent P40.

Genomic DNA Sequencing and Assembly
Whole BAC clones were sequenced by the shotgun sequencing strategy. Custom sub-libraries of the BACs were prepared by GATC-Biotech AG (Konstanz, Germany). After physical fractionation of the BAC DNA, the random sheared fragments were blunt-ended by using T4 DNA-Polymerase and then ligated into the pCR4Blunt-TOPO vector (Invitrogen, California, USA). Approximately 1300 clones containing ca. 1.5 kbp insertions and 300 to 400 clones with 4 to 5 kbp insertions were produced for each BAC. The smaller inserts were amplified by colony-PCR using as primers TO2f (5'-agcggataacaatttcacacagga-3') and TO2r (5'-gacgttgtaaaacgacggccagtg-3').
The PCR was performed in a volume of 100 μl containing 10 pmol of each primer, 0.2 mM dNTPs and 1.5 mM MgCl 2 , 0. was used for sequence assemblies, comparisons and alignments. The sequences of overlapping BAC insertions were assembled after trimming any vector sequences using the SeqMan module of Lasergene and the Megamerger program, which is part of the EMBOSS package [52].

Sequence analysis
Dotter and MUMer [54] were used to align and compare genomic sequences. Putative exons and open reading frames (ORFs) were predicted by the programs Gen-Mark.hmm [54], FGeneSH [55] and by alignment of EST (Expressed Sequence Tags) and protein sequences using GenomeThreader [56]. Genes were annotated by combining predicted ORFs from these gene finder programs with alignments of homologous sequences in public databases using the Apollo Genome Annotation Curation Tool [57] (Additional files 1 and 2). All genes were also manually annotated for putative function. Functional descriptions of homologous genes in the SWISSPROT database [58] were compared with homologous protein domains and patterns in the InterPro database [59]. Homologous genes in the SWISSPROT database were identified using BlastP [60] and homologous protein domains and patterns were identified by InterProScan [61]. Inparanoid [62] and BlastX [60] were used for the identification of similar genes in A. thaliana. The deduced amino acid sequences of the annotated ORFs were compared to A. thaliana annotated proteins from The A. thaliana Information Resource (TAIR) Release 6 [64]. The threshold criterion for accepting sequence similarity as significant was an E-value < 10 -10 for BLASTP searches. Polyproteins and transposable elements were excluded from the comparison because of their limited information value. For the same reason, ORFs with more than 30 hits in the A. thaliana genome were also excluded [4]. A two-dimensional array was generated with the potato physical map of 420,000 bp in one dimension and the A. thaliana physical map of 121 mb in the other [4]. Hits of the putative potato proteins with A. thaliana annotated proteins were positioned in the array according to their base pair coordinates on the local potato and genome wide A. thaliana physical maps. Syntenic blocks were identified based on the criterion that at least three different ORFs within the potato contigs found hits within an A. thaliana genome fragment of similar size.