Construction of the BAC and PAC contig and analysis of end sequences
We previously reported the construction of a 1.2 Mb BAC/PAC contig on SSC 6q1.2 [8]. To extend the existing contig the porcine TAIGP714 PAC and RPCI-44 BAC libraries were screened with new probes either derived from end fragments of previously isolated porcine genomic clones or from human HSA 19q13.1 genes. Assembly of all 171 isolated BAC and PAC clones according to STS content, insert sizes and fingerprinting data resulted in the expansion of the existing 1.2 Mb contig [8] to 2.0 Mb and the generation of a new 1.2 Mb contig (Fig. 1). End sequences from all clones of the contig were generated and submitted to the EMBL database under accessions AJ514457-AJ514832. In total 292 end sequences from SSC 6q1.2 with an average read length of 708 bp totaling 207 kb of genomic survey sequences were generated. Thus, the BAC/PAC end sequences cover approximately 6 % of the studied genomic region. The end sequences contain an average GC content of 47 % exceeding the value of 41 % that is generally accepted as the average GC content in mammalian genomes [9]. The GC content analysis further confirms that SSC 6q1.2 is indeed closely related to HSA 19q13.1, which has a GC content of 46 % in the corresponding 4 Mb region. An analysis of repetitive elements revealed that 39.8 % of the end sequences consisted of repetitive DNA. Of the 39.8 % repetitive DNA, 20.5 % were SINE, 13.3 % were LINE, 2.4 % were of retroviral origin (LTRs), and 2.0 % represented DNA transposons. The predominance of SINEs is another typical hallmark of GC-rich and gene-rich genome segments [10]. The analysis of the end sequences also revealed three dinucleotide and one tetranucleotide microsatellite (AJ514594, AJ514613, AJ514706, AJ514795).
The availability of the end sequences allowed the continuous verification of the contig assembly by comparative mapping. In BLAST searches against the human draft genome sequence, approximately 15 % of the BAC/PAC end sequences showed significant (E < 10-5) matches to HSA 19q13.1, which allowed the precise comparative mapping of 27 % of the tested BAC/PAC clones. Of the investigated clones, 73 % had no match in the human genome sequence, 23 % had matches with one end sequence, and 4 % had matches with both end sequences.
Physical mapping and comparative analysis
During the contig construction many gene-specific STSs were used, which allowed the unequivocal assignment of genes to individual clones. Further genes were localized by hybridization of heterologous cDNA probes to the individual BAC/PAC clones and BLAST analysis of the clone end sequences. Using these approaches, 33 genes in total were localized. Furthermore, the microsatellite SW193 was also localized by STS content analysis thus anchor in the physical clone-based map to the linkage map of this region [11].
The gene assignments were compared with human and mouse maps and a comparative map for SSC 6q1.2, HSA 19q13.1 and MMU 7 was developed (Fig. 2). The gene order in this region of the pig genome corresponds exactly to the gene order of the NCBI HSA 19 map (http://www.ncbi.nlm.nih.gov build 31). The gene order of MMU 7 (http://www.ncbi.nlm.nih.gov MGSCv3) also corresponds exactly to the gene order of SSC 6 and HSA 19 but the orientation is inverted. The perfect synteny conservation between mouse and the two other species can only be observed since the latest update of the mouse maps as in the previous mouse genome assembly a major rearrangement of the gene order in this genome region was observed [8].
Whereas the gene order is perfectly conserved between human, mouse and pig, the physical distances between genes vary somewhat between the three species. Within the investigated region the gene-poor stretch between COX7A1 and NEUD4 accounts for the biggest part of these size deviations. The cloned region has a very uneven gene density. At the top and at the bottom of the map (Fig. 2) genes are clustered extremely dense with very short intergenic regions, while in the middle of the map, between the COX7A1 and the NEUD4 gene the gene content is actually very low.
RH mapping
In this study, we were able to build two comprehensive RH maps for SSC 6q1.2. On the 7000 rad IMpRH panel 15 STS markers were genotyped, while on the 12 000 rad IMNpRH2 35 STS markers were analyzed. Retention frequencies of markers ranged from 18.1 % to 32.8 % (average 22.9 %) on the IMpRH panel and from 27.8 % to 44.3 % with an average retention frequency of 37.5 % on the porcine IMNpRH2 panel.
During the building of these two contigs, we simultaneously analyzed data obtained on both IMpRH and IMNpRH2 panels using the Carthagene program. Intermediate rough analyses of RH data allowed us to monitor the construction of the contig. In particular it allowed us to orient a subcontig in the gene poor region from ITZ002 to ITZ004 as well as to estimate the size of remainmg gaps.
When the full RH data set was available for both panels, it appeared that at the scale of 10–100 kb, the degree of resolution of the IMpRH panel is not high enough, and furthermore the order of genes that could be determined on this panel is very sensitive to some small genotyping errors. To produce a final reference map we thus computed a 1000:1 framework map using only the 35 vectors produced on IMNpRH2 panel. The framework status of the map was tested by calculation of likelihood of maps produced after all local permutations in a slipping window of 6 markers, and by global local inversions. We confirmed that no altemate order could be identified with a difference of log likelihood of less than 3 compared to the proposed order. The framework map contained 24 of the 35 IMNpRH2 markers. Using this framework map comprehensive maps were produced on each panel. In order to avoid inflation of the map size, we chose to project additional markers at their most likely location, without altering the multipoint distance between framework markers (Fig. 3).
As shown in figure 3, the gene orders on the RH and physical maps are generally in good agreement. This agreement is perfect between the physical map and the 1000:1 framework RH map produced on IMNpRH2 panel. It demonstrates that at the 50–100 kb scale, fully accurate maps can be produced on this panel provided that 1000:1 framework maps are drawn.
Some minor discrepancies can be found when comprehensive maps are drawn. For instance, the location of SPTBN4 on the IMNpRH2 map seems incorrect. However a difference of log likelihood of only 1.57 is found between the maps constructed under the most likely order and the expected order. We thus think that our RH data do not sufficiently support the hypothesis of a very small rearrangement of this region. It should be pointed out, that even if additional markers are added at their most likely location on this kind of comprehensive map, their mapping does not affect the distance calculated between framework markers.
We also compared the resolution of both panels on the framework map established between COX7A1 and BLVRP. On IMpRH the distance is 146 cR7000, whereas the same fragment is 438 cR12000 long on IMNpRH2. In this region the ratio between the resolutions is thus 3.01, which is slightly higher than the value of 2.77 observed in the PRKAG3-RN region [3] and of 2.43 observed in a QTL region close to the centromere of SSC 7 [12]. In the gene rich region between RYRl and BLVRP, which is precisely mapped on the reported clone contig, a ratio of 6.6 kb / cR12000 (1370 kb / 207 cR12000) is observed on the IMNpRH2 panel.
The RH map allowed us to confirm the close link between the two contigs we produced. The distance between the extremity markers of the contigs (ITZ002 and ITZ014) was estimated at 43.2 cR12000. Considering a ratio of 6.6 kb/cR12000 in this region, we can estimate that the physical distance between both contigs could be around 285 kb, which is roughly similar to the 360 kb distance that would be estimated from the human-pig comparative map.