An update of the goat genome assembly using dense radiation hybrid maps allows detailed analysis of evolutionary rearrangements in Bovidae
- Xiaoyong Du†1, 6,
- Bertrand Servin†2,
- James E Womack3,
- Jianhua Cao1,
- Mei Yu1,
- Yang Dong4, 5,
- Wen Wang4Email author and
- Shuhong Zhao1Email author
© Du et al.; licensee BioMed Central Ltd. 2014
Received: 24 October 2013
Accepted: 10 July 2014
Published: 23 July 2014
The domestic goat (Capra hircus), an important livestock species, belongs to a clade of Ruminantia, Bovidae, together with cattle, buffalo and sheep. The history of genome evolution and chromosomal rearrangements on a small scale in ruminants remain speculative. Recently completed goat genome sequence was released but is still in a draft stage. The draft sequence used a variety of assembly packages, as well as a radiation hybrid (RH) map of chromosome 1 as part of its validation.
Using an improved RH mapping pipeline, whole-genome dense maps of 45,953 SNP markers were constructed with statistical confidence measures and the saturated maps provided a fine map resolution of approximate 65 kb. Linking RH maps to the goat sequences showed that the assemblies of scaffolds/super-scaffolds were globally accurate. However, we observed certain flaws linked to the process of anchoring chromosome using conserved synteny with cattle. Chromosome assignments, long-range order, and orientation of the scaffolds were reassessed in an updated genome sequence version. We also present new results exploiting the updated goat genome sequence to understand genomic rearrangements and chromosome evolution between mammals during species radiations. The sequence architecture of rearrangement sites between the goat and cattle genomes presented abundant segmental duplication on regions of goat chromosome 9 and 14, as well as new insertions in homologous cattle genome regions. This complex interplay between duplicated sequences and Robertsonian translocations highlights the rearrangement mechanism of centromeric nonallelic homologous recombination (NAHR) in mammals. We observed that species-specific shifts in ANKRD26 gene duplication are coincident with breakpoint reuse in divergent lineages and this gene family may play a role in chromosome stabilization in chromosome evolution.
We generated dense maps of the complete whole goat genome. The chromosomal maps allowed us to anchor and orientate assembled genome scaffolds along the chromosomes, annotate chromosome rearrangements and thereby get a better understanding of the genome evolution of ruminants and other mammals.
KeywordsGoat Radiation hybrid map Genomic rearrangements Gene duplication
The domestic goat (Capra hircus) is an important source of food and wool (cashmere) especially important in low input production systems. The ability of goat to consume a wide range of vegetation explains its significance for the agricultural economy. In 2007, the FAO estimated the world population of goats to be around 800 million animals. Despite this economic importance, goat genomic resources are not extensive compared to other livestock species, such as cattle, pig, chicken, or sheep. Goat is a ruminant and a member of the Caprinae order, a clade distinct from Bovinae in the Bovidae family. Its last common ancestor with cattle is dated between 19 and 40 million year (Myr) ago, and with sheep between 6.2 and 8.5 Myr ago . Comparative mapping between cattle and goat using fluorescent in situ hybridization (FISH) showed a strong synteny conservation between the two genomes with a few large-scale rearrangements, including the well-known 9–14 translocation differentiating Bovinae and Caprinae and differences in intra-chromosomal organization of the X chromosome [3, 4]. However, the history of genome evolution and chromosomal rearrangements on a small scale in ruminants remain speculative, mostly due to limited number of mapped loci.
High-resolution gene maps provide basic but crucial linear information on the physical organization of the genome and typically serve as the backbone for further genomic research. For example, robust information about marker organization on chromosomes is crucial for linkage analysis and marker assisted animal breeding. Although assembling next-generation sequencing data into a draft genome comprising scaffolds is relatively straightforward, constructing a physical map of the structure of chromosomes is still difficult and costly. Recent years have witnessed a slow increase in gene mapping data in the goat since the first genetic and cytogenetic maps [2, 5]. To date, approximately only 550 loci have been mapped to the goat genome using linkage maps of low resolution . The initial release of the goat genome sequence is based entirely on short-read de novo sequencing of a Yunnan Black goat that yielded a ~2.66 Gb assembly of 284,683 scaffolds (N50 = 3.06 Mb). Anew improved optical mapping technology was used to joined 2,090 scaffolds into 315 larger super-scaffolds (N50 = 16.3 Mb). Finally, super-scaffolds were assembled into chromosomes based on conserved synteny with cattle . The goat genome sequence could yield a wealth of information about gene content and putative regulatory elements sequences; however, it is still lacking long-range continuity and its released form only gives a fragmented view of the genome.
Another genomic resource, radiation hybrid (RH) maps, played a pivotal role in the process of mapping animal genomes and validation of draft assemblies of the genome sequence, especially in mammals [8–11]. For obtaining dense and accurate next-generation RH maps, new methods were developed recently exploiting the increased availability of comparative genomic resources  and have proved successful for the production of RH maps based on SNP array genotyping data [13, 14].
As a part of the goat genome project, we built RH maps of chromosome 1 that helped to validate the draft genome assembly. In this study, using a goat RH panel that we generated recently , we constructed robust physical maps of 45,953 loci for all of goat chromosomes except chromosome Y. The whole-genome goat RH map, as an independent chromosomal map, allowed us to anchor and orientate assembled genome scaffolds along the chromosomes thus permitting the annotation of chromosome rearrangements and the study of their evolutionary history using ruminants and other mammalian genomes.
Results and discussion
Genotype calling of SNPs in the RH clones
In building the RH maps, we sought to provide accurate, detailed and reliable physical maps at the whole-genome scale to aid in the validation and improvement of the goat genome assembly. Genotyping SNP arrays from other species on the goat RH panel turned out to be a successful strategy for the genotype calling of RH clones. A similar strategy of marker selection from close related species had been applied for constructing buffalo RH maps . Because sequence conservation between the three species considered here is high, a number of bovine and sheep SNPs presented a positive signal in the goat. Moreover, the SNPs that did not provide positive signals were particularly useful for the genotype calling procedure as they gave an internal control, within each clone, of the signal exhibited by non-retained SNPs. We believe it highly increased the robustness of the genotype calling procedure, which is ultimately proven by the quality of the RH maps produced and for future studies we would recommend this approach. The feasibility depends on the availability of a SNP array in a closely related species. To test that whether the approach can be carried out in a species, a simple experiment would consist in genotyping of whole genome sample of the species of interest with the related species array and check that (1) a number of SNPs can provide signal and (2) a number of SNPs do not provide signal. Optimally both the categories should be in similar proportion on the array. Recall that the fact that SNPs are actually polymorphic in the species of interest is not relevant for RH mapping.
Summary of the genotype calling results
Ret. rate estimate
Even low levels of error result in large losses of information about breakage probabilities, markedly increase the uncertainty in the marker ordering, and inflate estimates of inter-marker distances and total map length . The high error rate of genotypes for constructing certain maps prevents from recovering sufficient signal for ordering the markers. The ability to produce useful robust maps in comparative RH mapping is impaired when the error rate in the data increases above 10% . Before constructing the integrated robust map, we also checked the vector data from bovine SNPs and Ovine SNPs, respectively, and constructed two robust maps for each chromosome. These maps (data not shown) presented a very low error rate in the RH data, in consistent with the low false negative rate estimated by the Q-value R package . However, we note that the slight differences in experimental conditions between genotyping RH clones on the BovineSNP50K array and on the OvineSNP50K array. It could bring certain systematic errors when merging the two datasets, which we cannot assess, since our RH vectors have not been validated by replicated genotyping. However, the availability of negative control SNPs within each clone offers a replication across clones. For increased robustness, we applied three approaches to remove possible genotyping errors before constructing the integrated robust maps.
Genome map construction and characterization
We portioned 53,075 mapped SNPs (mSNPs) according to their assigned chromosome on CHIR_1.0. For each of the 30 chromosomes, a linkage group was established using RH data alone, which removed 13 unlinked markers.
We then built comparative RH maps with all remaining markers, using prior information on their ordering from assembly CHIR_1.0. We further excluded 1,918 markers, which possibly have errors or have poor information in RH vector using three criteria (see Methods for details). This resulted in 51,144 SNPs being positioned on 30 chromosomal LKH maps.
We extracted a subset of markers from the LKH maps for which the ordering was strongly supported by RH data. This procedure removed about 5,191 SNPs, which could not be confidently ordered, and led to RH robust maps comprising 45,953 SNPs.
The main drawback of current high throughput sequencing technologies is the short length of the reads they produce. Many of the novel assemblies including the current goat draft assembly obtained with these approaches are composed of a very large number of scaffolds. This fragmentation does not affect gene discovery, polymorphism analysis and sequence comparison between closely related species but it greatly limits the study of genome structure and chromosomal evolution. The comparative RH approach incorporated the prior knowledge of marker orders in CHIR_1.0 into the statistical model. It has the key property that a different ordering than that of the reference order will be accepted as an alternative order only when the RH data is informative enough. As a consequence, the robust maps allow pinpointing the regions where the assembly order disagrees with an RH map order that is strongly supported by the RH data, so this approach allowed improvements over classical a framework-based mapping strategy . The RH maps presented here offer a powerful complement to the available genetic/cytogenetic maps with very limited resolutions [2, 6, 20] and the genome sequence map . Our RH panel and RH maps also revealed the potential ability to map the unplaced genes or scaffolds and to check possible misassembled regions in further research.
Validation of the draft genome assembly and anchoring scaffolds on chromosomes
The resolution of discrepancies directly addresses the question of the reliability of the order defined on one side by the genome map and on the other by the assembly. The construction of robust maps was precisely designed to address the reliability of RH maps . In the process of assembling CHIR_1.0, a first draft was produced by standard in silico assembly pipelines yielding numerous scaffolds, followed by an automated optical mapping procedure to assemble scaffolds into super-scaffolds and finally conservation of synteny with cattle was assumed in unassembled regions to organize super-scaffolds into whole chromosomes. For assembling CHIR_1.0, we had constructed two RH maps of chromosome 1 using bovine SNPs and ovine SNPs separately. CHI1 is the longest chromosome and carries many long super-scaffolds. These two maps suggested that the assemblies of super-scaffolds and chromosomes in CHIR_1.0 were accurate .
In this study, we merged all the candidate markers from bovine SNPs and ovine SNPs to construct comprehensive RH maps for the 29 autosomes and the X chromosome. The RH marker positions were compared to their sequence positions, providing a comprehensive link between the two physical maps. Based on this link, we aimed at resolving the discrepancies between the two maps to possibly improve the goat genome sequence. Our approach was to consider contigs and scaffolds construction (a small-scale assembly problem out of range given the RH map resolution) as accurate but to allow modifying their organization along chromosomes. More specifically, to resolve discrepancies and update the goat genome sequence using RH maps, we set the priority, from highest to lowest, to the scaffolds, the RH map, the super-scaffolds and the cattle genome successively. Doing so we consider (1) the scaffold data provide ultimate resolution in local region of genome in comparison of any other map, and the assembly of goat scaffold is comprehensively correct at the local level, (2) while the automated whole-genome optical mapping efficiently generated supper-scaffold data for facilitating the anchoring of scaffolds onto chromosomes of CHIR_1.0, the quality control of the maps have not fully been verified, and (3) assuming conserved synteny with cattle should only be required when no other information is available.
For most markers considered here we were able to assign a position on the genome sequence but still suggest that gaps remain in CHIR_1.0. In chromosome 5 (89.1 Mb between scaffold740 and scaffold666), we underlie a region spanned 72.6 cR haven’t aligned to any goat scaffold data, which mean that a continuous region of over 4.2 Mb missed there in do novo assembling of short sequence. Based on predictions using the 17-mer method, the current genome is ~91% of the estimated ~2.92 Gb size. Filling gaps in the goat genome will be a challenge for both technologies of genome mapping and genome sequencing. Since, all maps have some errors in them then it will be important to integrate information from various sources to allow for these checks to be consistent, given that they are generated essentially independently of one another. An approach to resolve contradicting orders is the exploitation of additional and independent source of information. (1) Inconsistency in marker order between the RH maps and other maps is likely due to inaccuracies in independently mappings. (2) Some of the remaining inconsistencies could be biologically grounded, reflecting breed structural variations. Indeed, the reference sequence and the RH panel were generated using the DNA from different individuals and from different breeds (a Yunnan black for the reference sequence and Boer samples for the RH panels), as has already been evidenced in the goat .
Mapping of unplaced scaffolds on the goat genome
As an application of the RH maps, we predicted the positions of SNPs with no localization on the assembly CHIR_1.0. Herein we call these unmapped SNPs “uSNPs”. On the assembly CHIR_1.0, 391 uSNPs assigned to unplaced scaffolds longer than 10Kbhad RH vectors. Firstly, the most likely chromosome was determined based on the linkage with mSNPs and then the best possible position was determined in the LKH map. We excluded uSNPs on tiny scaffolds from map construction because of the limited resolution of our RH maps (we noticed that essentially all SNPs assigned to a given scaffold had consistent predicted localization on the same RH map). We identified 32 uSNPs (2.2%) on the X chromosome, more than expected compared to autosome maps. This could imply that for CHIX more genome sequence is missing in the current assembly than for other chromosomes. We set the same principle for mapping unplaced scaffolds as we used to validate the draft genome: unplaced scaffolds could only be inserted between two contiguous placed scaffolds and we aligned the candidate scaffolds and their neighbors based on the cattle genome order.
Among the uSNPs for which we could assign a position on the goat genome, we were able to align 95 scaffolds present in the goat genome sequence but currently unassigned. Of these scaffolds 52 only could be assigned a position but not orientated, either because they carried a single uSNP or because they contained multiple tightly linked SNPs whose map order was questionable due to insufficient radiation-induced breakpoints. The orientations of the 53 scaffolds were obtained using comparative information with the cattle genome (UMD3.1) assuming conserved synteny. In total, 19.45 Mb of previously unplaced genome sequences were incorporated into the new version of the genome assembly. The comparison between the RH map with super-scaffolds and the CHIR_1.1 assembly are presented in Additional file 5. Of 1,910 scaffolds that can be aligned to the 30 robust RH maps, 1,884 were used to reassembling chromosome sequences. Only a few rearrangements exist between RH map and the new version of genome assembly. The updated assembly is available as an agp file (the CHIR_1.1 assembly, at http://goat.kiz.ac.cn)
Genome rearrangements between caprine and bovine genome
The major rearrangement sites between goat and cattle
The ancestral Bovidae diploid number of 30 pairs of chromosomes has been retained in all but one (gaur) of the previous studied species of Bovina. Robertsonian fusions or centric translocations (ROBs) involving the fusions of two nonhomologous acrocentric or telocentric chromosomes at their centromeres (producing a single biarmed product with distinguishable internal centromere) is frequent in ruminant chromosome evolution and particularly in Bovidae[25–27]. Several hypotheses have been advanced to explain the formation of ROBs in mammals usually invoking the organization of pericentromeric satellite DNA and nonallelic homologous recombination at meiosis . Breakpoints are considered to occur in regions of SDs  and high concentrations of repetitive elements . The sequence architectures of genomic breakpoints what we identified for ROBs are found to be complex with enriched genes and other active elements, which refine the evidences of these previous studies. The fact that SDs are enriched in breakpoints compared to synteny blocks and that breakpoint specific SD have a different evolutionary history in goat and cattle suggests that in these cases duplications promoted nonallelic homologous recombination (NAHR), and thus a chromosome rearrangement.
ANKRD26 gene cluster associated with genome rearrangements
Large-scale genome rearrangements caused by chromosome breaks underlie numerous inherited diseases and are associated with species evolution. Evolutionary breakpoints are nonrandomly distributed (reused) throughout mammalian genomes  and many breakpoints are coincident with ancient centromere activity as well as new centromere formation [32–36]. It has been hypothesized that centromeric illegitimate recombination between nonhomologous chromosomes led to chromosome fusions and synteny breakpoints during rapid speciation. This hypothesis initiated from the analysis of karyotypic evolution in mammals  and was supported by similar studies in marsupials  and in plants .
Evolutionary breakpoints are supposed to contribute importantly to new genetic variation and novel genes at these fragile sites . However, the dynamics between evolutionary breakpoints and genomic elements locally have not been completely understood yet . A current hypothesis is that genetic innovation and change at these sites may, as a driving force of local adaptation , have influences in subsequent chromosome stabilization. Indeed, most of the proteins belonging to this gene cluster conserved the Ankyrin domains near to N-terminal and the structural maintenance of chromosomes protein (SMC) domains near the C-terminal (Additional file 9). SMC proteins interact with DNA in chromosome condensation, sister-chromatic cohesion, recombination, DNA double-strand break repair and epigenetic silencing of gene expression. It may indicate that ANKRD26 is responsive to DNA damage , pericentromeric cohesin and condensing , and even centromere formation. The members of ANKRD26 usually linked with LINEs near the C-terminal  may indicate their duplication by retrotransposition.
Although little is known about the regulation of ANKRD26, previous studies demonstrated that ANKRD26–involved in chromosome rearrangements are responsible for several inherited disease: a gene conversion of ANKRD26 is associated with type-2 diabetes , which were supported by the studies of ANKRD26-inactivation  and familial thrombocytopenia, associated with a chromosome breakage  on human chromosome 10p  appears to be due to an ANKRD26 mutation [52, 53].
In the present study, we constructed dense goat RH maps of SNP markers from closely related species via a novel method of genotype calling and recent methodologies. Overall, there is good agreement between the current genome assembly and the robust RH maps. We were able to propose correction to the goat genome assembly and add previously unplaced scaffolds. We combined this information in the form of a new goat genome sequence CHI_1.1.We think this updated sequence, with a more accurate ordering of scaffolds will can be useful for many genetic studies on goat, for instance, linkage disequilibrium analysis and QTL mapping. We used our new dense chromosomal maps to annotate genomic rearrangements and chromosome evolution of ruminants. We compared sequence architecture of the evolutionary breakpoints including representative repeat elements and genes in ruminant rearrangements. Breakpoint intervals from two neighboring synteny blocks in goat gnome reveal a mosaic of new insertions at rearrangement sites. The Breakpoints in the goat genome presents more complex architectures than breakpoints of the cattle genome suggesting that goat centromeres of CHI9 and CHI14 underwent much shorter evolutionary time than those two cattle centromeres. Our findings highlight the complex interplay between duplicated sequences and chromosomal rearrangements, which favored a rearrangement mechanism of centromeric non-allelic homologous recombination in mammals. We detected a gene cluster, ANKRD26, which was expanded specifically in species as well as lineage specific evolutionary breakpoints and in centromeres. Their genomic distribution associated with the biological function of their conserved protein domains suggested it might play a role in chromosome stabilization during chromosome evolution.
RH panel and marker genotyping
We generated a 5000 rads goat-hamster RH panel . In total, 108,242 Bovidae-derived SNP markers from BovineSNP50K BeadChip (54,001) and OvineSNP50K BeadChip (54,241) were genotyped against 94 RH clones and 2 controls (one positive Boer goat sample and one negative hamster A23 sample) using the IlluminaBeadStation 500G genotyping system.
We performed the genotype calling procedure described below independently for each SNP array. We called genotypes of clones at SNPs based on the raw signal intensities obtained from the Illumina genotyping system. Specifically, the statistic used for genotype calling of a given SNP in a given clone was the maximum observed intensity over the two possible alleles (herein called Imax). The distribution of Imax in a given clone can be seen as a mixture of two underlying distributions, where one for the non-retained SNPs in the clone and one for the retained SNPs (e.g. Figure 1B). The difficulty of the genotype calling procedure is to evaluate from which of these two distributions a given data point (Imax) comes from. More precisely, we aimed at evaluating the p-value of the observed Imax under the hypothesis H0 that the SNP is not retained in the clone (i.e. P (T > =Imax |H0)).To calculate this p-value an estimate of the distribution of Imax for a non-retained SNP is needed. For this, we exploited the fact that the SNP array was designed for a different species than the goat (either cattle or sheep). Looking at the intensity distribution in the positive whole genome goat sample, we observed that a number (Nneg) of SNPs showed very low intensities for both alleles in the goat (Figure 1A). We speculate that the sequences corresponding to these negative SNPs are not conserved enough in the goat genome to provide a signal with the SNP array. Finally, the p-value that a (non-negative) SNP is not retained was calculated as the proportion of Imax in the negative SNPs that exceeds the observed Imax. If none was found, we set the p-value at 1/Nneg.
After calculating p-values for all observations, we estimated their corresponding q-values using the Q-value R package . This approach allows controlling the false positive and false negative rates and provides an estimate of the proportion of true null hypotheses in the data, which is in our case an estimate of one minus the panel retention fraction. We called "retained" data points at an FDR threshold of 1% and removed SNPs which retention fraction was either greater than 50% or lower than 10%. Furthermore, one RH clone (#73) showed a very high rate of retention (>90%) and was therefore removed from the analysis. Among the remaining data points, we called “missing” data points with q-values lower than 10% for the sheep array, and 5% for the bovine array. Finally, data points with q-values greater than 10% were called “un-retained”. The overall approach with these thresholds allowed us to control both the missing data rate and the false negative rate (see Results).
Our pipeline of RH mapping is very similar to that recently used for recently building porcine RH maps . The Carthagene software [12, 14, 54] was used to compute efficiently marker ordering for candidate maps with reference maps. This approach is based on a probabilistic Bayesian model integrating the usual RH probabilistic model with a probabilistic model of breakpoint occurrences with the reference order. In this probabilistic model, breakpoints induced by chromosomal rearrangements are considered as rare events, following a Poisson law. Meaning, the model considers that genome assembly errors create rare spurious breakpoints between the RH map order and the current goat genome assembly order .
The map construction process was conducted in parallel to the goat genome assembly. As our first goal to build RH maps was to help ordering scaffolds of the goat genome, only SNPs that could be mapped unambiguously to a working draft of the goat genome were used to build RH maps. These summed up to 54,318 SNPs, with 35,203 SNPs from the Ovine array and 19,115 SNPs from the bovine array. We portioned 53,075 mapped SNPs (mSNPs) according to their assigned chromosome and 1243 unassigned SNPs (uSNPs) remained on assembly CHIR_1.0. For each autosome, we determined linkage groups among mSNPs, based on RH data alone. Of the 53,075 SNPs that generated RH vectors, 53,062were assigned to 30 single linkage groups based on two-point analysis at a LOD threshold of 10. In these linkage groups 13 markers did not show enough evidence for linkage.
Each RH linkage group was assigned to a chromosome based on homologous comparison between RH markers and the goat genome. (1) We manually increased the LOD threshold in a two-point analysis to remove minor marker groups (<5 markers) to remove possible error vector. (2) A primary map was first generated with Carthagene by converting the RH data into a TSP and solved using a LKH heuristic method (lkh1 1). The initial RH maps were compared to the assembly to observe the general discrepancies, and were used to inspect inter-markers intervals with extremely long distances (>25 cR). In this step, we manually removed possible questionable markers, most of which were located on map ends. (3) The command of find_errors in RHMAPPER was used to flags marker/hybrid assay results that are likely to be the result of laboratory error, where the order for these results set to the primary map. A threshold of 3 was specified in the command i.e. reporting that this result is 1000:1 more likely to represent a laboratory error than a true result. Once again, we converted the RH data into a TSP and solved using the LKH heuristic method, and constructed the LKH map.
A novel approach  was applied next to build maps with a highly reliable ordering, called robust maps. Briefly, the principle of the method is to estimate a posterior distribution of marker ordering using an MCMC approach and then to extract from this distribution a subset of markers that show the same ordering across maps of high probability. We performed 5000 MCMC iterations and discarded the first 1000 as burning iterations in the command of Carthagene software, mcmc, using the LKH maps obtained above as starting orderings. We extracted robust maps from the posterior distribution using the metamap software . An inclusion tree called metamap that summarizes the uncertainty lying in the map distribution was calculated. In the metamap tree, groups in which the best order have posterior probability higher than 95% in the robust map were include the final robust map. The distances of the robust maps were evaluated using the diploid equal retention model with an EM convergence set to 10−9 (command cgtolerance). This approach is closely related to the idea of constructing framework maps. In the case of framework maps, an order is accepted based on a maximum likelihood ratio, also called LOD (the logarithm of odds between the best order and the second best order must exceed a preset ratio, such as 1,000:1 for example (LOD 3)). In contrast, the construction of robust maps falls into the Bayesian paradigm .
For mapping the unplaced scaffolds on the goat genome, we constructed primary LKH maps with mSNP and uSNP together. We removed 852 uSNPs, which assigned to scaffolds shorter than 10 Kb in RH mapping from 1,243 unassigned SNPs (uSNPs) on assembly CHIR_1.0. Then 391 uSNPs were assigned to chromosomes based on their linkage with each of 30 linkage groups. The LKH maps were constructed as above. We aligned both of the robust maps of only mSNP and the LKH map of all SNPs to the goat scaffold data, then computed the orientation of scaffolds positions on the chromosomes and the orientation of the scaffold. A comparison of the two results showed that the assignments of placed scaffolds are very similar, so we proposed that the assignments of placed scaffolds using the LKH maps are accurate.
Genome map comparisons and gene comparisons
Map discrepancies were determined by comparing the RH map with target genome data including goat scaffold sequence, goat assemblies (CHIR_1.0 and CHIR_1.1) and cattle genomes (BTA4.2 and UMD3.1). The probe sequences of markers in RH maps aligned with these target genome maps using BLASTN (version 2.2.21) with E-values < 1 × 10−10. We also conducted synteny-based comparisons between the goat genome sequence and other mammalian sequences. The comparisons were performed using NUCmer  with parameter “-c 500” (with cattle) or “-c 150” (with other mammals). The assignment of assembled sequences to chromosomes was done based on a best alignment and majority rule. The Interquartile Range method was used to filter out suspicious markers and compute the scaffold positions on chromosomes. The Least Square method was used to compute the orientation of the scaffold that had multiple markers aligned to them. Orientation was considered forward if the slope value was positive, reverse if the slope value was negative and unknown if the slope was close to be 0.
For the reconstruction of the evolutionary breakpoints, the Evolution Highway program [33, 39, 40] was used to compare more mammalian genomes including human, mouse, dog, horse, cattle, rat and goat RH maps. The data of orthologous gene pairs between cattle genome and other mammalian genome for generating homologous synteny block was downloaded from the Biomart database. We used lastz program and lastz-based Cassis program  to precisely detect sequence homology and genomic rearrangement breakpoints. RepeatMasker was run against both libraries separately to identify homologous repeats, which were classified into known classes of repeats . SDs was identified using the identical whole-genome sequence comparison method . The characterized elements of sequence were graphed by circus .
For phylogenetic analysis of gene family, the sequence of goat ANKRD26 was blasted (BlastP) against the NCBI-nr database. In the results, genes with less than 40% homology or 40% coverage were filtered out. The annotated genes of ANKRD26 in public protein databases also were retrieved. Multiple sequence alignment was performed and the Neighbor-joining tree were constructed based on Dayhoff PAM matrix method with a bootstrap value of 1000 .
Homologous synteny block
- LOD scores:
Log odds score method
Markov Chain Monte Carlo
Nonallelic homologous recombination
Single nucleotide polymorphism
Traveling Salesman problem.
We thank DNA LandMarks, Inc. for its contributions of experimental supplement for genotyping the RH panel and its technical assistance form Dr. Tunping Yu. We thank Ruheena Javed for reviewing the manuscript. This work has been supported by National High-Tech Research and Development Program of China (No. 2013AA102506), the International collaborating grants from National ministry of science and technology projects of China (NO. 1111), the 973 program (2013CB835200) of China, the IAEA’s Coordinated Research Project (Contract No. 16087/R0), the FP7-3SR (FP7-KBBE-2009-3: No. 245140), and the Fundamental Research Funds for the Central Universities.
- Hedges SB, Dudley J, Kumar S: TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006, 22 (23): 2971-2972.PubMedView ArticleGoogle Scholar
- Vaiman D, Schibler L, Bourgeois F, Oustry A, Amigues Y, Cribiu EP: A genetic linkage map of the male goat genome. Genetics. 1996, 144 (1): 279-305.PubMed CentralPubMedGoogle Scholar
- de Leon Ponce FA, Ambady S, Hawkins GA, Kappes SM, Bishop MD, Robl JM, Beattie CW: Development of a bovine X chromosome linkage group and painting probes to assess cattle, sheep, and goat X chromosome segment homologies. Proc Natl Acad Sci U S A. 1996, 93 (8): 3450-3454.View ArticleGoogle Scholar
- Piumi F, Schibler L, Vaiman D, Oustry A, Cribiu EP: Comparative cytogenetic mapping reveals chromosome rearrangements between the X chromosomes of two closely related mammalian species (cattle and goats). Cytogenet Cell Genet. 1998, 81 (1): 36-41.PubMedView ArticleGoogle Scholar
- Schibler L, Vaiman D, Oustry A, Giraud-Delville C, Cribiu EP: Comparative gene mapping: a fine-scale survey of chromosome rearrangements between ruminants and humans. Genome Res. 1998, 8: 901-915.PubMedGoogle Scholar
- Schibler L, Di Meo GP, Cribiu EP, Iannuzzi L: Molecular cytogenetics and comparative mapping in goats (Capra hircus, 2n = 60). Cytogenet Genome Res. 2009, 126 (1–2): 77-85.PubMedView ArticleGoogle Scholar
- Dong Y, Xie M, Jiang Y, Xiao N, Du X, Zhang W, Tosser-Klopp G, Wang J, Yang S, Liang J, Chen W, Chen J, Zeng P, Hou Y, Bian C, Pan S, Li Y, Liu X, Wang W, Servin B, Sayre B, Zhu B, Sweeney D, Moore R, Nie W, Shen Y, Zhao R, Zhang G, Li J, Faraut T, et al: Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat Biotechnol. 2013, 31 (2): 135-141.PubMedView ArticleGoogle Scholar
- Lewin HA, Larkin DM, Pontius J, O'Brien SJ: Every genome sequence needs a good map. Genome Res. 2009, 19 (11): 1925-1928.PubMed CentralPubMedView ArticleGoogle Scholar
- Jiang Y, Xie M, Chen W, Talbot R, Maddox JF, Faraut T, Wu C, Muzny DM, Li Y, Zhang W, Stanton JA, Brauning R, Barris WC, Hourlier T, Aken BL, Searle SM, Adelson DL, Bian C, Cam GR, Chen Y, Cheng S, DeSilva U, Dixen K, Dong Y, Fan G, Franklin IR, Fu S, Fuentes-Utrilla P, Guan R, Highland MA, et al: The sheep genome illuminates biology of the rumen and lipid metabolism. Science. 2014, 344 (6188): 1168-1173.PubMed CentralPubMedView ArticleGoogle Scholar
- Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LA, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, et al: Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 2012, 491 (7424): 393-398.PubMed CentralPubMedView ArticleGoogle Scholar
- Hitte C, Madeoy J, Kirkness EF, Priat C, Lorentzen TD, Senger F, Thomas D, Derrien T, Ramirez C, Scott C, Evanno G, Pullar B, Cadieu E, Oza V, Lourgant K, Jaffe DB, Tacher S, Dréano S, Berkova N, André C, Deloukas P, Fraser C, Lindblad-Toh K, Ostrander EA, Galibert F: Facilitating genome navigation: survey sequencing and dense radiation-hybrid gene mapping. Nat Rev Genet. 2005, 6: 643-648.PubMedView ArticleGoogle Scholar
- Faraut T, de Givry S, Chabrier P, Derrien T, Galibert F, Hitte C, Schiex T: A comparative genome approach to marker ordering. Bioinformatics. 2007, 23 (2): e50-e56.PubMedView ArticleGoogle Scholar
- Servin B, Faraut T, Iannuccelli N, Zelenika D, Milan D: High-resolution autosomal radiation hybrid maps of the pig genome and their contribution to the genome sequence assembly. BMC Genomics. 2012, 13: 585-PubMed CentralPubMedView ArticleGoogle Scholar
- Servin B, de Givry S, Faraut T: Statistical confidence measures for genome maps: application to the validation of genome assemblies. Bioinformatics. 2010, 26 (24): 3035-3042.PubMedView ArticleGoogle Scholar
- Du X, Womack JE, Owens KE, Elliott JS, Sayre B, Bottcher PJ, Milan D, Podesta MG, Zhao S, Malek M: A whole-genome radiation hybrid panel for goat. Small ruminat research. 2012, 105: 114-116.View ArticleGoogle Scholar
- Amaral ME, Grant JR, Riggs PK, Stafuzza NB, Filho EA, Goldammer T, Weikard R, Brunner RM, Kochan KJ, Greco AJ, Jeong J, Cai Z, Lin G, Prasad A, Kumar S, Saradhi GP, Mathew B, Kumar MA, Miziara MN, Mariani P, Caetano AR, Galvão SR, Tantia MS, Vijh RK, Mishra B, Kumar ST, Pelai VA, Santana AM, Fornitano LC, Jones BC, et al: A first generation whole genome RH map of the river buffalo with comparison to domestic cattle. BMC Genomics. 2008, 9: 631-PubMed CentralPubMedView ArticleGoogle Scholar
- Lunetta KL, Boehnke M, Lange K, Cox DR: Experimental design and error detection for polyploid radiation hybrid mapping. Genome Res. 1995, 5 (2): 151-163.PubMedView ArticleGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100 (16): 9440-9445.PubMed CentralPubMedView ArticleGoogle Scholar
- Stewart EA, McKusick KB, Aggarwal A, Bajorek E, Brady S, Chu A, Fang N, Hadley D, Harris M, Hussain S, Lee R, Maratukulam A, O'Connor K, Perkins S, Piercy M, Qin F, Reif T, Sanders C, She X, Sun WL, Tabar P, Voyticky S, Cowles S, Fan JB, Mader C, Quackenbush J, Myers RM, Cox DR: An STS-based radiation hybrid map of the human genome. Genome Res. 1997, 7 (5): 422-433.PubMedGoogle Scholar
- Fontanesi L, Martelli PL, Beretti F, Riggio V, Dall'Olio S, Colombo M, Casadio R, Russo V, Portolano B: An initial comparative map of copy number variations in the goat (Capra hircus) genome. BMC Genomics. 2010, 11: 639-PubMed CentralPubMedView ArticleGoogle Scholar
- Cribiu EP, Di Berardino D, Di Meo GP, Eggen A, Gallagher DS, Gustavsson I, Hayes H, Iannuzzi L, Popescu CP, Rubes J, Schmutz S, Stranzinger G, Vaiman A, Womack J: International System for Chromosome Nomenclature of Domestic Bovids (ISCNDB 2000). Cytogenet Cell Genet. 2001, 92 (3–4): 283-299.PubMedView ArticleGoogle Scholar
- Maddox JF: A presentation of the differences between the sheep and goat genetic maps. Genet Sel Evol. 2005, 37 (Suppl 1): S1-S10.PubMed CentralPubMedView ArticleGoogle Scholar
- Buckland RA, Evans HJ: Cytogenetic aspects of phylogeny in the Bovidae. I. G-banding. Cytogenet Cell Genet. 1978, 21 (1–2): 42-63.PubMedView ArticleGoogle Scholar
- Samonte RV, Eichler EE: Segmental duplications and the evolution of the primate genome. Nat Rev Genet. 2002, 3 (1): 65-72.PubMedView ArticleGoogle Scholar
- Gallagher DS, Womack JE: Chromosome conservation in the Bovidae. J Hered. 1992, 83 (4): 287-298.PubMedGoogle Scholar
- Iannuzzi L, King WA, Di Berardino D: Chromosome evolution in domestic bovids as revealed by chromosome banding and FISH-mapping techniques. Cytogenet Genome Res. 2009, 126 (1–2): 49-62.PubMedView ArticleGoogle Scholar
- Robinson TJ, Ropiquet A: Examination of hemiplasy, homoplasy and phylogenetic discordance in chromosomal evolution of the Bovidae. Syst Biol. 2011, 60 (4): 439-450.PubMedView ArticleGoogle Scholar
- Hernandez Fernandez M, Vrba ES: A complete estimate of the phylogenetic relationships in Ruminantia: a dated species-level supertree of the extant ruminants. Biol Rev Camb Philos Soc. 2005, 80 (2): 269-302.PubMedView ArticleGoogle Scholar
- Kehrer-Sawatzki H, Cooper DN: Molecular mechanisms of chromosomal rearrangement during primate evolution. Chromosome Res. 2008, 16 (1): 41-56.PubMedView ArticleGoogle Scholar
- Ruiz-Herrera A, Castresana J, Robinson TJ: Is mammalian chromosomal evolution driven by regions of genome fragility?. Genome Biol. 2006, 7 (12): R115-PubMed CentralPubMedView ArticleGoogle Scholar
- Pevzner P, Tesler G: Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc Natl Acad Sci U S A. 2003, 100 (13): 7672-7677.PubMed CentralPubMedView ArticleGoogle Scholar
- Bulazel KV, Ferreri GC, Eldridge MD, O'Neill RJ: Species-specific shifts in centromere sequence composition are coincident with breakpoint reuse in karyotypically divergent lineages. Genome Biol. 2007, 8 (8): R170-PubMed CentralPubMedView ArticleGoogle Scholar
- Murphy WJ, Larkin DM, Everts-van der Wind A, Bourque G, Tesler G, Auvil L, Beever JE, Chowdhary BP, Galibert F, Gatzke L, Hitte C, Meyers SN, Milan D, Ostrander EA, Pape G, Parker HG, Raudsepp T, Rogatcheva MB, Schook LB, Skow LC, Welge M, Womack JE, O'brien SJ, Pevzner PA, Lewin HA: Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science. 2005, 309 (5734): 613-617.PubMedView ArticleGoogle Scholar
- Han Y, Zhang Z, Liu C, Liu J, Huang S, Jiang J, Jin W: Centromere repositioning in cucurbit species: implication of the genomic impact from centromere activation and inactivation. Proc Natl Acad Sci U S A. 2009, 106 (35): 14937-14941.PubMed CentralPubMedView ArticleGoogle Scholar
- Ventura M, Antonacci F, Cardone MF, Stanyon R, D'Addabbo P, Cellamare A, Sprague LJ, Eichler EE, Archidiacono N, Rocchi M: Evolutionary formation of new centromeres in macaque. Science. 2007, 316 (5822): 243-246.PubMedView ArticleGoogle Scholar
- Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, Lear TL, Adelson DL, Bailey E, Bellone RR, Blöcker H, Distl O, Edgar RC, Garber M, Leeb T, Mauceli E, MacLeod JN, Penedo MC, Raison JM, Sharpe T, Vogel J, Andersson L, Antczak DF, Biagi T, Binns MM, Chowdhary BP, Coleman SJ, Della Valle G, Fryc S, Guérin G: Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 2009, 326 (5954): 865-867.PubMed CentralPubMedView ArticleGoogle Scholar
- Ferreri GC, Brown JD, Obergfell C, Jue N, Finn CE, O'Neill MJ, O'Neill RJ: Recent amplification of the kangaroo endogenous retrovirus, KERV, limited to the centromere. J Virol. 2011, 85 (10): 4761-4771.PubMed CentralPubMedView ArticleGoogle Scholar
- Murat F, Xu JH, Tannier E, Abrouk M, Guilhot N, Pont C, Messing J, Salse J: Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution. Genome Res. 2010, 20 (11): 1545-1557.PubMed CentralPubMedView ArticleGoogle Scholar
- Larkin DM, Pape G, Donthu R, Auvil L, Welge M, Lewin HA: Breakpoint regions and homologous synteny blocks in chromosomes have different evolutionary histories. Genome Res. 2009, 19 (5): 770-777.PubMed CentralPubMedView ArticleGoogle Scholar
- Donthu R, Lewin HA, Larkin DM: SyntenyTracker: a tool for defining homologous synteny blocks using radiation hybrid maps and whole-genome sequence. BMC research notes. 2009, 2: 148-PubMed CentralPubMedView ArticleGoogle Scholar
- Hahn Y, Bera TK, Pastan IH, Lee B: Duplication and extensive remodeling shaped POTE family genes encoding proteins containing ankyrin repeat and coiled coil domains. Gene. 2006, 366 (2): 238-245.PubMedView ArticleGoogle Scholar
- Bera TK, Zimonjic DB, Popescu NC, Sathyanarayana BK, Kumar V, Lee B, Pastan I: POTE, a highly homologous gene family located on numerous chromosomes and expressed in prostate, ovary, testis, placenta, and prostate cancer. Proc Natl Acad Sci U S A. 2002, 99 (26): 16975-16980.PubMed CentralPubMedView ArticleGoogle Scholar
- Fungtammasan A, Walsh E, Chiaromonte F, Eckert KA, Makova KD: A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome?. Genome Res. 2012, 22 (6): 993-1005.PubMed CentralPubMedView ArticleGoogle Scholar
- Yeaman S: Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proc Natl Acad Sci U S A. 2013, 110 (19): E1743-E1751.PubMed CentralPubMedView ArticleGoogle Scholar
- Matsuoka S, Ballif BA, Smogorzewska A, McDonald ER, Hurov KE, Luo J, Bakalarski CE, Zhao Z, Solimini N, Lerenthal Y, Shiloh Y, Gygi SP, Elledge SJ: ATM and ATR substrate analysis reveals extensive protein networks responsive to DNA damage. Science. 2007, 316 (5828): 1160-1166.PubMedView ArticleGoogle Scholar
- Stephens AD, Haase J, Vicci L, Taylor RM, Bloom K: Cohesin, condensin, and the intramolecular centromere loop together generate the mitotic chromatin spring. J Cell Biol. 2011, 193 (7): 1167-1180.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang Y, Leung FC: Discovery of a long inverted repeat in human POTE genes. Genomics. 2009, 94 (4): 278-283.PubMedView ArticleGoogle Scholar
- Ross KA: Evidence for somatic gene conversion and deletion in bipolar disorder, Crohn's disease, coronary artery disease, hypertension, rheumatoid arthritis, type-1 diabetes, and type-2 diabetes. BMC Med. 2011, 9: 12-PubMed CentralPubMedView ArticleGoogle Scholar
- Bera TK, Liu XF, Yamada M, Gavrilova O, Mezey E, Tessarollo L, Anver M, Hahn Y, Lee B, Pastan I: A model for obesity and gigantism due to disruption of the Ankrd26 gene. Proc Natl Acad Sci U S A. 2008, 105 (1): 270-275.PubMed CentralPubMedView ArticleGoogle Scholar
- Helmerhorst FM, Heaton DC, Crossen PE, von dem Borne AE, Engelfriet CP, Natarajan AT: Familial thrombocytopenia associated with platelet autoantibodies and chromosome breakage. Hum Genet. 1984, 65 (3): 252-256.PubMedView ArticleGoogle Scholar
- Savoia A, Del Vecchio M, Totaro A, Perrotta S, Amendola G, Moretti A, Zelante L, Iolascon A: An autosomal dominant thrombocytopenia gene maps to chromosomal region 10p. Am J Hum Genet. 1999, 65 (5): 1401-1405.PubMed CentralPubMedView ArticleGoogle Scholar
- Pippucci T, Savoia A, Perrotta S, Pujol-Moix N, Noris P, Castegnaro G, Pecci A, Gnan C, Punzo F, Marconi C, Gherardi S, Loffredo G, De Rocco D, Scianguetta S, Barozzi S, Magini P, Bozzi V, Dezzani L, Di Stazio M, Ferraro M, Perini G, Seri M, Balduini CL: Mutations in the 5' UTR of ANKRD26, the ankirin repeat domain 26 gene, cause an autosomal-dominant form of inherited thrombocytopenia, THC2. Am J Hum Genet. 2011, 88 (1): 115-120.PubMed CentralPubMedView ArticleGoogle Scholar
- Noris P, Perrotta S, Seri M, Pecci A, Gnan C, Loffredo G, Pujol-Moix N, Zecca M, Scognamiglio F, De Rocco D, Punzo F, Melazzini F, Scianguetta S, Casale M, Marconi C, Pippucci T, Amendola G, Notarangelo LD, Klersy C, Civaschi E, Balduini CL, Savoia A: Mutations in ANKRD26 are responsible for a frequent form of inherited thrombocytopenia: analysis of 78 patients from 21 families. Blood. 2011, 117 (24): 6673-6680.PubMedView ArticleGoogle Scholar
- de Givry S, Bouchez M, Chabrier P, Milan D, Schiex T: CARHTA GENE: multipopulation integrated genetic and radiation hybrid mapping. Bioinformatics. 2005, 21 (8): 1703-1704.PubMedView ArticleGoogle Scholar
- Delcher AL, Phillippy A, Carlton J, Salzberg SL: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002, 30 (11): 2478-2483.PubMed CentralPubMedView ArticleGoogle Scholar
- Baudet C, Lemaitre C, Dias Z, Gautier C, Tannier E, Sagot MF: Cassis: detection of genomic rearrangement breakpoints. Bioinformatics. 2010, 26 (15): 1897-1898.PubMed CentralPubMedView ArticleGoogle Scholar
- Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH: A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007, 8 (12): 973-982.PubMedView ArticleGoogle Scholar
- Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE: Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001, 11 (6): 1005-1017.PubMed CentralPubMedView ArticleGoogle Scholar
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res. 2009, 19 (9): 1639-1645.PubMed CentralPubMedView ArticleGoogle Scholar
- Lin CY, Lin FK, Lin CH, Lai LW, Hsu HJ, Chen SH, Hsiung CA: POWER: PhylOgenetic WEb Repeater--an integrated and user-optimized framework for biomolecular phylogenetic analysis. Nucleic Acids Res. 2005, 33 (Web Server issue): W553-W556.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.