Volume 15 Supplement 9
Whole genome sequence and analysis of the Marwari horse breed and its genetic origin
- JeHoon Jun†1,
- Yun Sung Cho†1, 2,
- Haejin Hu1,
- Hak-Min Kim1,
- Sungwoong Jho1,
- Priyvrat Gadhvi1,
- Kyung Mi Park3,
- Jeongheui Lim4,
- Woon Kee Paek4,
- Kyudong Han5, 6,
- Andrea Manica7,
- Jeremy S Edwards8 and
- Jong Bhak1, 2, 3Email author
© Jun et al.; licensee BioMed Central Ltd. 2014
Published: 8 December 2014
The horse (Equus ferus caballus) is one of the earliest domesticated species and has played an important role in the development of human societies over the past 5,000 years. In this study, we characterized the genome of the Marwari horse, a rare breed with unique phenotypic characteristics, including inwardly turned ear tips. It is thought to have originated from the crossbreeding of local Indian ponies with Arabian horses beginning in the 12th century.
We generated 101 Gb (~30 × coverage) of whole genome sequences from a Marwari horse using the Illumina HiSeq2000 sequencer. The sequences were mapped to the horse reference genome at a mapping rate of ~98% and with ~95% of the genome having at least 10 × coverage. A total of 5.9 million single nucleotide variations, 0.6 million small insertions or deletions, and 2,569 copy number variation blocks were identified. We confirmed a strong Arabian and Mongolian component in the Marwari genome. Novel variants from the Marwari sequences were annotated, and were found to be enriched in olfactory functions. Additionally, we suggest a potential functional genetic variant in the TSHZ1 gene (p.Ala344>Val) associated with the inward-turning ear tip shape of the Marwari horses.
Here, we present an analysis of the Marwari horse genome. This is the first genomic data for an Asian breed, and is an invaluable resource for future studies of genetic variation associated with phenotypes and diseases in horses.
KeywordsMarwari Horse Equus ferus caballus Whole-genome sequencing Genome
The horse (Equus ferus caballus) was one of the earliest domesticated species and has played numerous important roles in human societies: acting as a source of food, a means of transport, for draught and agricultural work, and for sport, hunting, and warfare . Horse domestication is believed to have started in the western Asian steppes approximately 5,500 years ago, and quickly spread across the Eurasian continent, with herds being augmented by the recruitment of local wild horses . Domestication in the Iberian Peninsula might have represented an additional independent episode, involving horses that survived in a steppe refuge following the reforestation of Central Europe during the Holocene .
The horse reference genome has provided fundamental genomic information on the equine lineage and has been used for improving the health and performance of horses [1, 4]. Horses exhibit 214 genetic traits and/or diseases that are similar to those of humans . To date, several horse whole genomes have been sequenced and analyzed [4, 6]. In 2012, the first whole genome re-sequencing analysis was conducted on the Quarter Horse breed to identify novel genetic variants . In 2013, divergence times among horse fossils, donkey, Przewalski's horse, and several domestic horses were estimated, together with their demographic history . However, currently available whole genome sequences of modern horses only comprise western Eurasian breeds.
Over the centuries, more than 400 distinct horse breeds have been established by genetic selection for a wide number of desired phenotypic traits . The Marwari (also known as Malani) horse is a rare breed from the Marwar region of India, and is one of six distinct horse breeds of India. They are believed to be descended from native Indian ponies, which were crossed with Arabian horses beginning around the 12th century, possibly with some Mongolian influence [8–10]. The Marwari horses were trained to perform complex prancing and leaping movements for ceremonial purposes [11, 12]. The Marwari population in India deteriorated in the early 1900s due to improper management of the breeding stock, and only a few thousand purebred Marwari horses remain .
Here, we report the first whole genome sequence of a male Marwari horse as one of the Asian breeds and characterize its genetic variations, including single nucleotide variations (SNVs), small insertions/deletions (indels), and copy number variations (CNVs). To investigate relationships among different horse breeds, we carried out a genome-wide comparative analysis using previously reported whole genome sequences of six western Eurasian breeds [4, 6], and single nucleotide polymorphism (SNP) array data of 729 horses from 32 worldwide breeds . Our results provide insights into its genetic background and origin, and identify genotypes associated with the Marwari-specific phenotypes.
Results and discussion
Whole genome sequencing and variation detection
Genomic DNA was obtained from a blood sample of a male Marwari horse (17 years old) and was sequenced using an Illumina HiSeq2000 sequencer. A total of 112 Gb of paired-end sequence data were produced with a read length of 100 bp and insert sizes of 456 and 462 bp from two genomic libraries (Additional file 2: Figure S1, Figure S2). A total of 1,013,642,417 reads remained after filtering, and 993,802,097 reads were mapped to the horse reference genome (EquCab2.0 from the Ensembl database) with a mapping rate of 98.04%. (Additional file 2: Figure S3, Figure S4). A total of 133,091,136 reads were identified as duplicates and were removed from further analyses (Additional file 1: Table S1). To enhance the mapping quality, we applied the IndelRealigner algorithm to the de-duplicated reads. A total of 44,835,563 (5.2%) reads were realigned, and the average mapping quality increased from 53.11 to 53.16 (from 29.33 to 43.32 in the realigned reads). The whole genome sequences covered 95.6% of the reference genome at 10 × or greater depth.
To identify novel genomic sequences, we performed a de novo assembly using the unmapped reads (1.8 Gb) to the horse reference genome. A total of 120,159 contigs (24,781,670 bases in length and 227 bp of contig N50 size) were assembled. After mapping the contigs to the reference genome, we found that 25,614 contigs (4,855,119 bases in length and 196 bp of contig N50 size) did not match the reference sequences; indicating that they may be novel regions specific to the Marwari horse breed (Additional file 1: Table S2). To identify the biological functions of these novel regions, the un-matched contigs were further analyzed by BLAST searches using the NCBI protein database. However, none of the contigs significantly matched the known protein database (Additional file 2: Figure S5).
Variants in the Marwari horse genome.
Noncoding exon variant
Indels in coding region
The Marwari genome consisted of 2,383,702 (40.2%) homozygous and 3,539,864 (59.8%) heterozygous SNVs (Table 1). Among them, 18,195 were found to be nonsynonymous SNVs (nsSNVs). When the Marwari variants were compared to those previously reported from the genomes of other breeds [4, 6] and the horse SNP database from the Broad Institute, 1,577,725 SNVs and 249,609 indels were novel variants. Of these, 4,716 variants (4,413 nsSNVs and 303 indels in coding regions) represented amino acid changes which were found in 2,770 genes (2,584 genes with nsSNVs, 279 genes with indels in coding regions, and 93 genes with nsSNVs and indels in coding regions simultaneously). To annotate the variants using well-known functional databases, human orthologs were retrieved from the Ensembl BioMart utility. A total of 1,970 of the 2,770 genes had human orthologs, and 1,896 genes were annotated using the DAVID Bioinformatics Resource 6.7 . The genes with nsSNVs and/or indels in coding regions were highly enriched in olfactory functions (Additional file 1: Tables S5 and S6).
Copy number variations (CNVs) were identified using the R library "ReadDepth package" . A total of 2,579 CNVs, including 869 gain and 1,710 loss blocks, were identified in the Marwari genome. The sizes ranged from 3 Kb to 6.43 Mb with an average length of 56 Kb. The CNV region (140 Mb in length) contained 2,504 genes which were duplicated (1,138 genes) or deleted (1,366 genes) (Additional file 1: Table S7). From the functional enrichment analysis, we found that the duplicated genes were enriched in olfactory function, whereas the deleted genes were enriched in immune regulation and metabolic processes (Additional file 1:Table S8, Table S9, Table S10, and Table S11).
Relatedness to other horse breeds
We constructed a phylogenetic tree using SNVs found in the whole genome data of the seven horse breeds (Arabian, Icelandic, Marwari, Norwegian Fjord, Quarter, Standardbred, and Thoroughbred) [4, 6]. We identified 11,377,736 nucleotide positions that were commonly found in the seven horse genomes. A total of 25,854 nucleotide positions were used for phylogenetic analysis after filtering for minor allele frequency (MAF), genotyping rate, and linkage disequilibrium (LD). We found that the Marwari horse is most closely related to the Arabian breed (Additional file 2: Figure S6), while the Icelandic horse and Norwegian Fjord were the most distinct from the other breeds, all of which are known to descend from Arabian horses [17, 18].
Phenotype association of the identified variants
Genetic variants for known traits and diseases.
Leopard complex spotting and congenital stationary night blindness
Hereditary equine regional dermal asthenia
Lavender foal syndrome
Del 1 bp
Chestnut coat color
Chestnut coat color
Tobiano spotting pattern
Large body size
Junctional epidermolysis bullosa
Silver coat color
Large body size
Junctional epidermolysis bullosa
Severe combined immunodeficiency
Del 5 bp
Large body size
Polysaccharide storage myopathy
Equine hyperkalemic periodic paralysis
Large body size
Lethal white foal syndrome
Optimum racing performance
Cream coat color
Black and bay color
Del 11 bp
Pattern of locomotion (altered gait)
Gray coat color
Selection in the equid lineage
We assessed the signatures of selection in the equid lineage using the d N /d S (nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site) ratio . A consensus horse (equid) sequence was constructed by integrating all of the available breed genomes (Arabian, Icelandic, Marwari, Norwegian Fjord, Quarter, Standardbred, and Thoroughbred) in an attempt to remove breed specificity and to include an Asian breed component via the central Asian heritage of the Marwari (in contrast to the western Eurasian breeds for which whole genomes had been previously sequenced). A total of 7,711 out of 22,305 genes in the horse reference genome were substituted by the consensus sequences. Using the protein sequences of seven non-horse genomes (camel, pig, cow, minke whale, dog, mouse, and human), 5,459 orthologous gene families were constructed using OrthoMCL . Using alignments of these gene families to estimate d N /d S , we identified 188 genes under selection in the horse genome (Additional file 1: Table S13). The selected genes were particularly enriched in immune response (immune effector process, leukocyte mediated immunity, positive regulation of immune system process, and defense response) and possible motor ability (T-tubule, muscle contraction, and regulation of heart contraction) functions (Additional file 1: Table S14). Over evolutionary time, the horse has developed increased speed and its musculature has become specialized for efficient strides [58, 59]. It is therefore possible that the motor activity-associated genes we identified to be under positive selection have contributed to the muscular efficiency seen in modern horses.
Our study provides the first whole genome sequences and analyses of the Marwari, an Asian horse breed. Comparing the Marwari genome to the horse reference genome, approximately 5.9 million SNVs and 0.6 million indels, including 4,716 variants that cause amino acid changes, were identified. We found a clear Arabian and Mongolian component in the Marwari genome, although further work is needed to confirm whether modern Marwari horses also descended from Indian ponies. We analyzed the Marwari variants and found a candidate SNV determining its characteristic inward-turning ear tips. Additionally, we investigated selection in the horse genome through comparisons with other mammalian genomes. By creating a consensus sequence that included information on an Asian breed, we found a number of genetic signatures of selection, providing insights into possible evolutionary and environmental adaptations in the equid lineage. The whole genome sequencing data from the Marwari horse provides a rich and diverse genomic resource that can be used to improve our understanding of animal domestication and will likely be useful in future studies of phenotypes and disease.
Sample preparation and whole genome sequencing
Genomic DNA was extracted from the blood of a 17 year old male Marwari horse with the XcelGene Blood gDNA Mini Kit (Xcelris Labs Ltd, Gujarat, India) following the manufacturer's protocol. Two genomic libraries with insert sizes of 456 and 462 bp were constructed at Theragen BiO Institute (TBI), TheragenEtex, Korea. The genomic DNA was sheared using Covaris S series (Covaris, MS, USA). The sheared DNA was end-repaired, A-tailed, and ligated to paired-end adapters, according to the manufacturer's protocol (Truseq DNA Sample Prep Kit v2, Illumina, San Diego, CA, USA). Adapter-ligated fragments were then size selected on a 2% Agarose gel, and the 520-620 bp band was extracted. Gel extraction and column purification were performed using the MinElute Gel Extraction Kit (Qiagen, CA, USA) following the manufacturer's protocol. The ligated DNA fragments containing adapter sequences were enhanced via PCR using adapter-specific primers. Library quality and concentration were determined using an Agilent 2100 BioAnalyzer (Agilent, CA, USA). The libraries were quantified using a KAPA library quantification kit (Kapa Biosystems, MA, USA) according to Illumina's library quantification protocol. Based on the qPCR quantification, the libraries were normalized to 2 nM and denatured using 0.1 N NaOH. Cluster amplification of denatured templates was performed in flow cells according to the manufacturer's protocol (Illumina, CA, USA). Flow cells were paired-end sequenced (2 × 100 bp) on an Illumina HiSeq2000 using HiSeq Sequencing kits. A base-calling pipeline (Sequencing Control Software, Illumina) was used to process the raw fluorescent images and the called sequences.
Filtering and mapping processes
Before the mapping step, raw reads were filtered using NGS QC toolkit version 2.3 (cutoff read length for high quality, 70%; cutoff quality score, 20) . After the filtering step, clean reads were mapped to the horse reference genome (Ensembl EquCab2.0, release 72)  with BWA version 0.7.5a  with minimum seed length (-k 15) and Mark shorter split hits as secondary (-M). We realigned the reads using the GATK  IndelRealigner algorithm to enhance the mapping quality, and marked duplicate reads using MarkDuplicates from picard-tools version 1.92 (http://broadinstitute.github.io/picard/).
De novoassembly of unmapped reads
We extracted unmapped sequences from aligned Marwari BAM files. To find Marwari specific genomic regions, we assembled unmapped reads using SOAPdenovo2  with "all" mode and multiple K values (ranged from 23 to 63). A total of 120,159 contigs were obtained, and N50 length was 227bp. To identify whether these contigs are in non-reference regions, we aligned contigs to the horse reference genome. A total of 25,614 contigs were not aligned to the reference genome. The non-reference sequences were further analyzed by BLAST to NCBI protein and DNA sequence databases with the criteria E≤10-5 and identity ≥ 70%.
Variant detection and annotation
Putative variant calls were made using the SAMtools version 0.1.16  mpileup command. In this step, we used the -E option to minimize the noise resulting from pairwise read alignments, and the -A option to use regardless of insert size constraint and/or orientation within pairs. Variants were called using bcftools and then filtered using vcfutils varFilter (minimal depth of 8, maximal depth of 100, Phred scores of SNP call ≥ 30, and no indel present within a 2 bp window) as previously reported . SnpEff  was used to annotate the variants. To find unique variants for the Marwari horse, SNVs and small indels were further compared with the horse SNP database that was identified by the Horse Genome Project (http://www.broadinstitute.org/mammals/horse/snp), and other previously reported horse breed genomes [4, 6]. Copy number variants based on the differences in sequencing depths were detected using R library "ReadDepth package" with default options. The ReadDepth calculated the thresholds for copy number gain (2.642) and loss (1.380).
Phylogenetic tree construction
Genotype data were extracted from a total of 11,377,736 single nucleotide positions, which were shared and sufficiently covered regions (> 8 × depth), in the seven horse whole genome data (Arabian, Icelandic, Marwari, Norwegian Fjord, Quarter, Standardbred, and Thoroughbred) [4, 6]. The genotyping data were merged, and then filtered to remove those SNP with a genotyping rate of < 0.05 and allele frequency > 0.2 using PLINK . SNPs that were in linkage disequilibrium (LD) were also removed: the merged files were pruned for r2 < 0.1 in PLINK, considering 100 SNP windows and moving 25 SNPs per set (-indep-pairwise 100 25). After the filtering and pruning process, 25,854 SNPs remained and were used for the phylogenetic analysis. RAxML version 7.28-ALPHA  was used to generate the parsimony starting trees, and RAxML-Light version 1.0.9  was used to carry out tree inference with the GTRGAMMA model of nucleotide substitutions. A total of 100 bootstrap trees were generated for each phylogeny. The resulting tree was drawn by MEGA6 .
MDS and population structure analyses
Equine SNP array data of 729 individuals belonging to 32 horse breeds were obtained from a previous report . The Marwari horse data used in this analysis were selected from 54,330 nucleotide positions that were derived from the SNP array data. The SNP array and Marwari data were filtered and pruned to remove SNPs with the same cutoffs described above, except that the MAF option was set to --maf < 0.05. A total of 10,554 single nucleotide positions were used for the following comparative analyses.
The MDS plot was drawn in R  using the "MASS" library and "canberra" distance metric. STRUCTURE version 2.3.4 [23, 24] was used to cluster Asian breeds based on genetic similarity, investigating K values from 2 to 5. Each run for a given K value consisted of a 15,000 steps burn-in and 35,000 MCMC repetitions. We applied a default admixture model and a default assumption that allele frequencies were correlated. The convergence of STRUCTURE runs was evaluated by the equilibrium of alpha. Individual and population clump files were produced with Structure Harvester  and visualized in Distruct1.1 .
Orthologous gene family
Protein sequences of cow (Bos taurus), dog (Canis familiaris), human (Homo sapiens), mouse (Mus musculus), and pig (Sus scrofa) were downloaded from the Ensembl database version 72. Protein sequences of minke whale (Balaenoptera acutorostrata)  and camel (Camelus bactrianus)  were obtained from the original publications. A total of eight species were used to identify orthologous gene clusters with OrthoMCL 2.0.9. Pairwise sequence similarities between all protein sequences were calculated using BLASTP with an e-value cutoff of 1E-05. On the basis of the BLASTP results, OrthoMCL was used to perform a Markov clustering algorithm with inflation value (-I) of 1.5. The OrthoMCL was run with an e-value exponent cutoff of -5 and percent match cutoff of 50%. In total, 5,501 orthologous groups were shared by all eight species. The representative sequences for each gene cluster were selected using the longest horse transcript and the corresponding protein sequences of the other species. BLASTP searches (E-value 1E-5 cutoff) between horse and all the other species were used in this process. Finally, we identified 5,459 1:1:1:1:1:1:1:1 orthologs.
Molecular evolutionary analysis
The phylogenetic tree was constructed from 5,459 single copy ortholog genes. CODEML in PAML 4.5  was used to estimate the d N /d S ratio, where d N indicates nonsynonymous substitution rate and d S indicates synonymous substitution rate. The d N /d S ratio along the horse branch (free-ratio molel) and d N /d S ratio for all branches (one-ratio model) were calculated as the branch model. We also applied the branch-site model to further examine potential positive selection . The LRTs (likelihood ratio tests) were applied to assess statistical significance of the branch-site model. We supposed that positively selected genes are that of having a higher d N /d S ratio with the free-ratios model than that with the one-ratio model and having p-value < 0.05 from branch-site model.
Availability of supporting data
Whole genome sequence data was deposited in the SRA database at NCBI with Biosample accession numbers SAMN02767683. SRA of whole genome sequencing can be accessed via reference numbers SRX535352. The data can also be accessed through BioProject accession number PRJNA246445 for the whole genome sequence data.
This work was supported by the National Research Foundation of Korea (2008-2004707 and 2013M3A9A5047052), the Industrial Strategic Technology Development Program, 10040231, 'Bioinformatics platform development for next generation bioinformation analysis' funded by the Ministry of Knowledge Economy (MKE, Korea), and the National Science Museum (NMK, Korea). The Equestrian Club of Gujarat and its office-bearer Mr. Virendra Kankariya provided samples of a registered Marwari horse for the study. We also thank Xcelris Genomics, Ahmedabad, India, and Dr Surendra Chikara, Ms. Arpita Ghosh, and the technical team of Xcelris for their work in DNA extraction, library preparation and shipment to GRF, South Korea. Authors thank TheragenEtex for surporting the research by providing GRF with computational and experimental resource for the NGS analyses. Publication costs were supported by the National Science Museum (NMK, Korea).
This article has been published as part of BMC Genomics Volume 15 Supplement 9, 2014: Thirteenth International Conference on Bioinformatics (InCoB2014): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S9.
- Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, Lear TL, Adelson DL, Bailey E, Bellone RR, Blocker H, Distl O, Edgar RC, Garber M, Leeb T, Mauceli E, MacLeod JN, Penedo MC, Raison JM, Sharpe T, Vogel J, Andersson L, Antczak DF, Biagi T, Binns MM, Chowdhary BP, Coleman SJ, Della Valle G, Fryc S, Guerin G, et al: Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 2009, 326: 865-867. 10.1126/science.1178158.PubMedPubMed CentralView ArticleGoogle Scholar
- Warmuth V, Eriksson A, Bower MA, Barker G, Barrett E, Hanks BK, Li S, Lomitashvili D, Ochir-Goryaeva M, Sizonov GV, Soyonov V, Manica A: Reconstructing the origin and spread of horse domestication in the Eurasian steppe. Proc Natl Acad Sci USA. 2012, 109: 8202-8206. 10.1073/pnas.1111122109.PubMedPubMed CentralView ArticleGoogle Scholar
- Warmuth V, Eriksson A, Bower MA, Cañon J, Cothran G, Distl O, Glowatzki-Mullis ML, Hunt H, Luís C, do Mar Oom M, Yupanqui IT, Ząbek T, Manica A: European Domestic Horses Originated in Two Holocene Refugia. PLoS One. 2011, 6: e18194-10.1371/journal.pone.0018194.PubMedPubMed CentralView ArticleGoogle Scholar
- Doan R, Cohen ND, Sawyer J, Ghaffari N, Johnson CD, Dindot SV: Whole-Genome Sequencing and Genetic Variant Analysis of a Quarter Horse Mare. BMC Genomics. 2012, 13: 78-10.1186/1471-2164-13-78.PubMedPubMed CentralView ArticleGoogle Scholar
- Online Mendelian Inheritance in Animals: [http://omia.angis.org.au/home]
- Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, Schubert M, Cappellini E, Petersen B, Moltke I, Johnson PL, Fumagalli M, Vilstrup JT, Raghavan M, Korneliussen T, Malaspinas AS, Vogt J, Szklarczyk D, Kelstrup CD, Vinther J, Dolocan A, Stenderup J, Velazquez AM, Cahill J, Rasmussen M, Wang X, Min J, Zazula GD, Seguin-Orlando A, Mortensen C, et al: Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature. 2013, 499: 74-81. 10.1038/nature12323.PubMedView ArticleGoogle Scholar
- Hendricks B: International Encyclopedia of Horse Breeds. 1995, Norman: University of Oklahoma PressGoogle Scholar
- Elwyn Hartley Edwards: The Encyclopedia of the Horse. 1994, New York: Dorling KindersleyGoogle Scholar
- Wendy Doniger: The Hindus: An Alternative History. 2009, New Delhi: Penguin BooksGoogle Scholar
- Behl R, Behl J, Gupta N, Gupta SC: Genetic relationships of five Indian horse breeds using microsatellite markers. Animal. 2007, 4: 483-488.View ArticleGoogle Scholar
- Dutson Judith: Storey's Illustrated Guide to 96 Horse Breeds of North America. 2005, North adams: Storey PublishingGoogle Scholar
- Gupta AK, Chauhan M, Tandon SN, Sonia : Genetic diversity and bottleneck studies in the Marwari horse breed. J Genet. 2005, 84: 295-301. 10.1007/BF02715799.PubMedView ArticleGoogle Scholar
- Petersen JL, Mickelson JR, Rendahl AK, Valberg SJ, Andersson LS, Axelsson J, Bailey E, Bannasch D, Binns MM, Borges AS, Brama P, da Câmara Machado A, Capomaccio S, Cappelli K, Cothran EG, Distl O, Fox-Clipsham L, Graves KT, Guérin G, Haase B, Hasegawa T, Hemmann K, Hill EW, Leeb T, Lindgren G, Lohi H, Lopes MS, McGivney BA, Mikko S, Orr N, et al: Genome-wide analysis reveals selection for important traits in domestic horse breeds. PLoS Genet. 2013, 9: e1003211-10.1371/journal.pgen.1003211.PubMedPubMed CentralView ArticleGoogle Scholar
- IBM Corp: 2013, IBM SPSS Statistics for Windows, Version 22.0. NY: IBMGoogle Scholar
- Huang da W, Sherman BT, Zheng X, Yang J, Imamichi T, Stephens R, Lempicki RA: Extracting biological meaning from large gene lists with DAVID. Curr Protoc Bioinformatics. 2009, Chapter 13:Unit 13.11Google Scholar
- Miller CA, Hampton O, Coarfa C, Milosavljevic A: ReadDepth: a parallel R package for detecting copy number alterations from short sequencing reads. PLoS One. 2011, 6: e16327-10.1371/journal.pone.0016327.PubMedPubMed CentralView ArticleGoogle Scholar
- John F, Wall : Famous Running Horses: Their Forebears and Descendants. 2013, Whitefish: Literary LicensingGoogle Scholar
- Robert Moorman Denhardt: The Quarter Horse Running: America's Oldest Breed. 2003, Norman: University of Oklahoma PressGoogle Scholar
- Llamas : This is the Spanish Horse. 1999, London: J A Allen & Co LtdGoogle Scholar
- Milner : Godolphin Arabian: Story of the Matchem Line. 1990, London: J. A. AllenGoogle Scholar
- Breed of Livestock: [http://www.ansi.okstate.edu/breeds/horses/]
- International Museum of the HORSE: [http://www.imh.org/exhibits/online/breeds-of-the-world]
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and1 correlated allele frequencies. Genetics. 2003, 164: 1567-1587.PubMedPubMed CentralGoogle Scholar
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.PubMedPubMed CentralGoogle Scholar
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nat Methods. 2010, 7: 248-249. 10.1038/nmeth0410-248.PubMedPubMed CentralView ArticleGoogle Scholar
- ALTMANN F: Congenital atresia of the ear in man and animals. Ann Otol Rhinol Laryngol. 1955, 64: 824-858. 10.1177/000348945506400313.PubMedView ArticleGoogle Scholar
- Yellon RF, Branstetter BF 4th: Prospective blinded study of computed tomography in congenital aural atresia. Int J Pediatr Otorhinolaryngol. 2010, 74: 1286-1291. 10.1016/j.ijporl.2010.08.006.PubMedView ArticleGoogle Scholar
- Coré N, Caubit X, Metchat A, Boned A, Djabali M, Fasano L: Tshz1 is required for axial skeleton, soft palate and middle ear development in mice. Dev Biol. 2007, 308: 407-420. 10.1016/j.ydbio.2007.05.038.PubMedView ArticleGoogle Scholar
- Hill EW, Gu J, McGivney BA, MacHugh DE: Targets of selection in the Thoroughbred genome contain exercise-relevant gene SNPs associated with elite racecourse performance. Anim Genet. 2010, 41: 56-63.PubMedView ArticleGoogle Scholar
- Bellone RR, Forsyth G, Leeb T, Archer S, Sigurdsson S, Imsland F, Mauceli E, Engensteiner M, Bailey E, Sandmeyer L, Grahn B, Lindblad-Toh K, Wade CM: Fine-mapping and mutation analysis of TRPM1: a candidate gene for leopard complex (LP) spotting and congenital stationary night blindness in horses. Brief Funct Genomics. 2010, 9: 193-207. 10.1093/bfgp/elq002.PubMedView ArticleGoogle Scholar
- Tryon RC, White SD, Bannasch DL: Homozygosity mapping approach identifies a missense mutation in equine cyclophilin B (PPIB) associated with HERDA in the American Quarter Horse. Genomics. 2007, 90: 93-102. 10.1016/j.ygeno.2007.03.009.PubMedView ArticleGoogle Scholar
- Brooks SA, Gabreski N, Miller D, Brisbin A, Brown HE, Streeter C, Mezey J, Cook D, Antczak DF: Whole-genome SNP association in the horse: identification of a deletion in myosin Va responsible for Lavender Foal Syndrome. PLoS Genet. 2010, 6: e1000909-10.1371/journal.pgen.1000909.PubMedPubMed CentralView ArticleGoogle Scholar
- Marklund L, Moller MJ, Sandberg K, Andersson L: A missense mutation in the gene for melanocyte-stimulating hormone receptor (MC1R) is associated with the chestnut coat color in horses. Mamm Genome. 1996, 7: 895-899. 10.1007/s003359900264.PubMedView ArticleGoogle Scholar
- Wagner HJ, Reissmann M: New polymorphism detected in the horse MC1R gene. Anim Genet. 2000, 31: 289-290. 10.1046/j.1365-2052.2000.00655.x.PubMedView ArticleGoogle Scholar
- Brooks SA, Bailey E: Exon skipping in the KIT gene causes a Sabino spotting pattern in horses. Mamm Genome. 2005, 16: 893-902. 10.1007/s00335-005-2472-y.PubMedView ArticleGoogle Scholar
- Brooks SA, Lear TL, Adelson DL, Bailey E: A chromosome inversion near the KIT gene and the Tobiano spotting pattern in horses. Cytogenet Genome Res. 2007, 119: 225-230. 10.1159/000112065.PubMedView ArticleGoogle Scholar
- Makvandi-Nejad S, Hoffman GE, Allen JJ, Chu E, Gu E, Chandler AM, Loredo AI, Bellone RR, Mezey JG, Brooks SA, Sutter NB: Four loci explain 83% of size variation in the horse. PLoS One. 2012, 7: e39929-10.1371/journal.pone.0039929.PubMedPubMed CentralView ArticleGoogle Scholar
- Signer-Hasler H, Flury C, Haase B, Burger D, Simianer H, Leeb T, Rieder S: A genome-wide association study reveals loci influencing height and other conformation traits in horses. PLoS One. 2012, 7: e37282-10.1371/journal.pone.0037282.PubMedPubMed CentralView ArticleGoogle Scholar
- Spirito F, Charlesworth A, Linder K, Ortonne JP, Baird J, Meneguzzi G: Animal models for skin blistering conditions: absence of laminin 5 causes hereditary junctional mechanobullous disease in the Belgian horse. J Invest Dermatol. 2002, 119: 684-691. 10.1046/j.1523-1747.2002.01852.x.PubMedView ArticleGoogle Scholar
- Brunberg E, Andersson L, Cothran G, Sandberg K, Mikko S, Lindgren G: A missense mutation in PMEL17 is associated with the Silver coat color in the horse. BMC Genet. 2006, 7: 46-PubMedPubMed CentralView ArticleGoogle Scholar
- Graves KT, Henney PJ, Ennis RB: Partial deletion of the LAMA3 gene is responsible for hereditary junctional epidermolysis bullosa in the American Saddlebred Horse. Anim Genet. 2009, 40: 35-41. 10.1111/j.1365-2052.2008.01795.x.PubMedView ArticleGoogle Scholar
- Shin EK, Perryman LE, Meek K: A kinase-negative mutation of DNA-PK(CS) in equine SCID results in defective coding and signal joint formation. J Immunol. 1997, 158: 3565-3569.PubMedGoogle Scholar
- Aleman M, Riehl J, Aldridge BM, Lecouteur RA, Stott JL, Pessah IN: Association of a mutation in the ryanodine receptor 1 gene with equine malignant hyperthermia. Muscle Nerve. 2004, 30: 356-365. 10.1002/mus.20084.PubMedView ArticleGoogle Scholar
- Gu J, MacHugh DE, McGivney BA, Park SD, Katz LM, Hill EW: Association of sequence variants in CKM (creatine kinase, muscle) and COX4I2 (cytochrome c oxidase, subunit 4, isoform 2) genes with racing performance in Thoroughbred horses. Equine Vet J. 2010, 42: 569-75.View ArticleGoogle Scholar
- McCue ME, Valberg SJ, Miller MB, Wade C, DiMauro S, Akman HO, Mickelson JR: Glycogen synthase (GYS1) mutation causes a novel skeletal muscle glycogenosis. Genomics. 2008, 91: 458-466. 10.1016/j.ygeno.2008.01.011.PubMedPubMed CentralView ArticleGoogle Scholar
- Cannon SC, Hayward LJ, Beech J, Brown RH: Sodium channel inactivation is impaired in equine hyperkalemic periodic paralysis. J Neurophysiol. 1995, 73: 1892-1899.PubMedGoogle Scholar
- Orr N, Back W, Gu J, Leegwater P, Govindarajan P, Conroy J, Ducro B, Van Arendonk JA, MacHugh DE, Ennis S, Hill EW, Brama PA: Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses. Anim Genet. 2010, 41: 2-7.PubMedView ArticleGoogle Scholar
- Cook D, Brooks S, Bellone R, Bailey E: Missense mutation in exon 2 of SLC36A1 responsible for champagne dilution in horses. PLoS Genet. 2008, 4: e1000195-10.1371/journal.pgen.1000195.PubMedPubMed CentralView ArticleGoogle Scholar
- Hansen M, Knorr C, Hall AJ, Broad TE, Brenig B: Sequence analysis of the equine SLC26A2 gene locus on chromosome 14q15-->q21. Cytogenet Genome Res. 2007, 118: 55-62. 10.1159/000106441.PubMedView ArticleGoogle Scholar
- Yang GC, Croaker D, Zhang AL, Manglick P, Cartmill T, Cass D: A dinucleotide mutation in the endothelin-B receptor gene is associated with lethal white foal syndrome (LWFS); a horse variant of Hirschsprung disease. Hum Mol Gene. 1998, 7: 1047-1052. 10.1093/hmg/7.6.1047.View ArticleGoogle Scholar
- Hill EW, McGivney BA, Gu J, Whiston R, Machugh DE: A genome-wide SNP association study confirms a sequence variant (g.66493737C > T) in the equine myostatin (MSTN) gene as the most powerful predictor of optimum racing distance for Thoroughbred racehorses. BMC Genomics. 2010, 11: 552-10.1186/1471-2164-11-552.PubMedPubMed CentralView ArticleGoogle Scholar
- Mariat D, Taourit S, Guérin G: A mutation in the MATP gene causes the cream coat colour in the horse. Genet Sel Evol. 2003, 35: 119-133. 10.1186/1297-9686-35-1-119.PubMedPubMed CentralView ArticleGoogle Scholar
- Rieder S, Taourit S, Mariat D, Langlois B, Guérin G: Mutations in the agouti (ASIP), the extension (MC1R), and the brown (TYRP1) loci and their association to coat color phenotypes in horses (Equus caballus). Mamm Genome. 2001, 12: 450-455. 10.1007/s003350020017.PubMedView ArticleGoogle Scholar
- Andersson LS, Larhammar M, Memic F, Wootz H, Schwochow D, Rubin CJ, Patra K, Arnason T, Wellbring L, Hjälm G, Imsland F, Petersen JL, McCue ME, Mickelson JR, Cothran G, Ahituv N, Roepstorff L, Mikko S, Vallstedt A, Lindgren G, Andersson L, Kullander K: Mutations in DMRT3 affect locomotion in horses and spinal circuit function in mice. Nature. 2012, 488: 642-646. 10.1038/nature11399.PubMedPubMed CentralView ArticleGoogle Scholar
- Rosengren Pielberg G, Golovko A, Sundström E, Curik I, Lennartsson J, Seltenhammer MH, Druml T, Binns M, Fitzsimmons C, Lindgren G, Sandberg K, Baumung R, Vetterlein M, Strömberg S, Grabherr M, Wade C, Lindblad-Toh K, Pontén F, Heldin CH, Sölkner J, Andersson L: A cis-acting regulatory mutation causes premature hair graying and susceptibility to melanoma in the horse. Nat Genet. 2008, 40: 1004-1009. 10.1038/ng.185.PubMedView ArticleGoogle Scholar
- Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A, Tanenbaum DM, Civello D, White TJ, J Sninsky J, Adams MD, Cargill M: A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005, 3: el70-View ArticleGoogle Scholar
- Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13: 2178-2189. 10.1101/gr.1224503.PubMedPubMed CentralView ArticleGoogle Scholar
- Macfadden BJ: Fossil Horses: Systematics, Paleobiology, and Evolution of the Family Equidae. 1994, Cambridge:Cambridge University PressGoogle Scholar
- Macfadden BJ: Evolution. Fossil horses--evidence for evolution. Science. 2005, 307: 1728-1730. 10.1126/science.1105458.PubMedView ArticleGoogle Scholar
- Patel RK, Jain M: NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data. PLoS One. 2012, 7: e30619-10.1371/journal.pone.0030619.PubMedPubMed CentralView ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMedPubMed CentralView ArticleGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20: 1297-1303. 10.1101/gr.107524.110.PubMedPubMed CentralView ArticleGoogle Scholar
- Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012, 1: 18-10.1186/2047-217X-1-18.PubMedPubMed CentralView ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMedPubMed CentralView ArticleGoogle Scholar
- Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly(Austin). 2012, 6: 80-92.Google Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analysis. Amer J Hum Genet. 2007, 81: 559-575. 10.1086/519795.PubMedPubMed CentralView ArticleGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22: 2688-2690. 10.1093/bioinformatics/btl446.PubMedView ArticleGoogle Scholar
- Stamatakis A, Aberer AJ, Goll C, Smith SA, Berger SA, Izquierdo-Carrasco F: RAxML-Light: a tool for computing terabyte phylogenies. Bioinformatics. 2012, 28: 2064-2066. 10.1093/bioinformatics/bts309.PubMedPubMed CentralView ArticleGoogle Scholar
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S: MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Mol Biol Evol. 2013, 30: 2725-2729. 10.1093/molbev/mst197.PubMedPubMed CentralView ArticleGoogle Scholar
- Ihaka R, Gentleman R: R: A Language for Data Analysis and Graphics. J Comput Graph Stat. 1996, 5: 299-314.Google Scholar
- Earl DA, Vonholdt BM: STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour. 2012, 4: 359-361. 10.1007/s12686-011-9548-7.View ArticleGoogle Scholar
- Rosenberg NA: DISTRUCT: a program for the graphical display of population structure. Mol Ecol Notes. 2004, 4: 137-138.View ArticleGoogle Scholar
- Yim HS, Cho YS, Guang X, Kang SG, Jeong JY, Cha SS, Oh HM, Lee JH, Yang EC, Kwon KK, Kim YJ, Kim TW, Kim W, Jeon JH, Kim SJ, Choi DH, Jho S, Kim HM, Ko J, Kim H, Shin YA, Jung HJ, Zheng Y, Wang Z, Chen Y, Chen M, Jiang A, Li E, Zhang S, Hou H, et al: Minke whale genome and aquatic adaptation in cetaceans. Nat Genet. 2014, 46: 88-92.PubMedPubMed CentralView ArticleGoogle Scholar
- Ji R, Cui P, Ding F, Geng J, Gao H, Zhang H, Yu J, Hu S, Meng H: Monophyletic origin of domestic bactrian camel (Camelus bactrianus) and its evolutionary relationship with the extant wild camel (Camelus bactrianus ferus). Anim Genet. 2009, 40: 377-382. 10.1111/j.1365-2052.2008.01848.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479. 10.1093/molbev/msi237.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.