Whole genome sequencing of the black grouse (Tetrao tetrix): reference guided assembly suggests faster-Z and MHC evolution
© Wang et al.; licensee BioMed Central Ltd. 2014
Received: 21 June 2013
Accepted: 26 February 2014
Published: 6 March 2014
The different regions of a genome do not evolve at the same rate. For example, comparative genomic studies have suggested that the sex chromosomes and the regions harbouring the immune defence genes in the Major Histocompatability Complex (MHC) may evolve faster than other genomic regions. The advent of the next generation sequencing technologies has made it possible to study which genomic regions are evolutionary liable to change and which are static, as well as enabling an increasing number of genome studies of non-model species. However, de novo sequencing of the whole genome of an organism remains non-trivial. In this study, we present the draft genome of the black grouse, which was developed using a reference-guided assembly strategy.
We generated 133 Gbp of sequence data from one black grouse individual by the SOLiD platform and used a combination of de novo assembly and chicken reference genome mapping to assemble the reads into 4572 scaffolds with a total length of 1022 Mb. The draft genome well covers the main chicken chromosomes 1 ~ 28 and Z which have a total length of 1001 Mb. The draft genome is fragmented, but has a good coverage of the homologous chicken genes. Especially, 33.0% of the coding regions of the homologous genes have more than 90% proportion of their sequences covered. In addition, we identified ~1 M SNPs from the genome and identified 106 genomic regions which had a high nucleotide divergence between black grouse and chicken or between black grouse and turkey.
Our results support the hypothesis that the chromosome X (Z) evolves faster than the autosomes and our data are consistent with the MHC regions being more liable to change than the genome average. Our study demonstrates how a moderate sequencing effort can be combined with existing genome references to generate a draft genome for a non-model species.
Next generation sequencing (NGS) has spurred a revolution in the development of genomic tools for non-model organisms . In particular, sequencing complete transcriptomes  or complexity-reduced fractions of genomes  has enabled the identification of genome-wide molecular markers such as single nucleotide polymorphisms (SNPs) and microsatellites (SSRs). Such investigations have also addressed fundamental questions in molecular ecology and evolution, such as the genomic basis for speciation [4, 5], morphological variation [6, 7], disease resistance  and selection on life history traits [9, 10].
A complete genome sequence is the ultimate genomic tool for a species. If such a sequence is available it is possible to conduct large-scale, in-depth studies of many important molecular biology processes such as gene expression, transcription regulation, alternative splicing, epigenetic modifications and gene-protein interactions [11–14] which are important in ecological studies. However, applying NGS technologies such as de-novo sequencing on a large eukaryotic genome is still rare, as it represents a considerable investment. The sheer volume of data generated and the computational facilities needed to assemble and analyse it may limit the number of non-specialized labs that are currently able to embark on such a project. However, more whole genome studies are needed to address fundamental questions on the evolution of genome organisation, such as which regions are conserved and which regions change when taxa diverge and become separate species. Published whole NGS genomes of non-model organisms include giant panda , cod , naked mole rat , macaque , Tasmanian devil , budgerigar , Puerto Rican parrot , Heliconius butterfly , Aye-aye , collared flycatcher , as well as the 29 mammalian genomes recently sequenced at the Broad Institute .
The large number of publically available whole genome sequences from both model and non-model organisms can be used to aid genomic investigations in related organisms. One approach is to directly transfer the genomic resources from a model organism to the study species, which would then be called ‘genome enabled taxa’ . This strategy has been used successfully to develop resources such as microsatellite markers , SNPs , microarrays  and exon capture arrays . Alternatively, the genome sequence from a related model organism can be used in the assembly of short read data from the focal species, a process known as reference guided (or reference assisted) assembly [31, 32].
Results and discussion
The raw sequencing data was comprised of 793 M reads with a read length of 75 bp which were generated for the single-end library, 1642 M reads with read length of 60 bp × 60 bp which were generated for the 2 Kb mate-paired library, and 1548 M reads with read length of 60 bp × 60 bp which were generated for the 5 Kb mate-paired library. The raw reads are deposited in the NCBI sequence read archive (SRA) under the accession number SRA061602. After quality and length filtering, 423 M reads (53.3%) were retained for the single-end library, 320 M (75.7%) of which were 75 bp in length. For the 2 Kb mate-paired library, 857 M reads (52.2%) were retained after filtering, and 663 M (77.4%) of them were 60 bp in length. For those filtered reads, 519 M (31.6%) were properly paired, and the rest were only retained as unpaired reads. For the 5 Kb mate-paired library, 847 M reads (54.7%) were retained after filtering, 648 M (76.5%) of which were of 60 bp in length. For those filtered reads, 520 M (33.6%) were properly paired, and the rest were only retained as unpaired reads. Therefore, 2127 M high quality sequencing reads with the total length of approximately 133 Gb were kept in downstream analysis. If we assume that the genome size of black grouse is similar to that of chicken (1.05 G), the estimated mean sequencing coverage of the black grouse genome was 127X.
Reference guided assembly
The reference guided assembly is comprised of several steps, including de novo assembly, reference mapping and the merging of these results (Figure 2). In the first step, all the 2127 M filtered high quality reads were de novo assembled by SOAPdenovo. We were able to generate 1298366 preliminary contigs with a total length of 937 Mb. As expected, the de novo assembly was more fragmented compared to some other studies which also used short-read sequencing technologies [15, 16, 22], this is because in this study we only had three sequencing libraries with a maximum insert size of 5 Kbp and the sequencing reads produced by the SOLiD technology were relatively short. The SOLiD platform is believed to produce high quality reads . All the filtered data we used in our analyses had an error rate not larger than 0.1%. However, the short read length seriously affects its performance in pure de novo assembly. Longer sequencing reads produced by platforms such as 454, ion-torrent or PacBio usually produce larger contigs and such data could be used to improve our assembly in the future.
In the next step, we aligned all long contigs to the chicken genome (Figure 2) and were able to map 277501 of them. The total mapped length was 438 Mb. At the same time, we also aligned the filtered and properly paired reads from the mate-paired libraries to the chicken genome resulting in 451 M successfully mapped reads. These two sets of mapped reads were merged and this resulted in a 805 Mb black grouse genome backbone scaffold. Finally, we mapped the de novo assembled contigs back to the black grouse backbone scaffolds and had 1175021 of them mapped. Therefore, we succeeded to cover 833 Mb (79.6%) of the 1046 Mb chicken genome, and 4572 of the 15932 chicken scaffolds (version galGal4). We covered 826 Mb (82.5%) of the 1001 Mb main chicken chromosomes (chromosomes 1-18, and chromosome Z). In addition, we also retained 41098 unmapped contigs (after discarding 265 contigs as likely contaminations) with a total length of 16.6 Mb.
Number of genes from other bird genomes found to be homologous to the black grouse draft genome
Number of genes
Black grouse homologs
Looking at the distribution of the annotated genes across the scaffolds, we found that the majority of the genes were identified on the 29 main chromosome scaffolds. Interestingly, 634 genes were identified from the unmapped contigs, suggesting that those genes could be not included in the reference chicken genome, or be highly divergent between black grouse and chicken. The average gene density of the 29 main chromosome scaffolds was 1.41E-5 gene/nucleotide. Chromosome 1 had the highest number of genes (2017) as it was the longest chromosome. Chromosome 16 had the highest gene density of 1.03E-4 gene/nucleotide, while chromosome Z had the lowest gene density of 7.99E-6 gene/nucleotide.
Information on repeat elements identified from sequenced bird genomes
Total length of repeats (Mb)
Percentage in genome (%)
Number of specific elements
Identification of SNPs
Number and density of single nucleotide polymorphisms (SNPs) identified in the genome sequence from one outbred black grouse individual
Number of SNP
SNP density (%)
Macro-chromosomes (1 ~ 5)
Intermediate-chromosomes (6 ~ 10)
Micro-chromosomes (11 ~ 28)
We further investigated the SNPs on the 29 large chromosome scaffolds (Table 3, Additional file 1). We classified the chromosomes into four categories: macro-chromosomes (chromosome 1 ~ 5), intermediate-chromosomes (chromosome 6 ~ 10), micro-chromosomes (chromosome 11 ~ 28) and sex chromosome (chromosome Z). We found that the macro-chromosomes had the highest heterozygosity while the sex chromosome had the lowest. The heterozygosity of the micro-chromosomes was also low. This might be because that the micro-chromosomes have a higher gene density in the black grouse. In contrast, the sex chromosome had the lowest density of genes but also had a low heterozygosity. Similar patterns have been observed in a wide variety of organisms and are explained by the fact that the effective population size of chromosome Z is theoretically 0.75 compared to that of the autosomes . In addition, the reduced variation on the Z (corresponding to X in mammals and flies) could also be interpreted as the result of faster evolution and purifying selection [50–52].
Since the scaffolds of the black grouse draft genome were developed by using the chicken genome as reference, we could not investigate the genomic variations of black grouse, chicken and other species from a genomic rearrangement perspective, however, the sequences allowed us to conduct a comprehensive comparative genomic analysis at the level of nucleotide variation. For this analysis, we focused on the main chromosomes (chromosome 1-28 and chromosome Z) and examined the nucleotide divergence (number of variable sites per unit) between black grouse, chicken and turkey. The downloaded chicken genome was split to 187307 sequences, of which 181105 (96.7%) could be mapped to the black grouse main chromosome scaffolds (chromosome 1-28 and chromosome Z). This alignment covered 795 M (96.2%) of the sequenced sites of the main black grouse chromosomes. The downloaded turkey genome was split to 336344 sequences, of which 328727 (97.7%) could be mapped to the main black grouse chromosome scaffolds. This alignment covered 703 M (85.1%) of the sequenced sites of the main black grouse chromosomes. The turkey genome had a higher mapping percentage but a much lower coverage of the sequenced sites of the black grouse genome, as the turkey genome sequences were of lower quality (containing many unresolved nucleotides ‘N’,) compared to those of chicken.
In this study, using the chicken genome as a reference, we successfully assembled the whole draft genome of black grouse. The draft genome consists of 4572 scaffolds with a total length of 1022 Mb (833 Mb sequenced), and additional 41098 unscaffoled contigs with total length of 16.6 Mb. This corresponds to a high coverage of the chicken chromosomes 1 ~ 28 and chromosome Z, with a total length of 1001 Mb (826 Mb sequenced). Although the continuously sequenced blocks on the scaffolds are fragmented, the draft genome has a good coverage of the homologous chicken genes, and 14826 (82.7%) of the chicken genes were identified on the black grouse draft genome. Notably, 33.0% of the coding regions of the homologous genes have more than 90% proportion of their sequences covered. To our knowledge, this is the first time a large eukaryote genome was developed by SOLiD short sequencing technology and reference guided assembly bioinformatic pipeline. Our study demonstrates how a moderate sequencing effort can be combined with existing genome references to accomplish a large genome project. We identified a large number (949254) of SNPs and identified the genomic regions we suggest are important for the lineage specific evolution of black grouse. From the above analysis, we note that the sex chromosome (chromosome Z) had lower reference assembly efficiency, lower SNP density but a higher nucleotide divergence between black grouse and other galliform species. Those multiple evidences support the faster X (Z) hypothesis of the sex chromosome, which states that the chromosome X (Z) evolves faster than the autosomes due to its lower effective population size and recombination rate. We also observed that microchromosome 16 which harbours the MHC region in galliforms was highly divergent among species which may indicate faster evolution in this genomic region.
DNA sampling, extraction and sequencing
The black grouse individual used in this study was a male collected by a licensed hunter in the winter hunting season of 2011 in Hundhamaren, Norway, where a large and continuously distributed black grouse population resides. The fresh blood of the sample was immediately stored in RNAlater (Ambion). DNA extraction was performed using DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer’s instructions. The library preparations and genome sequencing was performed at the Uppsala Genome Centre (http://www.igp.uu.se/facilities/genome_center/) using the Applied Biosystems SOLiD 5500xl platform. One single-end library with a read length of 75 bp, one mate-paired library with an insertion size of 2 Kb and read length of 60 × 60 bp, and one mate-paired library with an insertion size of 5 Kb and read length of 60 × 60 bp were constructed. Each library was sequenced on a full flowchip which contained six lanes. Both versions (colour-space/base-space) of the sequencing reads were obtained from the sequencing centre.
Preliminary de novoassembly
To make the best use of existing NGS analysis tools, we employed the widely used base-space version of data in all our bioinformatic analysis. The raw reads were first quality and size filtered using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). The threshold of the FASTQ quality score was set at 30; the thresholds of the length of the trimmed reads were 60 bp for the single-end library and 50 bp for the two mate-paired libraries. The filtered mate-paired reads were paired again using a custom made script.
All the sequencing reads were initially de novo assembled using SOAPdenovo v 1.05 (63mer version)  with default settings (Additional file 4). The assembly was performed on Uppmax Halvan cluster with 64 parallel threads and 2048 GB memory (http://www.uppmax.uu.se/halvan). We tested K-mer size exhaustively from 15 to 55, stepped by 2, and accepted the result with the longest N50 for the downstream mapping analysis. K-mer 31 gave the best result in this regard. Using it, we generated 1298366 preliminary contigs with length not shorter than 100 bp. The longest contig was 12574 bp in length. The average length of the contigs was 722 bp, and the contig N50 size was 1238 bp. The depths of coverage of the preliminary contigs ranged from 10 to 153, with the average of 35.1. The de-novo assembly scaffolds had an average length of 6010 bp. The longest was 53114 bp, and the N50 size of was 2065 bp.
Reference guided assembly and mapping
In order to improve the preliminary assembly we developed a reference guided approach (Figure 2). The well-established chicken genome (ICGSC Gallus_gallus-4.0/galGal4) , which was downloaded from the UCSC genome browser database , was used as the reference genome. To avoid incorrect mapping of the short sequences onto rearranged genome regions between black grouse and chicken, only preliminary contigs of 1 Kb or larger (335884 contigs with a mean length of 1817 bp) were selectively mapped. The mapping was performed using BWA-SW algorithm [60, 61] implemented in the Burrows-Wheeler Aligner (BWA) package v0.6.2. The BWA-SW algorithm was designed to enable the alignment of long sequences (up to 1 Mb) against a large sequence database at a relatively fast speed. To customize the algorithm to our needs we decreased the Gap extension penalty score (-r) to 1, as long trunks of insertions and deletions had been observed between the sequences of black grouse and chicken .
In parallel, we mapped the filtered and properly paired sequencing reads from the 2 Kb mate-paired library and the 5 Kb mate-paired library onto the reference chicken genome. We only adopted the mate-paired libraries because we wanted, as much as possible, to avoid incorrect mapping caused by genomic rearrangements between black grouse and chicken. The Burrows-Wheeler Aligner (BWA)  program v0.6.2 was used to conduct the mapping and custom alignment settings of Maximum edit distance (-n) 5, Maximum number of gap opens (-o) 2, Maximum number of gap extensions (-e) 10, Gap open penalty (-O) 8, and Gap extension penalty (-E) 2 were configured to make the program more tolerant to the indel variation between black grouse and chicken [39, 40]. The alignments were then summarized using the ‘bwa sampe’ command. The program automatically estimated the insertion size and direction between the paired reads and discarded the inferred incorrectly mapping pairs. The coverage of the alignment was estimated and the over-low/high covered sites were discarded by a custom made script to avoid incorrect mapping introduced by random factors or piling up of reads from duplicated genomic regions.
Reference guided assembly, merging and finalising
The BAM format alignment files of the contig mapping and the mate-pair read mapping were subsequently merged using SAMtools suite v0.1.18 . Then, the consensus sequences of black grouse were extracted from the merged alignment file by the ‘samtools mpileup’, ‘bcftools’ and ‘vcfutils.pl’ (vcf2fq) pipelines from the SAMtools suite. We used the consensus sequences of the black grouse scaffolds as a backbone to map all the contigs (not shorter than 100 bp) generated from the de novo assembly in order to further close gaps in the scaffolds and extend the sequenced regions (non-N) of the draft genome. The mapping was performed using BWA-SW program with its default configuration. To make use of the SAMtools consensus generating pipeline, the backbone scaffolds were split into 10 Kb fragments and mapped back onto themselves also using the BWA-SW program. The resulting alignment was merged with the contigs mapping alignment using SAMtools. This merged alignment was used to generate the final black grouse draft genome using the SAMtools pipeline. The remaining 41363 unmapped contigs (not smaller than 200 bp) were extracted and aligned to the NCBI Nucleotide collection (nt) and Genome survey sequence (gss) databases using BLASTN of the NCBI BLAST 2.2.27+ package . We discarded sequences of non-avian origin according to the BLAST search as they might be contamination. The remaining contig sequences were kept separately as parts of the black grouse draft genome.
The annotation of genes and genomic repeats was conducted by comparative methods. To identify genes, we downloaded the chicken genes (WASHUC2) from the Ensembl database  and followed a reciprocal BLAST approach to align the chicken genes and the black grouse draft genome. We firstly aligned the chicken cDNA sequences to the black grouse genome using the BLASTN program from the NCBI BLAST 2.2.27+ package. The E-value cut-off was set as 10E-10. We then extracted the aligned sequences from the black grouse genome and aligned them to the chicken proteins using the BLASTX program. The BLAST results were compared using a self-written script to keep only the reciprocal BLAST hits. Using the same BLAST protocol, we also searched the homologous turkey and zebra finch genes along the black grouse draft genome. The entire sets of the turkey proteins (UMD2)  and the zebra finch proteins (taeGut3.2.4)  were also downloaded from the Ensembl database. Since the chicken genome was released earliest and has the most direct molecular biology support for the genes , we accepted the BLAST result of chicken as the annotation of the black grouse genes.
To identify genomic repeats, we used the RepeatMasker program (http://www.repeatmasker.org/) to scan the black grouse draft genome sequence. RMBlast (RepearMasker compatible version of NCBI BLAST) (http://www.repeatmasker.org/RMBlast.html) was used as the alignment engine. The RepeatMasker library v20120418 was downloaded from RepBase (http://www.girinst.org/server/RepBase/index.php) and we specified the species library as ‘aves’ for the black grouse. For a comparative purpose, we also ran the RepeatMasker analysis for the latest versions of the chicken genome (galGal4), the turkey genome (melGal1) and the zebra finch genome (taeGut1), which were downloaded from the UCSC genome browser database.
Identification of SNPs
To identify SNPs present as heterozygous sites in our one outbred male black grouse, we first mapped all the filtered reads, including those from the single-end library, the paired reads and the singletons from the two mate-paired libraries to the black grouse draft genome using BWA v0.6.2. The alignment was performed using the ‘bwa aln’ command with default settings, ‘bwa samse’ with default settings was subsequently used for the reads of the single-end library and the singletons from the mate-paired libraries, and ‘bwa sampe’ with default settings was used for the paired reads of the two mate-paired libraries. The alignment files generated from the mapping were then merged together using SAMtools utilities v0.1.18. The average depth of coverage of the mapped sites was estimated from the SAM file and was used to determine the coverage cut-off of the SNP calling. The SNP calling followed the ‘samtools mpileup’, ‘bcftools’ and ‘vcfutils.pl’ (varFilter) pipelines. The Bayesian inference of the variants (-b) was enabled in ‘bcftools’. The statistics of the identified SNPs was calculated and evaluated using custom made scripts.
For the comparative genomic analysis at the level of nucleotide divergence, we focused on the chromosome scaffolds (chromosome 1-28 and chromosome Z). The chromosome sequences of chicken (galGal4) and turkey (melGal1) were downloaded from USCS genome browser database. Since directly aligning large genomic sequences is a cumbersome and time-consuming task, we split the genomic sequences of chicken and turkey into 10 Kb pieces, and then aligned these short sequences to the black grouse draft genome (chromosome 1-28 and chromosome Z) using the BWA-SW program with settings of Gap open penalty (-q) 1 and Gap extension penalty (-r) 1. The sequences with alignment depth of coverage more than 1 were excluded in downstream analysis. All the nucleotide variants were summarized using ‘SAMtools mpileup’ and ‘bcftools’ pipelines with probabilistic realignment for the computation of base alignment quality (BAQ) disable (-B). The statistics of the nucleotide divergence (percentage of variable sites per sequence) was calculated from the Variant call format (VCF) file by custom made scripts.We also used a sliding window (50 Kb) approach to scan the highly divergent regions across the genomes between black grouse/chicken, black grouse/turkey to identify the genomic regions which might be important in the lineage specific evolution of black grouse.
Raw sequencing reads: NCBI sequence read archive (SRA) SRA061602 Genome assembly: NCBI whole genome shotgun (WGS) database JDSL00000000
We thank Christopher Wheat and Jochen Wolf for comments on previous versions of the manuscript, Eleanor Jones for proofreading, and Henrik Lantz and Yu Sun for bioinformatic discussions. The sequencing was performed by the Uppsala Sequencing Centre and the SNIC-UPPMAX high-performance computing cluster was utilized for computations. Funding for this project was received from the research council of Sweden (VR) and SciLifeLab (Uppsala) to JH and the Finnish Academy to HS.
- Ekblom R, Galindo J: Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 2011, 107: 1-15. 10.1038/hdy.2010.152.PubMed CentralPubMedView Article
- Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol. 2008, 17 (7): 1636-1647. 10.1111/j.1365-294X.2008.03666.x.PubMedView Article
- van Bers NEM, Oers KV, Kerstens HHD, Dibbits BW, Crooijmans RPMA, Visser ME, Groenen MAM: Genome-wide SNP detection in the great tit Parus major using high throughput sequencing. Mol Ecol. 2010, 19 (s1): 89-99.PubMedView Article
- Wolf JBW, Bayer T, Haubold B, Schilhabel M, Rosenstiel P, Tautz D: Nucleotide divergence vs. gene expression differentiation: comparative transcriptome sequencing in natural isolates from the carrion crow and its hybrid zone with the hooded crow. Mol Ecol. 2010, 19 (s1): 162-175.PubMedView Article
- Schwarz D, Robertson H, Feder J, Varala K, Hudson M, Ragland G, Hahn D, Berlocher S: Sympatric ecological speciation meets pyrosequencing: sampling the transcriptome of the apple maggot Rhagoletis pomonella. BMC Genomics. 2009, 10 (1): 633-10.1186/1471-2164-10-633.PubMed CentralPubMedView Article
- Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, Cresko WA: Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 2010, 6 (2): e1000862-10.1371/journal.pgen.1000862.PubMed CentralPubMedView Article
- Galindo J, Grahame JW, Butlin RK: An EST-based genome scan using 454 sequencing in the marine snail Littorina saxatilis. J Evol Biol. 2010, 23 (9): 2004-2016. 10.1111/j.1420-9101.2010.02071.x.PubMedView Article
- Barakat A, DiLoreto DS, Zhang Y, Smith C, Baier K, Powell WA, Wheeler N, Sederoff R, Carlson JE: Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection. BMC Plant Biol. 2009, 9 (1): 51-10.1186/1471-2229-9-51.PubMed CentralPubMedView Article
- Hecht BC, Thrower FP, Hale MC, Miller MR, Nichols KM: Genetic architecture of migration-related traits in rainbow and steelhead trout, Oncorhynchus mykiss. G3: Genes|Genomes|Genetics. 2012, 2 (9): 1113-1127.PubMed CentralPubMedView Article
- Bruneaux M, Johnston SE, Herczeg G, Merilä J, Primmer CR, Vasemägi A: Molecular evolutionary and population genomic analysis of the nine-spined stickleback using a modified restriction-site-associated DNA tag approach. Mol Ecol. 2013, 22 (3): 565-582. 10.1111/j.1365-294X.2012.05749.x.PubMedView Article
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralPubMedView Article
- Werner T: Next generation sequencing in functional genomics. Brief Bioinform. 2010, 11 (5): 499-511. 10.1093/bib/bbq018.PubMedView Article
- Huss M: Introduction into the analysis of high-throughput-sequencing based epigenome data. Brief Bioinform. 2010, 11 (5): 512-523. 10.1093/bib/bbq014.PubMedView Article
- Pepke S, Wold B, Mortazavi A: Computation for ChIP-seq and RNA-seq studies. Nat Methods. 2009, 6 (11s): S22-S32. 10.1038/nmeth.1371.PubMed CentralPubMedView Article
- Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, Zhang Z, Zhang Y, Wang W, Li J, Wei F, Li H, Jian M, Li J, Zhang Z, Nielsen R, Li D, Gu W, Yang Z, Xuan Z, Ryder OA, Leung FC-C, Zhou Y, Cao J, Sun X, Fu Y, et al: The sequence and de novo assembly of the giant panda genome. Nature. 2010, 463 (7279): 311-317. 10.1038/nature08696.PubMed CentralPubMedView Article
- Star B, Nederbragt AJ, Jentoft S, Grimholt U, Malmstrom M, Gregers TF, Rounge TB, Paulsen J, Solbakken MH, Sharma A, Wetten OF, Lanzen A, Winer R, Knight J, Vogel JH, Aken B, Andersen O, Lagesen K, Tooming-Klunderud A, Edvardsen RB, Tina KG, Espelund M, Nepal C, Previti C, Karlsen BO, Moum T, Skage M, Berg PR, Gjoen T, Kuhl H, et al: The genome sequence of Atlantic cod reveals a unique immune system. Nature. 2011, 477 (7363): 207-210. 10.1038/nature10342.PubMed CentralPubMedView Article
- Kim EB, Fang X, Fushan AA, Huang Z, Lobanov AV, Han L, Marino SM, Sun X, Turanov AA, Yang P, Yim SH, Zhao X, Kasaikina MV, Stoletzki N, Peng C, Polak P, Xiong Z, Kiezun A, Zhu Y, Chen Y, Kryukov GV, Zhang Q, Peshkin L, Yang L, Bronson RT, Buffenstein R, Wang B, Han C, Li Q, Chen L, et al: Genome sequencing reveals insights into physiology and longevity of the naked mole rat. Nature. 2011, 479 (7372): 223-227. 10.1038/nature10533.PubMed CentralPubMedView Article
- Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, et al: Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007, 316 (5822): 222-234.PubMedView Article
- Miller W, Hayes VM, Ratan A, Petersen DC, Wittekindt NE, Miller J, Walenz B, Knight J, Qi J, Zhao F, Wang Q, Bedoya-Reina OC, Katiyar N, Tomsho LP, Kasson LM, Hardie R-A, Woodbridge P, Tindall EA, Bertelsen MF, Dixon D, Pyecroft S, Helgen KM, Lesk AM, Pringle TH, Patterson N, Zhang Y, Kreiss A, Woods GM, Jones ME, Schuster SC: Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). Proc Natl Acad Sci U S A. 2011, 108 (30): 12348-12353. 10.1073/pnas.1102838108.PubMed CentralPubMedView Article
- Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED, Phillippy AM: Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012, 30 (7): 693-700. 10.1038/nbt.2280.PubMed CentralPubMedView Article
- Oleksyk T, Pombert J-F, Siu D, Mazo-Vargas A, Ramos B, Guiblet W, Afanador Y, Ruiz-Rodriguez C, Nickerson M, Logue D, Dean M, Figueroa L, Valentin R, Martinez-Cruzado J-C: A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education. GigaScience. 2012, 1 (1): 14-10.1186/2047-217X-1-14.PubMed CentralPubMedView Article
- Dasmahapatra KK, Walters JR, Briscoe AD, Davey JW, Whibley A, Nadeau NJ, Zimin AV, Hughes DST, Ferguson LC, Martin SH, Salazar C, Lewis JJ, Adler S, Ahn SJ, Baker DA, Baxter SW, Chamberlain NL, Chauhan R, Counterman BA, Dalmay T, Gilbert LE, Gordon K, Heckel DG, Hines HM, Hoff KJ, Holland PWH, Jacquin-Joly E, Jiggins FM, Jones RT, Kapan DD, et al: Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 2012, 487 (7405): 94-98.PubMed Central
- Perry GH, Reeves D, Melsted P, Ratan A, Miller W, Michelini K, Louis EE, Pritchard JK, Mason CE, Gilad Y: A genome sequence resource for the Aye-aye (Daubentonia madagascariensis), a nocturnal lemur from Madagascar. Genome Biol Evol. 2012, 4 (2): 126-135. 10.1093/gbe/evr132.PubMed CentralPubMedView Article
- Ellegren H, Smeds L, Burri R, Olason PI, Backström N, Kawakami T, Künstner A, Mäkinen H, Nadachowska-Brzyska K, Qvarnström A, Uebbing S, Wolf JBW: The genomic landscape of species divergence in Ficedula flycatchers. Nature. 2012, 491: 756-760.PubMed
- Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, Ward LD, Lowe CB, Holloway AK, Clamp M, Gnerre S, Alfoldi J, Beal K, Chang J, Clawson H, Cuff J, Di Palma F, Fitzgerald S, Flicek P, Guttman M, Hubisz MJ, Jaffe DB, Jungreis I, Kent WJ, Kostka D, Lara M, et al: A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011, 478: 476-482. 10.1038/nature10530.PubMed CentralPubMedView Article
- Kohn MH, Murphy WJ, Ostrander EA, Wayne RK: Genomics and conservation genetics. Trends Ecol Evol. 2006, 21 (11): 629-637. 10.1016/j.tree.2006.08.001.PubMedView Article
- Dawson DA, Horsburgh GJ, Küpper C, Stewart IRK, Ball AD, Durrant KL, Hansson B, Bacon IDA, Bird S, Klein Á, Krupa AP, Lee J-W, Martín-Gálvez D, Simeoni M, Smith G, Spurgin LG, Burke T: New methods to identify conserved microsatellite loci and develop primer sets of high cross-species utility – as demonstrated for birds. Mol Ecol Resour. 2010, 10 (3): 475-494. 10.1111/j.1755-0998.2009.02775.x.PubMedView Article
- Miller JM, Kijas JW, Heaton MP, McEwan JC, Coltman DW: Consistent divergence times and allele sharing measured from cross-species application of SNP chips developed for three domestic species. Mol Ecol Resour. 2012, 12 (6): 1145-1150. 10.1111/1755-0998.12017.PubMedView Article
- Bar-Or C, Czosnek H, Koltai H: Cross-species microarray hybridizations: a developing tool for studying species diversity. Trends Genet. 2007, 23 (4): 200-207. 10.1016/j.tig.2007.02.003.PubMedView Article
- Cosart T, Beja-Pereira A, Chen S, Ng S, Shendure J, Luikart G: Exome-wide DNA capture and next generation sequencing in domestic and wild species. BMC Genomics. 2011, 12 (1): 347-10.1186/1471-2164-12-347.PubMed CentralPubMedView Article
- Gnerre S, Lander E, Lindblad-Toh K, Jaffe D: Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biol. 2009, 10 (8): R88-10.1186/gb-2009-10-8-r88.PubMed CentralPubMedView Article
- Schneeberger K, Ossowski S, Ott F, Klein JD, Wang X, Lanz C, Smith LM, Cao J, Fitz J, Warthmann N, Henz SR, Huson DH, Weigel D: Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc Natl Acad Sci U S A. 2011, 108 (25): 10249-10254. 10.1073/pnas.1107739108.PubMed CentralPubMedView Article
- Crowe TM, Bowie RCK, Bloomer P, Mandiwana TG, Hedderson TAJ, Randi E, Pereira SL, Wakeling J: Phylogenetics, biogeography and classification of, and character evolution in, gamebirds (Aves: Galliformes): effects of character exclusion, data partitioning and missing data. Cladistics. 2006, 22 (6): 495-532. 10.1111/j.1096-0031.2006.00120.x.View Article
- Pereira SL, Baker AJ: A molecular timescale for galliform birds accounting for uncertainty in time estimates and heterogeneity of rates of DNA substitutions across lineages and sites. Mol Phylogenet Evol. 2006, 38 (2): 499-509. 10.1016/j.ympev.2005.07.007.PubMedView Article
- Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, Bork P, Burt DW, Groenen MAM, Delany ME, Dodgson JB, Chinwalla AT, Cliften PF, Clifton SW, Delehaunty KD, Fronick C, Fulton RS, Graves TA, Kremitzki C, Layman D, Magrini V, McPherson JD, Miner TL, Minx P, Nash WE, Nhan MN, Nelson JO, Oddy LG, Pohl CS, Randall-Maher J, et al: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432 (7018): 695-716. 10.1038/nature03154.View Article
- Höglund J, Piertney SB, Alatalo RV, Lindell J, Lundberg A, Rintamäki PT: Inbreeding depression and male fitness in black grouse. Proc R Soc Lond B Biol Sci. 2002, 269: 711-715. 10.1098/rspb.2001.1937.View Article
- Alatalo RV, Höglund J, Lundberg A: Lekking in the black grouse- a test of male viability. Nature. 1991, 352 (6331): 155-156. 10.1038/352155a0.View Article
- Höglund J: Evolutionary Conservation Genetics. 2009, Oxford: Oxford University PressView Article
- Wang B, Ekblom R, Castoe TA, Jones EP, Kozma R, Bongcam-Rudloff E, Pollock DD, Höglund J: Transcriptome sequencing of black grouse (Tetrao tetrix) for immune gene discovery and microsatellite development. Open Biology. 2012, 2 (4): 120054-10.1098/rsob.120054.PubMed CentralPubMedView Article
- Wang B, Ekblom R, Strand TM, Portela-Bens S, Höglund J: Sequencing of the core MHC region of black grouse (Tetrao tetrix) and comparative genomics of the galliform MHC. BMC Genomics. 2012, 13: 553-10.1186/1471-2164-13-553.PubMed CentralPubMedView Article
- Martin JA, Wang Z: Next-generation transcriptome assembly. Nat Rev Genet. 2011, 12 (10): 671-682. 10.1038/nrg3068.PubMedView Article
- Kim J, Larkin DM, Cai QL, Asan , Zhang YF, Ge RL, Auvil L, Capitanu B, Zhang GJ, Lewin HA, Ma J: Reference-assisted chromosome assembly. Proc Natl Acad Sci U S A. 2013, 110 (5): 1785-1790. 10.1073/pnas.1220349110.PubMed CentralPubMedView Article
- Cerdeira LT, Pinto AC, Schneider MPC, de Almeida SS, dos Santos AR, Barbosa EGV, Ali A, Barbosa MS, Carneiro AR, Ramos RTJ, de Oliveira RS, Barh D, Barve N, Zambare V, Belchior SE, Guimaraes LC, Soares SD, Dorella FA, Rocha FS, de Abreu VAC, Tauch A, Trost E, Miyoshi A, Azevedo V, Silva A: Whole-genome sequence of corynebacterium pseudotuberculosis PAT10 strain isolated from sheep in Patagonia, Argentina. J Bacteriol. 2011, 193 (22): 6420-6421. 10.1128/JB.06044-11.PubMed CentralPubMedView Article
- Umemura M, Koyama Y, Takeda I, Hagiwara H, Ikegami T, Koike H, Machida M: Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on aspergillus oryzae RIB40. Plos One. 2013, 8: 5-
- Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol. 2008, 26 (10): 1135-1145. 10.1038/nbt1486.PubMedView Article
- Li S, Wang C, Yu W, Zhao S, Gong Y: Identification of genes related to white and black plumage formation by RNA-Seq from white and black feather bulbs in ducks. PLoS ONE. 2012, 7 (5): e36592-10.1371/journal.pone.0036592.PubMed CentralPubMedView Article
- Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Kunstner A, Searle S, White S, Vilella AJ, Fairley S, Heger A, Kong LS, Ponting CP, Jarvis ED, Mello CV, Minx P, Lovell P, Velho TAF, Ferris M, Balakrishnan CN, Sinha S, Blatti C, London SE, Li Y, Lin YC, George J, Sweedler J, Southey B, Gunaratne P, Watson M, et al: The genome of a songbird. Nature. 2010, 464 (7289): 757-762. 10.1038/nature08819.PubMed CentralPubMedView Article
- Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg L, Bouffard P, Burt DW, Crasta O, Crooijmans RPMA, Cooper K, Coulombe RA, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MAM, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, et al: Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 2010, 8: 9-View Article
- Haldane JBS: A mathematical theory of natural and artificial selection Part I. Proc Camb Philos Soc. 1924, 23: 19-41.
- Hogner S, Sæther SA, Borge T, Bruvik T, Johnsen A, Sætre G-P: Increased divergence but reduced variation on the Z chromosome relative to autosomes in Ficedula flycatchers: differential introgression or the faster-Z effect?. Ecol Evol. 2012, 2 (2): 379-396. 10.1002/ece3.92.PubMed CentralPubMedView Article
- Begun DJ, Whitley P: Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc Natl Acad Sci U S A. 2000, 97 (11): 5960-5965. 10.1073/pnas.97.11.5960.PubMed CentralPubMedView Article
- Borge T, Webster MT, Andersson G, Saetre GP: Contrasting patterns of polymorphism and divergence on the Z chromosome and autosomes in two Ficedula flycatcher species. Genetics. 2005, 171 (4): 1861-1873. 10.1534/genetics.105.045120.PubMed CentralPubMedView Article
- Axelsson E, Webster MT, Smith NGC, Burt DW, Ellegren H: Comparison of the chicken and turkey genomes reveals a higher rate of nucleotide divergence on microchromosomes than macrochromosomes. Genome Res. 2005, 15 (1): 120-125. 10.1101/gr.3021305.PubMed CentralPubMedView Article
- Axelsson E, Smith NGC, Sundstrom H, Berlin S, Ellegren H: Male-biased mutation rate and divergence in autosomal, Z-linked and W-linked introns of chicken and turkey. Mol Biol Evol. 2004, 21 (8): 1538-1547. 10.1093/molbev/msh157.PubMedView Article
- Charlesworth B, Coyne JA, Barton NH: The relative rates of evolution of sex-chromosomes and autosomes. Am Nat. 1987, 130 (1): 113-146. 10.1086/284701.View Article
- Mank JE, Nam K, Ellegren H: Faster-Z evolution is predominantly due to genetic drift. Mol Biol Evol. 2010, 27 (3): 661-670. 10.1093/molbev/msp282.PubMedView Article
- Mank JE, Vicoso B, Berlin S, Charlesworth B: Effective population size and the faster-X effect: empirical results and their interpretation. Evolution. 2010, 64 (3): 663-674. 10.1111/j.1558-5646.2009.00853.x.PubMedView Article
- Li RQ, Zhu HM, Ruan J, Qian WB, Fang XD, Shi ZB, Li YR, Li ST, Shan G, Kristiansen K, Li SG, Yang HM, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 2010, 20 (2): 265-272. 10.1101/gr.097261.109.PubMed CentralPubMedView Article
- Karolchik D, Hinrichs AS, Kent WJ: The UCSC genome browser. Current Protocols in Bioinformatics. 2009, Chapter 1:Unit1 4
- Li H, Durbin R: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010, 26 (5): 589-595. 10.1093/bioinformatics/btp698.PubMed CentralPubMedView Article
- Kalbfleisch T, Heaton M: Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes. F1000Research. 2013, 2: 244-PubMed CentralPubMed
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralPubMedView Article
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Proc GPD: The sequence alignment/map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralPubMedView Article
- Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralPubMedView Article
- Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, et al: The Ensembl genome database project. Nucleic Acids Res. 2002, 30 (1): 38-41. 10.1093/nar/30.1.38.PubMed CentralPubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.