- Research article
- Open Access
Insights into the evolution of Darwin’s finches from comparative analysis of the Geospiza magnirostris genome sequence
- Chris M Rands1,
- Aaron Darling†2,
- Matthew Fujita†1, 3,
- Lesheng Kong†1,
- Matthew T Webster†4,
- Céline Clabaut2,
- Richard D Emes5, 6,
- Andreas Heger1,
- Stephen Meader1,
- Michael Brent Hawkins3,
- Michael B Eisen7, 8,
- Clotilde Teiling9,
- Jason Affourtit9, 10,
- Benjamin Boese9,
- Peter R Grant11,
- Barbara Rosemary Grant11,
- Jonathan A Eisen2, 12, 13Email author,
- Arhat Abzhanov3Email author and
- Chris P Ponting1Email author
© Rands et al.; licensee BioMed Central Ltd. 2013
Received: 19 October 2012
Accepted: 23 January 2013
Published: 12 February 2013
A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin’s (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2–3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris.
13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin’s finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins.
These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin’s finches.
"“ The most curious fact is the perfect gradation in the size of the beaks in the different species of Geospiza, from one as large as that of a hawfinch to that of a chaffinch, and… even to that of a warbler… Seeing this gradation and diversity of structure in one small, intimately related group of birds, one might really fancy that from an original paucity of birds in this archipelago, one species had been taken and modified for different ends .”"
Charles R. Darwin, The Voyage of the Beagle 
Since their collection by Charles Darwin and fellow members of the HMS Beagle expedition from the Galápagos Islands in 1835 and their introduction to science, these birds have been subjected to intense research. Many biology textbooks use Darwin’s finches (formerly known as Galápagos finches) to illustrate a variety of topics in evolutionary theory, including speciation, natural selection, and niche partitioning [2–4]. Darwin’s finches continue to be a very valuable source of biological discovery. Several unique characteristics of this clade have allowed multiple important recent breakthroughs in our understanding of changes in island biodiversity, mechanisms of repeated speciation coupled with ecological diversification, evolution of cognitive behaviours, principles of beak/jaw biomechanics as well as the underlying developmental genetic mechanisms in generating morphological diversity [5, 6].
Beak size and shape, as well as body size, are the principal phenotypic traits that have diversified in Darwin’s finches . The most studied group within the Darwin’s finches is the monophyletic genus Geospiza, which includes three distinct bill shapes: the basal sharp-billed finch G. difficilis has a small and symmetrical beak used to feed on a mixed diet of insects and seeds; cactus finches G. scandens and G. conirostris feature an elongated and pointed bill suitable for probing cactus flowers and fruit; and ground finches possess deep and broad bills adapted for cracking seeds . Among the ground finches, which include small, medium and large species, the large ground finch G. magnirostris has the most modified beak that it uses to crack (and then consume) large and hard seeds (Figure 1). Importantly, beak shapes develop during early embryogenesis and finch hatchlings show species-specific features. Recent molecular analysis has shown that the ground finch bill morphology correlates with a developmentally earlier and broader gene expression of Bone morphogenetic protein 4 (Bmp4), especially in the large ground finch. Functional experiments mimicking such changes in Bmp4 expression using laboratory chicken embryos are consistent with its role in this Geospiza beak trait . Similar experiments elucidated the roles of three further developmental factors, Transforming Growth Factor beta Receptor Type II (TGFβRII), beta-Catenin (βCat) and Dickkopf-3 (Dkk3), at later stages of beak development that help in forming the bill shapes that are unique to ground finches . Other analyses revealed an important role of change in Calmodulin (CaM) expression pattern for the development of elongated bills of cactus finches .
In 2008 we initiated a project to sequence the genomes of some of the Darwin’s finches (Additional file 1). In particular, we were motivated to perform a whole genome analysis of the large ground finch G. magnirostris because of the evolutionary importance of the entire clade of Darwin’s finches to the fields of ecology and evolutionary biology, the potential of genomic analysis for uncovering the genetic basis of key phenotypic traits and the scarcity of genomic studies of birds (especially when compared to mammals). The species was chosen because it arose relatively recently and it has one of the most adapted and distinctive bill shapes. The embryonic individual chosen for genome sequencing was sampled from a population from the small and well isolated island of Genovesa which exhibit the largest bills of all existing Darwin’s finches, with an estimated effective population size of 75–150 individuals .
The field of evolutionary and comparative genomics will benefit more broadly from analyzing an additional species of passerine. G. magnirostris diverged from the first sequenced passerine, the zebra finch (Taeniopygia guttata) , approximately 25 my ago , which is comparable to the divergence time separating mouse and rat . The G. magnirostris genome assembly has not been assembled into chromosomes or long contigs so we cannot investigate whether this interval of time has seen radical changes in its karyotype; however, such changes are unlikely since avian karyotypes typically are stable [18, 19]. Nevertheless, we can investigate a variety of other evolutionary processes, such as whether episodes of positive selection have occurred along the G. magnirostris terminal lineage and whether there has been rapid gains and losses (evolutionary ‘turnover’ ) of functional sequence across the avian clade.
The genome assembly and analysis presented here should permit population genetics approaches to be applied to Darwin’s finch species and subpopulations in order to identify the genetic basis of their recent adaptations.
Results and discussion
A G. magnirostrisgenome assembly
A DNA sample was taken from a G. magnirostris individual embryo collected during a field trip to the island of Genovesa (Galápagos) in 2009. Sequencing was performed using the Roche 454 technology with both long read and mate-pairs libraries, and then assembled using Roche’s algorithm Newbler, as described in the Materials and Methods. The resulting assembly contains 991.0 Mbp across 12,958 scaffolds with a scaffold N50 of 382kbp and a median read coverage of 6.5-fold.
Completeness of the G. magnirostris genome assembly was estimated using two approaches. First, we determined the amount of euchromatic sequence that aligns between zebra finch and chicken, but that does not align to G. magnirostris. Since chicken is an outgroup to both zebra finch and G. magnirostris, we can assume that most sequence present in both the zebra finch and chicken genome assemblies will also be present in the G. magnirostris assembly, with rare exceptions where lineage-specific deletions have occurred along the Darwin's finch lineage. Thus, the 122 Mb of chicken sequence aligned to zebra finch that is absent from the G. magnirostris assembly provides an estimate of the G. magnirostris euchromatic genome assembly’s incompleteness. Second, the assembly consists of approximately 7.529 Gb of sequence data, and the depth of coverage for reads on assembled contigs peaks at 6.0. Consequently, under a simplifying assumption that all regions of the genome are equally represented in libraries and among successful sequencing runs, an estimate of the true genome size is 7.529/6.0 or 1.25 Gb. In summary, the G. magnirostris genome assembly is estimated to cover approximately 89% of the euchromatic genome or approximately 76% of the complete genome. The estimated 1.250 Gb size of the G. magnirostris genome is similar to the mean avian genome size (1.38Gb, . Animal Genome Size Database. http://www.genomesize.com).
We expect this G. magnirostris genome assembly to be most incomplete within highly repetitive sequence. Use of either a library of transposable element sequences constructed from the G. magnirostris genome (using RepeatScout ) or a zebra finch repeat library resulted in the identification of 3.3% or 4.1% of the assembly as being repetitive, respectively. This proportion is over two-fold lower than observed for zebra finch or chicken genomes [15, 23], and it is clear that there is a deficit of closely-related transposable elements present in the G. magnirostris assembly (Additional file 2). Highly repetitive sequence in the G. magnirostris genome is thus likely to be disproportionately missing from the assembly.
Assembly sequence quality was assessed first by examining whether GT-AG dinucleotide splice sites in 6,188 chicken genes, each with a single orthologue in zebra finch and G. magnirostris, exhibited apparently substituted nucleotides in aligned G. magnirostris sequence. 515 of 168,849 (0.31%) of these nucleotides showed sequence changes, providing an estimate of the assembly’s nucleotide substitution errors. Although this is higher than error rates inferred in other sequenced avian genomes, such as the 0.05% rate estimated for zebra finch , it is likely to overestimate the true error rate, because some substitutions will reflect mis-alignments or genuine point mutations. In a second approach, we counted the number of insertions or deletions (‘indels’) that are present in the three-way alignment of zebrafinch with G. magnirostris and a G. fortis sequence that was recently released (GenBank entry: AKZB00000000.1 ). If one conservatively assumes that there have been no G. magnirostris lineage-specific indels then the upper-bound estimate for the indel error is 1.98 indels per Kb of aligned sequence. These errors will have led to a lowering of the number of protein-coding gene models that we predict for G. magnirostris.
Amount of sequence aligning between G. magnirostris and genome assemblies from other avian species
Genome Size (Mb)
Aligning sequence (Mb)
Percentage of theG. magnirostrisgenome aligning (%)
G. gallus – G. magnirostris
M. gallopavo – G. magnirostris
T. guttata – G. magnirostris
The G. magnirostris genome assembly has a G+C proportion of 40.08%, which is similar to all other evaluated amniote genomes. Medium-sized scaffolds (sizes between 2398 bp and 46677 bp) were more G+C-rich (44.6%) than small or large scaffolds (41.2% and 39.8%, respectively). Visual inspection of the G. magnirostris genome reveals that it exhibits substantial spatial heterogeneity in its base composition; similarly to all other amniotic genomes, but unlike that of the Anolis lizard , genic G+C content of genomic regions has remained relatively constant (Additional file 3).
Neutral indel model analysis
The Neutral Indel Model (NIM) of Lunter et al. provides an estimate of the amount of sequence that has been functionally constrained in one or both members of a species pair since their last common ancestor. The method takes advantage of an expectation that autosomal indels in a genome-wide pairwise sequence alignment occur randomly once account has been taken of fluctuations in G+C content. Where their density is relatively low it is assumed that there is a greater likelihood that additional insertion or deletion variants have been preferentially purged in functional sequence. The NIM first constructs histograms of the lengths of inter-gap segments (IGSs; defined as ungapped segments of aligned sequence between a species pair) from whole genome pairwise alignments, and then measures the departure of the observed IGS frequency distribution from the random distribution expected under neutral evolution. The excess of long IGSs compared to the neutral expectation allows the quantity of constrained, indel-purified, sequence shared between the two species to be inferred.
G. magnirostrispredicted genes and orthologues
We predicted 13,291 protein-coding genes in the G. magnirostris genome assembly. To do so we aligned protein-coding sequences from three amniote species, human, chicken, and zebra finch, to the G. magnirostris genome assembly, and reconciled overlapping transcript predictions using the Gpipe pipeline . To analyse the evolution of G. magnirostris protein-coding genes, the orthologues and paralogues among G. magnirostris and seven other Euteleostomi (human, mouse, chicken, turkey, zebra finch, Anolis lizard and tetraodon) were assigned using the OPTIC pipeline [30, 31]. We then produced a high quality set of 1,452 simple orthologue sets (genes that have been spared from duplication or deletion in the bird, reptile and mammalian lineages since their last common ancestor) among the seven amniote species. These 1,452 gene sets represent a stringent set of evolutionarily conserved “core” protein-coding genes in vertebrates.
Examining the completeness of these gene sets, we noted that there were 10,222 simple 1:1 orthologue sets between human and zebra finch, while there were only 7,416 simple 1:1 orthologue sets between human and G. magnirostris. The smaller gene orthologue set between human and G. magnirostris could imply that 27% of genes are missing from the gene set, and thus the gene set could be 73% complete. A similar proportion (71%) of 1,109 metazoan single copy orthologues curated by Creevey et al. have orthologues among our predicted G. magnirostris genes. Our approaches ensure that each gene in these orthologue sets has at least one transcript that covers at least 80% of the human, chicken or zebrafinch template transcript. We note that these gene set completeness estimates are lower-bound estimates for assembly completeness since this orthology analysis will exclude some partially, imperfectly or fragmentary predicted G. magnirostris gene models.
Evolutionary rate analysis
The lineage-specific median d N /d S value is slightly smaller for Geospiza than it is for zebra finch (Figure 3). Smaller d N /d S values are expected for lineages with larger effective population sizes N e , which implies that since the last common ancestor of zebra finch and G. magnirostris historic N e values have been high, far higher than the very low N e values of 38–60 of extant Geospiza species  and closer to the current effective population size of zebra finch (25,000 – 7,000,000) .
Positively selected genes along the Darwin’s finch lineage
Short gene name
Ensembl gene ID of chicken 1:1 ortholog
P-value that gene is under positive selection
Number of codon sites inferred to be under positive selection with p<0.1
Two G. magnirostris PSGs are of particular note: POU1F1 (POU domain, class 1, transcription factor 1; also known as Pit1, growth hormone factor 1) and IGF2R (insulin-like growth factor 2 receptor). These genes’ putatively adaptive amino acid substitutions were confirmed using sequence data from G. fortis (medium ground finch)  and from G. difficilis (sharp-beaked ground finch) (Figure 4d, e). Disruption of either gene in the mouse is known to result in craniofacial abnormalities [49, 50] and POU1F1, despite its description as a pituitary-specific transcription factor in mammals , is differentially expressed in the developing beaks of ducks, quails and chickens . There is a functional link between these two genes since POU1F1 regulates prolactin and growth hormone genes in mammals and birds , and decreased growth hormone results in a decrease in activity of the insulin/IGF-1 signalling pathway . In mouse bone, growth hormone is known to regulate many genes of the insulin/IGF-1 or Wnt signaling pathways, as well as Bmp4 whose gene expression change is linked to bill morphology in G. magnirostris. Moreover, a key member of the IGF pathway (IGF binding protein, a molecule that controls ligand-receptor interaction) was identified in Darwin’s finches as one of the top differentially expressed candidate genes in a microarray screen in species with divergent beak shapes . Positive selection acting on POU1F1 and IGF2R may thus have contributed to the evolution of beak morphology in the G. magnirostris lineage. Experiments that misexpress POU1F1 or IGF2R variants during avian craniofacial development will be required to further investigate this hypothesis.
We also predicted 47 genes to have been under positive selection on the passerine branch prior to the split of the zebra finch and G. magnirostris lineages (Additional file 5). Performing an enrichment analysis to test whether any Gene Ontology (GO) terms  were overrepresented among genes with positively selected sites along the passerine branch identified ‘cilium’ (GO:0005929) as the most significantly enriched term (p = 8.1×10-20; Additional file 6). This term is annotated to three passerine PSGs: coiled-coil domain containing 40 (CCDC40), axonemal dynein intermediate chain 2 (DNAI2), and cytoplasmic dynein 2 light intermediate chain 1 (DYNC2LI1). DNAI2 protein is a component of respiratory ciliary axonemes and sperm flagella, and human DNAI2 mutations are associated with respiratory tract dysfunction and infertility . DYNC2LI1 is present in the mammalian ciliary axoneme . Two further passerine PSGs, namely coiled-coil domain containing 147 (CCDC147) and its paralogous gene, coiled-coil domain containing 146 (CCDC146), are likely to possess functions related to cilia and spermatazoan flagella (see below), although this is not reflected in current GO annotations.
This first genome sequence of a Darwin’s finch has utility beyond the purview of Darwin’s finch biology. Avian species are currently under-sampled as a taxonomic group compared with mammals. Moreover, the passerine order contains over half of all bird species, which equates to approximately 5,000 identified species, almost as many as the total number of mammalian species [64, 65]. However, passerines were only represented previously by the genomes of the zebra finch  and the flycatcher , and our range of genome-scale resources should now facilitate further research into the evolution of this unusual group of passerine birds. Our identification of positively selected genes on the passerine branch not mentioned in previous studies that used only the zebra finch genome sequence [15, 44] demonstrates the extra power this additional passerine sequence provides for investigating wider avian biology.
In addition to providing the G. magnirostris reads (SRA is SRA061447) and the genome assembly (BioProject accession PRJNA178982), we are now providing gene predictions, orthology relationships, and gene phylogenies generated by this project to browse and to download from http://genserv.anat.ox.ac.uk/clades/vertebrates_geospiza_v3. High quality multiple sequence alignments and regions predicted to have been subject to indel-purifying selection have also been made available from http://wwwfgu.anat.ox.ac.uk/~chrisr/Gmag_data/. Whilst the G. magnirostris genome assembly remains incomplete, like many vertebrate genome sequences, it should be finished to high quality once the cost of high quality sequencing is sufficiently reduced. Despite, its draft status the genome assembly provides an important foundation for genetic studies of single genes, and for population genomic studies of most of the genes not just for G. magnirostris but also for all other, closely-related, Darwin’s finches. These population approaches should assist in providing an accurate and detailed picture of the demography and phylogenetic history of these finches before and since they arrived on the Galápagos islands approximately 2–3 my ago . Considering the rapid and dramatic morphological and ecological evolution of Darwin’s finches, the comparative study of their genomes will provide valuable insights for speciation genomics, an emerging field of genomics studying genomic-level alterations that accompany processes of divergence and speciation in natural populations .
DNA samples were taken from individual late stage embryos representing three species of Darwin’s finches (G. magnirostris, G. conirostris and G. difficilis) collected during a field trip to the island of Genovesa (Galápagos) in 2009. The embryonic trunk tissue was preserved in RNAlater solution (Ambion) and treated as fresh tissue with a commercial genomic DNA preparation kit (QIAGEN Genomic DNA Purification Kit). The quality of the obtained gDNA was checked with a NanoDrop Spectrophotometer (ThermoScientific) and Agilent 2100 Bioanalyzer.
Library construction and sequencing
DNA library construction and sequencing was done at 454-Corporation under the coordination of Timothy Harkins, Jason Affourtit, Clotilde Teiling and Benjamin Boese. DNA libraries were constructed using standard techniques for Roche-454 sequencing. In summary: 3 μg of purified genomic DNA was fractionated into fragments of the targeted size ranges; short adaptors were ligated to each fragments; single stranded fragments were created and immobilized onto specifically designed DNA capture beads; the bead-bound library was emulsified with amplification reagents in water in oil mixture resulting in microreactors containing just (ideally) one bead with one unique sample-library fragment; emulsion beads were submitted to PCR amplification; the emulsion mixture was then broken while the amplified fragments remained bound to their beads; and the DNA-carrying capture beads were loaded onto a PicoTiterPlate device for sequencing. The device was then loaded into the Genome Sequencer system where individual nucleotides are flowed in a fixed order across the open wells and DNA capture beads; complementary nucleotides to the template strand results in a chemiluminescent signal recorded by the CCD camera of the instrument. Roche-454 software was then used to determine the sequence of ~900,000 reads per instrument run – this is done by analyzing a combination of signal intensity and positional information generated across the PicoTiterPlate device.
In total twenty-eight long read runs, six runs on 2.5 kbp mate-pair libraries, and six runs on 5 kbp mate-pair libraries were generated. Both Titanium and Titanium XL chemistries were used. Mate-pair libraries in each size range were constructed multiple times, yielding six mate-pair libraries of approximately 5 kbp insert size and an additional five libraries at about 2.5 kbp. Further details on the sequencing data are provided in Additional file 7.
Data were obtained from the following 454 runs: 28 small insert “fragment” runs, 6 mate-pair runs covering 5 different ‘3 kbp’ libraries (mean insert size: 2.5 kbp ± 620nt) and 3 mate-pair runs covering 6 ‘8 Kbp’ libraries (mean insert size: 4.9 kbp ± 1.2 kbp). Pyrosequencing reads in SFF format were assembled by the Newbler software version 2.3 using the vendor recommended protocol. Briefly, contigs were generated using the long read data, and mate-pair reads were mapped to the contigs and used to link contigs into scaffolds. In total, 24.4 million reads comprising 7.0 Gbp were used to form contigs and an additional 4.1 million read pairs were used for scaffolding.
The resulting assembly contains 12,958 scaffolds in an estimated genome size of 1254.6Mbp, with a scaffold N50 of 382kbp. The scaffolds comprise 394409 contigs spanning 958.3 Mbp. The coverage distribution has a median at 6.5-fold with a long tail to higher values, which further suggests that some repeat regions may not be fully resolved.
Whole genome alignments
Chicken and zebra finch genome assemblies (galGal3 and taeGut1 assembly versions) were obtained from UCSC Genome Informatics at http://genome.ucsc.edu (Santa Cruz). The Turkey_2.01 assembly (September 2010) was acquired from Ensembl release 61 at http://www.ensembl.org. LASTZ, available from http://www.bx.psu.edu/miller_lab/, was used to construct the whole genome pairwise alignments, which were subsequently chained and netted using various UCSC utilities .
The target genome sequences (chicken, turkey, or zebra finch) when not placed on specific chromosomes were discounted when calculating amounts of aligning sequence; such amounts are thus likely to be conservative estimates. These unplaced sequences were ignored because some sequence in the zebra finch genome assembly is artificially present in two copies, both in assembled chromosomes and in sequence not placed on chromosomes.
Using MULTIZ , we combined our zebra finch – G. magnirostris whole genome alignments with whole genome alignments between zebra finch – G. fortis obtained from UCSC. This resulted in the generation of multiple sequence alignments across zebra finch, G. magnirostris, and G. fortis.
Neutral indel model
The Neutral Indel Model (NIM) quantifies the amount of indel-purified sequence (IPS) shared between a species pair. The NIM uses whole genome pairwise alignments to identify inter-gap segments (IGSs) across the genome, and then compares the true distribution of IGSs to the expected neutral geometric distribution that is extrapolated from the distribution of short IGSs inferred from the alignment which are considered to be free from selective constraint. The excess of long IGSs over the neutral expectation are indicative of IPSs containing functional elements. The amount of functional sequence shared between the two species is then estimated by calculating the cumulative lengths of all the IPSs, and then subtracting a correction factor to account for the contribution of neutral sequence to each IPS. Each indel-purified segment is assumed to contain somewhere between K and 2 K bases of neutral sequence depending on the degree of clustering of functional elements, where K is the mean number of bases between indels in neutral sequence, which is simply p-1, where p is the indel mutation rate. The upper and lower bound estimates of the amount of IPS are derived using the K and 2 K corrections respectively. The genome is partitioned during the analysis to account for G+C content and sex chromosome biases. Further details of this approach are provided in .
The estimated quantities of aligning and indel-purified sequence, and the estimated synonymous divergence between the species are shown in Additional file 8. Frequency histograms of the IGS lengths calculated between the different avian species pairs across all GC content bins are displayed in Additional file 9, Additional file 10, and Additional file 11.
Gene predictions and orthologue/paralogue assignment
Gene predictions from the G. magnirostris genome assembly were made by a computational pipeline, Gpipe, using protein-coding genes from human, chicken, zebra finch as templates . Gene sets for all other seven species were downloaded from Ensembl release 61 (February 2011). Orthologues and paralogues were subsequently assigned using OPTIC . This consists of four steps: (1) orthologues are assigned between pairs of genomes using PhyOP  based on a distance metric derived from BLASTP alignments, (2) pairwise orthologues are grouped into clusters, (3) sequences within a cluster are aligned using MUSCLE , and (4) phylogenetic tree topologies are estimated using TreeBeST  with clusters being split into orthologous groups using the pufferfish Tetraodon as the outgroup.
The completeness of these gene sets was examined in two ways. Firstly, the number of simple 1:1 orthologues between human and zebra finch was compared to the number between human and G. magnirostris. Secondly, we calculated the number of genes with orthologues predicted in G. magnirostris from a set of metazoan single copy genes from Creevey et al.. Fifteen of the metazoan single copy genes were excluded from the analysis, since they were retired from the current Ensembl release.
From the OPTIC ortholog sets, a refined ortholog set was constructed of simple 1:1 orthologues shared across human, mouse, chicken, turkey, G. magnirostris, zebra finch, and the Anolis lizard. False positive predictions of positive selection will be more frequent in poorly aligned or sequence error-prone sequence . Multiple sequence alignments (MSAs) of protein-coding sequence were thus very stringently filtered to remove poorly aligning regions using SEG, GBLOCKS, GUIDANCE [74, 75], and further approaches that we describe below. Strict GBLOCKS settings were used (minimum number of sequences for a conserved position=5, minimum number of sequences for a flanking position=6, maximum number of contiguous nonconserved positions=6, minimum length of block=10), only alignment columns with a GUIDANCE score of 1 were kept, and no gaps were allowed. All codons containing a base with a phred quality score of 30 or less, which equates to a 0.1% probability of the base being falsely called, were also excluded. Alignment columns in 15 bp windows were removed when these windows contained greater than 5 substitutions between aligned G. magnirostris and zebra finch. Such runs of substitutions may represent sequence or alignment errors. Further alignment columns that lie within 7 codons of previously filtered sequence were also removed since otherwise such codons are enriched in predicted positively predicted sites. Finally, we discarded all genes whose remaining alignment columns numbered fewer than 10% of their predicted numbers of codons, or were less than 100 codons in length. This procedure resulted in a set of “strict” 1:1 orthologues containing 1,452 genes.
Evolutionary rate analyses
d S , d N , and d N /d S values were inferred from the filtered MSAs by applying the PAML M2a Maximum-likelihood branch model [76, 77]. The branch lengths were then calculated by taking the median values across all genes in the strict orthologue set.
The filtered MSAs and guide trees were also provided as input for the branch-site test for positive selection of Zhang et al.. The test identifies genes with particular codons showing evidence of positive selection by comparing a null model, where d N /d S (ω) is never allowed to exceed 1 (so only negative or neutral evolution is considered), to an alternative model in which some sites on the G. magnirostris lineage are allowed to have ω >1 (implying positive selection) (Figure 4a). The test was run twice, and only cases where the two tests converged to within log-likelihood values at or within 0.01 were taken forward for downstream analysis. Subsequently, a likelihood ratio test (LRT) was used to compare the null and alternative model, and a Chi-squared test applied to compare the significance of the LRT scores. The number of positively selected sites in genes inferred to have evolved under positive selection was estimated using a Bayes Empirical Bayes (BEB) approach .
It has been suggested that the branch site test of Zhang et al. is not statistically robust when the number of substitutions in the MSAs is small . However, this criticism is largely based on the study of Bakewell et al. who apply a branch site model across three very closely related primate species. Additionally, it has been suggested that branch-site methods are susceptible to high false positive error when branches assumed to have d N /d S values less than 1 are in fact evolving rapidly . However, the validity of these criticisms has been challenged [81–83]. The application of the test here across seven diverse amniotes should be robust, since the large number of species, considerable divergence between many species pairs, and the fact that only filtered sequences greater than 100 codons long were tested, mean that there are relatively large numbers of substitutions in each alignment.
Gene Ontology (GO) annotations for chicken genes were downloaded from http://www.geneontology.org/. GO terms were interpolated to ensure that for each GO term assigned to a gene, all “parental” terms of the GO term were also assigned to that gene. For each GO term, the number of positively selected and non-positively selected sites in genes assigned with that GO term was calculated. A hypergeometric test was then applied in R  to calculate a P value for each GO term that represents the probability that the number of positively selected sites observed to be associated with a GO term (or greater number than this) would be seen by chance if positively selected sites were distributed randomly across the genes. A Bonferroni correction was then applied to account for multiple testing , producing the adjusted P value that is quoted in the text and in Additional file 6.
Homologues of human CCDC147 were predicted using profile-based iterative searches with the HMMer3 , and later the more sensitive HMMer2 , algorithms. The algorithms searched for significant sequence similarity between the CCDC147 sequence and protein sequences in the UniRef50 database . Sequences with significant E-value similarity to CCDC147 where kept, and the G. magnirostris and T. guttata (and later G. fortis) CCDC147 predicted sequences were added to multiple sequence alignments that were aligned using T-Coffee . Alignments were inspected manually, and lower quality aligning sequences removed, before a phylogenetic tree of the relationship between the sequences was inferred using a Neighbor-joining tree approach .
We thank the Galapagos National Parks Service and Charles Darwin Research Station for permission to obtain and export material and for logistical support. We would like to thank Greg Wray from the Duke Institute for Genome Sciences and Policy for helpful suggestions during the initial planning of the project and for providing significant funds to help cover the sequencing. We express our gratitude to Luis Sánchez-Pulido for helping with the protein sequence analysis, and we thank Wilfried Haerty and Yang Li for helpful discussions.
CR, AH, and CP are funded by the Medical Research Council, UK. MF was supported by the National Research Foundation Postdoctoral Research Fellowship in Biological Informatics (Award No. DBI-0905714). LK is funded by the European Research Council (ERC). AA and CC were supported by a grant from the NSF (10B-0616127). PG and RG are funded by National Science Foundation, USA.
The parent accession for the G. magnirostris reads in SRA is SRA061447 whilst the BioProject accession for WGS data is PRJNA178982.
- Darwin C: The Voyage of the Beagle. 1988, New York: New American LibraryGoogle Scholar
- Freeman S, Herron J: Evolutionary Analysis. 2003, Upper Saddle River, N.J.: CramOutline&Highlight101Google Scholar
- Barton NH: Evolution. 2007, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory PressGoogle Scholar
- Futuyma DJ: Evolution. 2009, Sunderland, Mass: Sinauer Associates, 2Google Scholar
- Grant BR, Grant PR: Evolutionary dynamics of a natural population: the large cactus finch of the Galápagos. 1989, Chicago: University of Chicago PressGoogle Scholar
- Abzhanov A: Darwin's Galapagos finches in modern biology. Philos Trans R Soc Lond B Biol Sci. 2010, 365: 1001-1007.PubMed CentralView ArticlePubMedGoogle Scholar
- Sato A, Tichy H, O'HUigin C, Grant PR, Grant BR, Klein J: On the origin of Darwin's finches. Mol Biol Evol. 2001, 18: 299-311.View ArticlePubMedGoogle Scholar
- Burns KJ, Hackett SJ, Klein NK: Phylogenetic relationships and morphological diversity in Darwin's finches and their relatives. Evolution. 2002, 56: 1240-1252.View ArticlePubMedGoogle Scholar
- Petren K, Grant PR, Grant BR, Keller LF: Comparative landscape genetics and the adaptive radiation of Darwin's finches: the role of peripheral isolation. Mol Ecol. 2005, 14: 2943-2957.View ArticlePubMedGoogle Scholar
- Grant PR, Grant BR: How and Why Species Multiply: The Radiation of Darwin's Finches. 2008, Princeton, N.J.: Princeton University PressGoogle Scholar
- Grant PR, Grant BR: Conspecific versus heterospecific gene exchange between populations of Darwin's finches. Philos Trans R Soc Lond B Biol Sci. 2010, 365: 1065-1076.PubMed CentralView ArticleGoogle Scholar
- Abzhanov A, Protas M, Grant BR, Grant PR, Tabin CJ: Bmp4 and morphological variation of beaks in Darwin's finches. Science. 2004, 305: 1462-1465.View ArticlePubMedGoogle Scholar
- Mallarino R, Grant PR, Grant BR, Herrel A, Kuo WP, Abzhanov A: Two developmental modules establish 3D beak-shape variation in Darwin's finches. Proc Natl Acad Sci USA. 2011, 108: 4057-4062.PubMed CentralView ArticlePubMedGoogle Scholar
- Abzhanov A, Kuo WP, Hartmann C, Grant BR, Grant PR, Tabin CJ: The calmodulin pathway and evolution of elongated beak morphology in Darwin's finches. Nature. 2006, 442: 563-567.View ArticleGoogle Scholar
- Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Kunstner A, Searle S, White S, Vilella AJ, Fairley S, et al: The genome of a songbird. Nature. 2010, 464: 757-762.PubMed CentralView ArticlePubMedGoogle Scholar
- Cracraft J, Barker FK: The Timetree of Life. 2009, Oxford: Oxford University PressGoogle Scholar
- Arnason U, Adegoke JA, Gullberg A, Harley EH, Janke A, Kullberg M: Mitogenomic relationships of placental mammals and molecular estimates of their divergences. Gene. 2008, 421: 37-51.View ArticlePubMedGoogle Scholar
- Tegelstrom H, Ebenhard T, Ryttman H: Rate of karyotype evolution and speciation in birds. Hereditas. 1983, 98: 235-239.View ArticlePubMedGoogle Scholar
- Griffin DK, Robertson LB, Tempest HG, Skinner BM: The evolution of the avian genome as revealed by comparative molecular cytogenetics. Cytogenet Genome Res. 2007, 117: 64-77.View ArticleGoogle Scholar
- Meader S, Ponting CP, Lunter G: Massive turnover of functional sequence in human and other mammalian genomes. Genome Res. 2010, 20: 1335-1343.PubMed CentralView ArticlePubMedGoogle Scholar
- Gregory TR, Nicol JA, Tamm H, Kullman B, Kullman K, Leitch IJ, Murray BG, Kapraun DF, Greilhuber J, Bennett MD: Eukaryotic genome size databases. Nucleic Acids Res. 2007, 35: D332-D338.PubMed CentralView ArticlePubMedGoogle Scholar
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-i358.View ArticlePubMedGoogle Scholar
- Consortium ICGS: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432: 695-716.View ArticleGoogle Scholar
- Zhang G, Parker P, Li B, Li H, Wang J: The genome of Darwin’s Finch (Geospiza fortis). GigaScience. 2012, 1: 13-View ArticleGoogle Scholar
- Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Blomberg Le A, Bouffard P, Burt DW, Crasta O, Crooijmans RP, et al: Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 2010, 8 (9): e1000475-PubMed CentralView ArticlePubMedGoogle Scholar
- Fujita MK, Edwards SV, Ponting CP: The Anolis lizard genome: an amniote genome without isochores. Genome Biol Evol. 2011, 3: 974-984.PubMed CentralView ArticlePubMedGoogle Scholar
- Lunter G, Ponting CP, Hein J: Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol. 2006, 2: e5-PubMed CentralView ArticlePubMedGoogle Scholar
- Smith NG, Brandstrom M, Ellegren H: Evidence for turnover of functional noncoding DNA in mammalian genome evolution. Genomics. 2004, 84: 806-813.View ArticlePubMedGoogle Scholar
- Hardison RC, Roskin KM, Yang S, Diekhans M, Kent WJ, Weber R, Elnitski L, Li J, O'Connor M, Kolbe D, et al: Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res. 2003, 13: 13-26.PubMed CentralView ArticlePubMedGoogle Scholar
- Heger A, Ponting CP: Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. Genome Res. 2007, 17: 1837-1849.PubMed CentralView ArticlePubMedGoogle Scholar
- Heger A, Ponting CP: OPTIC: orthologous and paralogous transcripts in clades. Nucleic Acids Res. 2008, 36: D267-D270.PubMed CentralView ArticlePubMedGoogle Scholar
- Creevey CJ, Muller J, Doerks T, Thompson JD, Arendt D, Bork P: Identifying single copy orthologs in Metazoa. PLoS Comput Biol. 2011, 7: e1002269-PubMed CentralView ArticlePubMedGoogle Scholar
- Pereira SL, Baker AJ: A molecular timescale for galliform birds accounting for uncertainty in time estimates and heterogeneity of rates of DNA substitutions across lineages and sites. Mol Phylogenet Evol. 2006, 38: 499-509.View ArticlePubMedGoogle Scholar
- Nabholz B, Kunstner A, Wang R, Jarvis ED, Ellegren H: Dynamic evolution of base composition: causes and consequences in avian phylogenomics. Mol Biol Evol. 2011, 28: 2197-2210.PubMed CentralView ArticlePubMedGoogle Scholar
- Li WH, Ellsworth DL, Krushkal J, Chang BH, Hewett-Emmett D: Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylogenet Evol. 1996, 5: 182-187.View ArticlePubMedGoogle Scholar
- Keightley PD, Eyre-Walker A: Deleterious mutations and the evolution of sex. Science. 2000, 290: 331-333.View ArticlePubMedGoogle Scholar
- Grant PR, Grant BR: Demography and the genetically effective sizes of two populations of Darwin's Finches. Ecology. 1992, 73: 766-784.View ArticleGoogle Scholar
- Price TD, Grant PR, Gibbs HL, Boag PT: Recurrent patterns of natural selection in a population of Darwin's finches. Nature. 1984, 309: 787-789.View ArticlePubMedGoogle Scholar
- Ellegren H: A selection model of molecular evolution incorporating the effective population size. Evolution. 2009, 63: 301-305.View ArticlePubMedGoogle Scholar
- Grant PR, Grant BR: Hybridization of bird species. Science. 1992, 256: 193-197.View ArticlePubMedGoogle Scholar
- Balakrishnan CN, Edwards SV: Nucleotide variation, linkage disequilibrium and founder-facilitated speciation in wild populations of the zebra finch (Taeniopygia guttata). Genetics. 2009, 181: 645-660.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479.View ArticlePubMedGoogle Scholar
- Yang Z, Wong WS, Nielsen R: Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005, 22: 1107-1118.View ArticlePubMedGoogle Scholar
- Nam K, Mugal C, Nabholz B, Schielzeth H, Wolf JB, Backstrom N, Kunstner A, Balakrishnan CN, Heger A, Ponting CP, et al: Molecular evolution of genes in avian genomes. Genome Biol. 2010, 11: R68-PubMed CentralView ArticlePubMedGoogle Scholar
- Duret L, Galtier N: Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 2009, 10: 285-311.View ArticlePubMedGoogle Scholar
- Vorbach C, Harrison R, Capecchi MR: Xanthine oxidoreductase is central to the evolution and function of the innate immune system. Trends Immunol. 2003, 24: 512-517.View ArticlePubMedGoogle Scholar
- Hyde BB, Liesa M, Elorza AA, Qiu W, Haigh SE, Richey L, Mikkola HK, Schlaeger TM, Shirihai OS: The mitochondrial transporter ABC-me (ABCB10), a downstream target of GATA-1, is essential for erythropoiesis in vivo. Cell Death Differ. 2012, 19: 1117-1126.PubMed CentralView ArticlePubMedGoogle Scholar
- Pappas CT, Bliss KT, Zieseniss A, Gregorio CC: The Nebulin family: an actin support group. Trends Cell Biol. 2011, 21: 29-37.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang ZQ, Fung MR, Barlow DP, Wagner EF: Regulation of embryonic growth and lysosomal targeting by the imprinted Igf2/Mpr gene. Nature. 1994, 372: 464-467.View ArticlePubMedGoogle Scholar
- Snell GD: Dwarf, a New Mendelian Recessive Character of the House Mouse. Proc Natl Acad Sci USA. 1929, 15: 733-734.PubMed CentralView ArticlePubMedGoogle Scholar
- Ingraham HA, Chen RP, Mangalam HJ, Elsholtz HP, Flynn SE, Lin CR, Simmons DM, Swanson L, Rosenfeld MG: A tissue-specific transcription factor containing a homeodomain specifies a pituitary phenotype. Cell. 1988, 55: 519-529.View ArticlePubMedGoogle Scholar
- Brugmann SA, Powder KE, Young NM, Goodnough LH, Hahn SM, James AW, Helms JA, Lovett M: Comparative gene expression analysis of avian embryonic facial structures reveals new candidates for human craniofacial disorders. Hum Mol Genet. 2010, 19: 920-930.PubMed CentralView ArticlePubMedGoogle Scholar
- Weatherly KL, Ramesh R, Strange H, Waite KL, Storrie B, Proudman JA, Wong EA: The turkey transcription factor Pit-1/GHF-1 can activate the turkey prolactin and growth hormone gene promoters in vitro but is not detectable in lactotrophs in vivo. Gen Comp Endocrinol. 2001, 123: 244-253.View ArticlePubMedGoogle Scholar
- Hsieh CC, DeFord JH, Flurkey K, Harrison DE, Papaconstantinou J: Effects of the Pit1 mutation on the insulin signaling pathway: implications on the longevity of the long-lived Snell dwarf mouse. Mech Ageing Dev. 2002, 123: 1245-1255.View ArticlePubMedGoogle Scholar
- Govoni KE, Lee SK, Chadwick RB, Yu H, Kasukawa Y, Baylink DJ, Mohan S: Whole genome microarray analysis of growth hormone-induced gene expression in bone: T-box3, a novel transcription factor, regulates osteoblast proliferation. Am J Physiol Endocrinol Metab. 2006, 291: E128-E136.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29.PubMed CentralView ArticlePubMedGoogle Scholar
- Loges NT, Olbrich H, Fenske L, Mussaffi H, Horvath J, Fliegauf M, Kuhl H, Baktai G, Peterffy E, Chodhari R, et al: DNAI2 mutations cause primary ciliary dyskinesia with defects in the outer dynein arm. Am J Hum Genet. 2008, 83: 547-558.PubMed CentralView ArticlePubMedGoogle Scholar
- Perrone CA, Tritschler D, Taulman P, Bower R, Yoder BK, Porter ME: A novel dynein light intermediate chain colocalizes with the retrograde motor for intraflagellar transport at sites of axoneme assembly in chlamydomonas and Mammalian cells. Mol Biol Cell. 2003, 14: 2041-2056.PubMed CentralView ArticlePubMedGoogle Scholar
- Carvalho AB, Dobo BA, Vibranovski MD, Clark AG: Identification of five new genes on the Y chromosome of Drosophila melanogaster. Proc Natl Acad Sci USA. 2001, 98: 13225-13230.PubMed CentralView ArticlePubMedGoogle Scholar
- Tam LW, Lefebvre PA: The Chlamydomonas MBO2 locus encodes a conserved coiled-coil protein important for flagellar waveform conversion. Cell Motil Cytoskeleton. 2002, 51: 197-212.View ArticlePubMedGoogle Scholar
- Mummery-Widmer JL, Yamazaki M, Stoeger T, Novatchkova M, Bhalerao S, Chen D, Dietzl G, Dickson BJ, Knoblich JA: Genome-wide analysis of Notch signalling in Drosophila by transgenic RNAi. Nature. 2009, 458: 987-992.PubMed CentralView ArticlePubMedGoogle Scholar
- Kapushesky M, Adamusiak T, Burdett T, Culhane A, Farne A, Filippov A, Holloway E, Klebanov A, Kryvych N, Kurbatova N, et al: Gene Expression Atlas update–a value-added database of microarray and sequencing-based functional genomics experiments. Nucleic Acids Res. 2012, 40: D1077-D1081.PubMed CentralView ArticlePubMedGoogle Scholar
- Swanson WJ, Vacquier VD: The rapid evolution of reproductive proteins. Nat Rev Genet. 2002, 3: 137-144.View ArticlePubMedGoogle Scholar
- Wilson DE, Reeder DM: Mammal Species of the World. 2005, A Taxonomic and Geographic Reference: Johns Hopkins University PressGoogle Scholar
- Mayr E: The number of species of Birds. The Auk. 1946, 63: 67-Google Scholar
- Ellegren H, Smeds L, Burri R, Olason PI, Backstrom N, Kawakami T, Kunstner A, Makinen H, Nadachowska-Brzyska K, Qvarnstrom A, et al: The genomic landscape of species divergence in Ficedula flycatchers. Nature. 2012, 491: 756-760.PubMedGoogle Scholar
- Feder JL, Egan SP, Nosil P: The genomics of speciation-with-gene-flow. Trends Genet. 2012, 28: 342-350.View ArticlePubMedGoogle Scholar
- Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D: Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA. 2003, 100: 11484-11489.PubMed CentralView ArticlePubMedGoogle Scholar
- Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004, 14: 708-715.PubMed CentralView ArticlePubMedGoogle Scholar
- Goodstadt L, Ponting CP: Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol. 2006, 2: e133-PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5: 113-PubMed CentralView ArticlePubMedGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19: 327-335.PubMed CentralView ArticlePubMedGoogle Scholar
- Mallick S, Gnerre S, Muller P, Reich D: The difficulty of avoiding false positives in genome scans for natural selection. Genome Res. 2009, 19: 922-933.PubMed CentralView ArticlePubMedGoogle Scholar
- Talavera G, Castresana J: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007, 56: 564-577.View ArticlePubMedGoogle Scholar
- Penn O, Privman E, Landan G, Graur D, Pupko T: An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol. 2010, 27: 1759-1767.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591.View ArticlePubMedGoogle Scholar
- Nozawa M, Suzuki Y, Nei M: Reliabilities of identifying positive selection by the branch-site and the site-prediction methods. Proc Natl Acad Sci USA. 2009, 106: 6700-6705.PubMed CentralView ArticlePubMedGoogle Scholar
- Bakewell MA, Shi P, Zhang J: More genes underwent positive selection in chimpanzee evolution than in human evolution. Proc Natl Acad Sci USA. 2007, 104: 7489-7494.PubMed CentralView ArticlePubMedGoogle Scholar
- Suzuki Y: False-positive results obtained from the branch-site test of positive selection. Genes Genet Syst. 2008, 83: 331-338.View ArticlePubMedGoogle Scholar
- Yang Z, dos Reis M: Statistical properties of the branch-site test of positive selection. Mol Biol Evol. 2011, 28: 1217-1228.View ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R, Goldman N: In defense of statistical methods for detecting positive selection. Proc Natl Acad Sci USA. 2009, 106: E95-Author reply E96PubMed CentralView ArticlePubMedGoogle Scholar
- Fletcher W, Yang Z: The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol. 2010, 27: 2257-2267.View ArticlePubMedGoogle Scholar
- R: A Language and Environment for Statistical Computing. [http://www.R-project.org]
- Bonferroni CE: Teoria statistica delle classi e calcolo delle probabilità. 1936, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di FirenzeGoogle Scholar
- Eddy SR: A new generation of homology search tools based on probabilistic inference. Genome Inform. 2009, 23: 205-211.PubMedGoogle Scholar
- Eddy SR: Hidden Markov models. Curr Opin Struct Biol. 1996, 6: 361-365.View ArticlePubMedGoogle Scholar
- Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006, 34: D187-D191.PubMed CentralView ArticlePubMedGoogle Scholar
- Notredame C, Higgins DG, Heringa J: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217.View ArticlePubMedGoogle Scholar
- Sonnhammer EL, Hollich V: Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinformatics. 2005, 6: 108-PubMed CentralView ArticlePubMedGoogle Scholar
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. http://www.repeatmasker.org (1996–2011)
- Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K: Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res. 2012, 22: 51-63.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.