A house finch (Haemorhous mexicanus) spleen transcriptome reveals intra- and interspecific patterns of gene expression, alternative splicing and genetic diversity in passerines
© Zhang et al.; licensee BioMed Central Ltd. 2014
Received: 4 December 2013
Accepted: 15 April 2014
Published: 24 April 2014
With its plumage color dimorphism and unique history in North America, including a recent population expansion and an epizootic of Mycoplasma gallisepticum (MG), the house finch (Haemorhous mexicanus) is a model species for studying sexual selection, plumage coloration and host-parasite interactions. As part of our ongoing efforts to make available genomic resources for this species, here we report a transcriptome assembly derived from genes expressed in spleen.
We characterize transcriptomes from two populations with different histories of demography and disease exposure: a recently founded population in the eastern US that has been exposed to MG for over a decade and a native population from the western range that has never been exposed to MG. We utilize this resource to quantify conservation in gene expression in passerine birds over approximately 50 MY by comparing splenic expression profiles for 9,646 house finch transcripts and those from zebra finch and find that less than half of all genes expressed in spleen in either species are expressed in both species. Comparative gene annotations from several vertebrate species suggest that the house finch transcriptomes contain ~15 genes not yet found in previously sequenced vertebrate genomes. The house finch transcriptomes harbour ~85,000 SNPs, ~20,000 of which are non-synonymous. Although not yet validated by biological or technical replication, we identify a set of genes exhibiting differences between populations in gene expression (n = 182; 2% of all transcripts), allele frequencies (76 F ST ouliers) and alternative splicing as well as genes with several fixed non-synonymous substitutions; this set includes genes with functions related to double-strand break repair and immune response.
The two house finch spleen transcriptome profiles will add to the increasing data on genome and transcriptome sequence information from natural populations. Differences in splenic expression between house finch and zebra finch imply either significant evolutionary turnover of splenic expression patterns or different physiological states of the individuals examined. The transcriptome resource will enhance the potential to annotate an eventual house finch genome, and the set of gene-based high-quality SNPs will help clarify the genetic underpinnings of host-pathogen interactions and sexual selection.
KeywordsHouse finch Mycoplasma gallisepticum Gene expression Transcriptome Assembly
Understanding the hereditary components underlying trait variation in natural populations is key to answering a range of fundamental questions in evolutionary biology, but advancing basic insight into evolutionary relevant genotype-phenotype interactions is not trivial. Two essentials of this endeavor are quantification of phenotypic variation in traits that influence individual fitness in natural settings  and collection of information on DNA sequences, linkage maps, gene expression profiles or other genomic resources spanning the genome of the focal organism [eg. [2, 3]. The recent progress in data collection in evolutionary genetics research, mediated predominantly by advancements in DNA sequencing, allows one to rapidly generate vast amounts of genomic data at reasonable cost, even for organisms that are genetically poorly known . These advances facilitate detailed analyses of DNA sequence evolution in almost any species of interest, whether to scan for signs of positive selection, characterize genome-wide divergence between lineages or identify associations between genomic regions and phenotypic traits ([5, 6], The Heliconius Genome Sequence Consortium, ). Consequently, the main limiting factor for developing model systems for evolutionary genomics is acquisition not of genomic resources but rather of sufficient evolutionarily relevant phenotypic and fitness data. The implications of this trend are that taxa for which long-term ecological data have already been collected (ecological model species) will be at the forefront of evolutionary genomics research on natural populations .
A long-term model species for studying sexual selection and host-pathogen interactions in the wild is the house finch (Haemorhous mexicanus) . The species is native to western North America , but shipping of native birds from the western US to pet traders in the east resulted in the establishment of a feral house finch population in the New York City area around 1940 . Over the subsequent decades, this eastern population of house finches expanded its distribution across half of the continent and the population size increased exponentially. The house finch became one of the most common bird species in the eastern US and census estimates suggested a population size in the range of hundreds of millions . As a result of a Mycoplasma gallisepticum (MG) epizootic, which emerged in the Washington D.C. area in 1994 and subsequently spread rapidly over the eastern range [12, 13], populations of house finches in the eastern US and eastern Canada declined by approximately 50% between 1994 and 1997 . After the initial precipitous decline, the population remained stable and there were indications of increasing MG resistance in eastern populations [15–17]. The unique demographic history of the house finch, distinct and evolutionary important carotenoid-based color variation between males [11, 18], and the selection regime brought about by the MG epizootic have made it a model species for many issues in natural selection, sexual selection, and evolutionary genomics, including morphological evolution in response to regional climate [19, 20] plumage coloration and its role in sexual selection [18, 21, 22], gene expression evolution [17, 23–25], the genomic effects of founder events [26–28], and patterns of molecular evolution in candidate genes [29, 30], and across the genome . These are merely a handful of examples of the substantial knowledge about phenotypes of relevance to evolutionary biology that is represented in this study system .
Here we present high coverage spleen transcriptomes from two populations of the house finch, one representing birds from a population in Arizona (AZ) that has never been exposed to MG and the other representing birds from a population in Alabama (AL) that had been exposed to MG for 15 years at the time of collection. So far, genomic comparisons between these two populations have included expression profiling based on macro- and micro-arrays [17, 23, 32] and small-scale, partial candidate gene sequences  or anonymous genetic marker data . The transcriptome assembly provided here is a novel resource for forthcoming comparative and functional studies within house finches and among birds in general. Briefly, we describe our sequencing and assembly procedure and the results that we obtained using the zebra finch as a resource for comparative study of gene expression evolution in avian spleen. We follow up with functional annotation and analysis of genetic differentiation in gene expression between the two focal house finch populations studied here to identify genes exhibiting extensive differentiation between these populations. Such differentially expressed genes are candidate genes for MG resistance, and these genes will be important targets for subsequent detailed functional analyses to understand host-pathogen interactions in this system and in general. A preliminary report of this transcriptome (n = 4,398 genes), pooled across both populations, helped to clarify long-term patterns of protein-evolution in birds and other amniotes . The present study focuses on a transcript set over twice as large (n = 9,646 genes) and on expression differentiation between house finch populations and between house finch and zebra finch, the only other passerine bird for which spleen expression data is available.
Summary of the transcriptome assembly
Different splenic expression patterns between house finch and zebra finch
Novel genes in the house finch
A question of interest was to identify potential novel (unique) protein-coding genes that might have evolved unique functions in the house finch or related lineages. To investigate this, we first used the CPC program  to predict the coding potential for each of the transcripts assembled de novo in the house finch. Out of the initial 222,678 reconstructed transcripts, 33,767 (15%) were predicted as coding and these included as many as 27,241 (81%) transcripts that had no identifiable zebra finch ortholog. These 27,241 transcripts were mapped against all known chicken, Anolis and human coding cDNAs, and we identified an additional 617 transcripts with known chicken orthologs, 46 with Anolis orthologs and 13 with human orthologs. Among the remaining 26,565 transcripts, we identified house finch genes not yet found in birds by first calculating the median coding potential score (6.39) and the median size (2,604 bp) of the 7,202 transcripts with an identified ortholog and compared these values to the set of genes with no identified ortholog in any of the other four species. This analysis revealed that 4,940 (18.6%) of the 26,565 transcripts were longer and had larger coding potential than the median values of the known coding transcripts. Next, we calculated the median expression level of known coding genes (3.9 and 3.7 TPM (transcripts per million total reads) for AL and AZ respectively), and found that 511 and 502 of the novel transcripts in AL and AZ, respectively, had higher expression than the median. Of them, 436 (AL) and 425 (AZ) were highly similar isoforms of transcripts with known orthologs, i.e. false positives generated through the de novo assembly process. Excluding these we had 75 (AL) and 77 (AZ) transcripts of which 20 and 19 partially overlapped (≥80% identity but <50% query length coverage) known zebra finch protein-coding genes. All the transcripts that were identified as potential orthologs in the latter steps had low query length coverage and that is likely the reason why we failed to detect them in the initial BLAST search. This could possibly be explained by partial duplications of the orthologous genes or substantial differences in splicing isoforms between the two species. The remaining 55 (AL) and 58 (AZ) transcripts that could not be mapped to cDNAs from zebra finch were further aligned to the zebra finch genomic sequence. For certain transcripts that were different splice forms of the same genes, only the longest splicing form was used in the alignment. This resulted in significant alignment scores for 45 query transcripts in each population. Of these, 31 (69%) and 32 (71%) were aligned at various length coverage with > =80% sequence similarity and exhibited clear exon-intron structures in zebra finch, implying that these are actually functional genes with incomplete annotation in the zebra finch. 14 and 13 transcripts (12 in common, 15 transcripts in total) were not aligned at all and these constitute a set of novel genes in the lineage leading to house finch at some point after the split from zebra finch.
Expression differences between historically exposed and unexposed house finch populations
Gene ontology terms for genes differentially expressed between the two house finch populations when comparing expression profiles of the spleen between populations
Small molecule metabolic process
Carboxylic acid metabolic process
Oxoacid metabolic process
Organic acid metabolic process
Cellular ketone metabolic process
Cellular amino acid metabolic process
Cellular aromatic compound metabolic process
Ribonucleoside monophosphate biosynthetic process
Nucleoside monophosphate biosynthetic process
Purine-containing compound biosynthetic process
Ribonucleoside monophosphate metabolic process
Purine nucleoside biosynthetic process
Purine ribonucleoside biosynthetic process
L-serine metabolic process
Nucleobase metabolic process
Nucleoside monophosphate metabolic process
Compound biosynthetic process
Purine nucleobase biosynthetic process
Carbon-nitrogen lyase activity
Transferase activity, transferring one-carbon groups
Pyridoxal phosphate binding
Vitamin B6 binding
SNP frequencies and nucleotide diversity in the house finch transcriptome
Genetic differentiation between AL and AZ populations
Fixed differences between AL and AZ populations
Five genes with Ensembl entries available in the zebra finch containing at least two fixed non-synonymous differences between Alabama (AL, exposed) and Arizona (AZ, naïve) house finch populations
Ensembl transcript ID
Heat-shock PB associated protein 1
Ligase IV, DNA, ATP-dependent
Thymocyte selection associated
Patterns of alternative splicing
Transcriptomes are a valuable genomic resource for species of ecological and evolutionary significance because they contain likely targets of natural selection and can provide a catalog of protein-coding regions of interest. Transcriptomes are likely enriched for targets of natural and sexual selection not only because the genes themselves may harbor mutations influencing relevant phenotypes but also because regulatory sequences are enriched in the vicinity of coding regions ; hence any signals of selection on a regulatory mutation or associations with phenotypic variation may be detectable in genetic variants in nearby coding regions. Here we report a draft spleen transcriptome assembly of the house finch, a species of importance for sexual selection and host-pathogen interactions in the wild [8, 17, 18, 23, 25, 27, 31, 41]. Besides being one of the first non-normalized RNA sequencing efforts available for non-model avian taxa [42–44], our transcriptome is an important resource for comparative genomics studies within birds  and for detailed analyses of the genetic basis of pathogen resistance in this particular system. Because we sampled only two populations, our sampling design does not allow us to distinguish many interlocking events that could shape patterns of expression and polymorphism in these birds, including the introduction and adaptation to a novel environment in the eastern US and the effect of the MG epizootic on levels of polymorphism. Still, after stringent quality filtering using information from zebra finch and chicken, the best-annotated avian genome assemblies, we can state that the core set of transcribed house finch sequences comprises almost 10,000 unique genes. This set of genes is an invaluable resource for forthcoming efforts aimed at pinpointing the genetic basis of evolutionarily important traits in the house finch, such as resistance to Mycoplasma gallisepticum[17, 24, 25, 31] and the intensity of plumage redness in males [8, 18, 21, 22].
Expression differences between species
When comparing the set of genes expressed in the house finch spleen to a comparable set of genes expressed in the zebra finch spleen  we found that only 3,555 of the total set of 8,415 genes were expressed in both species. The spleen is a relatively small organ in birds and, although thorough studies of spleen function are uncommon, it plays an obvious role in the universal immune response . It was therefore somewhat surprising to find that a large proportion of the entire coding gene set for zebra finches (n = 17,488 genes) is expressed in the spleen and that such a large proportion of these genes are uniquely expressed in one species and enriched for different gene ontology terms. By contrast, only 185 and 128 transcripts were uniquely expressed in one of the two house finch populations studied here. However, most of these genes that were differentially expressed between house finch populations have an average level of expression that is low compared with genes that shared expression between populations (mean TPM of 0.41 versus 0.46 in AL and 0.36 versus 0.42 in AZ). This low level of expression overlap, combined with our small sample sizes per population, means that the probability of failing to detect such genes in one population or the other is high. These factors make the expression differences that we observed between zebra finch and house finch spleen even more striking. In addition to evolutionary divergence in gene expression, undoubtedly many of the differences found between zebra finch and house finch spleen are attributable to overall methodology and the physiological state of the individual birds used for RNA-seq.
Novel genes in the house finch transcriptome
Of particular value to the community of researchers studying house finches is the establishment of a catalog of transcripts and genes, including those that are not previously characterized in other organisms. Therefore, we attempted to identify transcripts with high coding potential that had not yet been found in birds. By using a strict set of filters and information about coding potential and reading frames from zebra finch, we identified 15 genes that were uniquely present in the house finch transcriptome data set compared to other birds. These genes might be genes that evolved novel function in the house finch lineage after divergence from the zebra finch around 50 million years ago  or they may represent ancestral genes that lost function in chicken and zebra finch and rapidly accumulated mutations so that they no longer can be identified as potential homologs in standard reciprocal BLAST analyses. This set of novel genes lacking orthologs in zebra finch or chicken should be primary targets in subsequent analyses aimed at characterizing the genetic basis of novel functions in the lineage leading to the house finch. Increased taxon sampling, which presumably will find these genes in additional relatives of house finches, will increase the power of our inferences about novel genes and their functions.
Expression differences between populations
In addition to the small set of genes uniquely expressed in one of the two populations (see above), we also found significant expression-level differences for a set of genes when comparing the historically exposed (AL) and unexposed (AZ) population of house finches. Using a strict filtering pipeline we collected a high-quality set of 8,981 genes with CPM >1, 182 (~2%) of which were significantly differentially expressed between house finch populations. This gene set was enriched for genes related to metabolic processes, vitamin binding, transferase- and catalytic activity. Because the sampling was not designed to make explicit conclusions about globally different patterns of gene expression in the spleen between these two populations - for example by keeping all birds in a common environment for a considerable time period before sampling – the overrepresentation of metabolic processes likely partly reflects differences in the home environment for the different populations. Differential expression of metabolic genes could, for example, be in part due to a difference in ambient temperature at the time of sampling in Arizona as compared to Alabama . It should also be stressed that biological replicates and/or validation of gene expression differences with alternative methods will be needed to verify differential expression patterns between the house finch populations.
Genetic variation and rates of non-synonymous substitution
Using a series of stringent filtering criteria we identified a set of high-quality SNPs in the transcriptome sequence of the house finch, allowing us to estimate the SNP density for the transcribed part of the genome and to assess within- and between-population genetic variation. Overall, we identified roughly 85,000 high-quality SNPs in each of the populations. The average SNP density was hence very similar between populations; in fact, although not significant (p-value = 0.2941, Fisher’s exact test) after correcting for the number of expressed nucleotides (22,950,549 and 22,434,784), the number of SNPs discovered in the recently founded AL population (n = 86,668) was slightly higher than in the native AZ population (n = 84,292). This pattern is consistent with the idea that neither the artificial introduction of house finches into eastern North America nor the Mycoplasma epizootic has dramatically reduced levels of heterozygosity in coding regions in the eastern population. Population genetic considerations suggest that the estimates for each population are unlikely to be strongly affected by our relatively small sample size (n = 12 chromosomes per population), which is large enough to capture most variation [48, 49], especially given that these populations are not strongly structured . Sample sizes are also unlikely to explain the pattern because our estimates of genetic diversity were if anything slightly larger for the eastern than for the western US, and the error in these estimates is influenced much more so by the number of loci than the number of individuals. The putative bottleneck in the eastern house finch population was short and the population has been rapidly growing for the last 60 years, perhaps resulting in lower levels of drift and a higher incidence of polymorphisms, albeit at low frequency. Additionally, although the selection event as a result of the Mycoplasma epizootic may not have been strong, resulting in a negligible effect on levels of polymorphism, any explanation must also reconcile the fact that several surveys of genetic variation in pre- and post- epizootic finches have found significantly lower levels of variation in the east than in the west. Such studies include analyses of mitochondrial DNA, microsatellites and MHC genes [27, 41]. Aside from the mtDNA and MHC results, which are likely to be idiosyncratic due to issues of linkage, ploidy and balancing selection, the contrasting patterns in these studies and our results suggest that the differences may lie largely in differences in evolutionary dynamics of coding versus noncoding (microsatellite) loci [50–52]. Hawley et al. found for example increased diversity in MHC class II genes in post-epizootic birds from eastern United States, potentially reflecting disease mediated balancing selection.
We observed that only 40% of the SNPs were shared between populations. Given the short divergence time and the high level of heterozygosity in the introduced population, the fraction of shared polymorphisms is expected to be high; hence, the lower-than-expected level of shared polymorphism observed here is likely a result of relatively small sample sizes from each range resulting in that many, especially low-frequency, polymorphisms that are indeed segregating in both populations are detected in only one of the populations. As a complementary test to assess if the introduced AL population might have been affected by either the demographic history or the Mycoplasma epizootic (or both), we looked at the p N /p S ratios calculated for each population separately using both the unfiltered and the filtered set of SNPs. The underlying hypothesis was that the historically exposed and potentially bottlenecked AL population should show a higher ratio of non-synonymous to synonymous polymorphisms as a consequence of less efficient selection against slightly deleterious non-synonymous alleles drifting to higher frequency and hence more easily detectable in our sample. In neither of the two data sets did we observe a difference in p N /p S between the native AZ population and the introduced AL population, again suggesting that neither the introduction itself nor the exposure to the epizootic have considerably affected the drift of functional polymorphisms in the introduced AL population.
Genetic differentiation between populations
As expected given the extremely short time of divergence separating the two house finch populations, the vast majority of genes showed no substantial genetic differentiation between AZ and AL. However, a few genes (n = 76) showed considerable allele frequency differences reflected in F ST values > 3 standard deviations higher than the arithmetic mean and these could potentially constitute targets for directional selection in the disease exposed AL population (Figure 7). This set of highly differentiated genes was, however, not enriched for any particular functional category. There were five genes in which two or more non-synonymous SNPs were fixed between the house finch populations, of which 2 have unknown functions in birds, one is associated with double-strand break repair and two were associated with immune response. The latter two were a heat-shock protein and a T-cell precursor which corresponds well with the assumed strong selection for the immune response to adapt to the encounter of the novel pathogen MG [17, 24, 25, 27, 53]. It should, however, be noted that inference of differentiation from pooled RNA samples might be biased by instances of allele-specific expression and verification experiments are needed to establish candidate genes detected from differentiation scans of this type.
Functional polymorphisms may involve primary gene and protein sequences or regulatory changes that affect the expression  and recently it has been recognized that, in addition, traits can be controlled via alternative splicing, spatial or temporal variation in the use of different gene transcripts , a process that considerably increases gene product complexity [56, 57]. Recent analyses suggest that most eukaryotic genes have at least two splice variants [55, 58]. Alternative splicing seems to be more common in higher eukaryotes [56, 59] but results are not consistent and few taxonomic groups have been thoroughly investigated [60–62]. In order to assess the prevalence of alternative splice variation in the house finch spleen transcriptome we focused on a set of genes with a > 300 bp long uninterrupted ORF and reciprocal blast-supported 1:1 orthology to zebra finch genes (>90% identity required), resulting in a gene set of 9,167 and 9,007 annotated zebra finch genes in the AL and the AZ populations, respectively. More than half of these showed evidence of only one splice variant in both populations and the remaining genes varied between two to 23 splice isoforms. This observation is within the range of splice variation described from other vertebrates. However, it should be noted that splice variant detection using software designed for assembly may misestimate the number of isoforms  and additional biases might be introduced by transcript redundancy among compared datasets [60, 61]. Evolutionarily novel splice variants may constitute an important source for evolution of novel functions because they might be under relatively low constraints ; for example, species-specific splice variants are positively correlated with non-synonymous substitution rate  and minor alternative exons evolve faster than obligate exons [66, 67]. Our analysis revealed more than 150 unique splice variants across both populations. A conservative interpretation is that rather than reflecting divergence in inter-population splice variation, this result may reveal the challenges with reliably inferring splice variation in previously uncharacterized genomes using short-read data . In order to more rigorously assess the patterns of splice variation among populations, we therefore suggest that future transcriptomes in house finches should be conducted with technologies with longer read lengths, a feature that will be increasingly feasible given the rapid developments in sequencing technology . Still, it is intriguing to speculate on the evolutionary significance of alternative splicing in this system. In the first study of differential gene expression between experimentally infected and uninfected house finches , one of the most highly up-regulated genes as a result of infection was an alternative splicing factor, now called SREK1 (splicing regulatory glutamine/lysine-rich protein 1, chicken ortholog: ENSGALG00000014775), whose function is to regulate alternative splicing. It is fascinating to speculate that this protein might marshal an array of functionally relevant alternatively spliced variants upon MG infection, the results of which we may be detecting in our study. In general we suspect that splice variants could be of interest for forthcoming studies on microevolutionary change in house finches and other birds.
The characterization of the spleen transcriptome of the house finch will facilitate forthcoming genomic efforts in this species. By using SNPs in a large set of genes, association analyses and QTL mapping efforts can provide insight into the short-term evolutionary processes governing allele frequency changes as a consequence of putative bottlenecks, disease exposure and/or sexual selection. In addition, this resource will enhance the power of comparative genomics approaches to identify genes of importance for lineage specific adaptations and to investigate molecular evolutionary patterns across diverging lineages. In summary, this resource has paved the way to take a model species for host pathogen interactions and sexual selection into the realm of genomics.
Sampling, library construction and sequencing
In February 2010, three male and three female house finches were sampled from each of two populations (n = 24 chromosomes in total); one population from Green Valley, AZ (hereafter AZ), which is within the native range and historically unexposed to Mycoplasma gallisepticum, and one population descended from the introduced, eastern US population and previously exposed in nature to MG from Auburn, AL (hereafter AL; see  for a description of the MG epizootic in Auburn Alabama). Wild birds with no symptoms of MG from both populations were caught in the field using feeder traps. Immediately after capture, birds were sacrificed and the spleen was sampled and stored in RNA later (Ambion Inc., Austin, TX) at room temperature for one day and then subsequently at -80°C. We used the RNeasy Mini Kit (Qiagen, Inc., Valencia, CA) to extract total RNA from each individual spleen. However, because a single spleen did not generate enough RNA for constructing an Illumina HiSeq sequencing library we pooled individuals from each population in equimolar concentrations based on the individual RNA concentrations as measured on a Bioanalyzer (Agilent Technologies, Inc., Clara, CA). The manufacturer’s mRNA sequencing sample preparation guide (Illumina, Inc., San Diego, CA) was used to prepare two pools of total RNA for paired-end sequencing (101 bp read length). cDNA library preparation qualities were assessed with Bioanalyzer runs (Agilent Technologies, Inc., Clara, CA). To evaluate the frequency of erroneous adapter constructs we cloned (pGEM®-T Systems, Promega, Inc., Madison, WI) and sequenced 48 sample clones from each pool on a 96 capillary ABI 3730xl instrument (Life Technologies Corp., Carlsbad, CA). Seven of the 96 sequenced clones showed inaccurate adapter constructs, always in the form of one incomplete adapter. To optimize the sample concentrations and volumes for HiSeq sequencing we used a standard qPCR protocol for Illumina library preparations (KAPA library Quant Kit, Kapa Biosystems, Woburn, MA). Both pools were sequenced with Illumina HiSeq technology (Illumina, Inc., San Diego, CA) at the core facility of the FAS Center for Systems Biology at Harvard University. All sequence reads have been deposited in the sequence reads archive under accession number SRP018959 .
Quality control and read filtering
We assessed overall quality and sequence read statistics using Unix shell, perl (http://www.perl.org/) and python (http://www.python.org/) scripts developed in-house in addition to the FastQC (available from http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and FastX (available from http://hannonlab.cshl.edu/fastx_toolkit/index.html) packages. Read trimming and purging of low quality bases (phred score < 25) was done using the ConDeTri program (available from http://code.google.com/p/condetri/) version 2.2  (Figure 1).
Quality-filtered reads were assembled using Trinity, a de novo assembler designed to efficiently and robustly reconstruct a transcriptome . Trinity partitions the sequence data into individual de Bruijn graph clusters and processes them in parallel, making the assembly process relatively inexpensive computationally. In addition to generating the longest transcript sequence, possible splice isoforms were automatically reconstructed for each gene using default settings in Trinity (Figure 1).
Filters of assembled transcripts
Because the primary aim of the study was to assess gene expression differences and genetic differentiation between house finch populations, we focused primarily on protein-coding genes and a series of filters were therefore applied to remove potentially confounding sequences like non-coding RNA species, expressed pseudogenes and transcripts of unknown function . First, transcripts with short length were excluded. To find a reasonable size threshold for including a transcript in the data set, the mean size (μ = 956 bp) and standard deviation (s.d. = 494 bp) of all known cDNA sequences in zebra finch Ensembl release [69, 72] was calculated. We used μ - 1.s.d. (462 bp) as the cutoff and applied the threshold to assembled house finch transcripts. Second, we searched for intact open reading frames (ORFs) in these filtered house finch transcripts, using the Trinity tool suite and only transcripts with predicted ORFs longer than 300 bp were kept; we define this set as the unfiltered set. Finally, we mapped the unfiltered set to all known zebra finch cDNAs Ensembl release [69, 72] by BLAT , and positive hits were denoted as high-confidence house finch coding transcripts, and retained for primary analyses, if the alignment covered ≥ 60% of the total zebra finch transcript length with ≥ 80% identity. We define this set as the filtered set. The corresponding transcripts of the house finch filtered set in zebra finch we designate as zebra finch orthologs. Our filtering steps inevitably biased the dataset towards conserved genes (high-degree of similarity between species) of intermediate length since rapidly evolving genes (>20% divergence between species) and short (<300 bp coding sequence) or long (lower chance of spanning > 60% of the entire gene length) genes are more likely filtered out using these criteria (Figure 1). We also identified orthologs of the filtered gene set in chicken, Anolis lizard and in humans using similar protocols, but requiring at least 60%, 50% and 30% sequence identities in chickens, Anolis and humans, respectively, to account for their increasing evolutionary divergence from house finches.
Analysis of differential gene expression between populations
We mapped all trimmed reads to the de novo assembled house finch transcripts and calculated both raw read counts and transcripts per million reads (TPM) for each transcript using the RSEM software . Subsequently, the DESeq package  was used to identify differentially expressed genes (DEG) between the two house finch populations (AZ and AL). Transcripts at extremely low expression level (<1 read count per million reads (CPM) identified using edgeR: a bioconductor package for differential expression analysis of digital gene expression data) were excluded. We used CPM here instead of TPM to filter minimally expressed transcripts because with TPM some short transcripts with questionably low expression may not be detected due to the normalization based on transcript length. DEGs were defined as transcripts with a p-value < 0.05 after applying the Benjamini-Hochberg adjustment of the significance level  (Figure 1).
Identifying single nucleotide polymorphisms (SNPs) and estimating nucleotide diversity
where S n is the number of segregating sites, n is the number of sequences, and L is the length of a given sequence.
Estimating rates of nonsynonymous substitution (pN/pS)
SNPs were classified as non-coding, synonymous or non-synonymous, according to their positions in the predicted ORFs, and synonymous and non-synonymous sites were identified for each ORF using the method of Nei and Gojobori  on the filtered data set. These sites were used to estimate the ratio of non-synonymous (p N ) to synonymous polymorphisms (p S ) for each transcript and each population, by using perl scripts developed in house and calculating the number of non-synonymous SNPs divided by the number of non-synonymous sites and the number of synonymous SNPs divided by the number of synonymous sites (Figure 1).
Estimating genetic differentiation (FST)
To enhance the reliability of transcript-specific estimates we excluded transcripts with less than three SNPs.
Gene ontology analysis
Since there is no annotation for house finch genes currently available, we used the gene ontology  annotations of zebra finch orthologs retrieved using Ensembl’s Biomart release 69 . Tests for functional enrichment in sets of transcripts either differentially expressed or exhibiting high genetic differentiation between house finch populations were then conducted using TopGO , a software package that compares the difference in occurrences in a given functional category between foreground and background sets of transcripts and assesses significance using Fisher’s exact test (Figure 1). Correction for multiple tests was again performed using the Benjamini-Hochberg approach .
We thank Margarita Rios and Rusty Ligon for help with sampling of house finches in Arizona. Thanks also to Robert Ekblom for providing the zebra finch spleen expression data. NB acknowledges funding from the Swedish Research Council for postdoctoral research (VR Grant: 2009-693). Work was supported by NSF grant DEB-IOS 0923088 to GEH and SVE. Sampling was approved by Auburn University Institutional Animal Care and Use Committee (ref: 2010-1762).
- Clutton-Brock TH, Sheldon BC: Individuals and populations: the role of long-term, individual-based studies in ecology and evolutionary biology. Trends Ecol Evol. 2010, 25: 562-573. 10.1016/j.tree.2010.08.002.PubMedView ArticleGoogle Scholar
- Ellegren H, Sheldon BC: Genetic basis of fitness differences in natural populations. Nature. 2008, 452: 169-175. 10.1038/nature06737.PubMedView ArticleGoogle Scholar
- Bonneaud C, Burnside J, Edwards SV: High-speed developments in avian genomics. Bioscience. 2008, 58: 587-595. 10.1641/B580706.View ArticleGoogle Scholar
- Ekblom R, Galindo J: Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 2011, 107: 1-15. 10.1038/hdy.2010.152.PubMed CentralPubMedView ArticleGoogle Scholar
- Ellegren H, Smeds L, Burri R, Olason P, Backström N, Kawakami T, Nadachowska-Brzyska K, Qvarnström A, Uebbing S, Wolf JBW: The genomics of species differentiation in Ficedula flycatchers. Nature. 2012, 491: 756-760.PubMedGoogle Scholar
- Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, Swofford R, Pirun M, Zody MC, White S, Birney E, Searle S, Schmutz J, Grimwood J, Dickson MC, Myers RM, Miller CT, Summers BR, Knecht AK, Brady SD, Zhang H, Pollen AA, Howes T, Amemiya C, Lander ES, Di Palma F, Lindblad-Toh K, Kingsley DM, Broad Institute Genome Sequencing Platform & Whole Genome Assembly Team: The genomic basis of adaptive evolution in threespine sticklebacks. Nature. 2012, 484: 55-61. 10.1038/nature10944.PubMed CentralPubMedView ArticleGoogle Scholar
- THGSC: Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 2012, 487: 94-98.Google Scholar
- Hill GE: A red bird in a brown bag: the function and evolution of colorful plumage in the house finch. 2002, New York: Oxford University PressView ArticleGoogle Scholar
- del Hoyo J, Elliot A, Christie DA: Handbook of the Birds of the World. Weavers to New World Warblers, Volume 15. 2010, Barcelona: Lynx EdicionsGoogle Scholar
- Elliott JJ, Arbib RS: Origin and status of the house finch in the eastern United States. Auk. 1953, 70: 31-37.View ArticleGoogle Scholar
- The house finch (Carpodacus mexicanus). The birds of North America online. [http://bna.birds.cornell.edu/bna.html/species/046]
- Ley DH, Berkhoff JE, McLaren JM: Mycoplasma gallisepticum isolated from house finches (Carpodacus mexicanus) with conjunctivitis. Avian Dis. 1996, 40: 480-483. 10.2307/1592250.PubMedView ArticleGoogle Scholar
- Hawley DM, Osnas EE, Dobson AP, Hochachka WM, Ley DH, Dhondt AA: Parallel patterns of increased virulence in a recently emerged wildlife pathogen. PLoS Biol. 2013, 11: e1001570-10.1371/journal.pbio.1001570.PubMed CentralPubMedView ArticleGoogle Scholar
- Hochachka WM, Dhondt AA: Density-dependent decline of host abundance resulting from a new infectious disease. Proc Natl Acad Sci U S A. 2000, 97: 5303-5306. 10.1073/pnas.080551197.PubMed CentralPubMedView ArticleGoogle Scholar
- Dhondt AA, Tessglia DL, Slothower RL: Epidemic mycoplasmal conjunctivitis in house finches from eastern North America. J Wildl Dis. 1998, 34: 265-280. 10.7589/0090-3558-34.2.265.PubMedView ArticleGoogle Scholar
- The house finch disease survey. [http://www.birds.cornell.edu/hofi/news.html]
- Bonneaud C, Balenger SL, Russell AF, Zhang J, Hill GE, Edwards SV: Rapid evolution of disease resistance is accompanied by functional changes in gene expression in a wild bird. Proc Natl Acad Sci U S A. 2011, 108 (19): 7866-7871. 10.1073/pnas.1018580108.PubMed CentralPubMedView ArticleGoogle Scholar
- Hill GE: Plumage coloration is a sexually selected indicator of male quality. Nature. 1991, 350: 337-339. 10.1038/350337a0.View ArticleGoogle Scholar
- Badyaev AV, Hill GE: The evolution of sexual dimorphism in the house finch. I. Population divergence in morphological covariance structure. Evolution. 2000, 54: 1784-1794. 10.1111/j.0014-3820.2000.tb00722.x.PubMedView ArticleGoogle Scholar
- Badyaev AV, Hill GE, Beck ML, Dervan AA, Duckworth RA, McGraw KJ, Nolan PM, Whittingham LA: Sex-biased hatching order and adaptive population divergence in a passerine bird. Science. 2002, 295: 316-318. 10.1126/science.1066651.PubMedView ArticleGoogle Scholar
- Hill GE, Farmer KL: Carotenoid-based plumage coloration predicts resistance to a novel parasite in the house finch. Naturwissenschaften. 2005, 92: 30-34. 10.1007/s00114-004-0582-0.PubMedView ArticleGoogle Scholar
- Hill GE, Farmer KL, Beck ML: The effect of mycoplasmosis on carotenoid plumage coloration in male house finches. J Exp Biol. 2004, 207: 2095-2099. 10.1242/jeb.00998.PubMedView ArticleGoogle Scholar
- Wang Z, Farmer K, Hill GE, Edwards SV: A cDNA macroarray approach to parasite-induced gene expression changes in a songbird host: genetic response of house finches to experimental infection by Mycoplasma gallisepticum. Mol Ecol. 2006, 15: 1263-1273.PubMedView ArticleGoogle Scholar
- Bonneaud C, Balenger SL, Zhang J, Edwards SV, Hill GE: Innate immunity and the evolution of resistance to an emerging infectious disease in a wild bird. Mol Ecol. 2012, 21: 2628-2639. 10.1111/j.1365-294X.2012.05551.x.PubMedView ArticleGoogle Scholar
- Backström N, Shipilina D, Blom MPK, Edwards SV: Cis-regulatory sequence variation and association with Mycoplasma load in natural populations of the house finch (Carpodacus mexicanus). Ecol Evol. 2013, 3: 655-666. 10.1002/ece3.484.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang Z, Baker AJ, Hill GE, Edwards SV: Reconciling actual and infected population histories in the house finch (Carpodacus mexicanus) by AFLP analysis. Evolution. 2003, 37: 2852-2864.View ArticleGoogle Scholar
- Hawley DM, Fleischer RC: Contrasting epidemic histories reveal pathogen-mediated balancing selection on class II MHC diversity in a wild songbird. PLoS One. 2012, 7 (1): e30222-10.1371/journal.pone.0030222.PubMed CentralPubMedView ArticleGoogle Scholar
- Hawley DM, Hanley D, Dhondt AA, Lovette IJ: Molecular evidence for a founder effect in invasive house finch (Carpodacus mexicanus) populations experiencing an emergent disease epidemic. Mol Ecol. 2006, 15: 263-275.PubMedView ArticleGoogle Scholar
- Alcaide M, Bonneaud C, Backström N, Liu M, Edwards SV: Geographic and diachronic analysis of Toll-like receptor variation in an invasive species, the house finch (Haemorhous mexicanus). [In Prep]
- Alcaide M, Edwards SV: Molecular evolution of the Toll-like receptor multigene family in birds. Mol Biol Evol. 2011, 28: 1703-1715. 10.1093/molbev/msq351.PubMedView ArticleGoogle Scholar
- Backström N, Zhang Q, Edwards SV: Evidence from a house finch (Haemorhous mexicanus) spleen transcriptome for adaptive evolution and biased gene conversion in passerine birds. Mol Biol Evol. 2013, 30: 1046-1050. 10.1093/molbev/mst033.PubMedView ArticleGoogle Scholar
- Davis AK, Hood WR, Hill GE: Prevalence of blood parasites in eastern versus western house finches: are eastern birds resistant to infection?. Ecohealth. 2013, In PressGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Ekblom R, Balakrishnan CN, Burke T, Slate J: Digital gene expression analysis of the zebra finch genome. BMC Genomics. 2010, 11: 219-10.1186/1471-2164-11-219.PubMed CentralPubMedView ArticleGoogle Scholar
- Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G: CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007, 35 (Web Server issue): W345-W349.PubMed CentralPubMedView ArticleGoogle Scholar
- Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC bioinformatics. 2011, 12: 323-10.1186/1471-2105-12-323.PubMed CentralPubMedView ArticleGoogle Scholar
- Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11: R106-10.1186/gb-2010-11-10-r106.PubMed CentralPubMedView ArticleGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralPubMedView ArticleGoogle Scholar
- Watterson GA: On the number of segregating sites in genetic models without recombination. Theor Pop Biol. 1975, 7: 256-276. 10.1016/0040-5809(75)90020-9.View ArticleGoogle Scholar
- Phillips T: Regulation of transcription and gene expression in eukaryotes. Nat Edu. 2008, 1 (1): 199-Google Scholar
- Hawley DM, Briggs J, Dhondt AA, Lovette IJ: Reconciling molecular signatures across markers: mitochondrial DNA confirms founder effect in invasive North American house finches (Carpodacus mexicanus). Cons Genet. 2008, 9: 637-643. 10.1007/s10592-007-9381-4.View ArticleGoogle Scholar
- Peterson MP, Whittaker DJ, Ambreth S, Sureshchandra S, Buechlein A, Podicheti R, Choi J-H, Lai Z, Mockatis K, Colbourne J, Tang H, Ketterson ED: De novo transcriptome sequencing in a songbird, the dark-eyed junco (Junco hyemalis) - genomic tools for an ecological model system. BMC Genomics. 2012, 13: 305-10.1186/1471-2164-13-305.PubMed CentralPubMedView ArticleGoogle Scholar
- Santure AW, Gratten J, Mossman JA, Sheldon BC, Slate J: Characterisation of the transcriptome of a wild great tit Parus major population by next generation sequencing. BMC Genomics. 2011, 12: 283-10.1186/1471-2164-12-283.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang B, Ekblom R, Castoe TA, Jones EP, Kozma R, Bongcam-Rudloff E, Pollock DD, Hoglund J: Transcriptome sequencing of black grouse (Tetrao tetrix) for immune gene discovery and microsatellite development. Open Biol. 2012, 2: 120054-10.1098/rsob.120054.PubMed CentralPubMedView ArticleGoogle Scholar
- John JL: The avian spleen: a neglected organ. Quart Rev Biol. 1994, 69: 327-351. 10.1086/418649.PubMedView ArticleGoogle Scholar
- Brown JW, Rest JS, Garcia-Moreno J, Sorenson MD, Mindell DP: Strong mitochondrial DNA support for a Cretaceous origin of modern avian lineages. BMC Biol. 2008, 6: e6-10.1186/1741-7007-6-6.View ArticleGoogle Scholar
- Hill GE, Fu X, Balenger S, McGraw KJ, Giraudeau M, Hood WR: Changes in concentrations of circulating heat-shock proteins in House Finches in response to different environmental stressors. J Field Ornithol. 2013, 84: 416-424. 10.1111/jofo.12040.View ArticleGoogle Scholar
- Felsenstein J: Accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci?. Mol Biol Evol. 2001, 23: 691-700.View ArticleGoogle Scholar
- Carling MD, Brumfield RT: Gene sampling strategies for multi-locus population estimates of genetic diversity. PLoS One. 2007, 2: e160-10.1371/journal.pone.0000160.PubMed CentralPubMedView ArticleGoogle Scholar
- Ohta T: The nearly neutral theory of molecular evolution. Ann Rev Ecol Evol Syst. 1992, 23: 263-286. 10.1146/annurev.es.23.110192.001403.View ArticleGoogle Scholar
- Nei M: Bottlenecks, genetic polymorphism and speciation. Genetics. 2005, 170: 1-4.PubMed CentralPubMedGoogle Scholar
- Gemayel R, Vinces MD, Legendre M, Verstrepen KJ: Variable tandem repeats accelerate evolution of coding and regulatory sequences. Ann Rev Genet. 2010, 44: 445-477. 10.1146/annurev-genet-072610-155046.PubMedView ArticleGoogle Scholar
- Alcaide M, Bonneaud C, Backström N, Liu M, Edwards SV: Geographic and diachronic analysis of Toll-like receptor variation in an invasive species, the house finch (Carpodacus mexicanus). 2011, [Manuscript In Preparation]Google Scholar
- Carroll SB: Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell. 2008, 134: 25-36. 10.1016/j.cell.2008.06.030.PubMedView ArticleGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CP: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456: 470-476. 10.1038/nature07509.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim H, Klein R, Majewski J, Ott J: Estimating rates of alternative splicing in mammals and invertebrates. Nat Genet. 2004, 36: 915-916. 10.1038/ng0904-915.PubMedView ArticleGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth. 2008, 5: 621-628. 10.1038/nmeth.1226.View ArticleGoogle Scholar
- Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003, 302: 2141-2144. 10.1126/science.1090100.PubMedView ArticleGoogle Scholar
- Kim E, Magen A, Ast G: Different levels of alternative splicing among eukaryotes. Nucleic Acids Res. 2007, 35: 125-131. 10.1093/nar/gkm529.PubMed CentralPubMedView ArticleGoogle Scholar
- Harrington ED, Boue S, Valcarcel J, Reich JG, Bork P: A reply to: Estimating rates of alternative splicing in mammals and invertebrates by Kim et al. Nat Genet. 2004, 36: 916-917. 10.1038/ng0904-916.View ArticleGoogle Scholar
- Brett D, Pospisil H, Valcárcel J, Reich J, Bork P: Alternative splicing and genome complexity. Nature. 2002, 30: 29-30.Google Scholar
- Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, Kim T, Misquitta-Ali CM, Wilson MD, Kim PM, Odom DT, Frey BJ, Blencowe BJ: The evolutionary landscape of alternative splicing in vertebrate species. Science. 2012, 338: 1587-1593. 10.1126/science.1230612.PubMedView ArticleGoogle Scholar
- Vijay N, Poelstra JW, Kunstner A, Wolf JB: Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol. 2013, 22: 620-634. 10.1111/mec.12014.PubMedView ArticleGoogle Scholar
- Blencowe BJ: Alternative splicing: new insights from global analyses. Cell. 2006, 126: 37-47. 10.1016/j.cell.2006.06.023.PubMedView ArticleGoogle Scholar
- Cusack BP, Wolfe KH: Changes in alternative splicing of human and mouse genes are accompanied by faster evolution of constitutive exons. Mol Biol Evol. 2005, 22: 2198-2208. 10.1093/molbev/msi218.PubMedView ArticleGoogle Scholar
- Nurtdinov RN, Neverov AD, Favorov AV, Mironov AA, Gelfand MS: Conserved and species-specific alternative splicing in mammalian genomes. BMC Evol Biol. 2007, 7: 249-10.1186/1471-2148-7-249.PubMed CentralPubMedView ArticleGoogle Scholar
- Chen FC, Pan CL, Lin HY: Independent effects of alternative splicing and structural constraint on the evolution of mammalian coding exons. Mol Biol Evol. 2012, 29: 187-193. 10.1093/molbev/msr182.PubMedView ArticleGoogle Scholar
- Nolan PM, Hill GE, Stoehr AM: Sex, size, and plumage redness predict house finch survival in an epidemic. Proc Roy Soc B. 1998, 265: 961-965. 10.1098/rspb.1998.0384.View ArticleGoogle Scholar
- Smeds L, Künstner A: ConDeTri - a content dependent read trimmer for Illumina data. PLoS One. 2011, 6: e26314-10.1371/journal.pone.0026314.PubMed CentralPubMedView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotech. 2011, 29: 644-652. 10.1038/nbt.1883.View ArticleGoogle Scholar
- Frith MC, Pheasant M, Mattick JS: The amazing complexity of the human transcriptome. Eur J Hum Genet. 2005, 13: 894-897. 10.1038/sj.ejhg.5201459.PubMedView ArticleGoogle Scholar
- Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al: Ensembl 2012. Nucleic Acids Res. 2012, 40 (Database issue): D84-D90.PubMed CentralPubMedView ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995, 57: 12-Google Scholar
- Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Meth. 2012, 9: 357-359. 10.1038/nmeth.1923.View ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralPubMedView ArticleGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMedGoogle Scholar
- Holsinger KE, Weir BS: Genetics in geographically structured populations: defining, estimating and interpreting F(ST). Nat Rev Genet. 2009, 10: 639-650. 10.1038/nrg2611.PubMedView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralPubMedView ArticleGoogle Scholar
- Kasprzyk A: BioMart - driving a paradigm change in biological data management. Database. 2011, 2011: bar049-PubMed CentralPubMedView ArticleGoogle Scholar
- Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22: 1600-1607. 10.1093/bioinformatics/btl140.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.