Gene expression data
Eight different chicken tissues were used for the analysis of whole genome gene expression profiles using chicken 20 k oligonucleotide microarrays (GEO [24] accession GPL8861, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=tjwjpscyceqawjk&acc=GPL8861). All array probes were designed from known transcripts and ESTs based on the chicken genome assembly WASHUC1 (Dec. 2004), and a stringent selection of probes was performed before the analysis. A total of 7,477 probes failed to map to unique chicken Ensembl genes, and these were excluded to avoid the introduction of additional noise into the analysis. In total, 11,361 chicken Ensembl gene IDs located on 27 chromosomes were included in the expression study. These 27 chromosomes cover over 90% of the chicken genome, and include all macro-chromosomes and many of the micro-chromosomes. The number of Ensembl genes on each of these chromosomes is shown in Figure 1. On average, about 70% of all the known ensemble genes on each of these 27 chromosomes were included in this analysis.
In this study, we define the chicken transcriptome map as the median expression levels of the 11,361 chicken Ensembl genes across eight tissues on 27 chromosomes. The start position of the first Ensembl gene and the end position of the last Ensembl gene on each chromosome were considered the start and end of each chicken chromosome. The combined size of the chromosomal sequences analyzed in this study is 1,022,830,111 bp, which covers 97% of the total length of build 2 (WASHUC2, May 2006) of the chicken (Gallus gallus) genome.
Regional differences of transcription in the chicken genome
To create the chicken transcriptome map, the Ensembl genes were ordered based on the middle positions of the genes on each chromosome, and a robust scatter plot smoothing (running median) technique was applied to the median expression values of the genes on each chromosome (see Materials and Methods for details). The resulting transcriptome map revealed clusters of highly expressed genes on all chicken chromosomes (Figure 2). Marked differences were observed in the overall expression levels of the different chicken chromosomes, with GGA 2, GGA14 and GGAZ showing relatively lower overall gene expression compared to the other chromosomes. Furthermore, the gene expression levels of the micro-chromosomes were observed to be higher than those of intermediate- and macro-chromosomes; the median expression level of each chromosome was observed to decrease with increased chromosome size (Figure 3). Interestingly, the sex chromosome GGAZ shows an extremely low median expression level.
To further investigate the unequal distribution of gene transcription activity along chicken chromosomes, we selected regions with clusters of the most highly expressed genes and regions with clusters of most lowly expressed genes, such that each region type covered approximately ten percent of the chicken genome. To be consistent with previous studies in humans [8, 9], here we use the terms "RIDGE" and "anti-RIDGE" to refer to regions showing the highest and lowest expression levels, respectively, in the chicken genome. Similar to Caron et al. [8], we define RIDGEs in the chicken genome as genomic regions with at least 10 consecutive running medians larger than 1.19 times the median expression of the chicken transcriptome, i.e. all 11,361 Ensembl genes. With a running median of a window size of 39 genes, we identified 64 RIDGEs in the chicken genome that cover approximately 10% of the genome. Using the same window size, we identified 27 anti-RIDGEs, which cover approximately 10% of the chicken genome; these anti-RIDGEs are defined as genomic regions with at least 10 consecutive running medians smaller than 0.78 times the median expression of the chicken transcriptome. The total number of Ensembl genes located in RIDGEs and anti-RIDGEs is 3,260 and 1,051, respectively. The mean of the median expression values of genes located in RIDGEs across the tissue panel is approximately 1.8 times higher than that of genes in anti-RIDGEs (Additional file 1). More detailed information of RIDGEs and anti-RIDGEs can be found in Additional file 1.
The distribution of the expression of the genes located in RIDGEs and anti-RIDGEs is shown in Figure 4. The majority of genes in anti-RIDGEs is below 7 (the log2 transformed intensities of the green channel). This is in strong contrast with the distribution observed for RIDGEs, which show a much broader distribution; furthermore, the majority of genes in RIDGEs show an expression above 7 (the log2 transformed intensities of the green channel).
Transcriptome maps in different tissues are highly correlated
To next evaluate transcriptome maps of different types of tissues, we created transcriptome maps for each individual tissue type by applying a running median on expression values within each tissue using a window size of 39 genes. Chromosome 1 is shown in Figure 5 as an example, and the transcriptome maps for the different tissues were observed to be very similar. We performed a correlation test between the transcriptome map created using the median expression values across the eight tissues and the transcriptome maps created using the expression values from each tissue type. All transcriptome maps are highly correlated, with an average correlation of 0.88. All pair-wise correlations were highly significant, with p-values less than 2.2 × 10-16. (All pair-wise correlations between the tissue-specific transcriptome maps are shown in Additional file 2).
Random permutation tests of RIDGE identification
To test the significance of the number of RIDGEs identified in our analysis, we performed random permutation tests using the same window size and threshold for RIDGE identification. In total, 10,000 random transcriptome maps were generated by permutating the gene orders throughout the genome. The permutation tests, shown in Additional file 3, clearly show that the number of RIDGEs identified in our analysis is higher than would have been expected merely by chance (i.e. that 4.7% of random permutations gave higher numbers of RIDGEs than that observed).
RIDGEs are relatively conserved between chicken and human
The observation that highly expressed genes tend to be clustered within RIDGEs in the chicken as well as the human genome suggests a conserved functional organization of the genome of these vertebrates. We therefore decided to assess whether genes in RIDGEs remain associated during evolution. Thus, we consider two different forms of functional constraint. The first possibility is that specific genes within a particular RIDGE need to be co-regulated; in this case, one would expect relatively few syntenic breaks to occur within the RIDGEs. The other possibility is that genes do not need to co-localize with specific genes, but rather remain spatially associated with other highly expressed genes in general. In this case, one would expect syntenic breaks to occur specifically between two different RIDGEs. Random rearrangements of RIDGEs and anti-RIDGEs, on the other hand, would reduce the clustering of genes, and therefore abolish the effect of regional regulation of transcription. First we tested if the observed RIDGEs were less prone to be broken down during evolution from chicken to human. Previous studies comparing the human, mouse, rat, and chicken genomes identified a total of 586 conserved synteny blocks [25]. Because the identification of these synteny blocks was based on chicken genome assembly WASHUC1 (Dec. 2004), we mapped the ends of these syntenic blocks to the current chicken genome assembly (WASHUC2, May 2006) (Additional file 4), and considered each end as an evolutionary break point. In total, we mapped 1130 break points on the WASHUC2 chicken genome assembly; we found 253 break points within RIDGEs, and 50 break points within anti-RIDGEs. Chi-square tests showed a significantly higher average number of break points in RIDGEs compared to regions outside RIDGEs (p value < 2.2 × 10-16) and a significantly lower number of break points in anti-RIDGEs compared to regions outside anti-RIDGEs (p value = 4.18 × 10-10) (Additional file 5).
To compare the transcriptome maps between chicken and human, we downloaded human gene expression data for the same types of tissues (see Materials and Methods) from the Human Transcriptome Map website [26]. Using the median of the expression values across the seven human tissues for each human gene, we performed an identical analysis on the human data as the chicken expression data to identify RIDGEs and anti-RIDGEs in the human genome. Similar to the chicken, in the human genome, RIDGEs and anti-RIDGEs each cover about ten percent of the genome. Defining the syntenic break points in the human genome using data described by Bourque et al. [25], we found a total of 143 and 86 break points in RIDGEs and anti-RIDGEs, respectively. Again, similar to results seen in the chicken, chi-square tests show a higher average number of break points in RIDGEs compared to regions outside of RIDGEs (p value = 0.01) and a lower number of break points in anti-RIDGEs compared to outside anti-RIDGEs (p value = 0.002) (Additional file 5).
We identified 46 RIDGE-to-RIDGE break points and 11 anti-RIDGE-to-anti-RIDGE break points between the chicken and human genomes. Chi-square tests showed a significantly higher number of RIDGE-to-RIDGE break points between the chicken and human genomes (p value < 2.2 × 10-16) compared to that expected by chance, and no significant difference in the number of anti-RIDGE-to-anti-RIDGE break points (p value = 0.8).
Genomic characteristics of RIDGEs and anti-RIDGEs in chicken
Next we evaluated whether RIDGEs and anti-RIDGEs were associated with other genome characteristics. Positive correlations were found between chicken transcriptome map and gene density (p value < 2.2 × 10-16), GC content (p value < 2.2 × 10-16) and average intron length (p value < 2.2 × 10-16). As an example, the whole chromosome views of the transcriptome map, gene density, GC content, gene length, average intron length and recombination rate are shown for chromosome 1 (Figure 6); these various parameters were similar in RIDGEs and anti-RIDGEs. To further investigate the specific genomic characteristics of RIDGEs and anti-RIDGEs, we compared the average intron length (averaged intron length of all transcripts per gene), gene length (genomic length), gene density (number of genes per 100 kb), and GC content between genes located in RIDGEs and anti-RIDGEs (Figure 7). Compared to the entire chicken genome, RIDGEs, on average, harbor genes with shorter average intron length (p value < 2.2 × 10-16), shorter gene length (p value < 2.2 × 10-16), and a higher GC content (p value < 2.2 × 10-16). Anti-RIDGEs, on the other hand, show opposite trends, with genes with longer average intron length (p value < 2.2 × 10-16), longer gene length (p value < 2.2 × 10-16), and lower GC content (p value < 2.2 × 10-16). Furthermore, RIDGEs also have a significantly higher gene density (p value = 1.29 × 10-9) than anti-RIDGEs.
Gene Ontology term enrichment analysis for genes in RIDGEs and anti-RIDGEs
Our results indicate that RIDGEs are relatively conserved between human and chicken. Assuming RIDGEs are the result of evolutionary events favoring the clustering of genes with higher expression levels, one can hypothesize that genes within RIDGEs may share similar functions or biological pathways. To investigate this possibility, we performed Gene Ontology (GO) [27] term enrichment analysis on genes located in RIDGEs and anti-RIDGEs using R package GOstats [28]. However, no significant GOBP terms (the minimum FDR of all three tests is 0.4) were found for genes in RIDGEs and anti-RIDGEs after correcting for multiple testing (Additional file 6).