Regional regulation of transcription in the chicken genome
© Nie et al. 2010
Received: 6 August 2009
Accepted: 14 January 2010
Published: 14 January 2010
Skip to main content
© Nie et al. 2010
Received: 6 August 2009
Accepted: 14 January 2010
Published: 14 January 2010
Over the past years, the relationship between gene transcription and chromosomal location has been studied in a number of different vertebrate genomes. Regional differences in gene expression have been found in several different species. The chicken genome, as the closest sequenced genome relative to mammals, is an important resource for investigating regional effects on transcription in birds and studying the regional dynamics of chromosome evolution by comparative analysis.
We used gene expression data to survey eight chicken tissues and create transcriptome maps for all chicken chromosomes. The results reveal the presence of two distinct types of chromosomal regions characterized by clusters of highly or lowly expressed genes. Furthermore, these regions correlate highly with a number of genome characteristics. Regions with clusters of highly expressed genes have higher gene densities, shorter genes, shorter average intron and higher GC content compared to regions with clusters of lowly expressed genes. A comparative analysis between the chicken and human transcriptome maps constructed using similar panels of tissues suggests that the regions with clusters of highly expressed genes are relatively conserved between the two genomes.
Our results revealed the presence of a higher order organization of the chicken genome that affects gene expression, confirming similar observations in other species. These results will aid in the further understanding of the regional dynamics of chromosome evolution.
The microarray data used in this analysis have been submitted to NCBI GEO database under accession number GSE17108. The reviewer access link is: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=tjwjpscyceqawjk&acc=GSE17108
Gene expression in eukaryotes is regulated on two different levels, i.e. individual gene level and regional level in the genome. The best studied, and generally considered the major level of regulation, is the regulation at the level of individual genes. Although a number of well studied exceptions have identified a number of tightly co-regulated gene clusters, such as the globin, MHC and the Hox gene gene clusters [1–4], the common model for eukaryotic gene transcription involves the binding of several transcription factors (TFs) to promoter regions and enhancers, resulting in activation of the individual genes. It has become increasingly evident that in addition to gene regulation by TF binding to regulatory sequences, eukaryotic gene expression is also regulated at a higher level, and several studies have demonstrated the dependency of gene expression on the location of the gene within the genome [5–7].
Over the past years, the relationship between gene transcription and chromosomal location has been studied in a number of different vertebrate genomes. Analysis of the human transcriptome map based on SAGE (serial analysis of gene expression) data from 12 human tissues  revealed the clustering of highly expressed genes within specific chromosomal regions; these regions were termed "RIDGEs", or " R egions of I ncrease d G ene E xpression". Genomic regions containing genes expressed at much lower levels were termed anti-RIDGEs, and these regions exhibit characteristics opposite those of RIDGEs [8, 9]. A similar region-wide regulation of gene expression was later reported in the Drosophila genome [10, 11]. RIDGEs were also found in the mouse genome  and are reported to be relatively conserved between the mouse and human genome . A later study  showed gene expression to be regulated at a region-wide level in the human genome. Insertion of green fluorescent protein (GFP) reporter constructs at 90 different chromosomal positions in the human genome showed that gene transcription was regulated through a novel region-wide regulatory mechanism as well as via specific transcription factors, thereby demonstrating dual mechanisms in the regulation of gene transcription.
Regional differences in gene expression have been found in two distinct clades (mammals and flies) of the metazoan phylogeny, suggesting a common mechanism of regulation of transcription in all animals. Other characteristics of eukaryotic genomes such as gene density and recombination have also been implied to exhibit domain-like features . In addition, levels of gene expression have been found to correlate with time of chromatin replication during the cell cycle, i.e. the early replication of actively expressed regions of the genome . Striking in this respect is the observed location of gene-dense and highly expressed chromosomes towards the center of the nucleus and the location of gene-poor and weakly expressed chromosomes towards the nuclear envelope in both human  and chicken cells . Furthermore, in chicken, this spatial organization seems to correlate with chromosome size .
The chicken genome sequence, published in 2004, was the first non-mammalian amniote genome to become available ; its karyotype (2 n = 78) consists of 38 autosomes and one pair of sex chromosomes, with the female being the heterogametic sex (ZW female, ZZ male). Thus far, there are 31 known chromosomes assembled in the chicken genome, including six macro-chromosomes (GGA1-5, Z), five intermediate-chromosomes (GGA6-10) and twenty micro-chromosomes (GGA11-28, 32, W) . The existence of micro-chromosomes is one of the interesting features of the chicken genome , micro-chromosomes are also found in some primitive amphibians [20, 21] and most reptiles . Besides the huge differences on sizes, microchromosomes also exhibit higher gene density, smaller gene size, and higher recombination rates compared with those in macrochromosomes [18, 23]. As the best-studied bird genome currently available, and the closest sequenced genome relative to mammals, the chicken genome is an important resource for comparative genomics, including comparative studies on gene transcription.
To investigate regional effects on transcription in birds, we analyzed chicken gene expression data across a number of different tissues to address three major questions: (i) if there are regional differences in the regulation of transcription in the chicken genome, (ii) if these regions are conserved during evolution, and (iii) the characteristics of these genomic regions in the chicken.
In this study, we define the chicken transcriptome map as the median expression levels of the 11,361 chicken Ensembl genes across eight tissues on 27 chromosomes. The start position of the first Ensembl gene and the end position of the last Ensembl gene on each chromosome were considered the start and end of each chicken chromosome. The combined size of the chromosomal sequences analyzed in this study is 1,022,830,111 bp, which covers 97% of the total length of build 2 (WASHUC2, May 2006) of the chicken (Gallus gallus) genome.
To further investigate the unequal distribution of gene transcription activity along chicken chromosomes, we selected regions with clusters of the most highly expressed genes and regions with clusters of most lowly expressed genes, such that each region type covered approximately ten percent of the chicken genome. To be consistent with previous studies in humans [8, 9], here we use the terms "RIDGE" and "anti-RIDGE" to refer to regions showing the highest and lowest expression levels, respectively, in the chicken genome. Similar to Caron et al. , we define RIDGEs in the chicken genome as genomic regions with at least 10 consecutive running medians larger than 1.19 times the median expression of the chicken transcriptome, i.e. all 11,361 Ensembl genes. With a running median of a window size of 39 genes, we identified 64 RIDGEs in the chicken genome that cover approximately 10% of the genome. Using the same window size, we identified 27 anti-RIDGEs, which cover approximately 10% of the chicken genome; these anti-RIDGEs are defined as genomic regions with at least 10 consecutive running medians smaller than 0.78 times the median expression of the chicken transcriptome. The total number of Ensembl genes located in RIDGEs and anti-RIDGEs is 3,260 and 1,051, respectively. The mean of the median expression values of genes located in RIDGEs across the tissue panel is approximately 1.8 times higher than that of genes in anti-RIDGEs (Additional file 1). More detailed information of RIDGEs and anti-RIDGEs can be found in Additional file 1.
To test the significance of the number of RIDGEs identified in our analysis, we performed random permutation tests using the same window size and threshold for RIDGE identification. In total, 10,000 random transcriptome maps were generated by permutating the gene orders throughout the genome. The permutation tests, shown in Additional file 3, clearly show that the number of RIDGEs identified in our analysis is higher than would have been expected merely by chance (i.e. that 4.7% of random permutations gave higher numbers of RIDGEs than that observed).
The observation that highly expressed genes tend to be clustered within RIDGEs in the chicken as well as the human genome suggests a conserved functional organization of the genome of these vertebrates. We therefore decided to assess whether genes in RIDGEs remain associated during evolution. Thus, we consider two different forms of functional constraint. The first possibility is that specific genes within a particular RIDGE need to be co-regulated; in this case, one would expect relatively few syntenic breaks to occur within the RIDGEs. The other possibility is that genes do not need to co-localize with specific genes, but rather remain spatially associated with other highly expressed genes in general. In this case, one would expect syntenic breaks to occur specifically between two different RIDGEs. Random rearrangements of RIDGEs and anti-RIDGEs, on the other hand, would reduce the clustering of genes, and therefore abolish the effect of regional regulation of transcription. First we tested if the observed RIDGEs were less prone to be broken down during evolution from chicken to human. Previous studies comparing the human, mouse, rat, and chicken genomes identified a total of 586 conserved synteny blocks . Because the identification of these synteny blocks was based on chicken genome assembly WASHUC1 (Dec. 2004), we mapped the ends of these syntenic blocks to the current chicken genome assembly (WASHUC2, May 2006) (Additional file 4), and considered each end as an evolutionary break point. In total, we mapped 1130 break points on the WASHUC2 chicken genome assembly; we found 253 break points within RIDGEs, and 50 break points within anti-RIDGEs. Chi-square tests showed a significantly higher average number of break points in RIDGEs compared to regions outside RIDGEs (p value < 2.2 × 10-16) and a significantly lower number of break points in anti-RIDGEs compared to regions outside anti-RIDGEs (p value = 4.18 × 10-10) (Additional file 5).
To compare the transcriptome maps between chicken and human, we downloaded human gene expression data for the same types of tissues (see Materials and Methods) from the Human Transcriptome Map website . Using the median of the expression values across the seven human tissues for each human gene, we performed an identical analysis on the human data as the chicken expression data to identify RIDGEs and anti-RIDGEs in the human genome. Similar to the chicken, in the human genome, RIDGEs and anti-RIDGEs each cover about ten percent of the genome. Defining the syntenic break points in the human genome using data described by Bourque et al. , we found a total of 143 and 86 break points in RIDGEs and anti-RIDGEs, respectively. Again, similar to results seen in the chicken, chi-square tests show a higher average number of break points in RIDGEs compared to regions outside of RIDGEs (p value = 0.01) and a lower number of break points in anti-RIDGEs compared to outside anti-RIDGEs (p value = 0.002) (Additional file 5).
We identified 46 RIDGE-to-RIDGE break points and 11 anti-RIDGE-to-anti-RIDGE break points between the chicken and human genomes. Chi-square tests showed a significantly higher number of RIDGE-to-RIDGE break points between the chicken and human genomes (p value < 2.2 × 10-16) compared to that expected by chance, and no significant difference in the number of anti-RIDGE-to-anti-RIDGE break points (p value = 0.8).
Our results indicate that RIDGEs are relatively conserved between human and chicken. Assuming RIDGEs are the result of evolutionary events favoring the clustering of genes with higher expression levels, one can hypothesize that genes within RIDGEs may share similar functions or biological pathways. To investigate this possibility, we performed Gene Ontology (GO)  term enrichment analysis on genes located in RIDGEs and anti-RIDGEs using R package GOstats . However, no significant GOBP terms (the minimum FDR of all three tests is 0.4) were found for genes in RIDGEs and anti-RIDGEs after correcting for multiple testing (Additional file 6).
The annotated genes on the array platform used in this study cover most of the current chicken genome assembly. The number of genes analyzed on each chromosome is also in good proportion with chromosome length (Figure 1), which suggests against a bias in the analysis due to uneven distribution of the genes in the chicken genome. We chose to exclude chromosome 16 and 25 from our analysis, as only 24 and 59 Ensembl genes are represented on the array; this number is too low to identify any meaningful high or low expressing regions with the window size of 39 genes used in this analysis.
We observed high correlations (average correlation = 0.88) among the different transcriptome maps based on the expression data from the eight different individual tissues as well as between these transcriptome maps and the transcriptome map of the combined expression data of all eight tissues. This indicates that use of the median expression value or the expression values from individual tissues only has a minor effect on the transcriptome maps and on the identification of RIDGEs and anti-RIDGEs. This shows that regional differences in transcription are a general trend in the chicken genome, even among different tissue types.
This is the first study in birds to construct a transcriptome map and to confirm the existence of regional differences on transcription regulation in the chicken genome. RIDGEs have been discovered in several animal species from phylogenetically distinct groups, suggesting that the existence of RIDGEs may be universal in the animal kingdom [8, 10–14].
Gierman et al.  showed that RIDGEs contain up to 80 genes and can exert an eightfold difference on the expression levels of integrated genes. They found that gene expression levels are not highly correlated to adjacent genes, but instead more correlated to the entire block of up to 80 genes, demonstrating regional effects on gene transcription. The exact mechanism underlying how gene expression occurs in RIDGEs is still unknown. One hypothesis is that evolution favors highly expressed genes to be physically close to each other, as transcription of one gene would help the chromatin of neighboring genes to "open up" during transcription. This hypothesis is in agreement with our observation of no apparent evolutionary constraint on the co-localization of specific genes, whereas we observed specific localization of specific genes within RIDGEs (see below). Goetze et al.  showed that RIDGEs in general are less condensed, more irregularly shaped, and are located more closely to the nuclear center than anti-RIDGEs. Furthermore, the chromatin structures of RIDGEs and anti-RIDGEs are largely independent of tissue-specific variations in gene expression and differentiation state. Their discovery again confirms the hypothesis that the different regional effect of gene transcription in RIDGEs and anti-RIDGEs is, at least in part, explained by the chromatin structure of the two types of genomic regions.
Many studies have shown that chicken genome characteristics such as recombination frequency, gene density and GC density correlate with chromosome size [18, 23]. Our results show a similar trend with regard to the level of gene expression and density of RIDGEs. In the chicken, the median expression values decrease with increased chromosome length (Figure 3), which can only be partly explained by the higher gene density of the micro-chromosomes. Our permutation analysis clearly shows that the organization of genes in clusters of highly expressed genes is not random and suggests a functional mechanism. This is further strengthened by our observation that the same distribution of RIDGEs is seen when both strands of the same chromosome are analyzed separately (Figure 6). This is additional confirmation of region-like regulation of transcription during gene expression, since the opening of chromatin structures during gene expression will affect both strands by facilitating the access of transcription factors to target genes, thus enhancing gene expression in that region. Furthermore, we also found a correlation between the transcriptome maps and gene density, GC content, gene length, average intron length, "minimal intron" density, and recombination rate in the chicken genome (Figure 6). A correlation between recombination rate and GC content in the chicken genome has been recently reported , and these authors therefore link recombination rate with the transcriptome map, as reported in the current study. This can be explained by the more open chromatin structure of the transcriptionally active RIDGEs, which would also facilitate recombination within these regions. Furthermore, "minimal introns" have been reported to be GC-rich and to enhance the rate at which mRNA is exported from the cell nucleus  (Yu et al. 2002). These findings link the "minimal introns" distribution via GC content with the transcriptome map in the current study. This can be explained, at least in part, by the need for efficient export of highly expressed mRNA from the nucleus. Many genomic characteristics in eukaryotic genomes, such as RIDGEs, early replication and recombination, appear to be linked. RIDGEs are associated with higher expression, higher gene density, higher GC content, shorter gene introns, shorter genes, higher "minimal intron" density, and higher recombination rate (Figure 6). This is congruent in human studies, in which similar correlations were found . Shorter introns and shorter genes in RIDGEs may indicate the need for increased transcription efficiency. Castillo-Davis et al.  showed that introns in highly expressed genes are substantially shorter than those in genes that are expressed at low levels in the human genome, and the authors hypothesized that transcription efficiency is enhanced when intron length is shorter. The clustering of highly expressed genes in RIDGEs therefore would result in clustering of genes with, on average, shorter introns. Although GC content, gene density, gene length, average intron length, "minimal intron" distribution and recombination rate are all correlated with gene transcriptional activity in the chicken genome, the exact causative mechanisms of these relationships are still unknown.
In comparing evolutionary break points between RIDGEs and anti-RIDGEs, we found a higher number of break points within RIDGEs than anti-RIDGEs in both the chicken and the human genome. Similar as for recombination, it is possible that the more open chromatin structure within RIDGEs facilitates an increase in the likelihood of rearrangement events, and thus in an increase in the observed syntenic breaks.
Although RIDGEs clearly show an increase in the number of evolutionary break points, we also showed a significantly higher number of RIDGE-to-RIDGE break points between the chicken and human genomes. Hence, although RIDGEs are more prone to be interrupted by evolutionary break points, there still seems to be an evolutionary constraint that favors recombination between RIDGEs, i.e. the resulting parts of a "broken RIDGEs" from one species were more likely to stay together with a part of another broken RIDGE during genome evolution, thereby keeping specific genes together within RIDGEs. In other words genes within a RIDGE in one species are likely to end up in a RIDGE in another species even when syntenic rearrangements occur. There are in total 11,407 1-to-1 human-chicken homolog genes downloaded via biomaRt . Of these genes, 1,351 are located In RIDGEs and 857 genes are located in anti-RIDGEs in the human genome. 27% of these 1-to-1 human-chicken homolog genes (361 out of 1351 genes) located in human RIDGEs are also located in chicken RIDGEs (p-value smaller than 2.2 × 10-16, Chi-square tests). This again supports our hypothesis that genes within a RIDGE in one species are likely to end up in a RIDGE in another species.
This result suggests that the clustering of specific genes is not so much important, but rather the clustering of any genes that are highly expressed. The relative low number of syntenic breaks within anti-RIDGEs, on the other hand, might be linked to another feature of vertebrate chromosomes, namely the occurrence of regions with a relatively low number of genes, so called "gene deserts" . In particular, the so-called "stable gene deserts" colocalize with developmentally active genes and genes coding for transcription factors, both gene types that generally show relatively low levels of expression. These "stable gene deserts" showed extremely low numbers of syntenic breaks .
Our results clearly show the existence of a higher level organization of the vertebrate genome affecting not only the expression of genes but also other features such as recombination and genome rearrangements during evolution.
This is the first study describing a transcriptome map in birds. This study has revealed regional regulation of gene expression in chicken that is consistent with previous studies in flies and mammals [8, 10, 12]. Since features correlating with high regional transcription are more pronounced in the microchromosomes leading to overall higher expression compared to genes on the macrochromosomes. Our analysis on evolutionary break points shows that the regional regulation of gene transcription is relatively conserved between chicken and human. Given the evolutionary position of chicken on the phylogenetic tree, our results provide a unique perspective for future comparative studies on transcriptome maps between vertebrate species.
The gene expression data used in this analysis was obtained from a gene expression survey in chicken brain, bursa of Fabricius, kidney, liver, lung, small intestine, spleen and thymus, using the chicken 20 k oligonucleotide microarray (see below). Five biological replicates were used for each tissue type, resulting in a total of 40 arrays. Each individual sample was compared to the pooled reference, and data was normalized using the R  package limma . The mean expression value for each Ensembl gene was calculated for each tissue type, and the average expression value of each Ensembl gene was determined by calculating the median expression values across all eight tissues.
The microarray data have been deposited in the Gene Expression Omnibus (GEO) public repository . The accession number for the series is GSE17108, and the sample series can be retrieved with accession numbers from GSM427873 to GSM427912. The sample series contains the raw data (median signal) of each Cy5 (red) and Cy3 (green) channels as well as the normalized data for each microarray.
The chicken 20 k array was obtained from ARK-Genomics . The array design has been published in Gene Expression Omnibus with the platform name GPL8861 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=tjwjpscyceqawjk&acc=GPL8861.
The probe sequences of the chicken 20 k oligonucleotide microarray used in this study were designed based on chicken genome assembly WASHUC1 (Dec. 2004), and all sequences were mapped to the chicken genome assembly WASHUC2. An updated array re-annotation file based on Ensembl 50 is available at EADGENE Oligo Set Annotation Files homepage . Of the total 20,460 oligonucleotide probes on the chicken 20 k array, 13,431 mapped to unique locations in the chicken genome. All the probes for genes that mapped to chromosome "unknown" were excluded in the analysis, and all probes for genes on chromosome 16, 25, and W were excluded due to the very low number of probes that mapped to those chromosomes. For probes that mapped to the same known Ensembl gene ID , the expression data were averaged and assigned to the Ensembl gene. In total, in this study, 12,983 oligo probes were used that mapped to 11,361 unique chicken Ensembl gene IDs located on 27 chromosomes.
Individual gene expression data was ordered according to the middle position of the gene. A Robust Scatter Plot Smoothing (function runmed in R package stats) technique was applied to each chromosome separately, with a window size of 39 genes, i.e. the expression value of each gene was replaced by the median expression value of the neighboring 39 genes. Similar to the definition for RIDGEs in humans , here we defined a RIDGE by window size for calculating median expression, minimum length of the run, and the threshold for the lower limit of the median. The selection of window size of 39 genes was based on the following two points: 1) Permutation analysis performed by both Caron et al.  and our analysis indicated a window size of 39 genes gives a reasonable number of RIDGEs; 2) To be able to compare the results of RIDGE identification between human and chicken, we decided to use the same threshold as described by Caron et al. The bigger the window size is, the smaller number of RIDGEs will be identified as indicated in the permutation results in Additional file 3.
The threshold for RIDGEs was set to 1.19 times the genomic median value (the data are log2 transformed, and the values used here is the running median values of a window size of 39 genes) along the length of a run of at least 10 median values. The threshold used for anti-RIDGEs was a median expression of 0.78 times the genomic median. The thresholds used for the classification of the RIDGEs and anti-RIDGEs were chosen such that RIDGEs and anti-RIDGEs each cover 10% of the genome.
Spearman rank correlation test was performed to test for pairwise correlations among the transcriptome maps on all the chromosomes (applied to the running median with window size of 39 genes). The running median expression values are not normally distributed, and the non-parametric Spearman correlation test was used on the ranks of the paired transcriptome maps.
Random permutation tests were done in R by permuting the genomic locations of Ensembl genes and repeating the RIDGE analysis 10,000 times to create 10,000 random transcriptome maps. The number of RIDGEs identified in these 10,000 random transcriptome maps was compared to the actual number of identified RIDGEs in this analysis using the same threshold.
Human-chicken synteny block data from Bourque et al.  was used in this study, and genomic locations of synteny blocks from assembly WASHUC1 (Dec 2004) were mapped to assembly WASHUC2 (May 2006) using BLAT (see Additional file 4). Each end of every syntenic block was considered a break point, and the number of break points in RIDGEs and anti-RIDGEs was subsequently summarized.
Human Transcriptome Map data was downloaded from the HTM website . We selected Affymetrix U133A human whole genome array data from seven tissues (thymus, spleen, lung, small intestine, brain, liver, and kidney) from a healthy individual; data (normalized data) was log2 transformed and the median expression value across the seven different tissues was used to build the transcriptome map. RIDGEs and anti-RIDGEs were identified using the same approach as for the chicken data.
Genomic location, transcript length, exon number and GC content for the individual Ensembl chicken genes were downloaded from the Ensembl genome database using biomaRt . The averaged intron length was calculated by averaging the intron length of all transcripts per gene. The statistical test for differences in average intron length, gene length, gene density, and GC content between RIDGEs and anti-RIDGEs was performed using Wilcoxon rank-sum test (function Wilcox.test function in R package stats).
GO term enrichment analysis was performed using R package Gostats . The conditional algorithm was used for the hypergeometric test. The gene annotation package for the GOstats analysis was built using R package AnnotationDbi . Mapping of chicken Ensembl gene IDs and other genomic information (e.g. entrezgene) was performed using the R package biomaRt .
Major Histocompatibility Complex
Serial Analysis of Gene Expression
Regions of Increased Gene Expression
Green Fluorescent Protein
expressed sequence tag
This work was financed by European Union grant FOOD-CT-2004--506416 (EADGENE). We would like to acknowledge Aart Lammers and Basav Hangalapura from Wageningen University for helping with collection of chicken tissues, Richard Talbot from Ark-Genomics for providing the microarrays, Pieter Neerincx and Jack Leunissen from Wageningen University and Christophe Klopp and Pierrot Casel from INRA for help with updating the annotation for the chicken 20 k oligo array.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.