Skip to main content
  • Research article
  • Open access
  • Published:

Transcriptome map of mouse isochores

Abstract

Background

The availability of fully sequenced genomes and the implementation of transcriptome technologies have increased the studies investigating the expression profiles for a variety of tissues, conditions, and species. In this study, using RNA-seq data for three distinct tissues (brain, liver, and muscle), we investigate how base composition affects mammalian gene expression, an issue of prime practical and evolutionary interest.

Results

We present the transcriptome map of the mouse isochores (DNA segments with a fairly homogeneous base composition) for the three different tissues and the effects of isochores' base composition on their expression activity. Our analyses also cover the relations between the genes' expression activity and their localization in the isochore families.

Conclusions

This study is the first where next-generation sequencing data are used to associate the effects of both genomic and genic compositional properties to their corresponding expression activity. Our findings confirm previous results, and further support the existence of a relationship between isochores and gene expression. This relationship corroborates that isochores are primarily a product of evolutionary adaptation rather than a simple by-product of neutral evolutionary processes.

Background

The genomes of vertebrates are mosaics of isochores, long regions (from 0.2Mb up to several Mb) that are fairly homogeneous in base composition. The isochores belong to a small group of families characterized by different GC levels (molar ratio of guanine and cytosine over the total number of bases of the area) [1–4]. In the human genome, a typical mammalian genome, five isochore families can be found (L1, L2, H1, H2, and H3 -- in order of increasing GC level) that cover a wide GC range (30-60%) [2–4]. The GC-richest families, H2 and H3, represent approximately 15% of the genome, and contain about 50% of the protein-coding genes. This high gene density is accompanied by other striking properties, such as open chromatin structure, localization at the center of the nucleus, high density of short interspersed elements (SINES), low density of long interspersed elements (LINES), early replication, high level of recombination, high mutation rate, and higher expression level, while GC-poorer families have the opposite properties [2]. In the mouse genome, which is of interest in this study, the L1 isochore family is under-represented, compared to other vertebrates, and the H3 family is almost absent [5]. This narrow isochore distribution in the mouse genome has been interpreted as the result of a higher substitution rate [6, 7] and weak repair mechanism [8], both phenomena reducing compositional heterogeneity (see also [5]). Despite these differences, the distribution of genes is similar to that of the other vertebrates (gene density increases as GC level increases), and the average GC levels of the different families are remarkably conserved across species, reflecting a functional relation to the chromatin structure [5].

The emergence of the isochores is an open debate of relevant evolutionary importance, where in addition to the selectionist model (functional advantage [4]), other models attempt to explain the evolution of the isochores: the mutational bias [9], the GC-biased gene conversion [10, 11], as also a unifying one [12]. Despite the importance of this debate, our study is focused on investigating how base composition affects mammalian gene expression. Such a relationship would provide additional evidence on a functional implication of the isochores, supporting that they are mainly a product of evolutionary adaptation [2, 4], rather than a simple by-product of neutral evolutionary processes [9–11].

Previous studies have investigated the effects of base composition on gene expression, both in human and mouse tissues, through an exhaustive use of expression data from techniques based on sequencing (ESTs, SAGE, MPSS) and/or hybridization (microarrays, single-arrays, cDNA arrays) [13–21], and despite some quantitative differences, agree that the expression levels of genes are positively correlated with the GC level. Two recent studies [22, 23], through in silico compositional analysis of expression vectors and DNA carriers, showed that aside from the GC3 level (GC level in the third codon position) of the coding sequences, the genomic compositional context in which a gene is embedded affects its expression. Additionally, the Human Transcriptome Map (HTM), using SAGE data, revealed domains of highly and weakly expressed genes [24], namely the "RIDGES" and "anti-RIDGES", respectively. The former were found to be located in gene-dense, high GC-rich, and SINE-rich genomic regions, while the latter were in regions with opposite properties [15, 25]. The above reflect the partitioning of vertebrate genes into two types of genomic regions: the gene-rich regions ("genome core"), which correspond to the GC-rich isochores, and the gene-poor regions ("genome desert"), which correspond to the GC-poor isochores [2, 3, 26, 27]. In addition, when a similar to the HTM transcriptome map was established for the mouse genome, the expression patterns were found to be conserved to that of the human genome [28, 29]. Next-generation sequencing (NGS) techniques revolutionized transcriptome analyses and, compared to previous transcriptome technologies, appear to be characterized by several advantages, i.e. a better dynamic range (absence of background noise and signal saturation phenomena, although misaligned reads could be considered as background), better quantification of transcript levels and of their isoforms (absence of an upper limit to the quantification, detection of lowly expressed transcripts), identification of yet unknown coding and non-coding RNA species [30–32]. Moreover, NGS reduced the processing time and cost of sequencing by orders of magnitude, making it a more attractive tool in a broad range of research, for both DNA and RNA sequencing and for detection and analysis of genetic variability [33–36]. In this study, we took advantage of publicly available NGS data of three distinct mouse tissues [37] in order to investigate the expression patterns across the isochores of the mouse chromosomes and the effects of the isochores' compositional properties on their expression activity. In the second part, we investigated the relations between genes' expression levels and their localization in the five isochore families for the three transcriptomes considered (brain, liver, and muscle).

Results

The results of aligning each tissue's reads to the reference mouse genome and to the coding sequences are shown in Table 1.

Table 1 Aligned Reads

The transcriptome map of the mouse isochores and the effects of their GC level on their expression activity

Additional file 1 shows the isochores' expression profiles for the three tissues along the whole genome, and illustrates a rough agreement of the expression levels and the GC level. One such example can be clearly seen on chromosome 10 (Figure 1). The choice of this chromosome is based on the fact that it also includes one of the very few H3 isochores of the mouse genome, the 10 Mm62 (GC > 53% -- marked with a vertical line in the red box in Figure 1). In the boxed areas in Figure 1, there is a clear agreement of peaks in expression and GC level, an agreement that can also be seen along most of the chromosome. To quantify this relation, we looked at the correlation between the overall expression activity of each isochore and its respective GC level, and found it to be quite strong (coefficients: R brain = 0.72, R liver = 0.62, and R muscle = 0.65 -- see Additional file 2).

Figure 1
figure 1

Expression profiles of the isochore for the three tissues on chromosome 10. The Y axis measures the isochores' GC levels (positive values -- light blue line) and their respective expression levels (E L -- Equation (1)) for the brain, liver, and muscle tissues (negative values -- red, dark blue, and green lines, respectively). High expression corresponds to peaks in the lines. The red and black boxes highlight areas where the high GC level is clearly accompanied by high expression. The black vertical line in the red box marks the location of the 10 Mm62 H3 isochore.

It is well-known that in vertebrates, including the mouse, GC-richer isochores have higher gene densities compared to the GC-poorer ones (see the Background Section). This is confirmed by the positive linear correlation we found between the gene density of the isochores and their respective GC level (R = 0.42). Having shown the positive effect of high GC levels to the isochoric expression and between GC levels and gene density, we also looked into the direct relation between the gene density and the expression level of the individual isochores. We found a positive correlation, with similar coefficients for all tissues (coefficients: R brain = 0.57, R liver = 0.57, and R muscle = 0.58).

In order to isolate and investigate the effects of the GC level on the expression activity of the isochores, it was necessary to eliminate the effects of the gene density. To this end, the normalized per tissue count of reads aligned within each isochore was normalized by the respective gene density of the isochore, and the log2 values were calculated (Additional file 3). This approach limited our analysis to isochores containing at least one CDS (1, 902 isochores out of the 2, 319). As expected, we found that the percentage of isochores containing at least one CDS increased as the isochore family GC level increased (more than 60% of the L1 isochores contain no CDS against only 6% of the H2 isochores -- see Additional file 4). Notable exception to the trend is the H3 family, where an increase of isochores without any CDS is observed. However, this increasing trend in H3 isochore is due to the fact that in the mouse genome the H3 icoshores consists of just nine isochores, two of which had no CDS.

We then looked at the correlation between the expression level of the isochores, normalized by the respective gene density, and their respective GC levels of the isochores, and found it to be positive for all tissues (Figure 2).

Figure 2
figure 2

Correlation between the solely GC effects on the expression activity of each isochore. Correlation between the expression level (normalized by the gene density E D -- Equation (2)) of each isochore and the respective GC level (red plot for brain, blue plot for liver, and green plot for muscle).

Summarizing, in this section, we initially presented the transcriptome map of the mouse isochores, and demonstrated an agreement between isochores GC level and their expression levels. Finally, after gene density effects were removed from the isochores expression levels, we found a tissue-dependent correlation between the isochores GC levels and their expression activity.

Isochoric localization of genes and their expression activity

In this section, we first investigated the relation between the isochoric localization of genes and their expression level. Figure 3 shows each tissue's average genic expression level per isochore family. An increase in the average genic expression can be observed as the isochore family GC level increases (statistically significant: p value < 0.001 and only 2 cases with p value < 0.01 -- Cochran test, non-parametric). The only exceptions were the differences in average genic expression between the H2 and H3 families, in the liver and muscle, and between the L1 and L2 in the brain, found to be not significant (p value > 0.05). Additionally, we found that the average genic expression of the isochore families in the brain differs significantly from that of the corresponding isochores in the muscle and liver (p value < 0.001), while between the two latter tissues significance was detected only for the L2 (p value < 0.001) and H1 families (p value < 0.005). This suggests that the expressed genes located in L1, H2, and H3 isochores in the liver and muscle appear to maintain similar expression activity.

Figure 3
figure 3

Average genic activity within each isochore for the three tissues. Average genic expression levels after the genes have been binned in the five isochore families. Larger negative values (tall coloured bars) indicate low expression, and small negative values (short coloured bars) indicate high expression.

We then looked for differences in the distribution of the expressed genes in the isochore families against that of the genes that are not expressed. As expressed, we considered genes with at least 10 aligned reads to avoid possible noise from misalignments, while as non-expressed, we considered genes without any aligned reads.

First, we identified genes that did not have detectable expression in any of the three tissues covered by the dataset (1, 925 CDSs accounting for 10.88% of the total coding sequences), and we found a very strong preference for them to be located in the L2 family (over 50% of these genes), with decreasing presence in families of subsequently higher GC (black bars in the upper panel of Figure 4). This preference for lower GC isochores is clearly different from the distribution of the total coding sequences in the isochore families (see the lower panel of Figure 4). It seems to agree with the proposition that low-GC isochores and GC-poor genes may be active during development, and are subsequently silenced in the adult stage (see the Discussion Section). For the remaining 13, 382 (15, 765 CDSs minus the 2, 383 CDSs with less 10 aligned reads), we looked into the isochoric distribution of genes that are not detected as expressed in only one of the three tissues (968 in the brain, 3, 589 in the liver, and 2, 633 in the muscle). In overall, their distribution was quite similar; centred on the H1 family, and slightly skewed towards the L1 for the brain and towards the H2 for the liver (see the upper panel of Figure 4).

Figure 4
figure 4

Isochoric distributions for the non-detected genes and the total number of CDSs. Top: Distribution (%) across the isochore families of the genes not detected to be expressed in any of the three tissues (bars in black), and of the genes not detected to be expressed in a specific tissue only (red bars for brain, blue bars for liver, and green bars for muscle). Bottom: Distribution (%) of the total number of coding sequences across the five isochore families (each coloured bar corresponds to an isochore family).

Looking into the distribution of the expressed genes in the isochore families, we found no differences among the three tissues (Additional file 5). The percentage of expressed genes (12, 414 CDSs in the brain, 9, 793 in the liver, and 10, 749 in the muscle) progressively increases from low to high GC families, and peaks at the H2 family. Regarding the H3 family, the massive drop observed is related to the extreme under-representation of this family in the mouse genome. Repeating the analysis with a higher expression threshold (at least 100 reads per CDS) affects mostly the lower GC families, but overall it does not change the observed trend (data not shown). With either threshold, the distribution is different from that observed for the non-expressed genes.

In this section, we showed that genes located in GC-richer isochores have a higher expression level than genes located in GC-poor isochores. Moreover, we observed that, between liver and muscle, the genes located in L1, H2, and H3 isochores appear to maintain a similar expression activity, contrary to the expressed genes located in L2 and H1 isochores. We also presented evidence that, in three adult mouse tissues, the non-detected as expressed genes are preferably located in GC-poor isochores, while the expressed genes are preferably located in GC-rich isochores.

Discussion

As mentioned in the Background Section, the way base composition affects mammalian gene expression is an issue of prime practical and evolutionary interest and, although it has been a matter of debate, most studies agree that there is a positive correlation. The transcriptome of the mouse isochores for the three tissues (Additional file 1, Figure 1), the positive correlation between the isochores' GC level and their respective expression activity (Figure 2), and the increase of the average expression level of genes as the GC of the isochores increases (Figure 3) support the existence of a relationship between expression level and base composition.

The herein reported correlation coefficients, between the expression activity of the isochores and their respective GC levels (Figure 2), are slightly higher to those reported in previous studies on mouse [16, 19], where the genes expression was correlated with their GC3 levels. Moreover, the order in which the expression level in the three tissues is most affected by the GC level (brain > muscle > liver) agrees to those in [16]. Finally, despite the virtual absence of H3 isochores in the mouse genome and the small number of L1 isochores, our coefficients were found to be similar to those of human, the latter containing both L1 and H3 isochores [16, 18–21].

In regards to the GC-poor localization of the genes that are not expressed in any of the three adult mouse tissues considered here, the notion that they may be implicated in developmental processes is supported by several studies. Indeed, two recent studies [38, 39] identified, in the genome deserts of vertebrates, long-range conserved systems comprised of highly-conserved non-coding elements and their developmental regulatory gene targets. Similarly, although in a different context, it has been shown that during the development of the mouse brain, most expression changes occur in the GC-poor and LINE-rich regions [40], and that the genes expressed in the early development stages of the mouse have AT-ending codons, unlike the genes expressed in later developmental stages [41]. Genes rich in AT-ending codons are expected to be typically found in GC-poor isochore families [42].

Conclusions

This work is the first where NGS data are used in order to establish the transcriptome map of the mouse isochores for three different tissues, and to investigate the effects of base composition on the expression activity. Our results are consistent with previous ones, and further support the idea of a functional implication of the isochores in gene expression. We conclude proposing that similar compositional approaches, using NGS data from carefully designed experiments, may shed more light into the role of the genomic (in the term of isochores) and genic compositional properties in gene expression, in the context of specific tissues or biological processes, and reveal valuable information on the implicated regulation mechanisms.

Methods

Data and alignment

To produce the transcriptome map of the isochores, we used publicly available RNA-seq data of three distinct mouse tissues (brain, liver, and muscle), obtained in a recent study by Mortazavi et al [37] using the standard Solexa pipeline (version 0.2.6). The initial 32-mer reads were subsequently truncated to a length of 25 base pairs. The data comes from pooled adult C57BL6 individuals. We aligned the reads against the reference mouse genome (UCSC release mm9) [43] using REad ALigner (REAL) [44, 45]. REAL is based on a new, relatively simple, algorithm for the alignment of short reads onto a reference sequence. It uses two-bits-per-base encoding of the DNA alphabet for both the reference and read sequences. We used the appropriate arguments to allow up to two mismatches per read with no gaps, and to report the unique alignment with the least number of mismatches. In this case, REAL splits the reads in four fragments, and approximate string-matching implements the pigeon-hole principle [46], as a means to quickly filter out some of the alignments that have more than two mismatches. The remaining candidate alignment locations are then examined in order to eliminate the rest of them that have more than two mismatches. Unlike other current fast aligners like Bowtie [47] and SOAP2 [48], REAL is not hindered by the very short length of the reads in this dataset. This gap-less alignment method will surely miss reads that span splice sites. However, these should represent only a small fraction of the total reads. Since the study is aimed at the bigger picture, rather than the exact quantification of individual mRNAs and alternate splicing variants, the loss of sensitivity will have little impact. In any case, gapped alignment of such short single-end reads has its own perils.

Expression level of isochores

To investigate the expression levels of the mouse isochores, the aligned reads were assigned to the isochores containing their mapped location. The locations and GC-spans of the isochores were extracted from [5]. To eliminate the effect of the different number of reads aligned from each tissue and the different length of each isochore, the aligned reads per isochore were normalized by the total count of aligned reads of the respective tissue and the length of the respective isochore. A scaling factor can be applied to lift at this stage, and then the log2 of each normalized read count was calculated as a representation of the expression level. This is represented by Equation (1), where E L represents the expression level normalized over the length L of the isochore, R i the read count of the isochore, R t the read count of the tissue, and f the scaling factor.

E L = log 2 R i R t × L × f
(1)

Because the normalized counts are very small, the logarithm produces negative values, however, higher expression still corresponds to peaks. Details on the isochores' coordinates, GC levels, aligned reads, and expression levels, for each of the three tissues, can be found in Additional file 6.

As we report in the Results Section, the expression levels were also further normalized by the respective gene densities to account for the higher concentration of genes in isochores with higher GC level. If by D we denote the gene density of the isochore and by E D the isochoric expression normalized over the gene density, Equation (1) is modified as shown in Equation (2).

E D = log 2 R i R t × D × f
(2)

Expression level of genes

To investigate the expression at gene level, the coding sequences for the mouse were retrieved from the Consensus Coding Sequence Database (CCDS) [49]. From the 17, 704 CDSs, 14 were found to lack a starting codon, and were eliminated. The remaining 17, 690 CDSs were assigned to isochores based on the coordinates of their exons, as given in the CCDS database.

Similarly to the procedure followed for the expression levels of isochores, the expression level of a CDS (E CDS ) was produced with Equation (3), where R CDS represents the count of aligned reads in the exons of each CDS, R t ′ the total number of reads aligned to coding sequences for the tissue, and ℓ the length of the CDS.

E C D S = log 2 ( R C D S R ′ t × ℓ × f )
(3)

Details on the expression levels of the CDSs, for each of the three tissues, can be found in Additional file 7.

References

  1. Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F: The mosaic genome of warm--blooded vertebrates. Science. 1985, 228: 953-958. 10.1126/science.4001930.

    Article  CAS  PubMed  Google Scholar 

  2. Bernardi G: Structural and Evolutionary Genomics: Natural Selection in Genome Evolution. 2005, Elsevier Science Publishers Ltd

    Google Scholar 

  3. Costantini M, Clay O, Auletta F, Bernardi G: Isochore Map of Human Chromosomes. Genome Research. 2006, 16: 536-541. 10.1101/gr.4910606.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Bernardi G: The neoselectionist Theory of Genome Evolution. PNAS. 2007, 104 (20): 8385-8390. 10.1073/pnas.0701652104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Costantini M, Cammarano R, Bernardi G: The evolution of isochore patterns in vertebrate genomes. BMC Genomics. 2008, 10: 146-

    Article  Google Scholar 

  6. Wu C, Li W: Evidence for higher rates of nucleotide substitution in rodents than in man. PNAS. 1985, 82: 1741-1745. 10.1073/pnas.82.6.1741.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Gu X, Li W: Higher rates of amino acids substitution in rodents than in human. Mol Phylogenet Evol. 1992, 1: 211-214. 10.1016/1055-7903(92)90017-B.

    Article  CAS  PubMed  Google Scholar 

  8. Holliday R: Understanding Ageing. 1995, Cambridge University Press, Cambridge, U.K

    Book  Google Scholar 

  9. Eyre-Walker A, Hurst LD: The evolution of isochores. Nature Reviews Genetics. 2001, 2 (7): 549-555. 10.1038/35080577.

    Article  CAS  PubMed  Google Scholar 

  10. Galtier N, Piganeau G, Mouchiroud D, Duret L: GC-Content Evolution in Mammalian Genomes: The Biased Gene Conversion Hypothesis. Genetics. 2001, 159 (2): 907-911.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Duret L, Galtier N: Biased Gene Conversion and the Evolution of Mammalian Genomic Landscapes. Annual Review of Genomics and Human Genetics. 2009, 10: 285-311. 10.1146/annurev-genom-082908-150001.

    Article  CAS  PubMed  Google Scholar 

  12. Chojnowski J, Franklin J, Katsu Y, et al: Patterns of Vertebrate Isochore Evolution Revealed by Comparison of Expressed Mammalian, Avian, and Crocodilian Genes. Journal of Molecular Evolution. 2007, 65 (3): 259-266. 10.1007/s00239-007-9003-2.

    Article  CAS  PubMed  Google Scholar 

  13. Duret L: Evolution of synonymous codon usage in metazoans. Current Opinion in Genetics & Development. 2002, 12 (6): 640-649. 10.1016/S0959-437X(02)00353-2.

    Article  CAS  Google Scholar 

  14. Konu O, Li M: Correlations between mRNA expression levels and GC contents of coding and untranslated regions of genes in rodents. Journal of Molecular Evolution. 2002, 54: 35-41. 10.1007/s00239-001-0015-z.

    Article  CAS  PubMed  Google Scholar 

  15. Versteeg R, van Schaik B, van Batenburg M, et al: The human transcriptome map reveals extremes in gene dentistry, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Research. 2003, 13 (9): 1998-2004. 10.1101/gr.1649303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Vinogradov A: Isochores and tissue specificity. Nucleic Acids Research. 2003, 31 (17): 5212-5220. 10.1093/nar/gkg699.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Arhondakis S, Auletta F, Torelli G, D'Onofrio G: Base composition and expression level of human genes. Gene. 2004, 325: 165-169.

    Article  CAS  PubMed  Google Scholar 

  18. Comeron J: Selective and Mutational Patterns Associated With Gene Expression in Humans: Influences on Synonymous Composition and Intron Presence. Genetics. 2004, 167 (3): 1293-1304. 10.1534/genetics.104.026351.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Semon M, Mouchiroud D, Duret L: Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Human Molecular Genetics. 2005, 14 (3): 421-427.

    Article  CAS  PubMed  Google Scholar 

  20. Vinogradov A: Dualism of gene GC content and CpG pattern in regard to expression in the human genome: Magnitude versus breadth. Trends in Genetics. 2005, 21 (12): 639-643. 10.1016/j.tig.2005.09.002.

    Article  CAS  PubMed  Google Scholar 

  21. Arhondakis S, Clay O, Bernardi G: Compositional properties of human cDNA libraries: Practical implications. FEBS Letters. 2006, 580 (24): 5772-5778. 10.1016/j.febslet.2006.09.034.

    Article  CAS  PubMed  Google Scholar 

  22. Arhondakis S, Clay O, Bernardi G: GC level and expression of human coding sequences. Biochemical and Biophysical Research Communications. 2008, 367 (3): 542-545. 10.1016/j.bbrc.2007.12.155.

    Article  CAS  PubMed  Google Scholar 

  23. Mahmud A, Amore G, Bernardi G: Compositional Genome Contexts Affect Gene Expression Control in Sea Urchin Embryo. PLoS ONE. 2008, 3 (12): e4025-10.1371/journal.pone.0004025.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Caron H, van Schaik B, van der Mee M, et al: The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains. Science. 2001, 291 (5507): 1289-1292. 10.1126/science.1056794.

    Article  CAS  PubMed  Google Scholar 

  25. Lercher M, Urrutia A, Pavlicek A, Hurst L: A unification of mosaic structures in the human genome. Human Molecular Genetics. 2003, 12 (19): 2411-2415. 10.1093/hmg/ddg251.

    Article  CAS  PubMed  Google Scholar 

  26. Mouchiroud D, D'Onofrio G, Aissani B, et al: The distribution of genes in the human genome. Gene. 1991, 100: 181-187.

    Article  CAS  PubMed  Google Scholar 

  27. Zoubak S, Clay O, Bernardi G: The gene distribution of the human genome. Gene. 1996, 174: 95-102. 10.1016/0378-1119(96)00393-9.

    Article  CAS  PubMed  Google Scholar 

  28. Mijalski T, Harder A, Halder T, et al: Identification of coexpressed gene clusters in a comparative analysis. PNAS. 2005, 102 (24): 8621-8626. 10.1073/pnas.0407672102.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Singer G, Lloyd A, Huminiecki L, Wolfe K: Clusters of Co-expressed Genes in Mammalian Genomes Are Conserved by Natural Selection. Molecular Biology and Evolution. 2005, 22 (3): 767-775.

    Article  CAS  PubMed  Google Scholar 

  30. Wang Z, Gerstein M, Snyder M: RNA-seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009, 10: 57-63. 10.1038/nrg2484.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Metzker M: Sequencing technologies -- the next generation. Nature Reviews Genetics. 2010, 11: 31-46. 10.1038/nrg2626.

    Article  CAS  PubMed  Google Scholar 

  32. Ozsolak F, Milos P: RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics. 2011, 12 (2): 87-98. 10.1038/nrg2934.

    Article  CAS  PubMed  Google Scholar 

  33. Dalca A, Brudno M: Genome variation discovery with high-throughput sequencing data. Briefings in Bioinformatics. 2010, 11: bbp058-14.

    Article  Google Scholar 

  34. Ng S, Buckingham K, Lee C, et al: Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics. 2010, 42: 30-35. 10.1038/ng.499.

    Article  CAS  PubMed  Google Scholar 

  35. Wu T, Nacu S: Fast and SNP--tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010, 26 (7): 873-881. 10.1093/bioinformatics/btq057.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Xiang H, Zhu J, Chen Q, et al: Single--base resolution methylome of the silkworm reveals a sparse epigenomic map. Nature Biotechnology. 2010, 28 (5): 516-520. 10.1038/nbt.1626.

    Article  CAS  PubMed  Google Scholar 

  37. Mortazavi A, Williams B, McCue K, et al: Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.

    Article  CAS  PubMed  Google Scholar 

  38. Kikuta H, Laplante M, Navratilova P, et al: Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Research. 2007, 17 (5): 545-555. 10.1101/gr.6086307.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Navratilova P, Becker T: Genomic regulatory blocks in vertebrates and implications in human disease. Briefings in Functional Genomics & Proteomics. 2009, 8 (4): 333-342. 10.1093/bfgp/elp019.

    Article  Google Scholar 

  40. Hiratani I, Leskovar A, Gilbert D: Differentiation--induced replication-timing changes are restricted to AT--rich/long interspersed nuclear element (LINE)--rich isochores. Proceedings of the National Academy of Sciences of the United States of America. 2004, 101 (48): 16861-16866. 10.1073/pnas.0406687101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ren L, Gao G, Zhao D: Developmental stage related patterns of codon usage and genomic GC content: Searching for evolutionary fingerprints with models of stem cell differentiation. Genome Biology. 2007, 8 (3):

  42. Clay O, Bernardi G: GC3 of Genes Can Be Used as a Proxy for Isochore Base Composition: A Reply to Elhaik et al. Molecular Biology and Evolution. 2011, 28: 21-23. 10.1093/molbev/msq222.

    Article  CAS  PubMed  Google Scholar 

  43. UCSC Genome Browser: 2011, [http://genome.ucsc.edu]

  44. Frousios K, Iliopoulos CS, Mouchard L, et al: REAL: an efficient REad ALigner for next generation sequencing reads. Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology. 2010, New York, NY, USA: ACM, 154-159. BCB '10,

    Chapter  Google Scholar 

  45. REad ALigner (REAL): 2011, [http://www.inf.kcl.ac.uk/pg/real/]

  46. Navarro G, Raffinot M: Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. 2002, Cambridge University Press

    Book  Google Scholar 

  47. Langmead B, Trapnell C, Pop M, Salzberg S: Ultrafast and memory--efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009, 10 (3): R25+-

    Article  PubMed  PubMed Central  Google Scholar 

  48. Li R, Li Y, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics. 2008, 24 (5): btn025-714.

    Google Scholar 

  49. National Center for Biotechnology Information (NCBI): 2011, [ftp://ftp.ncbi.nlm.nih.gov]

Download references

Acknowledgements

We thank Prof. Giorgio Bernardi and Oliver Clay for reading the manuscript and giving valuable comments. SA and SK are supported by institutional funds. KF is funded by the Greek State Scholarships Foundation. This work is also partially supported by the SeqAhead COST action.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Stilianos Arhondakis or Sophia Kossida.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SA and SK designed the study. KF, CSI, SPP, and GT processed the data, and did the computational work. SA and KF did the analysis. SA, KF, and SPP wrote the manuscript with the contribution of all authors. The final version of the manuscript is approved by all authors.

Electronic supplementary material

12864_2011_3666_MOESM1_ESM.PDF

Additional file 1:Transcriptome profiles of the mouse isochores along the chromosomes. The Y axis measures the isochores' GC levels (positive values -- light blue line) and their respective expression levels (E L -- Equation (1)) for the brain, liver, and muscle tissues (negative values -- red, dark blue, and green lines). High expression corresponds to peaks in the lines. (PDF 395 KB)

12864_2011_3666_MOESM2_ESM.TIFF

Additional file 2:Correlations between GC level and expression activity of the isochores. The correlations between isochoric expression level (normalized over the isochoric length E L -- Equation (1)) and their GC. The red plot is for brain, the blue plot for liver, and the green one for muscle tissue. (TIFF 190 KB)

12864_2011_3666_MOESM3_ESM.XLS

Additional file 3:Isochoric expression levels for each tissue normalized over gene density. This table reports the name of each isochore, the GC level (GC, %), the length (Length, Mb), the number of genes (CDS-count), the gene density (GeneDensity -- number of genes within an isochore over its length), the count of aligned reads for each tissue (Brain Count, Liver Count, and Muscle Count), the ratio between the count of aligned reads for each tissue within each isochore over the total number of reads of that tissue (#Br/TotBr, #Liv/TotLiv, and #Mus/TotMusc), and finally the isochoric expression level normalized over the gene density (LogBr(GeneDens), LogLiv(GeneDens), and LogMusc(GeneDens)). (XLS 498 KB)

12864_2011_3666_MOESM4_ESM.TIFF

Additional file 4:Distribution of the coding sequences across the five isochore families. Within each isochore family, the % of the isochores containing at least one gene (grey bars) and of the isochores with no genes at all (light grey bars). (TIFF 112 KB)

12864_2011_3666_MOESM5_ESM.TIFF

Additional file 5:Distribution of the expressed CDSs in the isochore families. For each tissue, the % of the expressed genes (in histogram -- upper panel) within each isochore and the corresponding count (in table format -- lower panel) using as expression threshold ≥ 10 aligned reads per gene. In the histogram, the red bars indicate the genes expressed in brain, the blue bars the genes expressed in liver, and the green ones in muscle. (TIFF 159 KB)

12864_2011_3666_MOESM6_ESM.XLS

Additional file 6:Isochoric expression levels for each tissue normalized over length. This table reports the name of each isochore, the GC level (GC, %), length (Length, Mb), the number of genes (CDS-count), the gene density (GeneDensity -- number of genes within an isochore over its length), the count of aligned reads within each isochore for each tissue (Brain Count, Liver Count, and Muscle Count), the ratio (%) between the count of aligned reads within each isochore for each tissue over the total number of reads of that tissue (#Br/TotBr, #Liv/TotLiv, and #Mus/TotMusc), and finally the global isochoric expression level normalized over the isochoric length (LogBr(Length), LogLiv(Length), and LogMusc(Length)). (XLS 519 KB)

12864_2011_3666_MOESM7_ESM.XLS

Additional file 7:Genic expression levels for each tissue. This table reports the isochoric localization of each coding sequence. Specifically, the first column shows the chromosome, the second indicates the isochore in which the gene is embedded, followed by its GC level and the genomic coordinates (Start (Mb) and End (Mb)). Afterwards comes the id of each coding sequence, the genomic coordinates of the coding sequence (cds_from and cds_to), the level (GC_ccds), the GC3 (GC3_ccds), the length of the coding sequence (Length_ccds), and the count of aligned reads for each tissue (brain, liver, and muscle) within each coding sequence. The three last columns report the genic expression level for each tissue (LogBr(genic), LogLiv(genic), and LogMusc(genic)). (XLS 4 MB)

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Arhondakis, S., Frousios, K., Iliopoulos, C.S. et al. Transcriptome map of mouse isochores. BMC Genomics 12, 511 (2011). https://doi.org/10.1186/1471-2164-12-511

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2164-12-511

Keywords