Global transcriptome analysis of the C57BL/6J mouse testis by SAGE: evidence for nonrandom gene order
© Divina et al; licensee BioMed Central Ltd. 2005
Received: 06 December 2004
Accepted: 05 March 2005
Published: 05 March 2005
We generated the gene expression profile of the total testis from the adult C57BL/6J male mice using serial analysis of gene expression (SAGE). Two high-quality SAGE libraries containing a total of 76 854 tags were constructed. An extensive bioinformatic analysis and comparison of SAGE transcriptomes of the total testis, testicular somatic cells and other mouse tissues was performed and the theory of male-biased gene accumulation on the X chromosome was tested.
We sorted out 829 genes predominantly expressed from the germinal part and 944 genes from the somatic part of the testis. The genes preferentially and specifically expressed in total testis and testicular somatic cells were identified by comparing the testis SAGE transcriptomes to the available transcriptomes of seven non-testis tissues. We uncovered chromosomal clusters of adjacent genes with preferential expression in total testis and testicular somatic cells by a genome-wide search and found that the clusters encompassed a significantly higher number of genes than expected by chance. We observed a significant 3.2-fold enrichment of the proportion of X-linked genes specific for testicular somatic cells, while the proportions of X-linked genes specific for total testis and for other tissues were comparable. In contrast to the tissue-specific genes, an under-representation of X-linked genes in the total testis transcriptome but not in the transcriptomes of testicular somatic cells and other tissues was detected.
Our results provide new evidence in favor of the theory of male-biased genes accumulation on the X chromosome in testicular somatic cells and indicate the opposite action of the meiotic X-inactivation in testicular germ cells.
From the selfish DNA perspective [1, 2], gonads are fundamentally important organs of an organism. During the first meiotic division of gametogenesis, crossing-over enhances the re-assortment of information carried in parental DNA molecules and virtually immortal genetic information is then transferred to next generations of mortal individuals via the final products of gametogenesis, spermatozoa and eggs. Moreover, testes and ovaries are the only niches where the paternal and maternal DNA interacts with a different environment. The dissimilar gonadal environment enables sex-dependent epigenetic modifications of paternal and maternal DNA such as reactivation of the X chromosome in female germ cells [3, 4], inactivation of a single X chromosome in pachytene spermatocytes [5–7] or differential establishment of imprinting marks on paternally or maternally imprinted genes [8, 9]. Spermatogenesis also serves as an important checkpoint filtering out many de novo occurring gene mutations [10, 11] and chromosomal rearrangements [12, 13] by making their carriers sterile. A special form of meiotic checkpoint is represented by hybrid sterility, which facilitates creation of new species. Obeying the Haldane's rule, hybrid sterility preferentially affects gametogenesis in testis in species with heterogametic (XY) sex [13–15]. Molecular analyses of these phenomena are hindered by the fact that testis is a complex organ with many types of intimately intermingled somatic and germline cells. Moreover, the spermatogenic differentiation is almost impossible to achieve ex vivo, in a cell culture system. The main cell types can be fractionated, via gravity sedimentation, centrifugal elutriation or fluorescence activated cell sorting, but the time required can be fairly long to exclude possible artificial changes of mRNA levels.
In the present work we used Serial Analysis of Gene Expression (SAGE)  to characterize the transcriptome of mouse total testis. We created a catalogue of genes expressed in the adult mouse testis of the C57BL/6J (abbreviated here B6) inbred strain. The B6 inbred strain has been chosen because its genome has been recently sequenced  and since it has been selected as a recipient strain for creation of two sets of Chromosome Substitution Strains, C57BL/6J-Chr#A/J  and C57BL/6J -Chr#PWD/Ph [Gregorova S, Forejt J et al., in progress]. Except for the characterization of the total testis transcriptome, we compared our data with the publicly available SAGE library from adult testis somatic cells  and other SAGE libraries constructed from normal mouse tissues. Furthermore, we were interested in the organization of testicular genes in the mouse genome and we present here a detailed bioinformatic analysis of the distribution of testicular genes between the X chromosome and autosomes, and the positional clustering of genes with preferential expression in testis.
Characterization of the SAGE libraries of B6 mouse testis
Parameters of constructed SAGE libraries from B6 mouse total testis
Total testis 1 (TT 1)
Total testis 2 (TT 2)
Single copy tags
Average tags per clone
157 (1.2 %)
276 (1.0 %)
Linker derived tags
147 (0.6 %)
223 (0.4 %)
Tag-to-gene identification in the B6 testis transcriptome
Identification of tags in the combined total testis SAGE library (TT 1+2) The tags matching the mitochondrial genome were omitted in this summary. Tags in group "Unreliable matches" (*) are considered not reliable according to Mouse SAGE Site and RNA evidence mappings, because they are not supported by the required number of mRNA and EST sequences. These tags are, however, included in reliable single/multiple match groups in the SAGEmap reliable mapping, which results in a highly increased number of reliable multiple matches and a slightly increased number of reliable single matches.
NCBI SAGEmap reliable mapping
Mouse SAGE Site reliable mapping
RNA evidence mapping
Reliable single match
Reliable multiple matches
Total tags (tag count > 1)
Functional categories of genes expressed in total testis
Comparing the transcriptomes of total testis and adult testis somatic cells
Parameters of the SAGE libraries constructed from total testis and somatic cells of adult testis
Adult testis somatic cells
Unique tags with count > 1
Proportions of unique tags with count > 1
% of total tags
% of unique tags
Genes with predominant expression in the germinal or somatic component of testis
To sort out subsets of genes with predominant expression in either germinal or somatic cells of testis we applied tentative criteria to account for the presence of somatic cells in TT 1+2 and for residues of germ cells in ATSC. Predominant expression of a gene was considered if the corresponding tag was significantly more frequent in one of the libraries (p < 0.05, Monte Carlo simulations) and exhibited at least fivefold enrichment of tag counts (fold factor <-5 or > 5). According to this criterion a set of 829 genes is expressed predominantly in germ cells and 944 genes are expressed mainly from the somatic part of the testis (see Additional file 3). Moreover, we identified 12 tags corresponding to 8 genes encoded in the mitochondrial genome (1 gene with increased tag counts in TT 1+2 and 6 genes with increased tag count in ATSC). A gene coding for cytochrome c oxidase III (mt-Co3) displayed two tags separated by 87 bp in mt-Co3 gene mRNA. One isoform was predominantly present in the ATSC library and the other was observed exclusively in the TT 1+2 library. Substantial over-expression of mitochondrial cytochrome c oxidase complexes I, II, III and NADH dehydrogenase 3 and 4 was noted in testicular somatic cells (see Additional file 3).
Exploring the dissimilarity of testis transcriptomes and transcriptomes of other mouse tissues
Nonrandom representation of testis-expressed genes on the X chromosome
Previous works have shown a significant enrichment of prostate- and spermatogonia-specific genes on the X chromosome when compared to autosomes [29, 30]. We asked what proportion of testis-expressed genes maps to the X chromosome and compared it with the proportion of X-linked genes expressed in somatic (non-testis) tissues. Furthermore, we examined whether the proportion of testis-specific genes on the X chromosome differs from the proportion of X-linked tissue-specific genes in somatic tissues.
Out of the 14 222 genes expressed in SAGE libraries from total testis, adult testis somatic cells and 7 somatic tissues (brain, eye, heart, liver, kidney, limbs and adipose tissue) (see Additional file 4) we considered only genes identified by corresponding tag count > 1. The proportion of genes expressed from the X chromosome in a pool of 7 somatic tissues was 3.1 % (374 of 11 903 genes). Although the proportions of X-linked genes in somatic tissues were uneven, there were no significant differences among the tissues (3.2 % in brain, 2.7 % in limbs and eye, 2.6 % in liver, 2.5 % in kidney and adipose tissue, 2.4 % in heart; p > 0.05, Chi-square test for brain vs. heart). In testicular somatic cells, we observed 3.2% X-linked genes (133 of 4 216 genes), while in total testis only 1.4 % genes (48 of 3 338 genes) were expressed from the X chromosome (p < 10-6, Chi-square test). We can conclude that the number of expressed X-linked genes is underrepresented in the transcriptome of total testis.
Distribution of testis-specific genes on autosomes and the X chromosome The total of 14 222 LocusLink genes were identified in total testis, adult testis somatic cells and non-testis tissue SAGE libraries (see Additional file 4) using RNA evidence mapping (tags matching multiple LocusLink genes were discarded). The genes identified by total tag count = 1 were then excluded from analysis. The genes expressed only in one tissue type (total testis, adult testis somatic cells, brain, eye, heart, liver, kidney, limbs and adipose tissue) were considered to be tissue-specific genes. Chromosomal distribution of genes specific for total testis (a) and testis somatic cells (b) in comparison to the non-testis tissue-specific genes was evaluated. The significance was tested by permutations (100 000 random shufflings of the chromosomes while keeping the sum of genes on autosomes and the X chromosome fixed) and confirmed by Fisher's exact test. Abbreviations: total t. = total testis; t. somatic = testicular somatic cells; other = non-testis tissues; ChrA = autosomes; ChrX = X chromosome.
a) Total testis: 395 genes specific for the combined total testis SAGE library (TT 1+2) Other tissues: 877 genes specific for one tissue type in the pool of other SAGE libraries
Observed gene counts
Gene counts in randomized genome
% observed gene counts
Ratio of observed proportions
Permutations yielding < = observed gene counts in total t. on ChrX
Permutations, p-value (two tailed)
Fisher's exact, p-value (two tailed)
Confidence interval (0.95)
0.70 – 2.68
b) Testis somatic cells: 81 genes specific for the adult testis somatic cells SAGE library (ATSC) Other tissues: 924 genes specific for one tissue type in the pool of other SAGE libraries
Observed gene counts
Gene counts in randomized genome
% observed gene counts
Ratio of observed proportions
Permutations yielding > = observed gene counts in t. somatic on ChrX
Permutations, p-value (two tailed)
Fisher's exact, p-value (two tailed)
Confidence interval (0.95)
0.13 – 0.64
Chromosomal clustering of genes with preferential expression in testis
Based on the data from testis and other publicly available SAGE libraries (see Additional file 4) we identified genes with preferential expression in testis by Preferential Expression Measure (PEM) . PEM score controls for the genes that are highly expressed in many tissues (housekeeping genes) and reports positive values for over-expressed genes and negative values for under-expressed genes in a given tissue. Large positive PEM scores for a gene in a particular tissue indicate that the gene is unusually highly expressed in that tissue, relative to its expression in other tissues . We considered a gene to be preferentially expressed if the PEM score reached at least 50 % of the maximum PEM value encountered in that tissue. Using this criterion, we scored expression of genes in total testis or testicular somatic cells in conjunction with their expression in 7 other tissues (brain, eye, heart, liver, kidney, limbs and adipose tissue).
Number of preferentially expressed genes in testis located in clusters within tandem duplicate-free mouse genome Out of the 19 684 known genes (LocusLink) mapped on mouse genome assembly (NCBI, build32), 16 858 genes remained in tandem duplicate-free genome, including 1 300 and 1 050 preferentially expressed genes in total testis and testicular somatic cells, respectively. Chromosome search found clusters containing at least three adjacent preferentially expressed genes (tight clusters) or at least three preferentially expressed genes among the six adjacent genes (loose clusters). The tight clusters therefore form a subset of the loose clusters. Observed gene counts were evaluated using permutations (100000 random shufflings of the expression status of genes while keeping the gene positions constant) and the average number of genes located in clusters in the randomized genomes was computed.
Total testis (TT 1+2)
Adult testis somatic cells (ATSC)
in tight clusters
in loose clusters
in tight clusters
in loose clusters
Observed gene counts
Proportion of preferentially expressed genes
Gene counts in randomized genomes (mean ± std. dev.)
21.9 ± 8.1
168.4 ± 20.1
11.7 ± 5.9
94.2 ± 15.6
Ratio observed/mean in randomized genomes
Permutations yielding > = observed gene counts
p-value (one tailed)
Comparing the B6 and BDF1 total testis transcriptomes
In a recent study focused on senescence changes in testis, a modified SAGE method was used to generate digital gene expression profiles of total testis from 3- and 29-month-old mice of the BDF1 strain and 14-month-old mice of the SAMP1 strain that exhibits an accelerated senescence . Because of the different anchoring enzyme (Rsa I) used in construction of the libraries and the limited availability of data from the BDF1 testis transcriptome, we could perform only a rough manual comparison of our B6 testis transcriptome (76 854 tags) and the combined BDF1 testis transcriptome from 3- and 29-month-old BDF1 mice (41 221 tags). We focused on the most highly expressed testicular genes in GNF Mouse Atlas v2 [33, 34] that were detected by Affymetrix GeneChips. A set of 35 highly expressed genes in testis (average difference > 9 000) was organized with SAGE tag counts from B6 and BDF1 testis (see Additional file 9). In the B6 total testis, we detected 33 out of 35 genes (the Serf1 gene could not be distinguished because its low complexity tag matches multiple genes and the Cox7a2 gene is not detected because its transcript lacks Nla III restriction site). In contrast, only 9 genes were detected in the BDF1 testis library, 13 genes were missing due to the absence of Rsa I restriction site in the transcript and for 13 other genes the expression data from BDF1 testis were not publicly available. Furthermore, out of the 35 highly expressed genes in testis, 21 genes were among the top 100 most expressed genes in the B6 total testis library, but only 9 genes were among the top 100 most expressed genes in the BDF1 total testis library. It appears that our SAGE data from the B6 testis transcriptome shows better correspondence to the microarray data than the data from the transcriptome of BDF1 testis.
Serial analysis of gene expression is a high-throughput method for building a catalogue of expressed genes and their expression levels of "normal" as well as diseased or genetically variant tissues and organs . The digital character of SAGE data enables addition and direct comparison of different SAGE libraries, provided they were built with the same anchoring enzyme and originated from individuals of the same species. The utilization of such global transcriptome databases is multifold, including positional cloning of mutations or quantitative trait loci [35, 36], functional genome annotation [37, 38] or analysis of a nonrandom gene order . Admittedly, the SAGE, as used in this work, has several limitations, including a significant proportion of repetitive and low complexity tags. The SAGE is obviously more labor-intensive than transcriptome analysis based on microarrays. At present, some of these inconveniencies can be solved by applying LongSAGE or massively parallel signature sequencing technologies [38, 40].
In this study we constructed a SAGE library of the total testis of the C57BL/6J (B6) mouse inbred strain, compared it with other public available mouse SAGE libraries and analyzed localization of testis-expressed genes within the mouse genome. The B6 strain was favored for the availability of its high-quality draft genomic sequence  and because series of congenics and recently also consomic strains have used the B6 strain as a background strain  [Gregorova S, Forejt J, personal communication]. The combined total testis SAGE library, TT 1+2, consisted of 76 854 total tags representing 24 529 unique tags. The tag-to-gene reliable identification method used in Mouse SAGE Site  was applied to tags with frequency ≥ 2. Out of these tags, 47.5% (3 553) revealed a reliable match to single and 15.5% (1 157) to multiple UniGene clusters. Considering the size of the total testis SAGE library, medium to highly expressed genes are present in the expression profile. The library size is comparable to the recently published SAGE library of somatic cells of the mouse testis  and almost twice the size of a library constructed from the total testis of BDF1 hybrid mice using a modified SAGE method .
Contrary to microarrays, SAGE data are platform independent, which permits the use of unrelated datasets coming from various sources to compare gene expression patterns. We analyzed the mouse testis transcriptome by comparing our total testis SAGE library to the adult testis somatic cells library  and to additional publicly available SAGE libraries from 7 different tissues. We recognized three different modes of differential expression. (1) Predominant expression of genes in the germinal or somatic part of the testis, which did not consider expression in other tissues. (2) Preferential expression in testis that was defined by comparing the expression of testis to 7 somatic tissues for which SAGE data were available. (3) Testis-specific expression that was defined by null expression (at the resolution of a particular SAGE library) in SAGE libraries of seven tissues or organs other than testis. Complete lists of genes predominantly expressed in germinal or testis-somatic cells, as well as the catalogues of genes preferentially expressed in testis and testis-specific genes are available online in Additional file 3 , 5 and Additional file 7.
Conflicting results have been reported on the representation of male-biased genes on the X chromosome in various species. Spermatogonia-specific genes were found to be an order of magnitude more abundant on the mouse X chromosome . In human, the prostate-specific genes were twice more frequent on the X chromosome, but the female mammary gland- and ovary-specific X-linked genes were not enriched in respective SAGE libraries . On the contrary, under-representation or absence of male-biased genes on the X chromosome was reported in Caenorhabditis elegans  and in Drosophila [42, 43]. In the mouse, an under-representation of testis-expressed and testis-enriched genes on the X chromosome was also revealed by the analysis of microarray and EST data [5–7]. Our present data favor under-representation of X-linked genes in the total testis transcriptome but not in testis-somatic cells. Because the germ cells in different stages of differentiation constitute about 90% of the total cell mass of testis, the data indicate that the deficit of X-linked testis-expressed genes may reflect the lack of transcription from the X chromosome in meiotic cells. These results are in agreement with the idea of X-chromosome silencing during the first meiotic division, the phenomenon based mostly on circumstantial evidence in flies and mice [7, 44–46]. Thus, transcription at the haploid stage of spermatogenesis is expected for most of the X-linked genes expressed in total testis. The meiotic X chromosome inactivation seems to be restricted to primary spermatocytes, but Sertoli cells, which form the somatic part of seminiferous tubules, may have the X chromosome in the active state. Indeed, in the transcriptome of adult testis somatic cells the proportion of expressed X-linked genes (3.2 %) was more than twice higher than in total testis (1.4 %) and did not differ from the proportion of X-linked genes expressed in non-testis (somatic) tissues.
Testis-specific genes belong to a wider category of sex-biased genes, which according to the hypothesis of sexually antagonistic genes are more likely to spread on the X chromosome than on autosomes . This is because on the X chromosome they will express their favorable effect in the hemizygous state (XY) while their deleterious effect will be masked by their recessivity in the other sex (XX). Consequently, accumulation of male-specific genes on the X chromosome will be possible by the effect of modifiers that narrow the expression of sex-biased genes only to the male sex . Thus, the evolution of sexually antagonistic genes and X inactivation may act as opposing forces on the germline lineage of testis while accumulation of male-specific genes could be expected in somatic cells of testis. In accord with these assumptions the proportion of X-linked genes specific for total testis did not significantly differ from the proportion of genes specific for other tissues, while we observed a significant 3.2-fold enrichment of the proportion of X-linked genes specific for testicular somatic cells.
The eukaryotic gene order is nonrandom obviously not only due to shifting of sex-biased genes to and from the X chromosome, but also owing to a nonrandom clustering of genes within chromosomes. This somewhat unexpected conclusion (taking into account the relative autonomy of transgene regulation) is gaining gradual support from global transcriptome analyses of various eukaryotic species (see Hurst et al. for review) . The observed examples of clustering are apparently a mixture of several unrelated phenomena, including large domains of similarly expressed genes in Drosophila and humans [48, 49], clustering of housekeeping genes , clustering of highly expressed genes  or genes with similar expression breadth in regions of similar GC content . In Drosophila melanogaster one third of testes-specific genes occur in clusters , a phenomenon not reported in any other species. Using PEM  to define preferentially expressed genes we were able to demonstrate that in the mouse, the genes preferentially expressed in germ cells as well as in somatic cells of testis occur in tight clusters with a frequency 2.0-fold and 3.1-fold higher than the expected average frequency in randomized genomes. Moreover, our results indicate that this phenomenon is not merely a consequence of tandem duplications. Further analysis of clustering of testis-expressed genes may reveal new insights into the functional organization of the mammalian genome.
We identified chromosomal clusters of adjacent genes with preferential expression in testis that contain a significantly higher number of genes than expected by chance. This phenomenon is not merely a consequence of tandem duplication. The genes with specific expression in testicular somatic cells are more abundant on the X chromosome, which favors the theory of accumulation of male-biased genes on the X chromosome. In contrast, the X-linked genes are under-represented in the transcriptome of total testis, which is in accordance with the idea of X-chromosome inactivation during the first meiotic division.
Tissue collection and RNA isolation
Mice were housed in specific pathogen free environment and their manipulation was in accordance with the Czech Animal Protection Act No. 246/92, 162/93, and decrees No. 311/97, fully compatible with the NIH Publication No. 85-23, revised 1985. Testes were obtained from 9-week-old males of the C57BL/6J mouse strain. The animals were killed by cervical dislocation; the testes were quickly removed from the body and released from tunica. The total RNA was extracted from homogenized testes using TRIzol (Invitrogen) according to the manufacturer's protocol. SAGE libraries were constructed from the total RNA isolated from both testes of a single male (TT 1) and from the pool consisting of equal weight amounts of total RNA isolated from both testes of three male littermates (TT 2).
Construction of SAGE libraries, sequencing and tag extraction
SAGE libraries were constructed as described in the MicroSAGE protocol version 1.0e available from SAGE homepage  using Nla III as the anchoring enzyme and Bsm FI as the tagging enzyme. Two minor modifications of the MicroSAGE protocol were employed: the first strand cDNA synthesis reaction was incubated at 42°C and the amount of linkers used in the linker ligation step was decreased to ~10 ng. Sequencing was performed in a Beckmann Coulter CEQ 2000 DNA Analysis System. The sequence files were processed for the tag extraction using a custom Perl script. Tags were extracted only from clones containing > 2 ditags. Duplicated ditags, linker tags and all 1-bp linker variations were removed. Data of total testis SAGE libraries are available in the GEO repository  under accession numbers GSM34767 (TT 1) and GSM34768 (TT 2).
Identification of SAGE tags
Tag identification to UniGene clusters was done using three methods: SAGEmap reliable mapping , Mouse SAGE Site reliable mapping  and RNA evidence mapping. The SAGEmap reliable mapping  uses a reliability score to classify tag-to-gene associations and tag-to-gene associations with the top two reliability scores are considered reliable. The Mouse SAGE Site  reliable mapping is based on the SAGEmap full mapping file and considers reliable the tag-to-gene associations that are supported by tags extracted from at least one mRNA sequence (from RefSeq, Mammalian Gene Collection, GenBank) or at least 3 ESTs with a poly(A) signal or at least 8 ESTs with no poly(A) signal. The RNA evidence mapping is also based on the SAGEmap full mapping file and considers reliable only tag-to-gene associations supported by tags extracted from at least one mRNA sequence. Mitochondrial tags were identified using all possible tags extracted from the mouse mitochondrial genome reference sequence [GenBank:NC_005089].
Comparison of testis SAGE libraries
Tags significantly different between SAGE libraries were determined by Monte Carlo simulations. Using the described algorithm  a set of 100 000 random tables was generated keeping the row and column totals of the observed data fixed. For each tag, the proportion of simulations that produced a difference equal to or greater than the observed difference (p-chance) was computed. The set of 100 000 random tables was generated six times and the average p-chance was calculated. The fold factor was computed as the ratio of normalized tag counts in two SAGE libraries with values < 1 converted to reciprocal negatives. For the tags absent in one library a normalized tag count of single copy tags was assumed.
The SAGE library from somatic cells of the adult testis  was obtained from GEO repository , accession number GSM5435. Other SAGE libraries were obtained from GEO repository or downloaded from Internet sources (see Additional file 4 ). The data from the BDF1 testis SAGE library were obtained from a printed table in publication  (only the top 100 genes expressed in BDF1 testis are listed in publication, the whole library is currently not publicly available). Microarray data of mouse testis, generated by the GNF Mouse Atlas v2 project , were obtained from the hgFixed database of the UCSC Genome Browser [55, 56].
Hierarchical clustering of mouse SAGE libraries
Thirty-two mouse SAGE libraries constructed from bulk tissues (including normal and diseased) that were publicly available to date (July 1, 2004) were selected (see Additional file 4). For each pair of SAGE libraries a distance based on differences between normalized tag counts was computed . The average agglomeration method was used in hierarchical clustering because of the highest cophenetic correlation (Pearson correlation between the observed distances and the distances calculated from the dendrogram).
Selection and preparation of mouse SAGE libraries for genomic analysis
Twenty-seven SAGE libraries created from bulk tissues (excluding tumors) were organized into 7 groups by tissue type and tag counts from SAGE libraries within each group were combined (see Additional file 4). The groups of SAGE libraries include: brain (9 libraries, 329 745 tags), eye (6 libraries, 336 399 tags), heart (1 library, 84 275 tags), liver (2 libraries, 37 118 tags), kidney (6 libraries, 87 810 tags), limbs (2 libraries, 136 650 tags) and adipose tissue (1 library, 44 974 tags). These groups were analyzed in parallel with total testis (2 libraries, 76 854 tags) and adult testis somatic cells (1 library, 81 478 tags). All tags from prepared tissue groups, total testis and adult testis somatic cells SAGE libraries were identified to UniGene clusters using RNA evidence mapping (tag-to-gene association is supported by at least one mRNA sequence) and linked to LocusLink genes. Only tags with identification to a single LocusLink gene were subjected to further analysis. Tag counts from multiple tags matching the same LocusLink gene were combined.
Distribution of tissue-specific genes on chromosomes
Analysis was done in parallel for testis-specific genes in total testis and somatic cells of adult testis. The tissue-specific genes were selected according to tag counts in the testis tissue and 7 non-testis tissues (see Additional file 4). A gene was considered to be tissue-specific if it was expressed only in one tissue and its expression was supported by tag count > 1. Each tissue-specific gene was then assigned to a chromosome (autosome or X chromosome) according to the LocusLink database and the group (testis or non-testis). The permutations algorithm performed 100 000 random shufflings of the chromosomes while keeping the sum of genes on autosomes and the X chromosome constant. The p-value (two tailed) was computed as doubled number of permutations yielding gene counts above/below (which of this was lower) or equal to the observed gene counts in testis tissue and the X chromosome.
Identification of chromosomal clusters of genes with preferential expression in testis
The preferential expression measure (PEM)  was used to score differential expression of genes in testis tissues. PEM for total testis (PEMTT) and adult testis somatic cells (PEMATSC) were calculated for each gene. The gene was considered to be preferentially expressed in total testis if PEMTT> = 1/2 PEMTT(max), and in somatic cells of adult testis if PEMATSC> = 1/2PEMATSC(max). PEM(max) values represent the maximum PEM value encountered in the tissue, PEMTT(max) = 1.169, PEMATSC(max) = 1.145.
To prepare a tandem duplicate-free mouse genome we considered 19 684 known genes from the LocusLink database that were mapped on the mouse genome assembly (NCBI build 32) . For each LocusLink gene, we obtained a known protein sequence (NP_ accessions) from the mouse RefSeq collection  and performed protein BLAST (standard settings) against the RefSeq known protein collection. The hits with expectation value < 1e-10 and with an alignment of at least 50% length and 30% identity of the query sequence were processed and identified to LocusLink genes. If a LocusLink gene located in the vicinity of the original LocusLink gene was found among the hits (considering 10 adjacent genes in both directions), both genes were considered as a tandem duplicate pair and were excluded from the genome. As a result a tandem duplicate-free genome with 16 858 LocusLink genes was obtained.
Two sets of gene clusters with preferentially expressed genes were identified – for total testis and somatic cells of adult testis. All LocusLink genes from the tandem duplicate-free mouse genome were associated with the expression status (preferentially expressed, expressed, unknown). Each chromosome was searched using a sliding window of three adjacent genes and three consecutive preferentially expressed genes were considered as a cluster (tight clusters). Another search was performed using a sliding window of six adjacent genes and at least three preferentially expressed genes were required to form a cluster spanning from the first to the last preferentially expressed gene (loose clusters). The overlapping clusters were merged into a single cluster encompassing all involved genes (separately for tight or loose clusters). The permutations performed 100 000 random shufflings of the expression status in the genome while keeping the gene positions constant. A search with the above defined sliding windows determined the number of preferentially expressed genes located in clusters in each randomized genome. The p-value (one tailed) was computed as the number of permutations yielding greater than or equal to the observed number of preferentially expressed genes located in clusters.
All statistical analyses, including Monte Carlo simulations, hierarchical clustering, chromosomal and gene permutations were conducted in R statistical environment  using custom scripts.
The following database versions were used in all analyses: Mouse UniGene build #136 (March 26, 2004), mouse SAGEmap (April 3, 2004) corresponding to the mouse UniGene #136, LocusLink (April 3, 2004), mouse genome assembly NCBI build 32 (November 2003), mouse Reference Sequence collection (April 3, 2004) and Gene Ontology database (July, 2004).
We thank Laurence D. Hurst, Adam Pavlíček and Jan Pačes for helpful comments and suggestions, Radka Storchová, Zdeněk Trachtulec and Šárka Takáčová for critically reading the manuscript. This work is supported by the project of the Czech Ministry of Education, Youth and Sports No. LN00A079 – Center for Integrated Genomics and by the project of the Academy of Sciences of the Czech Republic No. K5052113. J.F. is supported as an International Scholar of the Howard Hughes Medical Institute.
- Doolittle WF, Sapienza C: Selfish genes, the phenotype paradigm and genome evolution. Nature. 1980, 284: 601-603. 10.1038/284601a0.PubMedView ArticleGoogle Scholar
- Orgel LE, Crick FH, Sapienza C: Selfish DNA. Nature. 1980, 288: 645-646. 10.1038/288645a0.PubMedView ArticleGoogle Scholar
- Watson D, Jacombs AS, Loebel DA, Robinson ES, Johnston PG: Single nucleotide primer extension (SNuPE) analysis of the G6PD gene in somatic cells and oocytes of a kangaroo (Macropus robustus). Genet Res. 2000, 75: 269-274. 10.1017/S0016672300004523.PubMedView ArticleGoogle Scholar
- Handel MA, Hunt PA: Sex-chromosome pairing and activity during mammalian meiosis. Bioessays. 1992, 14: 817-822. 10.1002/bies.950141205.PubMedView ArticleGoogle Scholar
- McCarrey JR, Watson C, Atencio J, Ostermeier GC, Marahrens Y, Jaenisch R, Krawetz SA: X-chromosome inactivation during spermatogenesis is regulated by an Xist/Tsix-independent mechanism in the mouse. Genesis. 2002, 34: 257-266. 10.1002/gene.10163.PubMedView ArticleGoogle Scholar
- Khil PP, Smirnova NA, Romanienko PJ, Camerini-Otero RD: The mouse X chromosome is enriched for sex-biased genes not subject to selection by meiotic sex chromosome inactivation. Nat Genet. 2004, 36: 642-646. 10.1038/ng1368.PubMedView ArticleGoogle Scholar
- Lifschytz E, Lindsley DL: The role of X-chromosome inactivation during spermatogenesis (Drosophila-allocycly-chromosome evolution-male sterility-dosage compensation). Proc Natl Acad Sci U S A. 1972, 69: 182-186.PubMedPubMed CentralView ArticleGoogle Scholar
- Davis TL, Yang GJ, McCarrey JR, Bartolomei MS: The H19 methylation imprint is erased and re-established differentially on the parental alleles during male germ cell development. Hum Mol Genet. 2000, 9: 2885-2894. 10.1093/hmg/9.19.2885.PubMedView ArticleGoogle Scholar
- Liu J, Yu S, Litman D, Chen W, Weinstein LS: Identification of a methylation imprint mark within the mouse Gnas locus. Mol Cell Biol. 2000, 20: 5808-5817. 10.1128/MCB.20.16.5808-5817.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Cooke HJ, Saunders PT: Mouse models of male infertility. Nat Rev Genet. 2002, 3: 790-801. 10.1038/nrg911.PubMedView ArticleGoogle Scholar
- de Rooij DG, de Boer P: Specific arrests of spermatogenesis in genetically modified and mutant mice. Cytogenet Genome Res. 2003, 103: 267-276. 10.1159/000076812.PubMedView ArticleGoogle Scholar
- Ashley T: X-Autosome translocations, meiotic synapsis, chromosome evolution and speciation. Cytogenet Genome Res. 2002, 96: 33-39. 10.1159/000063030.PubMedView ArticleGoogle Scholar
- Forejt J: Hybrid sterility in the mouse. Trends Genet. 1996, 12: 412-417. 10.1016/0168-9525(96)10040-8.PubMedView ArticleGoogle Scholar
- Orr HA, Presgraves DC: Speciation by postzygotic isolation: forces, genes and molecules. Bioessays. 2000, 22: 1085-1094. 10.1002/1521-1878(200012)22:12<1085::AID-BIES6>3.0.CO;2-G.PubMedView ArticleGoogle Scholar
- Storchova R, Gregorova S, Buckiova D, Kyselova V, Divina P, Forejt J: Genetic analysis of X-linked hybrid sterility in the house mouse. Mammalian Genome. 2004, 15: 515-524.PubMedView ArticleGoogle Scholar
- Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science. 1995, 270: 484-487.PubMedView ArticleGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.PubMedView ArticleGoogle Scholar
- Singer JB, Hill AE, Burrage LC, Olszens KR, Song J, Justice M, O'Brien WE, Conti DV, Witte JS, Lander ES, Nadeau JH: Genetic dissection of complex traits with chromosome substitution strains of mice. Science. 2004, 304: 445-448. 10.1126/science.1093139.PubMedView ArticleGoogle Scholar
- O'Shaughnessy PJ, Fleming L, Baker PJ, Jackson G, Johnston H: Identification of Developmentally-Regulated Genes in the Somatic Cells of the Mouse Testis Using Serial Analysis of Gene Expression. Biol Reprod. 2003, 69: 797-808. 10.1095/biolreprod.103.016899.PubMedView ArticleGoogle Scholar
- NCBI Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/geo/]
- Mouse SAGE Site. [http://mouse.biomed.cas.cz/sage/]
- Lash AE, Tolstoshev CM, Wagner L, Schuler GD, Strausberg RL, Riggins GJ, Altschul SF: SAGEmap: a public gene expression resource. Genome Res. 2000, 10: 1051-1060. 10.1101/gr.10.7.1051.PubMedPubMed CentralView ArticleGoogle Scholar
- SAGEmap database. [ftp://ftp.ncbi.nlm.nih.gov/pub/sage/]
- Divina P, Forejt J: The Mouse SAGE Site: database of public mouse SAGE libraries. Nucleic Acids Res. 2004, 32: D482-3. 10.1093/nar/gkh058.PubMedPubMed CentralView ArticleGoogle Scholar
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32 (Database issue): D258-61.PubMedGoogle Scholar
- Gene Ontology Project. [http://www.geneontology.org/]
- Sutcliffe MJ, Darling SM, Burgoyne PS: Spermatogenesis in XY, XYSxra and XOSxra mice: a quantitative analysis of spermatogenesis throughout puberty. Mol Reprod Dev. 1991, 30: 81-89. 10.1002/mrd.1080300202.PubMedView ArticleGoogle Scholar
- Baross A, Schertzer M, Zuyderduyn SD, Jones SJ, Marra MA, Lansdorp PM: Effect of TERT and ATM on gene expression profiles in human fibroblasts. Genes Chromosomes Cancer. 2004, 39: 298-310. 10.1002/gcc.20006.PubMedView ArticleGoogle Scholar
- Lercher MJ, Urrutia AO, Hurst LD: Evidence that the human X chromosome is enriched for male-specific but not female-specific genes. Mol Biol Evol. 2003, 20: 1113-1116. 10.1093/molbev/msg131.PubMedView ArticleGoogle Scholar
- Wang PJ, McCarrey JR, Yang F, Page DC: An abundance of X-linked genes expressed in spermatogonia. Nat Genet. 2001, 27: 422-426. 10.1038/86927.PubMedView ArticleGoogle Scholar
- Huminiecki L, Lloyd AT, Wolfe KH: Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases. BMC Genomics. 2003, 4: 31-10.1186/1471-2164-4-31.PubMedPubMed CentralView ArticleGoogle Scholar
- Yao J, Chiba T, Sakai J, Hirose K, Yamamoto M, Hada A, Kuramoto K, Higuchi K, Mori M: Mouse testis transcriptome revealed using serial analysis of gene expression. Mamm Genome. 2004, 15: 433-451. 10.1007/s00335-004-2347-7.PubMedView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- GNF Mouse Atlas v2. [http://symatlas.gnf.org/]
- Pravenec M, Zidek V, Landa V, Simakova M, Mlejnek P, Kazdova L, Bila V, Krenova D, Kren V: Genetic analysis of "metabolic syndrome" in the spontaneously hypertensive rat. Physiol Res. 2004, 53 Suppl 1: S15-22.PubMedGoogle Scholar
- Farrall M: Quantitative genetic variation: a post-modern view. Hum Mol Genet. 2004, 13 Spec No 1: R1-7. 10.1093/hmg/ddh084.PubMedView ArticleGoogle Scholar
- Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE: Using the transcriptome to annotate the genome. Nat Biotechnol. 2002, 20: 508-512. 10.1038/nbt0502-508.PubMedView ArticleGoogle Scholar
- Wei CL, Ng P, Chiu KP, Wong CH, Ang CC, Lipovich L, Liu ET, Ruan Y: 5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. Proc Natl Acad Sci U S A. 2004, 101: 11701-6. 10.1073/pnas.0403514101.PubMedPubMed CentralView ArticleGoogle Scholar
- Hurst LD, Pal C, Lercher MJ: The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet. 2004, 5: 299-310. 10.1038/nrg1319.PubMedView ArticleGoogle Scholar
- Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000, 18: 630-634. 10.1038/76469.PubMedView ArticleGoogle Scholar
- Reinke V, Smith HE, Nance J, Wang J, Van Doren C, Begley R, Jones SJ, Davis EB, Scherer S, Ward S, Kim SK: A global profile of germline gene expression in C. elegans. Mol Cell. 2000, 6: 605-616. 10.1016/S1097-2765(00)00059-9.PubMedView ArticleGoogle Scholar
- Betran E, Thornton K, Long M: Retroposed new genes out of the X in Drosophila. Genome Res. 2002, 12: 1854-1859. 10.1101/gr.6049.PubMedPubMed CentralView ArticleGoogle Scholar
- Boutanaev AM, Kalmykova AI, Shevelyov YY, Nurminsky DI: Large clusters of co-expressed genes in the Drosophila genome. Nature. 2002, 420: 666-669. 10.1038/nature01216.PubMedView ArticleGoogle Scholar
- Forejt J: X-Y involvment in male sterility caused by autosome translocations - a hypothesis. Genetic control of Gamete Production and Function. Edited by: Fraccaro M and Rubin B. 1982, , Academic Press, New York, 135-151.Google Scholar
- Turner JM, Mahadevaiah SK, Elliott DJ, Garchon HJ, Pehrson JR, Jaenisch R, Burgoyne PS: Meiotic sex chromosome inactivation in male mice with targeted disruptions of Xist. J Cell Sci. 2002, 115: 4097-4105. 10.1242/jcs.00111.PubMedView ArticleGoogle Scholar
- Handel MA: The XY body: a specialized meiotic chromatin domain. Exp Cell Res. 2004, 296: 57-63. 10.1016/j.yexcr.2004.03.008.PubMedView ArticleGoogle Scholar
- Rice WR: Sex-Chromosomes and the Evolution of Sexual Dimorphism. Evolution. 1984, 38: 735-742.View ArticleGoogle Scholar
- Spellman PT, Rubin GM: Evidence for large domains of similarly expressed genes in the Drosophila genome. J Biol. 2002, 1: 5-10.1186/1475-4924-1-5.PubMedPubMed CentralView ArticleGoogle Scholar
- Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 2003, 13: 1998-2004. 10.1101/gr.1649303.PubMedPubMed CentralView ArticleGoogle Scholar
- Lercher MJ, Urrutia AO, Hurst LD: Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet. 2002, 31: 180-183. 10.1038/ng887.PubMedView ArticleGoogle Scholar
- Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus MC, van Asperen R, Boon K, Voute PA, Heisterkamp S, van Kampen A, Versteeg R: The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science. 2001, 291: 1289-1292. 10.1126/science.1056794.PubMedView ArticleGoogle Scholar
- Lercher MJ, Urrutia AO, Pavlicek A, Hurst LD: A unification of mosaic structures in the human genome. Hum Mol Genet. 2003, 12: 2411-2415. 10.1093/hmg/ddg251.PubMedView ArticleGoogle Scholar
- SAGE method homepage. [http://www.sagenet.org/]
- Patefield WM: An efficient method of generating random RxC tables with given row and column totals. Applied Statistics. 1981, 30: 91-97.View ArticleGoogle Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucleic Acids Res. 2003, 31: 51-54. 10.1093/nar/gkg129.PubMedPubMed CentralView ArticleGoogle Scholar
- UCSC Genome Browser expression database. [http://hgdownload.cse.ucsc.edu/goldenPath/hgFixed/database/]
- NCBI mouse genome assembly. [ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/maps/mapview/]
- NCBI mouse Reference Sequences. [ftp://ftp.ncbi.nlm.nih.gov/refseq/M_musculus/mRNA_Prot/]
- The R Project for Statistical Computing. [http://www.r-project.org/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.