The mammalian TMC gene and protein family
We used human TMC1 and TMC2, murine Tmc1 and Tmc2, and human EVER1 and EVER2 sequences in conjunction with database search algorithms to identify homologous sequences in genomic and expressed sequence tag (EST) databases. We iteratively assembled contiguous theoretical coding sequences for individual human and mouse proteins. We verified regions of individual murine cDNAs that were not unequivocally predictable with database information by RT-PCR and by sequencing. This strategy resulted in the identification of the murine orthologues of EVER1 and EVER2, and four hitherto uncharacterized human and mouse proteins (Figure 1, Table 1). All identified proteins have eight predicted membrane-spanning domains (TM1 – TM8) and share the completely conserved amino acid triplet C (cysteine) – W (tryptophan) – E (glutamic acid), predicted to be located in the extracellular loop upstream of TM6 (Figure 2). The presence of this hereby named TMC signature sequence motif CWETXVGQEly(K/R)LtvXD is our defining criterion for this novel TMC protein family (Figures 1, 2, additional file 2).
The eight mammalian TMC proteins can be grouped into three subfamilies A, B, and C, based on sequence homology and on similarities of the genomic structure of their respective genes (Figures 1, 3, and 4). The murine TMC protein subfamily A consists of three proteins, Tmc1, Tmc2, and the novel Tmc3. All TMC subfamily A proteins are between 757 and 1130 amino acids in length. Tmc3 bears a long carboxyl-terminal tail that is unlike all other TMC proteins (Figure 1). The overall identity within the TMC subfamily A is 36–56%; the positions of >73% of the genes' introns within the conserved core region are conserved (Figure 3A, 3C, additional file 1).
The murine TMC subfamily B consists of Tmc5 and Tmc6 (mouse orthologue of EVER1), proteins of 810 and 757 amino acid residues that are 31% identical and share a >92% conservation of the corresponding genes' intron locations within the conserved core region (Figure 3A, 3C, additional file 1). A significant structural difference between subfamily B and subfamily C proteins is that the long presumptive extracellular loop of TMC subfamily A proteins between TM5 and TM6 is much shorter in subfamily B proteins and mainly consists of the TMC signature sequence motif (Figures 1, 2).
Finally, the three members of the murine TMC subfamily C, Tmc4, Tmc7, and Tmc8 (mouse orthologue of EVER2) are the shortest TMC proteins with 694, 726, and 722 amino acid residues. The overall identity within the murine TMC protein subfamily C is 29–33% with a common gene structure of >92% conserved intron locations within the conserved core region (Figure 3A, 3C, additional file 1).
Analysis of the human TMC gene and protein family yielded principally identical results, likely because of the high degree of conservation between human and murine TMC proteins [5] (Figure 4). The new TMC classification now designates TMC6 for EVER1, and TMC8 for EVER2 (Table 1).
The TMC genes map to six chromosomal locations in the human and mouse (Table 1). Two chromosomal locations in both species harbor two neighboring TMC genes, Tmc5 and Tmc7 on murine chromosome 7, and Tmc6 and Tmc8 on murine chromosome 11; the human orthologues are located on the syntenic regions of chromosomes 16 and 17, respectively [7]. An additional murine locus on chromosome 8 represents a partial gene fragment, likely a pseudogene of Tmc2.
Expression of murine TMC transcripts
To demonstrate that all predicted murine TMC genes are transcribed, we performed RT-PCR experiments. Primer pairs specific for each TMC transcript amplified products of predicted length and sequence (Figure 5).
Tmc1 and Tmc3 mRNAs were detectable in most neuronal organs and we also found expression in some non-neuronal organs. Tmc2 transcripts were only detectable in testis; we did not reveal by RT-PCR Tmc2 expression in cochlea. However, we were able to verify that Tmc2 is expressed in the cochlea by using an organ of Corti cDNA library as a template for PCR, which corroborates the results presented by Kurima et al. (2002).
We observed that mRNAs encoding Tmc5, Tmc6, Tmc4 and Tmc7 are expressed in most murine organs tested. Tmc8 mRNA is detectable in thymus and lung; we also found expression of Tmc8 mRNA in spleen (not shown). We did not detect Tmc8 transcripts in any other organs investigated.
Our expression analysis results are corroborated by an analysis of applicable TMC ESTs obtained from GenBank http://www.ncbi.nlm.nih.gov/dbEST/ (Figure 5).
Non-mammalian vertebrate and invertebrate TMC genes and proteins
The high degree of similarity between the corresponding human and murine TMCs suggests a conserved function of these proteins. This role of TMC proteins may also be conserved among other species. We therefore decided to investigate the TMC genes of other vertebrates and invertebrates.
We identified eight TMC loci in the Japanese pufferfish (Torafugu, Fugu rubripes, Fr) genomic database http://fugu.hgmp.mrc.ac.uk/. Whereas the genome of Fugu rubripes is one order of magnitude shorter than the length of the human genome, the total number of genes is estimated to be approximately the same [8, 9]. Because of the low homology of amino- and carboxyl-terminal sequences, we were not able to determine the complete coding sequences of the eight pufferfish TMC proteins unequivocally; nevertheless, we obtained sufficient sequence information of the central parts of the proteins bearing the transmembrane domains to classify the eight pufferfish TMCs into the three subfamilies. This subfamily assignment of individual TMC proteins was further substantiated by an analysis of the degree of conservation of intron positions within the Fugu rubripes TMC gene family (Figure 3B). The Fugu rubripes genome contains three TMC subfamilyA genes Tmc2-rs1 (Tmc2-related sequence1), Tmc2-rs2 (Tmc2-related sequence2), Tmc3, three TMC subfamilyB genes Tmc5, Tmc6-rs1 (Tmc6-related sequence1), Tmc6-rs2 (Tmc6-related sequence2), and two subfamilyC genes Tmc4 and Tmc7 (Table 1). The nomenclature of the pufferfish TMC genes and proteins is derived from the phylogenetic relation of the corresponding sequences with the mammalian TMCs (Figure 4). It is interesting that the pufferfish genome lacks orthologues of mammalian TMC1 and TMC8. Fugu rubripes Tmc5 and Tmc7 are clustered, equivalent to the clustering of their mammalian orthologues [7].
TMC genes also exist in invertebrates. In GenBank, we identified two mRNA sequences encoding Caenorhabditis elegans (Ce) TMC proteins. These mRNAs are transcribed from TmcAh1 (Tmc subfamily A homologue1) and TmcAh2 (Tmc subfamily A homologue2) (Table 1).
Whereas the C. elegans genome appears to lack TMC subfamily B and C genes, some insects have genes for all three TMC subfamilies. For example, the mosquito Anopheles gambiae has three TMC genes, TmcAh (Tmc subfamily A homologue), TmcBh (Tmc subfamily B homologue), and TmcCh (Tmc subfamily C homologue) (Figure 4, Table 1). We did not find deposited cDNA TmcAh and TmcCh, but TmcBh appears to be transcribed (ESTs BM645887, BM621478, BM605758, and BM636384). A search of the Drosophila melanogaster genome database revealed only a single TMC gene, TmcAh, (Figure 4, Table 1).
We did not find evidence for TMC genes in genomes and cDNA databases of yeast and plants.