In silico identification and expression of SLC30 family genes: An expressed sequence tag data mining strategy for the characterization of zinc transporters' tissue expression

Background Intracellular zinc concentration and localization are strictly regulated by two main protein components, metallothioneins and membrane transporters. In mammalian cells, two membrane transporters family are involved in intracellular zinc homeostasis: the uptake transporters called SLC39 or Zip family and the efflux transporters called SLC30 or ZnT family. ZnT proteins are members of the cation diffusion facilitator (CDF) family of metal ion transporters. Results From genomic databanks analysis, we identified the full-length sequences of two novel SLC30 genes, SLC30A8 and SLC30A10, extending the SLC30 family to ten members. We used an expressed sequence tag (EST) data mining strategy to determine the pattern of ZnT genes expression in tissues. In silico results obtained for already studied ZnT sequences were compared to experimental data, previously published. We determined an overall good correlation with expression pattern obtained by RT-PCR or immunomethods, particularly for highly tissue specific genes. Conclusion The method presented herein provides a useful tool to complete gene families from sequencing programs and to produce preliminary expression data to select the proper biological samples for laboratory experimentation.


Background
Zinc is involved in many cellular processes as a cofactor of numerous enzymes, nuclear factors and hormones and as an intra-and intercellular signal ion [1,2], and hence, is a very important component of cell viability. However, since both zinc excess and deficiency could be toxic, local intracellular zinc concentrations must be strictly regulated. The two main protein components involved in zinc homeostasis are metallothioneins, zinc transporters [3], and specific, gated, zinc permeable membrane spanning channels [4,5]. Metallothioneins play an important role in zinc transport, storage and distribution [6]. Zinc transporters are transmembrane proteins, which ensure zinc ions carriage across biological membranes. Some transporters allow intracellular uptake of zinc, while others permit cellular efflux of zinc. Proteins involved in cellular uptake of zinc have been characterized in plants, yeast and mammals [7]. In mammalian cells, seven homologous zinc export proteins, named ZnT-1 to -7 have been discovered (for review see [3]). These proteins are members of the SLC30 solute carrier subfamily of the CDF family (Cation Diffusion Facilitator), and share the same predicted structure, with six membrane-spanning domains and a histidine-rich intracellular loop between helixes IV and V, excepted for ZnT-6 which retains a serine-rich loop [8]. It is still controversial whether mammalian ZnT proteins are truly transporters or proteins controlling zinc transportation through other channels [9]. However, recent works demonstrated that bacterial ZitB and CzcD proteins, two members of the CDF family are antiporters catalyzing the obligatory exchange of Zn 2+ or Cd 2+ for K + and H + with a 1:1 stoichiometry [10,11].
ZnT-1 is an ubiquitous zinc transporter located in the plasma membrane and ensures zinc efflux from the cell [12]. ZnT-2 equally confers zinc resistance, although it is located in acidic endosomal/lysosomal vesicles and allows vesicular zinc accumulation inside the cell [13]. ZnT-3 and ZnT-4 are more closely related to ZnT-2 than ZnT-1. ZnT-3 is tissue specific and mainly located in brain, in the membranes of zinc-rich synaptic vesicles within mossy fiber boutons of hippocampus [14] and in testis [15]. Conversely, ZnT-4 is expressed ubiquitously [16], but higher levels of ZnT-4 are found in brain, mammary glands and epithelial cells [6]. This transporter has been shown to be essential in mammary epithelia for regulating milk zinc content in mice [17]. ZnT-5 is an ubiquitous zinc transporter localized in intracellular non-acidotropic vesicles and found to be abundantly expressed in pancreatic beta cells [18]. A sixth member of the ZnT family, ZnT-6 has been described and is responsible for the relocation of cytoplasmic zinc into the trans Golgi network and the vesicular compartment [19]. Recently, ZnT-7 was also described as a Golgi apparatus protein involved in accumulation of zinc [20].
From genomic databanks analysis, we identified two novel SLC30 genes, SLC30A8 and SLC30A10. During the preparation of this article, another SLC30 gene, SLC30A9, appeared in Genbank [21,22] under the accession number BC016949, extending the family to 10 genes. However, the homology for this latter gene to the other SLC30 sequences is very low. To further characterize these new genes and prove the validity of this method, we took advantage of the ever-increasing wealth of information available through the human expressed sequence tag database (dbEST). Assuming that cDNA libraries used for EST sequencing are representative of all mRNA transcripts in a given tissue [23], we determined SLC30 family mRNA transcript levels in different tissues by EST database analysis for all the already known ZnTs (except for ZnT-9) and compared their in silico expression profiles with experimental data on human tissues. For most cases, the experimental data correlate with in silico analysis. Hence, this strategy provides valuable informations and the method presented herein is a useful tool to complete gene families from sequencing programs and to produce preliminary expression data before selecting the proper biological samples for laboratory experimentation.

Results and Discussion
An approach for discovering new genes is to search the whole human genome sequence for homologous sequences of known genes or known gene families by in silico methods. Recent publications demonstrate the efficiency of this technique to find new genes [24,25]. Using the different already known ZnT cDNA and protein sequences in human, mouse or rat as a bait for a BLASTN or a TBLASTN search of the human genome databanks, we discovered two DNA sequences encoding new putative zinc transporters belonging to the ZnT family. These new genes were named SLC30A8 and SLC30A10, encoding the proteins designated ZnT-8 and ZnT-10 respectively. Human SLC30A8 cDNA was found in the contig AC027419, which allowed us to localize the SLC30A8 gene to human chromosome 8 at the position q24.11 (Table 1). The gene contained 8 exons, spanned 37 kb and is predicted to code for a 40.8 KDa protein (Fig. 1). The sequence data reported for human ZnT-8 mRNA was submitted to Genbank under the accession number AY117411. Human SLC30A10 cDNA was found in the contig AC093562, which allowed us to localize the SLC30A10 gene to human chromosome 1 at the position q41. The gene contained 4 exons, spanned 15 kb and was predicted to code for a 52.7 KDa protein (Fig. 1). The sequence for ZnT-10 mRNA was submitted to Genbank under the accession number BK004163. We also localized ZnT-2 gene (SLC30A2) in human genome to chromosome 1 at the position p36.11 (Table 1) by homology with the rat ZnT-2 sequence [13]. The predicted cDNA and protein sequences are identical to the NM_032513 nucleotide and NP_115902 protein entries of Entrez database.
We then aligned the predicted sequence of the nine human ZnT proteins 1 with the ClustalW program. When compared, all the proteins of the family are predicted to have a conserved structure, with a common pattern composed of 6 transmembrane helices and a histidine-rich domain between helices IV and V. Both N-and C termini are predicted to be located on the cytoplasmic side of the plasma membrane. Alignments of amino acids composing the fifth and sixth transmembrane domains illustrate this homology ( Fig. 2A). However, despite very well conserved residues, the homology between amino acid sequences can differ from one protein to another. For example it was known that ZnT-5 exhibited 15 transmembrane domains, but the region homologous to the members of the family is located in the carboxyl-terminal portion and is predicted to adopt six membrane-spanning domains. The histidine-rich loop is replaced by a serinerich loop for ZnT-6. We report the presence of a loop rich in basic residues for ZnT-10, while ZnT-8 keeps the characteristic histidine-rich loop (Fig. 2B). The histidine content is also very different, from no histidine residue for ZnT-10 to 20 histidine residues for ZnT-7 (Table 1).  Using the amino acids alignment, a phylogenetic tree for the 10 ZnT sequences was calculated by the neighbourjoining method (Fig. 3). Zip-2, a zinc membrane transporter belonging to the SLC39 family was used as an outgroup. From this analysis, we can delineate three subfamilies: ZnT-1 and ZnT-10; ZnT-5 and ZnT-7; ZnT-2, ZnT-3, ZnT-4 and ZnT-8. This result was confirmed by similarity analysis of the amino acid protein sequences. The subfamily ZnT-2, -3, -4, -8 exhibited the highest homologies, with the highest score of 53.5 % between ZnT-2 and ZnT-8. The homology between ZnT-1 and ZnT-10 is high with a score of 48.3. But, ZnT-5, -6 and -7 are less homologous, with a highest score of 27.8 % between ZnT-5 and ZnT-7. ZnT-9 has the lowest homology with the other ZnTs. Despite an overall shared topological structure, the similarity between the subfamilies is relatively low.

Partial alignment of ZnT proteins
The in silico characterization of SLC30 tissue expression pattern was performed by an expressed sequence tag (EST) data mining strategy. The predicted SLC30 transcripts (ORF, 5' and 3' UTRs) were queried against the human EST database using BLASTN. We obtained a total of 426 significant hits with a bit score >150 and an E-value Dendrogram of ZnT proteins Figure 3 Dendrogram of ZnT proteins. Bootstrapping (2000 replicate sets) and calculation of the consensus tree by the neighbourjoining method were performed with the DAMBE program. The numbers indicate bootstrapping values as a percentage at internal nodes. The scale of the branch length is given in amino acid substitutions per site. Accession numbers in Entrez databanks are indicated for protein sequences excepted for ZnT-10 whose accession number corresponds to the cDNA sequence. Zip-2 protein sequence was used as an outgroup.
We found SLC30A1 expression in 18 tissues out of 36, indicating a very wide expression pattern (Fig. 4). ZnT-1 was demonstrated to display a broad tissue distribution. It is particularly abundant in intestine, liver [26] and in the brain [27]. On the other hand, no ZnT-1 transcripts were expressed at detectable levels in lamina propria intestinal cells or in many kidney cells. ZnT-1 gene is controlled at the transcriptional level by zinc status. The elevation of extracellular zinc concentration results in a rapidly and dramatically increase of SLC30A1 mRNA levels, mediated by the transcription factor MTF-1, a sensor of zinc level [28]. So, the basal evaluation of SLC30A1 expression by EST analysis may not reflect the real expression which depends on extracellular conditions.
For SLC30A2 expression, we calculated a highest level in placenta and high levels in eye, kidney and ovary. Experimental results indicate an expression of ZnT-2 in intestine, kidney, seminal vesicles and testis [13]. In rats, SLC30A2 mRNA expression is limited to small intestine, kidney, placenta and liver, while SLC30A2 mRNA levels were increased several fold only in small intestine, liver and kidney upon a single oral dose of zinc [16]. The very high level of SL30A2 expression in placenta presumably indicated an important role of ZnT-2 transporter in zinc exchange between maternal tissues and foetus.
The results for SLC30A3 display a good correlation between in silico analysis and experimental data. We determined a restricted expression with very high levels of expression in brain and testis, an expression pattern previously identified by northern blot and reverse transcriptase-PCR analysis [15]. In brain, SLC30A3 mRNA is most abundant in the cerebral cortex and in synaptic vesicle membranes within mossy fiber boutons in the hippocampus [14]. Zinc is secreted from these vesicles in response to high frequency stimulations [29,30].
ESTs for SLC30A4 were founded only in few tissues (Bcells, muscle, ovary, parathyroid gland, stomach and testis) and the correlation with experimental data is very poor. The highest level was calculated for testis tissue sample. Znt-4 was first thought to play an important role in milk secretion. A nonsense mutation, leading to a truncated form of ZnT-4, is responsible for the inherited zinc deficiency in the lethal milk (lm) mouse [17,31]. In the lm mouse, the maternal milk does not contain enough zinc for the newborn mice to live. ZnT-4 is constitutively expressed in human breast epithelial cells [32]. However, in human no difference in ZnT-4 expression levels was observed between lactating and resting breasts. In rats, ZnT-4 is expressed ubiquitously and was refractory to changes in zinc intake [16]. ZnT-4 is also expressed in polarized enterocytes, in which it is localized in the membrane of intracellular vesicles, the majority of which concentrates in the basal cytoplasmic region. The protein was not founded in proliferating cells of the crypt, but was detected in differenciated enterocytes of the villi, the apparition corresponding to the junction crypt/villi.
From EST analysis results, SLC30A5 is ubiquitously expressed, with high levels in kidney, liver, pancreas, brain, skin, bone marrow and T-cells. We determined the presence and the level of SLC30A5 mRNA by PCR amplification of cDNA libraries prepared from different human tissues [see additional file 1]. As expected from calculated data, a SLC30A5 specific product was detected at a high level in nearly all kind of tissues, thus confirming the previously published results [18].
From in silico analysis, SLC30A6 displayed the highest levels in germinal B-cells and colon and high levels in eye and lung. In vivo, SLC30A6 mRNA has been detected in liver, brain, small intestine and kidney. Western blot analysis indicated that ZnT-6 is present in mouse brain, small intestine, kidney and lung [19].
Low levels of expression were calculated for SLC30A7, excepted in the colon and the eye. SLC30A7 mRNA (Northern-blot) or PCR products were detected in the heart, liver, spleen, plasma blood leukocytes, small intestine, kidney, brain, lung, ovary, prostate and testis at a very low level ( [20] and see additional file 1). Recently, we demonstrated an induction of SLC30A7 expression by extracellular zinc deficiency [33].
SLC30A8 had a very high expression restricted to the pancreas. We detected a faint signal for four other tissues. We then analyzed SLC30A8 gene expression by PCR using a panel of 24 cDNAs prepared from different tissues. A specific PCR product was only detected in pancreatic tissue sample (see additional file 2). This last result is highly correlated with in silico analysis.
From EST analysis results, SLC30A10 had a restricted expression to fetal liver and fetal brain. It is the first zinc transporter predicted to have a fetal restricted expression. SLC30A10 and SLC30A1 have a high homology. At birth, ZnT-1 protein is nearly undetectable and ZnT-1 expression increases at the end of the first postnatal week [34]. So, we speculate that ZnT-10 could play a role comparable to that of ZnT-1 during fetal development.

Conclusions
From genomic databanks analysis, we identified two novel SLC30 genes, SLC30A8 and SLC30A10, extending the SLC30 family to ten members. We determined an overall good correlation of ZnT in silico gene expression with expression patterns obtained by RT-PCR or immunomethods, particularly for highly tissue-specific genes.
As the average number of ESTs recovered per library was relatively low (few copies of ZnT sequence per library), we can not definitively conclude that tissues without ESTs for a given ZnT do not express this gene at all. We have also to keep in mind that the zinc status of the cells and, hence, the adaptative mechanisms to extracellular zinc concentrations were usually unknown for sample tissues used for RT-PCR experiments or EST sequencing programs. In conclusion, this method provides a useful tool to complete gene families from sequencing programs and to produce preliminary expression data to select the proper biological samples for laboratory experimentation.

EST analysis for in silico determination of SLC30 genes tissue expression
The SLC30 sequences (ORF, 5' and 3' UTRs) were used for a BLASTN search of the human EST database through NCBI BLAST web service [36]. The significant ESTs (bit score >150 and E-value <0.001) were sorted and information regarding each cDNA library was retrieved from either the human Unigene databank [40] or from the respective company catalogue. Libraries prepared from pooled tissue samples or derived from other libraries were rejected from the analysis. The frequency of each mRNA transcript for a given tissue was calculated. The ESTs were also analyzed by the Gene2EST program [41] to precisely locate the 5' and 3' starting ends of the transcript and the spliced variants [42].

Sequences alignment and phylogeny
Predicted Homo sapiens ZnT protein sequences were aligned using the clustalW program [43]. For phylogenetic analysis, bootstrapping (2000 replicate sets) and calculation of the consensus tree were performed with the DAMBE program by the neighbour-joining method [44]. Bootstrap analysis is based on multiple re-sampling of the original data and is the commonest method of estimating the degree of confidence in the topology of phylogenetic trees. Zip-2 protein sequence (NP_055394) was used as an outgroup.

Expression in human tissues
The presence and the level of SLC30A5, SLC30A7 and The products were analyzed by agarose gel electrophoresis, stained with ethidium bromide and photographed under UV light with a CCD camera.