The Anopheles gambiae glutathione transferase supergene family: annotation, phylogeny and expression profiles

Background Twenty-eight genes putatively encoding cytosolic glutathione transferases have been identified in the Anopheles gambiae genome. We manually annotated these genes and then confirmed the annotation by sequencing of A. gambiae cDNAs. Phylogenetic analysis with the 37 putative GST genes from Drosophila and representative GSTs from other taxa was undertaken to develop a nomenclature for insect GSTs. The epsilon class of insect GSTs has previously been implicated in conferring insecticide resistance in several insect species. We compared the expression level of all members of this GST class in two strains of A. gambiae to determine whether epsilon GST expression is correlated with insecticide resistance status. Results Two A. gambiae GSTs are alternatively spliced resulting in a maximum number of 32 transcripts encoding cytosolic GSTs. We detected cDNAs for 31 of these in adult mosquitoes. There are at least six different classes of GSTs in insects but 20 of the A. gambiae GSTs belong to the two insect specific classes, delta and epsilon. Members of these two GST classes are clustered on chromosome arms 2L and 3R respectively. Two members of the GST supergene family are intronless. Amongst the remainder, there are 13 unique introns positions but within the epsilon and delta class, there is considerable conservation of intron positions. Five of the eight epsilon GSTs are overexpressed in a DDT resistant strain of A. gambiae. Conclusions The GST supergene family in A. gambiae is extensive and regulation of transcription of these genes is complex. Expression profiling of the epsilon class supports earlier predictions that this class is important in conferring insecticide resistance.


Background
Glutathione transferases (GSTs) are a diverse family of dimeric proteins found in almost all living organisms. Originally studied for their role in detoxification of endogenous and xenobiotic compounds, they have since been found to have additional important roles as transport proteins and in protection against oxidative stress [1]. Each GST subunit consists of two domains, each containing two binding sites, the G site and the H site. The highly conserved G site binds the tripeptide glutathione and is largely composed of amino acid residues found in the Nterminal domain. The H-site or substrate binding site is more variable in structure and is largely formed from residues at the C-terminal [2].
Purification of independent homogenous GST preparations with differing substrate specificities indicated the presence of multiple forms of GSTs [3]. Subsequently, the availability of N-terminal sequence data led to the recognition of five classes of cytosolic GSTs in mammals, the alpha, mu, pi, theta and sigma classes [2,4,5] and an additional, structurally unrelated membrane bound microsomal class [6]. Recently the advent of large scale EST and full genome sequencing projects has led to a marked increase in the number of GST classes recognized. Some of these, such as the omega and zeta classes are represented in a wide range of species [7,8], whereas others, such as the mammalian kappa class [9], the insect epsilon class [10] and the plant tau and phi clases [11] have a more restricted distribution.
Most of these GST classes are encoded by multigene families. Alternative splicing [12,13] and the formation of heterodimers [14], can add a further level of heterogeneity to this enzyme family. With this level of diversity, assigning physiological functions to individual GSTs is a complex task, but progress towards this goal can be greatly facilitated by the process of cataloguing the number of genes within the supergene family. Armed with this information, details of expression profiles, induction mechanisms, tissue distribution etc. can be accurately obtained enabling biologically important questions to be addressed. Automatic annotation algorithms applied to assembled eukaryotic genomes provide projections of the sizes of gene families within a species. Using these tools the numbers of GST genes is estimated at 10 in Saccharomyces cerevisiae, 57 in Caenorhabditis elegans, 43 in Drosophila melanogaster, 37 in Anopheles gambiae, 46 in Arabidopsis thaliana and 40 in Homo sapiens [15]. Careful manual annotation is essential to confirm these predicted numbers. This process has led to revised sizes of the GST supergene family in A. thaliana and A. gambiae to 48 and 31 respectively [11,16]. (This gene count in A. gambiae includes three genes encoding putative microsomal GSTs but these will not be discussed further in this report).
To facilitate the functional characterization of insect GSTs, we have studied the annotation of each member of this supergene family in the mosquito A. gambiae. RT-PCR experiments demonstrate that all but one of the predicted GST genes are actively transcribed in adult mosquitoes and that alternative splicing of two GST genes contributes additional diversity. We compare this GST supergene family with that of a second Diptera, D. melanogaster and identify classes of GSTs that are conserved between the species and other classes that have undergone independent radiation.
The majority of studies on insect GSTs have focused on their role in conferring insecticide resistance (e.g. [17][18][19]) and, more recently, in protecting against cellular damage by oxidative stress [20,21]. Reactive oxygen species can be produced in response to infection by pathogens, and this phenomenon has been implicated in the defense mechanism of mosquitoes against malaria parasites [22]. A. gambiae, is therefore an ideal species in which to study the role of GSTs in both of these biological processes. This species is the major vector of malaria in Africa and as such is responsible for over 1 million deaths each year [23]. Efforts to control the disease by targeting the mosquito populations have relied on treatment of Anopheles breeding and resting sites with insecticides. The organochlorine DDT was the insecticide of choice for malaria control for much of the latter half of the 20 th century and is still employed in public health campaigns today but, in many malarious regions, resistance has rendered DDTbased control programmes ineffectual. In A. gambiae, DDT resistance is associated with increased GST activity [24]. Genetic mapping using microsatellite markers has located two loci associated with DDT resistance in A. gambiae [25]. By aligning the cytogenetic position of these resistance loci with the in situ position of physically mapped GST genes we previously identified two candidate resistance-associated GSTs [10]. Analysis of the draft genome sequence of A. gambiae identified a further six GST genes within this region of the genome [16]. We now report that the expression of multiple members of this gene cluster is elevated in DDT resistant insects.

Results and Discussion
Classification of A. gambiae GSTs Twenty eight genes putatively encoding cytosolic GSTs were identified in the A. gambiae genome (Table 1). Nomenclature guidelines originally proposed for vertebrate GSTs and later expanded to incorporate invertebrate classes [26], were employed when classifying the A. gambiae genes. This led to the renaming of nine genes previously described in the literature (changes are shown in Table 1).
The putative amino acid sequences of the A. gambiae GSTs were aligned using ClustalW [27] and GST-1 from C. elegans (Accession number CAA78471) as an outgroup (Figure 1). The alignment was manually truncated by the removal of twenty positions at both the N-terminal and Cterminal that contained excessive gaps and two small internal regions of poor alignment, one consisting of six residues in the linker region between the N and C domains and the second consisting of 21 residues in the C-terminal domain of which over 75% were gaps in the majority of sequences ( Figure 1). This manually truncated alignment, consisting of 218 residues, was used to determine the phylogenetic relationship between the A. gambiae GSTs by both distance and parsimony methods. Two of the GST genes (GSTd1 and GSTs1) are alternatively spliced to produce multiple transcripts with identical Ntermini but differing C-termini (see below). To avoid distortion of the phylogeny that may have resulted from inclusion of conserved N-termini, a second alignment was generated, in which a further 85 residues from the N-terminal, were removed and this alignment was also subjected to parsimony and distance methods. Although several different optimal trees were generated from the two alignments and two phylogenetic methods, their topology was essentially the same, differing only in the placement of four GSTs, GSTu1, GSTu2, GSTu3 and GSTe8, discussed below. A representative distance tree, derived from the N-terminal truncated alignment is shown in Figure 2.
The two largest GST classes in A. gambiae are the insect specific delta and epsilon classes with 12 and 8 members respectively. Support for the monophyly of these two classes is low in the tree shown although, when the 218 residue alignment is used, the bootstrap values are more supportive (data not shown). Criteria for inclusion in a particular class is based primarily on amino acid sequence identity and phylogenetic relationship, but chromosomal location and immunological properties, where known, were taken into account. Thus GSTe8, although sharing less than 29% amino acid identity with other members of the epsilon class, is found immediately adjacent to the seven epsilon GSTs on chromosome 3R, is immunologically related [28] and, in the majority of the trees, formed a weakly supported monophyletic group with the seven bona fide epsilon GSTs. Thus this GST was classified as the eighth member of the epsilon class [28]. Three GSTs, GSTu1, GSTu2 and GSTu3, are outliers from the major delta GST clade ( Figure 2). The phylogenetic relationship of these GSTs to the remainder of the family was not consistent between the different trees. Furthermore, these GSTs share less than 37% amino acid identity with other members of the A. gambiae delta GST class (pairwise amino acid identities between the three tentative delta class GSTs and the remainder of the class range from 22.1% between GSTd4 and GSTu2 to 36.3% identity Alignment of A. gambiae GSTs  - LMEKDDAVKQSFISTEDHTKFLQSRKNGENNYDILA------------between GSTd11 and GSTu3) and are not physically clustered with the majority of the delta class GSTs on chromosome 2L. Thus it is possible that these GSTs may belong to an as yet unrecognised GST class (or classes) and in the absence of clarifying immunological or biochemical data these GSTs have been designated as unclassified (denoted by a 'u').
Of the remaining, five A. gambiae GSTs, two belong to the theta class, originally thought of as the progenitor class of all GSTs [29]. The ubiquitous omega and zeta classes [7,8] are each represented by a single gene in A. gambiae, and a single sigma GST (alternatively spliced to produce two different transcripts (see below)) is also present. Support for the classification of these five GSTs is provided by a phylogenetic comparison with their orthologous classes from other organisms (Figure 3).
With the exception of the sigma class, all of the non-insect specific classes are expanded in D. melanogaster relative to A. gambiae. Neither species has any sequence related to the mitochondrial kappa class found in mammals [9]. The endogenous function of these mammalian GST classes in insects has not been clearly resolved and therefore the significance of the difference in size of these classes in the two Diptera examined, is, at present, unknown.

Amplification of A. gambiae cDNAs
Transcripts from 27 of the 28 A. gambiae GST genes were detected in fourth instar larvae or one-day-old adults by RT-PCR. Expression of GSTd9 was not detected in any life stage and thus it is possible that this gene represents a silent pseudogene, as suggested earlier [10]. A second putative pseudogene is GSTd6. Utilisation of the first inframe stop codon for this gene (as present in the genome sequence database) would generate a transcript with 885 bp of coding sequence (> 130 bp longer than any of the other cytosolic GST genes in this species) and the putative translation would encompass a string of 14 glutamine residues. We therefore sequenced the intergenic region between GSTd6 and the neighbouring gene, GSTd11 to detect any possible frame shifts or sequencing errors that may have masked the stop codon but none were detected. A GSTd6 transcript of 666 bp was amplified by RT-PCR but attempts at 3' RACE have so far failed to detect the 3' end of this transcript.

Intron positions and sizes
The cDNA sequences of the A. gambiae GSTs were aligned with the genomic DNA sequences retrieved from the genome database and the position and sizes of the introns noted. Intron size ranged from 64 bp in GSTe1 and GSTd4 to 13,937 bp in GSTz1 with the majority of introns ranging from 50 to 100 bp (Figure 4). Only two of the GSTs are intronless, GSTd2 and the putative pseudogene, GSTd9.
Neighbour-joining tree illustrating the relationship between representative insect and mammalian GST classes Figure 3 Neighbour-joining tree illustrating the relationship between representative insect and mammalian GST classes. GSTs from the non-insect specific classes from A. gambiae, D. melanogaster were aligned with representatives from mammalian classes using ClustalW. The tree was constructed by the neighbour-joining method from a similarity matrix of pairwise comparisons made using the Jukes-Cantor algorithm. Sequences shown in blue are from A. gambiae. Dm = D. melanogaster, Hs = Homo sapiens, Rn = Rattus norvegicus. Nodes with distance bootstrap values (500 replicates) of > 70% are marked by*. There are thirteen unique intron sites and 42 introns within the A. gambiae GST supergene family. The majority of the introns (28) are phase 0 introns, i.e. the intron does not interrupt a codon. There is a considerable conservation of intron positions within the different GST classes ( Figure 2) and one intron is found in 17 GST genes spanning three different classes. Interestingly this conserved intron, found approximately 50 amino acid residues from the N-terminal, is also the splice site for the alternative transcripts of GSTs1 and GSTd1 (see below). It has been proposed, by proponents of the 'introns-early' hypothesis, that different exons correspond to different domains in a protein [31]. The highly conserved GST intron however, splits the N-terminal domain (roughly residue 1-80). In addition, the phase of this intron is not conserved between the different classes, and thus the classification of this as an ancient intron is not well supported.

Clustering of GST genes in the genome
The A. gambiae GST genes are located on all three of the mosquito's chromosomes but two large, gene-specific clusters exist ( Figure 5). The eight members of the epsilon GST class are clustered on chromosome 3R division 33B. In D. melanogaster, a cluster of 10 epsilon GSTs are located on chromosome 2R, division 55C [30] (Figure 6B). A search of the D. melanogaster genome identified a further four members of this family also present on chromosome 2R but at a distant location. These four singletons belonging to the D. melanogaster epsilon GST class do not form a monophyletic clade with the cluster of 10 DmGSTE genes ( Figure 6A). Instead, one of these Drosophila GSTs (CG4688) is a probable ortholog of A. gambiae GSTe8.
Histogram showing the frequency of intron sizes in the A. gambiae GST supergene family  Figure 6A and 6B).
The A. gambiae delta class GSTs are located in two closely linked clusters of six genes on chromosome 2R divisions 18B and 19D ( Figure 7B). Ten members of the D. melanogaster delta class are sequentially arranged on chromosome 3R, division 87B. We have identified an eleventh putative member of this class (CG17639) in D. melanogaster that is a clear ortholog of A. gambiae GSTd7 [10] ( Figure 7B). A detailed analysis of the phylogenetic relationship between members of the A. gambiae delta GST class, supports the suggestion by Holt et al [15] that the two clusters of paralogous delta GSTs are partly a result of segmental duplication. The clustering of GSTd8 with GSTd11, GSTd6 with GSTd9 and GSTd5 with GSTd10 is consistent with a duplication of a cluster of three paralogous genes, with subsequent further local duplications in one block to give rise to GSTd4 and GSTd3 ( Figure 7A and 7B).
In both A. gambiae and D. melanogaster local duplications in the epsilon and delta GST families have led to independent expansions of these gene classes. Subsequent diversification of these enzymes has presumably facilitated the adaptation of the two Diptera to their different ecological niches.
Schematic diagram indicating the organisation of GST genes in the A. gambiae genome Figure 5 Schematic diagram indicating the organisation of GST genes in the A. gambiae genome. The numbers represent polytene chromosome divisions. Those genes marked with * are alternatively spliced to produce multiple transcripts. Alternative Splicing An A. gambiae delta class GST, GSTd1, that is alternatively spliced to give four different mature transcripts each encoding subunits with differing biochemical properties, has already been described [12]. Here we report that a second member of this supergene family belonging to the sigma class, is also alternatively spliced in a similar manner producing two distinct transcripts both of which have been detected in adults by RT-PCR ( Figure 8). As with GSTd1 the two transcripts of GSTs1 share a common 5' exon but differing 3' exons (64.3% identity at the amino acid level). As the carboxyl region of the protein contains the majority of the residues involved in substrate binding (the H-site), the two subunits encoded by the sigma GST gene in A. gambiae are likely to encode proteins with differing substrate specificities.
The sigma class in D. melanogaster is also represented by a single gene, DmGSTs1. The Drosophila GST has an N-terminal extension of 46 amino acid residues relative to the Anopheles ortholog. This N-terminal extension is not essential for catalytic activity and may play a role in attaching the D. melanogaster protein to indirect flight muscles [21]. The intron positions of DmGSTS1 and A gambiae GSTs1-1 are conserved indicating a common A: Neighbour-joining tree illustrating the relationship between A. gambiae and D. melanogaster epsilon class GSTs Figure 6 A: Neighbour-joining tree illustrating the relationship between A. gambiae and D. melanogaster epsilon class GSTs. ClustalW was used to align the putative amino acid sequences of the 8 epsilon class GST subunits in A. gambiae with the 14 putative epsilon class GSTs from D. melanogaster and with GSTS1 from D. melanogaster (as an outgroup). The tree was constructed by the neighbour-joining method from a similarity matrix of pairwise comparisons made using the Jukes-Cantor algorithm. Nodes with distance bootstrap values (500 replicates) of >70% are marked by *. B, Epsilon class GST arrangement in A. gambiae and D. melanogaster. Arrows mark directions of transcription. The solid bars denote the genes and the vertical lines within these mark the approximate introns position. Neighbour-joining tree illustrating the relationship between A. gambiae and D. melanogaster delta class GSTs Figure 7 Neighbour-joining tree illustrating the relationship between A. gambiae and D. melanogaster delta class GSTs. ClustalW was used to align the putative amino acid sequences of the 18 delta GST subunits in A. gambiae with the 11 putative delta class GSTs from D. melanogaster and with GSTS1 from D. melanogaster (as an outgroup). The tree was constructed by the neighbour-joining method from a similarity matrix of pairwise comparisons made using the Jukes-Cantor algorithm. Nodes with distance bootstrap values (500 replicates) of >70% are marked by *. B, Delta class GST arrangement in A. gambiae and D. melanogaster. Arrows mark directions of transcription. The solid rectangles denote the exons of the genes. Agcp2459 and agcp2052 are two putative genes, identified by the automatic analysis of the A. gambiae genome, that interrupt the delta GST cluster on chromosome 2R, division 18B.
Anopheles gambiae Drosophila melanogaster B ancestor for these genes. Local duplication of the carboxyl exons of the A. gambiae GSTs1 gene presumably occurred subsequent to the speciation event separating the two Diptera.
Alternative splicing as a means of increasing the level of heterogeneity of GST subunits appears to be a rare phenomenon outside of the genus Anopheles, having, as far as we are aware, only been reported in a GST from the nematode Onchocerca volvulus [13]. Two human cDNA clones encoding alternative transcripts of a mu class GST have been detected but both are incomplete and thus unlikely to encode functional GSTs [32].

Expression profiling of epsilon GSTs
Our principal interest in insect GSTs stems from the role that these enzymes play in insecticide resistance. GSTs have been implicated in the conjugation, metabolism or remediation of harmful effects of all major classes of insecticides, but our studies have focused on the role of these enzymes in the detoxification of DDT. From genetic mapping studies [25] we know that a major DDT resistance locus co-localises with the epsilon GST cluster on chromosome 3R, division 33B. We have previously demonstrated that expression of one member of this GST class, GSTe2, is up-regulated in a DDT resistant strain [10] and that a recombinant homodimer of GSTe2 is very efficient at metabolising DDT [28]. In the present study we used quantitative PCR to compare the expression levels of all members of the epsilon class in DDT resistant and suscep-

B. D. melanogaster, GSTs1
DmGSTS1 tible strains. The results are shown in Figure 9A. Five of the eight GSTs are expressed at a significantly higher levels in the resistant strain. The difference in expression levels is greatest for GSTe2 and GSTe7 (Table 2). Of the five GSTs that are expressed at higher levels in the DDT resistant strain, only GSTe2 has confirmed DDT dehydrochlorinase activity. The ability of recombinant homodimers of GSTe3 and GSTe7 to metabolise DDT has not yet been determined. However, we have previously shown that neither GSTe1 or GSTe4 homodimers possess DDT dehydrochlorinase activity [28] and yet the expression of both of these genes is upregulated in the DDT resistant strain ( Table 2). This result is most readily explained if it is assumed that the expression of multiple genes in the epsilon cluster is under the control of a common regulatory element and that an alteration in this element accounts for the increase in expression of the six epsilon GST genes. Nonetheless other explanations, such as modifications in the stability of the mRNA transcripts in the resistant strain can not be eliminated at this stage.
There was a large variation in the number of transcripts detected for each of the individual GSTs within both the susceptible and resistant strains. For example, the normalised copy number of GSTe2 transcripts was over 26-fold greater than that of GSTe3 in the resistant strain (Table 2). This indicates that the basal expression of individual GSTs within a cluster is independently regulated. To confirm this result, we repeated the qPCR for four of the GSTs using gDNA as a template. If the differences seen with cDNA reflect genuine differences in GST transcript copy number as we hypothesised, then the genomic copy number for each of the four GSTs would be approximately the same. Figure 9 and Table 2 support our hypothesis. Furthermore, as the genomic qPCR was carried out on DNA extracted from both susceptible and resistant mosquitoes, the results shown in Figure 9B support our unpublished data from Southern blots demonstrating that the increases in GST transcript levels seen in the resistant strain is not due to gene amplification (F. Ortelli, unpublished data).
As a final test of the reliability of our qPCR results we calculated the expected copy number of a single copy gene in 10.4 ng of A. gambiae genomic DNA (the starting amount of gDNA template in the qPCR reactions with Kisumu DNA). The theoretical copy number (33,120) was within an order of magnitude of the range of values obtained (11,205).

Conclusion
The insect GST supergene family encodes a diverse set of proteins. The availability of the full genome sequence for two insect species has enabled the full extent of this protein family in insects to be realised. Multiple members of the epsilon class are upregulated in a DDT resistant strain of A. gambiae and it is proposed that this class plays a major role in the detoxification of xenobiotics. However, little is known about the endogenous substrates of insect GSTs. Functional genomics approaches will no doubt contribute to our understanding of the role of individual GSTs in insects and perhaps then the reason for the extensive diversity of this enzyme family will become clear.

Mosquito Strains
The

Annotation of A. gambiae GST genes
Members of the GST supergene family were identified in the A. gambiae genome by BLAST searches [34] using multiple representative sequences from each GST class as query sequences. The sequences retrieved from the genome were manually annotated to predict transcription initiation and termination sites and intron/exon boundaries using BlastX comparisons of putative amino acid translations [35]. Primers pairs were designed to amplify the full length of the coding sequence of each gene from A. gambiae cDNA.

cDNA synthesis
Total RNA was extracted from individual mosquitoes using the TRI reagent (SIGMA), according to the manufacturer's instructions. The RNA was treated with DNase to remove any contaminating genomic DNA and the mRNA was reverse transcribed into cDNA using superscript II (GIBCO BRL) and an oligo (dT) adapter primer (5'-GACTCGAGTCGACATCGA(dT) 17 -3'). The PCR conditions for amplifying GST cDNAs were determined empirically for each GST. Products of the expected size were subcloned into pGEM T-easy vectors (Promega) and used as templates for sequencing. At least three independent clones were sequenced for each GST. Sequencing reactions were performed using Beckman chemistry and the resultant products analysed on a Beckman CEQ800 capillary sequencer.

Quantitative PCR
Total RNA was extracted from pools of ten (five male and five female) one-day old adult mosquitoes and the mRNA was reverse transcribed into cDNA as described above. Plasmids containing the gene of interest were diluted to produce seven standard templates at concentrations Expression levels of the epsilon class GSTs in two strains of A. gambiae Figure 9 Expression levels of the epsilon class GSTs in two strains of A. gambiae. A Transcript copy number determined by qPCR using cDNA as a template. Results represent the average of three independent cDNA samples. B gene copy number, determined by using three independent aliquots of mass homogenates of gDNA as a template. In both cases, copy numbers were normalised against the copy number of the S7 genes and error bars represent standard deviations. Statistically significant differences (*p < 0.05, **p < 0.01) between ZAN/U and Kisumu are indicated. Kisumu is an insecticide susceptible strain, ZAN/U is resistant to DDT.
A B ranging from 1 ng/µl to 1 fg/µl. The incorporation of the fluorescent dye SYBR Green during PCR amplification of these templates was detected using Roche LightCycler technology and Quantitect SYBR Green chemistry (Qiagen). For each experiment two replicates of each of the seven plasmid templates and two replicates of three cDNA samples from both strains, all synthesised from independent RNA extractions, were used. The binding sites of the primers used for template amplification were chosen to avoid regions of allelic variation within the gene sequences [28]. A control plasmid containing a partial fragment of the S7 ribosomal protein gene [36] was used to standardise the initial cDNA concentration in each sample. Details of the primer sequences and PCR conditions are given in Table 3. The samples were analysed using LightCycler Software V3 (Roche) and the number of GST copies in the cDNA sample was determined as described previously [37].
The experiment was repeated on genomic DNA samples extracted from approx 1 g of Kisumu or ZAN/U strains as described previously [38]. For the GST genes for which the initial primers spanned introns, new plasmids were constructed containing the genomic fragment of the gene.

Phylogenetic analysis
Putative amino acid sequences of the GSTs were aligned using ClustalW [27]. The alignment was manually truncated as described in the Results and Discussion section. Evolutionary distances were calculated using the Jukes-Cantor algorithm [39] and phylogenetic trees were determined by the neighbor-joining method [40] with Table 2: Quantitative PCR results of A. gambiae epsilon GST genes. The transcript copy number or gene copy number was determined using cDNA or gDNA respectively. The copy numbers were normalised for variations in initial template concentration by dividing each sample by the copy number of the ribosomal protein gene, S7. The final column shows the ratio of the transcript or gene copy number between the resistant ZAN/U and insecticide susceptible Kisumu strain. Statistically significant differences (*p < 0.05, **p < 0.01) between ZAN/U and Kisumu are indicated.

Authors Contributions
YD and LR carried out the expression analysis. FO participated in the sequencing and performed the majority of the data analysis. JH conceived the study and participated in its design. HR contributed to the sequencing and analysis of data, drafted the manuscript and coordinated the project. All authors read and approved the final manuscript.