Analyses of murine GBP homology clusters based on in silico, in vitro and in vivo studies

The interactions between pathogens and hosts lead to a massive upregulation of antimicrobial host effector molecules. Among these, the 65 kDa guanylate binding proteins (GBPs) are interesting candidates as intricate components of the host effector molecule repertoire. Members of the GBP family are highly conserved in vertebrates. Previous reports indicate an antiviral activity of human GBP1 (hGBP1) and murine GBP2 (mGBP2). We recently demonstrated that distinct murine GBP (mGBP) family members are highly upregulated upon Toxoplasma gondii infection and localize around the intracellular protozoa T. gondii. Moreover, we characterised five new mGBP family members within the murine 65 kDa GBP family. Here, we identified a new mGBP locus named mGbp11. Based on bacterial artificial chromosome (BAC), expressed sequence tag (EST), and RT-PCR analyses this study provides a detailed insight into the genomic localization and organization of the mGBPs. These analyses revealed a 166-kb spanning region on chromosome 3 harboring five transcribed mGBPs (mGbp1, mGbp2, mGbp3, mGbp5, and mGbp7) and one pseudogene (pseudomGbp1), as well as a 332-kb spanning region on chromosome 5 consisting of six transcribed mGBPs (mGbp4, mGbp6, mGbp8, mGbp9, mGbp10, and mGbp11), and one pseudogene (pseudomgbp2). Besides the strikingly high homology of 65% to 98% within the coding sequences, the mGBPs on chromosome 5 cluster also exhibit a highly homologous exon-intron structure whereas the mGBP on chromosome 3 reveals a more divergent exon-intron structure. This study details the comprehensive genomic organization of mGBPs and suggests that a continuously changing microbial environment has exerted evolutionary pressure on this gene family leading to multiple gene amplifications. A list of links for this article can be found in the Availability and requirements section.

Human and murine GBPs possess the unique ability to bind to agarose-immobilized GMP (guanosine monophosphate), GDP (guanosine diphosphate), and GTP (guanosine triphosphate) with the same affinity, thereby differing from heterotrimeric or Ras-like GTP-binding proteins [2]. In addition, they hydrolyse GTP not only to GDP but also to GMP [9]. Further biochemical properties of the GBPs are the low binding affinity to nucleotides, their stability in the absence of guanine nucleotides and their high turnover GTPase activity [10]. Remarkably, the sequence of the common G4-motif N/TKxD is modified in the GBPs to the unique T(L/V)RD motif [10]. In the case of hGBP1 a nucleotide-dependent oligomerization and concentration-dependent GTPase activity has been observed [11]. These biochemical properties classified the GBPs as distantly related family members of the dynamin superfamily despite the lack of any sequence homology of the primary sequences [11,12]. The similarity to the dynamin family is further corroborated by the analysis of the crystal structure of hGBP1. It has an amino-terminal globular domain containing the GTP binding region and an elongated carboxy-terminal series of α-helices. The GBPs possess the common 'dynamin domain structure' with a GTPase domain (~300 residues), a 'middle' or 'assembly' domain (150-200 residues) and a GTPase effector domain (~100 residues) [11].
Although, the GBPs have been discovered almost 30 years ago, only little is known about their biological function. It has been suggested that the GBPs are important for cell growth regulation as demonstrated for mGBP2 and hGBP1 [7,13]. Both, hGBP1 and mGBP2 also alter matrix metalloproteinase (MMP) gene expression and thereby change cellular interactions with the extracellular environment [14]. In addition, hGBP1 was reported to be involved with paclitaxel resistance in ovarian cancer cell lines [15]. Further studies revealed that hGBP1 and mGBP2 exhibit a moderate antiviral activity against vesicular stomatitis virus (VSV) and encephalomyocarditis virus (EMCV) [16,17]. Recently, we have demonstrated that mGBPs are highly upregulated in mice after infection with Listeria monocytogenes or Toxoplasma gondii, and localize around the parasitophorous vacuole of T. gondii, thus suggesting that the mGBPs play a role in the defense against intracellular bacteria and apicomplexa [5].
Besides humans and mice the GBPs have been found in rats [18], chicken [19], fish [20], and several other vertebrate species. In humans seven orthologs and at least one pseudogene have been identified [21,22]. In mice five GBPs have been described [2,23,24]. Recently, one additional mGBP was discovered by an in silico study [21]. In search of new IFNγ regulated genes using Affymetrix analyses we independently identified mGbp6, mGbp7, and mGbp8 [5]. Further comprehensive genome and sequence analyses yielded two more homologous genes, named mGbp9 and mGbp10 [5]. The subsequent investigation of mGBP gene loci revealed an incorrect assembly concerning the mGbp8 locus in the genome databases (Ensembl, NCBI). Thus, to clarify the genomic organization of mGBP coding genes, we used BAC and EST sequences obtained from NCBI. Based on these sequence analyses we were able to identify another mGBP locus named mGbp11. In this study, we address the genomic organization and localization of each mGBP family member and present a revised assembly of the murine GBP homology clusters on chromosomes 3 and 5.

The mGBP genes are arranged in two clusters located on chromosomes 3 and 5
Although the first guanylate binding protein had been described almost 30 years ago further family members have been discovered lately. A recent in silico study has proposed a number of six mGBPs and three pseudogenes [21]. We recently raised the number up to ten mGBP members [5]. In this study, we performed extensive homology searches on the whole genome to identify further mGBP loci. Single exons of known mGBPs were used for homology searches against murine genome databases using the BlastN algorithm [25]. Thereby, another mGBP family member was discovered, now designated mGBP11 (Acc. No. EU304258). The mGbp11 locus was confirmed by BAC analyses. Hence, we extend the family to eleven mGBPs and two pseudogenes which all cluster on two chromosomes. One gene cluster is located on chromosome 3 within the H3 region and the second gene cluster is found within the E5 region on chromosome 5 (Fig. 1A). For further analysis of these two gene clusters we used BAC sequences obtained from NCBI. On chromosome 3 the BACs RP23-100J23 and RP24-314I8 cover the respective chromosomal regions between 142.44 MB and 142.60 MB harboring mGbp1, mGbp2, mGbp3, mGbp5, mGbp7, and a pseudogene named pseudomGbp1 (Fig. 1B). This gene cluster spans approximately 166 kb. The length of each gene locus ranges from 3.7 kb (pseudomGbp1) to 34.9 kb (mGbp1). The mGBPs located on chromosome 3 are all transcribed from the positive strand.
Further, we used the BACs RP24-63G23, RP23-329M7 and RP24-210D14 for analyses of the region between 105.25 MB and 105.58 MB on chromosome 5 (Fig. 1C). By means of these BAC sequences we were able to determine the precise loci for mGbp4, mGbp6 (formerly mpa2l), mGbp8, mGbp9, mGbp10, mGbp11, and the pseudogene pseudomGbp2. This cluster has approximately twice the size of the cluster on chromosome 3 with an extension of 332 kb. The length of each gene locus on chromosome 5 ranges from 23 kb (mGbp6) to 44 kb (pseudomGbp2). In contrast to the mGBPs located on chromosome 3 the mGBPs on chromosome 5 are all transcribed from the negative strand.

Conserved exon-intron structure in the mGBPs
Besides their clustering on two chromosomes, the mGBPs share a highly similar genomic structure (Fig. 2). All members of the 65 kDa mGBP gene family consist of eleven exons, except mGbp8 which lacks exon 6. Since exon 6 is composed of 246 basepairs no frameshift is generated.      The cloning and sequencing of mGBP8 cDNA confirmed these genomic findings. Interestingly, the translation of all mGBPs starts within exon 2.
The size of the first non-coding exon of the mGBPs on chromosome 3 ranges from 65 bp (mGBP1) to 205 bp (mGBP2) (Fig. 2A). For mGBP3 several alternative 5' UTR exons were identified. In the database 26 EST and cDNA sequences which cover the 5' UTR from mGBP3 were found: seven ESTs contain exon 1-Ia upstream of exon 2, three sequences harbor exon 1-Ia and 1-Ib in combination, 16 sequences comprise exon 1-II and only one EST was found having exon 1-III as a 5' non coding exon. For mGBP7 two different splice sites in the 5' UTR of exon 2 were observed as reported previously [21]. In every mGBP Moreover, for mGBP5 an alternative splice form, mGBP5a, which lacks exons 3, exon 4, a part of exon 5, and possesses an exon 10a, has been described [24]. Exons 6 of mGBP1, mGBP2, and mGBP5 consist of 243 bp whereas exons 6 of the other mGBPs span 246 bp.
Interestingly, the mGBPs on chromosome 5 show even higher similarities concerning the genomic organisation (Fig. 2B). The first exons of these mGBPs have nearly identical sizes (between 106 and 108 bp). Only for mGBP4 alternative 5' non-coding exons were observed (1-I, 1-II, and 1-III). EST sequences harboring either exon 1-I or exon 1-II as well as all three alternative 5' UTR exons were found in the database. Furthermore, we were able to identify an alternative splice form of mGBP4. Due to a mutation at the splice donor site in intron 2 two different transcripts, named mGBP4 and mGBP4.1, are generated [26]. Except exon 3 of mGBP4, all mGBPs on chromosome 5 share identical sizes of the individual coding exons 3 to 10. Even the intron sequences are highly conserved in these genes. In particular, intron 4, intron 7, intron 8, intron 9, and intron 10 exhibit high similarities. Comparable to all other mGBPs the pseudogene pseudomGbp2 shows a highly conserved exon-intron structure. Yet, an inversion of 'exon 1' and 'exon 2' results in a non-functional locus. Taken together, due to their high structural homologies these loci seem to be duplicated quite recently.

In vitro and in vivo analyses of the new mGBP members
After determination of the precise exon-intron structure of each mGBP we confirmed the predicted sequences by cloning and sequencing the corresponding cDNAs out of IFNγ stimulated macrophages. These studies revealed that mGBP11 has a premature stop codon within exon 8 lead-ing to an ORF sequence with only 1329 bp. The amino acid sequence and the GTP-binding motif are depicted in additional file 1.
In order to elucidate the basal expression and inducibility of the new mGBP family members RT-PCR analyses were performed. Therefore, we compared the expression levels in unstimulated and IFNγ stimulated ANA-1 macrophages (Fig. 3A). The mGBPs were only detectable upon stimulation, except mGBP9, which was already expressed in unstimulated cells. Besides IFNγ stimulation, the infection of C57BL/6 mice with L. monocytogenes led to a rapid upregulation of all tested mGBPs in the liver (Fig. 3B). Interestingly, slight mGBP11 RNA expression was detectable in the liver of uninfected mice.

Sequence alignments and homologies of the mGBPs
Based on ORF sequences of the 11 mGBPs we accomplished a cDNA alignment using the ClustalW algorithm ( Fig. 4 and additional file 2). As shown in the alignment, the mGBPs share a high degree of homology throughout their coding sequences whereas the 5' part is the most conserved region of the mGBPs. Especially, mGBP6 and mGBP10 are the most homologous mGBPs differing in only 30 basepairs within an ORF sequence of 1836 bp. The conserved 5' part codes for the four typical GTP binding motifs. Although there are some differences in the nucleotide sequence of the GTP binding motifs the amino acid sequences are almost identical (Table 1). In detail, the G1 motifs of mGBP2, mGBP3, mGBP5/5a, mGBP6, mGBP7, mGBP8, mGBP9, and mGBP11 are identical, only the G1 motifs of mGBP1 and mGBP10 differ in two amino acids, and the G1 motif of mGBP4/4.1 differ in three amino acids. The G2 and G3 motifs are the same among all mGBPs. Regarding the G4 domain the mGBPs can be divided into the two subgroups 'TVRD' (mGBP4.1, mGBP6, mGBP7, mGBP8, mGBP9, and mGBP11) and 'TLRD' (mGBP1, mGBP2, and mGBP5), except mGBP3 with a 'AVRD' and mGBP10 with a 'IVRD' motif (Table 1). Most likely the functional consequences are minor since the amino acid variations in different G4 motifs lead to conservative exchange of aliphatic (L/V) and hydrophobic (T/A/I) amino acids. Interestingly, the mGBPs from the 'TLRD' group possess a C-terminal CaaX motif, which can be modified by isoprenylation [27]. Overall, the 3' parts of the mGBPs are more divergent than the 5' parts.
Using the maximum likelihood method we generated a phylogenetic tree of the mGBP family based on the ORF sequences (Fig. 5). This phylogenetic analysis revealed three predominant homology clusters. Two homology clusters are located on chromosome 3. The first homology cluster consists of mGbp1, mGbp2, and mGbp5/5a. The second homology cluster contains mGbp3 and mGbp7. The homologies of the members of these two clusters are lower than 60% ( Table 2). The second cluster is with approximately 70% amino acid identities more closely related to the mGBPs in the third homology cluster on chromosome 5 encompassing mGbp4, mGbp6, mGbp8, mGbp9, mGbp10, and mGbp11. Within the cluster on chromosome 5 sequence identities reach 65% up to 98% ( Table 2). The short branches implicate a just recent duplication of these genes on chromosome 5.

Discussion
In order to find new IFNγ regulated host effector molecules we were able to identify and characterize five novel members of the mGBP family [5]. Further, we showed that all hitherto identified members of the mGBP family are IFNγ induced and moreover are highly upregulated in mice after infection with L. monocytogenes or T. gondii. Furthermore, we demonstrated that in infected cells most mGBPs surround the parasitophorous vacuole of T. gondii [5]. Consecutively, within this study, comprehensive homology and motif searches against public databases (NCBI, Ensembl) using EST and BAC sequences resolved Transcriptional analyses of mGBP6, mGBP7, mGBP8, mGBP9, mGBP10, and mGBP11 Figure 3 Transcriptional analyses of mGBP6, mGBP7, mGBP8, mGBP9, mGBP10, and mGBP11. Amplification of mGBP6, mGBP7, mGBP8, mGBP9, mGBP10, and mGBP11 using specific primers listed in additional file 3 and cDNAs derived from RNA of unstimulated and IFNγ stimulated ANA-1 macrophages (A), and RNA from liver of uninfected and L. monocytogenes infected C57BL/6 mice (B). GAPDH primers were used as an internal control.
the genomic organization and localization of the mGBPs in more detail. During these analyses we were able to identify the additional mGBP member mGBP11. The scope of this study was to determine the precise loci of the 11 mGBPs and of the two pseudogenes, to compare the structure and organization of the mGBPs, to compile all cDNA sequences, and to verify the expression of mGBP mRNAs.

Genomic organization of the mGBPs
The combined analyses revealed two mGBP homology clusters on chromosomes 3 and 5. One mGBP cluster is located within the H3 region on chromosome 3 which is in contrast to previously published data where the cluster was mapped to the H1 region [21]. The second cluster is located in the E5 region on chromosome 5 which is in accordance to Olszewski et al. [21]. Furthermore, Olsze-wski et al. noted that within the mGBP cluster on chromosome 5 the only functional mGBP gene is mGbp4 and that in addition three pseudogenes (pseudomGbp2, pseudomGbp3, and pseudomGbp4) are located on this chromosome. Our in silico and mRNA sequence analyses now clearly demonstrate that besides mGbp4 five expressed mGBPs (mGbp6, mGbp8, mGbp9, mGbp10, and mGbp11) are located on chromosome 5 (this study and [5]). In a previous report, we have shown that the mGbp4 locus does not encode for a complete mGBP4 protein [26]. For pseudomGbp2 we could not find any corresponding EST sequence and were not able to clone a cDNA corresponding to pseudomGBP2, thus excluding the possibility of an alternative start downstream of exon 2. Based on BAC analyses we could demonstrate that the two pseudogenes pseudomGbp3 and pseudomGbp4 described by Olszewski  which was virtually disrupted by the Abcg3 gene locus in a former incorrect assembly within the public databases (see also [21]). Further transcript analyses revealed a functional mGbp8 locus and showed an IFNγ dependent upregulation of the mGBP8 mRNA comparable to mGBP6, mGBP9, mGBP10, and mGBP11. Similarly, all these mGBPs were highly induced upon L. monocytogenes infection in C57BL/6 mice (this study and [5]). The BAC sequences as well as our sequenced cDNA clones of the newly identified mGBP11 contain a premature stop codon within exon 8, leading to an ORF sequence of only 1329 bp. However, in the database (NCBI) one cDNA of mGBP11 without a premature stop was found (Acc. No. BC111039). It might be possible that the presence of different mGBP11 cDNAs is due to allelic variation. Further studies have to clarify, whether from this locus a functional protein can be translated and whether other mouse backgrounds differ in exon 8.
In a recently published report, some subtle differences to our extensive genomic analyses have been described [28]. Firstly, in this report no mGBP7 gene is presented. Secondly, mGBP6 in [28] is termed mGBP7 based on our NCBI database submission in 2006 (BK005760, [5]). Thirdly, mGBP12 has been deposited in 2007 as mGBP11 by us (EU304258, this study). To keep consistency between database and nomenclature we propose to refer to the mGBP assignment in Figure 1. This is also in accordance with the extensive protein sequence and functional analyses which were provided previously [5]. mGBP13 has been described as a pseudogene by Olszewski and here (see Fig. 1 pseudoGbp1). Unfortunately, no description of the methods used for the identification of the mGBP13 locus is given in [28]. However, further studies are required to confirm whether this is a functional mGBP locus.
Besides the chromosomal localization of the mGBPs we also elucidated the exon-intron structure of these genes. Interestingly, all mGBPs consist of eleven exons with an impressively similar gene organization, with only one exception in mGbp8 which lacks exon 6. In addition, the translation of all mGBPs starts in exon 2. For mGbp3 and mGbp4 alternative non-coding exons were found in the 5' regions. The usage of different 5' exons may influence the stability of the mRNAs [29]. Indeed, the frequencies of ESTs of mGBP3 and mGBP4 with the different alternative 5' exons are quite variable, so their functional significance has to be validated. It has been reported that mRNAs of genes with alternative 5' coding or non-coding exons are Phylogenetic tree of the mGBP cDNAs Figure 5 Phylogenetic tree of the mGBP cDNAs. The tree was created based on ORF sequences using the neighbor-joining method of the treepuzzle software. Branch lengths are measured relative to the estimated numbers of substitutions. Therefore, mGBP5 and mGBP5a appear less similar as they are actually on their amino acid level.  ------often expressed in a tissue-specific manner [30]. Further studies will be necessary to verify a potential tissue-specific expression/regulation of the different mRNA isoforms of mGBP3 and mGBP4.

Evolution of the mGBPs
Recent data indicate that the GBPs are host effector molecules involved in pathogen defense [5,16,17]. Defense against pathogens requires permanent adaptation to the changes in pathogen virulence strategies [31,32]. We suggest that evolutionary pressure led to gene duplication events which have resulted in the current mGBP clusters on chromosomes 3 and 5. It is most likely, that these gene duplications started with one primordial mGBP. We suppose that this ancestor mGBP is located on chromosome 3 because these mGBPs are more divergent among each other as compared to the mGBPs on chromosome 5. Interestingly, on chromosome 3 two homology clusters have evolved. One homology cluster with mGbp1, mGbp2, and mGbp5 is characterized by a C-terminal CaaX motif for isoprenylation and a 'TLRD' G4 motif. In contrast, the second homology cluster with mGbp3 and mGbp7 lacks the CaaX motif and possess a 'TVRD' G4 motif. This finding leads to the hypothesis, that mGbp3 or mGbp7 is the ancestor for all mGBPs on chromosome 5 which also lack a CaaX motif and have a "TVRD" G4 motif. This is further corroborated by the high cDNA sequence identities (around 70%) of mGBP3 and mGBP7 with mGBP4, mGBP6, mGBP8, mGBP9, mGBP10, and mGBP11 on chromosome 5. On this chromosome also the most recent duplication event occurred, where mGbp6 emanated from mGbp10 or vice versa. This is supported by the high homology of 98.4% between these two GTPases. Moreover, we suggest that the mGBP cluster on chromosome 5 is a "genomic hot spot" permanently exposed to genetic recombination events. Consistent with this suggestion, we detected a transposon-like element (>1000 bp) which is integrated several times in this mGBP cluster (data not shown). Further studies have to clarify whether these genes evolved due to evolutionary pressure and whether these genes have redundant or non-redundant functions during pathogen defense.
We have shown that mGBPs are highly induced upon IFNγ stimulation and infection with intracellular bacteria or protozoa indicating an important role as effector molecules in host defense. Now, we describe and characterize the genomic loci of the mGBPs on chromosomes 3 and 5. These data will be very important for the analyses of evolutionary gene amplifications required in host defense as well as for functional studies by the generation of gene targeted mice which are under way.  [25] from the NCBI website and mapped onto the respective genomic area.

Identification of the exon-intron structure
The transcript sequences from mGBP4 (mpa-2) and mGBP6 (mpa-2l) were downloaded from the Ensembl website and imported into EditSeq. We then determined the exon-intron structure by using BLAST "align two sequences" using standard parameters on the NCBI website. The sequences of single exons from mGBP4 and mGBP6 were aligned with the corresponding BAC sequences. If the exons mapped to multiple regions on the BAC with high homology the regions were retained and imported into EditSeq. The equivalent sequences from the resulting BLAST hits on the BACs were subsequently aligned with MeqAlign (DNAStar, Madison, WI) using standard parameters. The 3' and 5' splice sites were identified manually by inspecting the alignment and confirmed by comparing the genomic sequences to the corresponding exon sequences. Once all exons were mapped on the genome the exons indicating a new gene locus were assembled and potential cDNAs were created. Finally, we determined the open reading frames by using the ORF search tool from EditSeq.

Alignment and phylogenetic tree
The alignment of mGBP cDNAs was created with the software ClustalW [33] and the subsequent layout was done with JalView http://www.jalview.org. The phylogenetic analysis was accomplished using the maximum likelihood method and treepuzzle http://www.tree-puzzle.de for construction of the phylogenetic tree. The treepuzzle software was run with the option for exact parameter estimates using the neighbor joining method. Finally, we used the software drawtree from the phylip package http:/ /www.phylip.com to plot the tree data. All software was run on a Linux PC workstation.
For stimulation of ANA-1 cells we used 100 U/ml recombinant mouse IFNγ (R&D Systems, Mainz, Germany). After 16 h of IFNγ stimulation the cells were harvested for RNA preparation.

Infection with Listeria monocytogenes
C57BL/6N mice were purchased from Charles River (Sulzfeld, Germany) and maintained in the animal facility of the Medical Faculty of the Heinrich-Heine-University under SPF conditions. All procedures performed on animals in this study have been approved by the Animal Care and Use Committee of the local government of Duesseldorf and have been in accordance with the German animal laws. C57BL/6N mice were intraperitoneally infected with 0.1 × LD 50 L. monocytogenes (American type culture collection strain 43251), and organs were removed 48 h after infection.

Amplification and cloning of mGBPs
Total RNA from cells and tissues was isolated using Trizol Reagent (Invitrogen) according to the manufacture's instructions. First-strand cDNA synthesis was performed using 1 μg of total RNA with M-MLV reverse transcriptase and oligo dT primer (Invitrogen). The subsequent PCR reactions were accomplished using specific forward and reverse primers (additional file 3), and sequencing for both DNA strands was done by GATC Biotech AG (Konstanz, Germany).