An Sp185/333 gene cluster from the purple sea urchin and putative microsatellite-mediated gene diversification
- Chase A Miller†1, 3,
- Katherine M Buckley†2, 4,
- Rebecca L Easley2, 5 and
- L Courtney Smith2Email author
© Miller et al; licensee BioMed Central Ltd. 2010
Received: 7 June 2010
Accepted: 18 October 2010
Published: 18 October 2010
The immune system of the purple sea urchin, Strongylocentrotus purpuratus, is complex and sophisticated. An important component of sea urchin immunity is the Sp185/333 gene family, which is significantly upregulated in immunologically challenged animals. The Sp185/333 genes are less than 2 kb with two exons and are members of a large diverse family composed of greater than 40 genes. The S. purpuratus genome assembly, however, contains only six Sp185/333 genes. This underrepresentation could be due to the difficulties that large gene families present in shotgun assembly, where multiple similar genes can be collapsed into a single consensus gene.
To understand the genomic organization of the Sp185/333 gene family, a BAC insert containing Sp185/333 genes was assembled, with careful attention to avoiding artifacts resulting from collapse or artificial duplication/expansion of very similar genes. Twelve candidate BAC assemblies were generated with varying parameters and the optimal assembly was identified by PCR, restriction digests, and subclone sequencing. The validated assembly contained six Sp185/333 genes that were clustered in a 34 kb region at one end of the BAC with five of the six genes tightly clustered within 20 kb. The Sp185/333 genes in this cluster were no more similar to each other than to previously sequenced Sp185/333 genes isolated from three different animals. This was unexpected given their proximity and putative effects of gene homogenization in closely linked, similar genes. All six genes displayed significant similarity including both 5' and 3' flanking regions, which were bounded by microsatellites. Three of the Sp185/333 genes and their flanking regions were tandemly duplicated such that each repeated segment consisted of a gene plus 0.7 kb 5' and 2.4 kb 3' of the gene (4.5 kb total). Both edges of the segmental duplications were bounded by different microsatellites.
The high sequence similarity of the Sp185/333 genes and flanking regions, suggests that the microsatellites may promote genomic instability and are involved with gene duplication and/or gene conversion and the extraordinary sequence diversity of this family.
Invertebrate immune systems are marked by an array of complex and sophisticated mechanisms for recognizing and responding to microbes [1–4]. A few systems that highlight this complexity are reshaping the paradigm that invertebrate immune systems were thought to be simple. The genes that encode fibrinogen-related proteins (FRePs) in the freshwater snail Biomphalaria glabrata diversify through somatic diversification and point mutation of a small gene set . Arthropod DSCAM genes employ extensive alternative splicing to generate thousands of unique mRNAs [6–8] that encode proteins involved in phagocytosis by hemocytes  and may bind specifically to the infecting pathogen . In higher plants, a variety of classes of R genes exhibit disease resistance capabilities, and create and maintain diversity by sequence exchange and recombination (reviewed in ). Furthermore, a number of gene families function in immunity in which the mechanisms of diversification have not been investigated, such as the variable region-containing chitin-binding proteins (VCBPs) in protochordates [12–14].
Shotgun sequence assembly is the standard method for quick and efficient assembly of BACs and whole genomes but there are problems in correctly assembling regions with repetitive elements. The most common type of gap in 'finished' genomes are unresolved heterochromatin regions, which are mainly composed of repetitive elements [24, 25]. Much effort has gone into improving the assembly of these types of regions and some progress has been made with assembling transposons using specific transposon-based approaches . However, these methods fail when applied to the assembly of other repetitive elements. A detailed study of mis-assembled segmental duplications in the 'finished' human genome shows that shotgun strategies consistently mis-assemble segmental duplications that are at least 15 kb and share at least 97% identity . Although shotgun assembly is extremely flexible and powerful, it can be modified to improve results especially when a specifically defined goal is included in the approach [25, 28, 29]. The significant underrepresentation of Sp185/333 genes in the sea urchin genome compared to our estimates of the gene family size may stem from two possible sources. First, the numbers of trace sequences with Sp185/333 sequence that were used to assemble the genome are fewer than expected, and may result from gene deletions from BAC inserts during growth of the cultures. This possibility will be tested in the future. Second, the genes may be incorrectly assembled in the genome because repetitive sequences are commonly mis-assembled and are often collapsed onto a single genomic location [24, 25, 27]. This second possibility is addressed below.
We report here the first follow-up to the problem of assembling the Sp185/333 genes, and show how the shortcomings of shotgun assembly for these genes could be overcome by focusing on a single BAC insert, an easier task for a repeat-riddled region. We generated multiple candidate BAC assemblies with varying parameters to account for potential gene collapse or artificial duplication/expansion, and experimentally validated the assemblies to identify the optimal sequence. We present a unique perspective on sequence assembly and validation, particularly the need to adjust the assembly parameters locally, rather than using global parameters for the entire genome. This is the first report of a small cluster of six Sp185/333 genes in a 34 kb region located at one end of a 117 kb BAC insert. The gene structure is consistent with that of previously characterized Sp185/333 genes; the coding region is contained within two exons, the second of which includes the mosaic pattern of elements . All six genes are flanked on both sides by GA microsatellites and four of the genes have a GAT microsatellite in the 5' flank. There is no correlation between linkage and sequence similarity, as the six genes on the BAC are no more similar to each other than to 121 unique genes that have been cloned and sequenced from three different animals . The flanking regions of the genes that extend to the microsatellites exhibit significant sequence similarity. Three of the Sp185/333 genes are tandemly duplicated including their flanking regions and each repeated segment is delineated by microsatellites. The assembly of this region had to be validated by cloning and sequencing. The very high sequence similarity of the Sp185/333 genes, the flanking regions, and the positions of the flanking microsatellites may promote genomic instability and increase the rate of gene duplication of this family and/or perhaps block homogenization resulting from gene conversion, thereby contributing to its extraordinary diversity.
BAC library screening
Two arrayed BAC libraries (Sp BAC genomic and Sp small BAC; http://www.spbase.org/SpBase/resources/index.php) were screened for clones with Sp185/333 sequences . The libraries differed in average insert sizes (Sp BAC genomic library inserts were ~140 kb, 25× genome coverage; Sp small BAC library inserts were ~50 - 80 kb, 6.25× genome coverage) . The libraries were screened with riboprobes synthesized from combinations of templates chosen from three Sp185/333 gene clones that included all known elements (10-010 [GenBank:EF607629; element pattern G2 γ], 10-022 [GenBank:EF607640; element pattern D1 α], and 2-095 [GenBank:EF607756; element pattern E2 δ]) . The Sp small BAC library was screened as previously described for the Sp BAC genomic library . Riboprobe synthesis and filter hybridization were performed as described in . BAC clones with Sp185/333 sequence were obtained from Eric Davidson and Andrew Cameron at the California Institute of Technology.
BAC insert isolation and PFGE analysis
Bacterial cultures were grown at 37°C with chloramphenical and the BAC plasmids were isolated using the alkaline lysis protocol as described in . The insert was released from the pBACe3.6 vector with Not I (New England Biolabs) digestion and analyzed by pulsed-field gel electrophoresis (PFGE) with 1% Pulsed Field Certified Agarose (Bio-Rad Laboratories) gel in 0.5× TBE at 6 V/cm, and a ramped switch time from 1 to 15 sec over 16 hrs. Gels were stained in 0.5 μg/mL ethidium bromide, destained and imaged under UV light. The MidRange pulsed-field gel (PFG) Marker I (New England Biolabs) was used to generate the standard curve to plot the BAC insert size.
A working draft sequence of BAC clone R3-3033E12 was generated as part of the S. purpuratus genome project [GenBank: AC178508.1] . A randomly sheared subclone library was generated from BAC 178508 and end sequencing the subclones was performed at the Baylor College of Medicine (BCM) generating 1,886 traces by Sanger sequencing. Traces were deposited in the NCBI Trace Archive as a BCM center project SRHQ; TI number AC204781.3. The results reported here employ different methods (see following) than those used by the Baylor team to assemble the traces into a BAC insert sequence [GenBank: BK007096], which is hereafter called "7096".
The 7096 sequence was assembled from the traces using the Whole-Genome Shotgun Celera Assembler . Traces were converted into the format required by the Celera Assembler with the tarchive2ca tool, which is part of the A Modular Open Source tool suite http://amos.sourceforge.net/. Assemblies were generated using default parameters, with the exception of varying unitigger error rates that ranged from the default of 1.5% to 0.2% in 0.1% decrements. Hawkeye  was used to view the assemblies graphically and to assess sequencing coverage. GenePalette  was used to annotate the 7096 assembly.
Real-time quantitative PCR (qPCR) analysis of Sp185/333 genes on BACs
Primer locations in the exons or flanking untranslated regions1
5' untranslated region of all genes
In the second exon ~500 bp from 5' end of all genes
In the second exon ~1.4 kb from the 5' end of most genes
In the second exon ~900 bp from the start codon in all genes.
In the first exon ~50 bp from the start codon in all genes.
In the second exon ~800 bp from the start codon in most genes
In the second exon ~900 bp from the start codon in all genes.
In the second exon ~ 1.5 kb from the 5' end of most genes
3' end of all genes
3' end of some genes
~300 bp 5'of each D1 gene
RC of 12R, located ~2.2 kb 3' of all D1 genes
~900 bp 3' of the D1-b gene
~800 bp 3' of the A2 gene
~1 kb 3' of the B8 gene
RC of 10R, located ~2.4 kb 3' of the D1-y gene
RC of 5R, located ~2.7 kb 3' of the E2 gene
qPCR primer with 18R1
~1 kb 3' of the D1 genes
RC of 13F, located ~1.1 kb 5' of the D1-b gene
~1.2 kb 5' of the A2 gene
RC of 11F, located ~1.1 kb 5' of the B8 gene
RC of 2F, located ~1 kb 5' of all D1 genes
~700 bp 5' of the E2 gene
qPCR primer with 17F1
PCR and cloning
Primers (Table 2) were designed with Primer Premier (Premier Biosoft International, Palo Alto, CA) based on an assembly of 7096 that was generated using 0.9% unitigger error rate. Amplicons of less than 5 kb were produced in reactions with 4 - 20 ng of BAC DNA, 200 nM each primer, 200 μM each dNTP, 1 unit (U) Paq5000 Taq (Stratagene, La Jolla, CA), and 1× company-supplied buffer. Samples were amplified under the following conditions: 3 min at 95°C, followed by 25 cycles of 20 sec at 95°C, 20 sec at 51°C to 59°C and 10 sec to 2.5 min at 72°C, followed by 3 min at 72°C and a 4°C hold. For amplicons longer than 5 kb, each reaction consisted of 0.4 - 2 ng BAC DNA, 200 nM each primer, 400 μM each dNTP, 1 U Takara LA Taq (Takara Biosciences, Madison, WI) and 1× company-supplied buffer. Samples were amplified with the following conditions: 3 min at 94°C followed by 30 cycles of 30 sec at 94°C and 5 to 10 min at 51°C to 65°C, followed by 10 min at 72°C, and a 4°C hold.
Amplicons of regions surrounding the D1 genes employed PCR reactions with 10 ng of 7096 DNA, 500 nM each primer (1R and 2F; Table 2), 400 μM each dNTP, 1 U PhusionTaq (New England Biolabs, Ipswich, MA), and 1× company-supplied buffer. Samples were amplified as follows: 30 sec at 98°C, 25 cycles of 10 sec at 98°C, 20 sec at 55°C, and 2 min at 72°C, followed by 5 min at 72°C, and a 4°C hold. Amplicons were adenylated by adding 1 U of Fisher Taq (Fisher Scientific, Pittsburgh, PA) to the reaction for 10 min at 72°C to facilitate amplicon cloning into pCR4-XL-TOPO (Invitrogen, Carlsbad, CA). Plasmid DNA (pCR4-XL-TOPO with 7096 fragment inserts) was isolated using the Wizard Plus Miniprep DNA Purification System (Promega, Madison, WI).
Cycle sequencing reactions consisted of 165 ng of plasmid DNA, 1 μM of each primer, sequencing buffer (267 mM Tris base pH 9.0, 6.7 mM MgCl2), 1× dye terminator cycle sequencing (DTCS) Quickstart (Beckman Coulter, Fullerton, CA). Samples were amplified in an iCycler (Bio-Rad Laboratories) with the following conditions: 30 cycles of 20 sec at 96°C, 20 sec at 50°C and 20 sec at 60°C, followed by a hold at 4°C. DNA was precipitated and resuspended in CEQ Sample Loading Solution (Beckman Coulter). Samples were analyzed on a Beckman Coulter CEQ8000 using protocol LFR-a (Beckman Coulter) modified with a 10 second injection duration. Sequences were edited and assembled using Sequencher software (GeneCodes, Ann Arbor, MI).
Sequences were manually aligned using Bioedit . Pairwise diversity was measured by pairwise distance analysis using MEGA v.4  with pairwise deletion of gaps. Dot plots were generated using plotRep . Microsatellites, interspersed repeats, and low complexity DNA sequences were identified by Repeatmasker (http://www.repeatmasker.org). Entropy was calculated as in .
The disagreement between the number of Sp185/333 gene models in the S. purpuratus genome and our estimates of gene copy number may have resulted from a shortcoming of genome assembly methods, in which regions with similar sequences are artificially collapsed. Consequently, the gene models assembled in the genome may not be sequences of real genes, but rather, may be consensus sequences of multiple genes. Therefore, we analyzed the genomic organization of the Sp185/333 genes from the level of a finished BAC sequence. BAC sequences present a simpler computational problem for assembly because there is less sequence to assemble compared to an entire genome from a diploid, outbred animal, and because a BAC is sequence from a single haplotype. This was of particular relevance for the sea urchin, in which genomes have been shown to vary by 4-5% among individuals  and the S. purpuratus genome assembly is a mosaic of both haplotypes .
BACs with Sp185/333 sequence
Screens of the large-insert BAC library  identified 75 clones that were positive for Sp185/333 sequence. Screens of the small-insert BAC library identified 46 positive clones (see , reviewed in ). Preliminary analysis of the BACs by PCR showed that the Sp185/333 genes were positioned in all possible orientations relative to each other and that many BACs had identical patterns of amplicons . PCR, restriction digests and Southern blots of 11 BACs indicated four categories of genes based on the number of shared bands among the groups (data not shown). Two BACs were chosen for sequencing based on different patterns of Sp185/333 amplicons and the results for one BAC, 7096, are reported here.
Assembling the 7096 BAC
An initial sequence for 7096 [GenBank:AC204781] was assembled by the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) using the Phrap assembler  as part of the Atlas assembly system , with the traces from the randomly sheared subclone library . To validate the sequence assembled by Phrap, the 7096 traces were reassembled with the Celera WGS assembler . The Celera assembler was chosen based on its ability to optimize parameters for contig creation and its relative strengths for correctly assembling repeated regions .
Varying assembly parameters affects the length, number of scaffolds, and Sp185/333 genes present in different assemblies
Unitigger rate (%)
All 12 assemblies had three Sp185/333 genes that were identical among the assemblies: one with an A2 element pattern, one with a B8 element pattern and one with an E2 element pattern (Figures 1, 2). In addition, each assembly had between one and three fully assembled D1 genes, plus most assemblies showed a fragmented or poorly assembled D1 gene (Table 3). The sequences of the D1 genes varied among the assemblies (shown as yellow and green in Figure 2; see below). In each assembly, the gaps between the small scaffolds were flanked by the D1 genes, which indicated that these genes were the source of the conflicts. Varying the unitigger rates altered the number and placement of the D1 genes, indicating that further analysis was necessary to obtain the accurate sequence of the Sp185/333 gene cluster. For clarity, the D1 genes and fragments were given extended names according to their 5' to 3' order within assembly 9: D1 yellow (D1-y), D1 green (D1-g), and D1 blue (D1-b) (Figure 2).
Experimental validation of the assembled 7096 sequence
A two-fold approach was undertaken to validate the assemblies experimentally. First, PFGE and PCR were used to determine the size of the BAC insert and to confirm the existence and size of the three Sp185/333 genes present in all assemblies (Figures 1, 2). Second, the region harboring multiple D1 genes was analyzed more closely using PCR, cloning, sequencing, and restriction enzyme analysis. These results were used to reject incorrect assemblies, including the assembly generated by BCM-HGSC, and ultimately to define the correct 7096 sequence thereby enabling analysis of the Sp185/333 gene cluster. The 7096 insert size was estimated to be 117.6 kb by PFGE (data not shown), eliminating assemblies 2, 4, and 6 from further consideration as they were too short (Table 3). qPCR estimation of the Sp185/333 gene copy number indicated that there were 5.8 to 6.1 Sp185/333 genes present (data not shown), which was in agreement with all of the assemblies, if whole genes plus fragments were considered. The remaining nine assemblies were evaluated in more detail.
Analysis of the assembled 7096 sequence
Sp185/333 genes on 7096
The 7096 assembly contained six Sp185/333 genes with the following element patterns: one A2 γ, one B8 β, one E2 δ, and three D1 α genes (Figure 1; Greek letters represent the intron class based on sequence variations; see ). The genes varied in size from 1286 to 1881 nt and were of identical structure to that reported previously: two exons and one intron [15, 19]. The genes were located within a 34 kb region at the 3' end of the assembled insert (Figure 4) with the A2 gene separated from the rest by 14 kb. The remaining five genes were clustered within 20 kb, with intergenic regions of 3.2 ± 0.2 kb. The three D1 genes and the B8 gene were adjacent to one another in the middle of the cluster and were all oriented in the same direction, whereas the genes at the edges of the cluster, A2 and E2, were oriented in the opposite direction (Figure 4).
The assembled BAC sequence surrounding the Sp185/333 genes was investigated for the basic signatures of transcriptional control, including the TATA box, and polyadenylation signal. In five of the six Sp185/333 genes a TATAAA sequence was located 106 nt 5' of the start codon, however, there was a TATACA sequence in same position for the D1-g gene. A polyadenylation signal (AATAAA) was identified 175 to 267 nt 3' of the stop codon in four of the six genes. The D1-b and D1-g genes had a SNP that altered their polyadenylation sequences to ATTAAA and AATATA, respectively. Both the TATAA box and the polyadenylation site for the D1-g gene were non-canonical sequences, however the effect of these sequence variations on expression is unknown.
Sp185/333 sequence diversity
Pairwise diversity of the Sp185/333 genes1
Conserved flanking regions
The pairwise diversity scores for the distal regions outside of the flanking GA microsatellites (regions 1 and 4, Figure 9A) defined three categories of gene diversity: high, hybrid, and low. The high diversity category for the distal regions included the pairwise diversity scores between either A2 or E2 and each of the other genes. Results showed a sharp increase in the sequence diversity between the proximal and distal flanking regions with respect to the GA microsatellites. This indicated that the proximal flanking sequences were generally more similar to each other than the distal flanking regions were to each other. The hybrid diversity category included pairwise comparisons between B8 and each of the D1 genes with respect to the two distal regions (Figure 9A). There was low diversity in region 1 (average diversity = 0.051) and high diversity in region 4 (average diversity = 0.548). Regions 1 and 2 for the B8 gene were conserved with respect to all of the D1 genes because that side of B8 was adjacent to the GAT microsatellite and part of the intergenic region oriented towards the D1-y gene (see Figure 4). On the other hand, regions 3 and 4 of the B8 gene had divergent sequence with respect to the corresponding D1 gene regions, and were part of the intergenic region oriented towards the A2 gene. The B8 gene therefore represented an interesting hybrid of conserved and divergent flanking regions. The low diversity category included pairwise comparisons among the three D1 genes, which had low scores in all regions (Figure 9A).
The patterns of sequence diversity among the genes and the flanking regions were analyzed more closely by calculating the average diversity (using the entropy equation) over a sliding 30 bp window (Figure 9B). The diversity of all six sequences indicated that the genes, as well as the proximal flanking regions (2 and 3) were relatively conserved, and that the sequences diverged sharply distal to the GA microsatellites (regions 1 and 4). When only the D1 genes were analyzed, they showed much greater identity in all regions compared to the result that included all of the genes (Figure 9B). The D1 genes were almost identical, with slightly less identity in the proximal flanking regions (2 and 3) and somewhat less identity in the distal flanking regions (1 and 4). In all cases, the microsatellites marked the boundaries between the more conserved and less conserved flanking sequence.
Microsatellites and sequence diversification
Microsatellites are common in the genomes of most organisms, although their importance in function and evolution has been debated for years [45–47]. Microsatellites have been associated with regions of increased recombination in a number of organisms, including yeast  and, to a lesser extent, mammals [49–51]. Microsatellites have also been associated with increased genomic diversity by promoting sequence duplications, gene conversion, crossovers, and generating local recombination hotspots [45, 48–50, 52, 53]. A novel segmental duplication mechanism has been reported wherein duplications are generated by template switching between microsatellites  and appear to stimulate recombination in plasmids [55–58]. The sequence diversity observed for the Sp185/333 genes may result, in part, from recent and frequent recombination . The combination of gene and segmental duplications in addition to gene recombination may be a powerful system for generating and or maintaining sequence diversity in this gene family.
Heterogeneous gene clusters
Many large gene families in organisms from plants to mammals have immune related functions. In humans, the major histocompatability complex (MHC) has over 160 genes that diversify through sequence exchange and duplication  and clusters of R genes in higher plants also maintain diversity through sequence exchange and recombination (reviewed in ). The Sp185/333 gene family is another example of a large diverse immune related gene family (reviewed in ). The Sp185/333 cluster on the 7096 BAC is positioned 6.1 kb from the end of the insert, which makes it unclear whether this cluster is one of several small isolated clusters in the genome, or whether it is the end of a large cluster with additional linked genes that might be identified from overlapping BACs. Examples of both large and small clusters of linked genes involved in immune responsiveness have been found in other organisms. The nucleotide binding, leucine-rich repeats (NB-LRR) subclass of R genes in Arabidopsis has 149 members of which 109 are clustered into small groups consisting of two to eight genes [60, 61]. Similarly, the sea urchin Toll-like receptor (TLR) genes are clustered in small groups that are spread throughout the genome [4, 62]. Multiple large clusters of over 1,000 variant surface glycoprotein (VSG) genes in Trypanosoma brucei are distributed into 15 sizeable (40-60 kb) telomeric sites .
The six Sp185/333 genes on the 7096 BAC form a heterogeneous cluster with four different element patterns. Except for the D1 genes, there is no correlation between proximity and sequence similarity among the linked genes on 7096 compared to genes that have been randomly isolated with unknown linkage (Figure 5). Although we suggest that the genes may be the result of duplications mediated by the GA microsatellites, it does not appear that the different element patterns of the clustered genes on 7096 are the result of tandem gene duplications from a single gene followed by sequence diversification. Consequently, the Sp185/333 gene cluster appears as a heterogeneous cluster of genes with different element patterns. Heterogeneous clusters of tandemly linked R genes have been investigated in Arabidopsis in which more than ten clusters have intermingled genes from two different subfamilies: the Toll/interleukin-1 LRR (TNL) subfamily and the coiled-coil region LRR (CNL) subfamily [60, 61]. A proposed advantage of heterogeneous clusters is a block to gene homogenization and maintenance of diversity among the members of the cluster . Two models have been proposed to explain the origins of heterogeneous clusters. The 'rapid rearrangement' model suggests that small areas consisting of one to a few genes are ectopically duplicated such that genes are copied to unlinked regions of the genome [60, 61]. The 'conserved synteny' model suggests that large-scale segmental duplications are moved to new genomic locations, including different chromosomes [11, 65]. Evidence for these models is based on the level of synteny, or lack thereof, in regions surrounding heterogeneous clusters. It is not clear whether either of these mechanisms functions within the Sp185/333 family, however, the notion of copying sequences from within the GA repeats to other locations of the genome with similar GA repeats is consistent with the heterogeneous mixture of Sp185/333 genes in the cluster. It is also consistent with a rapid rate of gene diversification as deduced from molecular clock analysis  and as proposed for rapid gene recombination .
In addition to ectopic duplication of genes and segments to produce heterogeneous clusters, gene conversion may also be involved in sequence diversification, which may be promoted not only by the GA microsatellites, but also by the repeats and shared element sequences within the coding region. Six types of coding region repeats were first reported for ESTs and full length transcripts [17, 19] and are present in the second exon in both tandem and mixed interspersed organization (Figure 1) . Within the repeats, shorter, simple repeats are also present . In addition, many of the genes share element sequences and simple repeats, and, on a larger scale, the genes themselves can be viewed as imperfect repeats. If the similarity among the Sp185/333 sequences promotes crossovers and gene conversion, these activities would lead to sequence homogenization of the genes, the flanking regions, and possibly an entire region harboring Sp185/333 genes. This would be counter-productive for maintaining a diverse gene family with putative immunological functions. However, because sequence similarity among the genes decreases outside of the GA microsatellites, it suggests that regions that undergo sequence exchange are limited to the span between the GA microsatellites. The flanking microsatellites may act to block the progression of DNA strand exchange during crossovers and gene conversion, protecting the entire region from sequence homogenization including nearby Sp185/333 genes. An example of this type of result that has been experimentally observed in yeast . Overall, we postulate two activities that may function simultaneously to generate and regulate sequence diversity among the cluster of Sp185/333 genes. Both the GA and GAT microsatellites may promote duplication of genes and larger segments leading to diversification perhaps by recombination. On the other hand, the shared sequences within the coding regions may promote an unknown level of gene conversion among both closely linked and unlinked genes that could preserve the heterogeneous nature of the cluster. Furthermore, strand exchange during gene conversion may be restricted to the genes and proximal flanking regions by the GA microsatellites that might block the spread of sequence homogenization to other genes within a tight cluster.
Gene fragments and pseudogenes are common in clusters of genes belonging to the same family [66, 67] and often result from common mechanisms of duplication and diversification such as unequal crossing over and tandem duplication. Surprisingly, no gene fragments have been found in the Sp185/333 family even after extensive searches of the genome, and only one pseudogene has been identified (of 171 genes sequenced) that appears to be the result of retrotransposition . The remaining 170 sequenced genes have perfect open reading frames and splice signals. We speculate that the mechanisms that promote a rapid rate of gene diversification, as predicted by Buckley et al.  and as proposed above, may be under controls to avoid generating fragmented and non-functional genes. The flanking microsatellites and their putative block to DNA strand exchange may be involved in maintaining the reading frame fidelity while promoting diversification, given their location at the edges of the conserved flanking regions of the genes and at the edges of the tandem segmental duplications.
A2 Gene Diversity
The A2 gene can be categorized as the outlier of the cluster for more than just reasons of distance. It has the highest sequence diversity compared to the other genes within the cluster (Table 4, Figure 11B) and it has variant GA microsatellites. Previous reports show that large genes such as A2 (large genes always have elements 2 through 5, see Figure 1) are strikingly different from small genes (B, D and E patterns, see Figure 1) that make up the rest of this cluster (see ). The sequences of the shared elements are entirely different  even though the large and small genes have a somewhat comparable complement of elements within the patterns (Figure 1). This prompted previous speculation that the A2 genes may be spatially separated from the rest of the Sp185/333 genes, perhaps located in a separate cluster that would prevent recombination among large and small genes . Consequently, it was unexpected to find an A2 gene clustered near five Sp185/333 genes of the small category. Differences between the element diversity in the A2 gene compared to the other genes in the cluster may be due to its separation from the other genes by 14 kb, however, variations in the 3' flanking GA microsatellite may also be involved, preventing recombination between the A2 gene and the other Sp185/333 genes within the cluster. If altered GA microsatellites are present in the other A2 genes throughout the genome, this may restrict recombination or gene conversion to within the A type element pattern category and maintain the sequence diversity for all of the A type genes so that they share a similar element pattern and individual element sequences. A possible origin for the variation of the GA microsatellite associated with the A2 gene is the LTR element fragments that are interspersed within this particular microsatellite. Whether this unique 3' GA microsatellite is common to all A2 genes and to all genes in the large category and whether it is involved in maintaining separate element sequences between large vs. small genes is unknown and will require additional sequence data.
Duplications imply deletions
We hypothesize above that the recent segmental duplications that include the three D1 genes within the cluster may be mediated by the GAT microsatellites. However, the presence of duplications implies that deletions also occur, which are difficult or impossible to detect. Preliminary PCR amplification of Sp185/333 sequences on two BACs, 7096 and 181662, indicated that both had Sp185/333 genes in different arrangements (data not shown). Initial sequencing of 181662 BAC (completed before 2006) resulted in 15 unordered contigs [116 kb, GenBank:AC181662.1] and included one contig with a complete second exon from a Sp185/333 gene with an open reading frame and a 3' flanking GA microsatellite. In 2008, a finishing-level sequence for 181662 (136.6 kb) resulted in a single contig with no Sp185/333 genes, although GA microsatellites were present. Intergenic distances between GA microsatellites that flank the Sp185/333 genes on 7096 range from 1.9 to 2.5 kb, although the spacing between B8 and A2 is much larger. The distances between large GA microsatellites (similar in repeat numbers to those surrounding the Sp185/333 genes reported here) on 181662 are 1.3, 1.4 and 2.6 kb. This spacing is typical for the majority of the Sp185/333 genes as assayed by intergenic PCR amplification of genomic DNA . We speculate that if the GA microsatellites mediate gene deletion and that this occurred during propagation of the BAC in culture, then the positions of the microsatellites on 181662 suggest that the Sp185/333 genes were spaced apart from each other similar to that for 7096. In comparison, results from another BAC, 076N15 (139 kb; see http://www.spbase.org/SpBase/resources/bac_sequences.php for BAC sequence), that harbors homologues of two complement genes and does not have Sp1865/333 genes, has six large GA microsatellites that are spaced apart by 4 - 33.8 kb. This spacing is much greater than reported here for either 7096 or 181662. Although it is not known whether the deletion of Sp185/333 genes on 181662 was based on instability from the GA microsatellites, it is intriguing that these microsatellites may mediate gene both duplication and deletion.
Gene copy number does not correlate with the level of gene expression
Of the four different element patterns present in the genes within the cluster, two are of particular interest because of differences in both gene copy numbers and expression levels. The presence of three D1 genes vs. single copies of genes with other element patterns is consistent with the previous observation that D1 is the most commonly observed element pattern among genes . Yet despite the higher frequency, expression of D1 genes is relatively low compared to expression of E2 genes [18, 19]. Based on the cluster of genes reported here, reduced expression may be the result of a non-consensus TATA box associated with the D1-g gene and non-consensus polyadenylation sites associated with the D1-g and D1-b genes. This raises the possibility that these genes may either be expressed less efficiently or they may be pseudogenes; however, it is not known whether other D1 genes in the genome also have variant TATA box and polyadenylation sites. On the other hand, the E2 gene, which is most commonly expressed [18, 19], is observed less often in randomly sequenced genes  and is present as a single copy in the sequenced cluster. This suggests that increased expression of E2 gene(s) in the genome may be the result of very active promoters that overcome an estimated lower gene copy number relative to D1 genes (12-18 E2 genes vs. 30-45 D1 genes , KM Buckley, unpublished). It is important to note however, that although E2 is the most commonly isolated element pattern among transcripts in response to immune challenge, a limited number of pathogen-associated molecular patterns (PAMPs) have been tested for the induction of Sp185/333 expression [17, 18, 20]. Testing additional PAMPs may show a variety of response levels for Sp185/333 genes with different element patterns that are present in the genome at different frequencies. Furthermore, the disparity in expression levels for genes with different element patterns may suggest that expression of each gene may be independently controlled by cis regulatory elements as opposed to a group expression control mechanism. This hypothesis is supported by comparisons between sequences from genes and messages for three sea urchins which shows that most of the messages (59% to 93% for different individuals) are likely transcribed from a single gene per animal .
Conclusions: Diversification of the Sp185/333 gene family
Previous studies of the Sp185/333 gene family and encoded proteins have provided evidence of several different mechanisms that ultimately diversify the pool of Sp185/333 proteins: gene recombination , RNA editing , and post-translational modifications [21, 69]. To this body of data, we present a computational basis for postulating three additional diversification mechanisms; i) gene and segmental duplications driven by sequence similarities among the genes and the flanking microsatellites, ii) ectopic duplication, and iii) gene conversion promoted by coding region sequence similarities with strand exchange blocked by flanking microsatellites. Additional mechanisms for generating sequence diversity in the Sp185/333 gene family are undoubtedly possible. The Sp185/333 gene family in the purple sea urchin remains an interesting example of a complex invertebrate immune system that functions effectively in host protection against the myriad of possible pathogens in the marine environment.
CAM is a doctoral student in the Department of Biology, Boston College, Boston, MA. KMB is a postdoctoral researcher in the Department of Immunology, Sunnybrook Research Institute, University of Toronto, Toronto, ON, Canada. RLE is a research scientist at TECHLAB Inc. in Blacksburg, VA. LCS is a Professor of Biology at George Washington University, Washington DC.
bacterial artificial chromosome
Baylor College of Medicine - Human Genome Sequencing Center
coiled-coil region LRR
dye terminator cycle sequencing
fibrinogen related proteins
long terminal repeat
major histocompatability complex
nucleotide binding, leucine-rich repeats
pathogen-associated molecular patterns
pulsed-field gel electrophoresis
quantitative polymerase chain reaction
sample loading solution
Tris borate EDTA
variable region-containing chitin-binding proteins
variant surface glycoprotein
whole genome shotgun
The authors are indebted to Drs. Steven Salzberg and Michael Schatz for providing shotgun assembly expertise and to Drs. Erica Sodergren and George Weinstock for agreeing to re-sequence BAC 7096. Drs. Sham Nair and Liliana Florea provided helpful improvements to the manuscript. Trudy Gillevant assisted with BAC clone analysis. Katie Zaleski, Khin Sone and Caroline Rosa assisted with generating BAC subclones of Sp185/333 genes and intergenic regions. The research was funded by the National Science Foundation (MCB 07-44999) to LCS and a Weintraub research fellowship to KMB.
- Raftos DA, Raison RL: Early vertebrates reveal diverse immune recognition strategies. Immunology and Cell Biology. 2008, 86: 479-481. 10.1038/icb.2008.38.View ArticleGoogle Scholar
- Flajnik MF, Du Pasquier L: Evolution of innate and adaptive immunity: can we draw a line?. Trends in Immunology. 2004, 25 (12): 640-644. 10.1016/j.it.2004.10.001.View ArticleGoogle Scholar
- Loker ES, Adema CM, Zhang SM, Kepler TB: Invertebrate immune systems--not homogeneous, not simple, not well understood. Immunological Review. 2004, 198: 10-24. 10.1111/j.0105-2896.2004.0117.x.View ArticleGoogle Scholar
- Messier-Solek C, Buckley KM, Rast JP: Highly diversified innate receptor systems and new forms of animal immunity. Seminars in Immunology. 2010, 22 (1): 39-47. 10.1016/j.smim.2009.11.007.View ArticleGoogle Scholar
- Zhang SM, Adema CM, Kepler TB, Loker ES: Diversification of Ig superfamily genes in an invertebrate. Science. 2004, 305 (5681): 251-254. 10.1126/science.1088069.View ArticleGoogle Scholar
- Schmucker D, Clemens JC, Shu H, Worby CA, Xiao J, Muda M, Dixon JE, Zipursky SL: Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell. 2000, 101 (6): 671-684. 10.1016/S0092-8674(00)80878-8.View ArticleGoogle Scholar
- Schmucker D, Chen B: Dscam and DSCAM: complex genes in simple animals, complex animals yet simple genes. Genes and Development. 2009, 23: 147-156. 10.1101/gad.1752909.View ArticleGoogle Scholar
- Brites D, McTaggart S, Morris K, Anderson J, Thomas K, Colson I, Fabbro T, Little TJ, Ebert D, Du Pasquier L: The Dscam homologue of the crustacean Daphnia is diversified by alternative splicing like in insects. Molecular Biology and Evolution. 2008, 25 (7): 1429-1439. 10.1093/molbev/msn087.View ArticleGoogle Scholar
- Watson FL, Puttmann-Holgado R, Thomas F, Lamar DL, Hughes M, Kondo M, Rebel VI, Schmucker D: Extensive diversity of Ig-superfamily proteins in the immune system of insects. Science. 2005, 309 (5742): 1874-1878. 10.1126/science.1116887.View ArticleGoogle Scholar
- Dong Y, Taylor HE, Dimopoulos G: AgDscam, a hypervariable immunoglobulin domain-containing receptor of the Anopheles gambiae innate immune system. PLoS Biology. 2006, 4 (7): e229-10.1371/journal.pbio.0040229.PubMed CentralView ArticleGoogle Scholar
- McDowell JM, Simon SA: Molecular diversity at the plant-pathogen interface. Developmental and Comparative Immunology. 2008, 32 (7): 736-744. 10.1016/j.dci.2007.11.005.View ArticleGoogle Scholar
- Cannon JP, Haire RN, Litman GW: Identification of diversified genes that contain immunoglobulin-like variable regions in a protochordate. Nature Immunology. 2002, 3 (12): 1200-1207. 10.1038/ni849.View ArticleGoogle Scholar
- Cannon JP, Haire RN, Rast JP, Litman GW: The phylogenetic origins of the antigen-binding receptors and somatic diversification mechanisms. Immunolocal Reviews. 2004, 200: 12-22. 10.1111/j.0105-2896.2004.00166.x.View ArticleGoogle Scholar
- Dishaw LJ, Meuller MG, Gwatney N, Cannon JP, Haire RN, Litman RT, Amemiya CT, Ota T, Rowen L, Gluxman G: Genomic complexity of the variable region-containing chitin-binding proteins in amphioxus. MBC Genetics. 2008, 9: 78-Google Scholar
- Buckley KM, Smith LC: Extraordinary diversity among members of the large gene family, 185/333, from the purple sea urchin, Strongylocentrotus purpuratus. BMC Molecular Biology. 2007, 8: 68-10.1186/1471-2199-8-68.PubMed CentralView ArticleGoogle Scholar
- Ghosh JG, Buckley KM, Nair SV, Raftos DA, Miller C, Majeske AJ, Hibino T, Rast JP, Roth M, Smith LC: Sp185/333: A novel family of genes and proteins involved in the purple sea urchin immune response. Developmental and Comparative Immunology. 2010, 34: 235-245. 10.1016/j.dci.2009.10.008.View ArticleGoogle Scholar
- Nair SV, Del Valle H, Gross PS, Terwilliger DP, Smith LC: Macroarray analysis of coelomocyte gene expression in response to LPS in the sea urchin. Identification of unexpected immune diversity in an invertebrate. Physiological Genomics. 2005, 22 (1): 33-47. 10.1152/physiolgenomics.00052.2005.View ArticleGoogle Scholar
- Terwilliger DP, Buckley KM, Brockton V, Ritter NJ, Smith LC: Distinctive expression patterns of 185/333 genes in the purple sea urchin, Strongylocentrotus purpuratus: an unexpectedly diverse family of transcripts in response to LPS, beta-1,3-glucan, and dsRNA. BMC Molecular Biology. 2007, 8: 16-10.1186/1471-2199-8-16.PubMed CentralView ArticleGoogle Scholar
- Terwilliger DP, Buckley KM, Mehta D, Moorjani PG, Smith LC: Unexpected diversity displayed in cDNAs expressed by the immune cells of the purple sea urchin, Strongylocentrotus purpuratus. Physiological Genomics. 2006, 26 (2): 134-144. 10.1152/physiolgenomics.00011.2006.View ArticleGoogle Scholar
- Rast JP, Pancer Z, Davidson EH: New approaches towards an understanding of deuterostome immunity. Current Topics in Microbiology and Immunology. 2000, 248: 3-16.Google Scholar
- Dheilly NM, Nair SV, Smith LC, Raftos DA: Highly variable immune-response proteins (185/333) from the sea urchin Strongylocentrotus purpuratus: Proteomic analysis identifies diversity within and between individuals. Journal of Immunology. 2009, 182: 2203-2212. 10.4049/jimmunol.07012766.View ArticleGoogle Scholar
- Buckley KM, Munshaw S, Kepler TB, Smith LC: The 185/333 gene family is a rapidly diversifying host-defense gene cluster in the purple sea urchin, Strongylocentrotus purpuratus. Journal of Molecular Biology. 2008, 379: 912-928. 10.1016/j.jmb.2008.04.037.View ArticleGoogle Scholar
- Cameron RA, Samanta M, Yuan A, He D, Davidson E: SpBase: the sea urchin genome database and web site. Nucleic Acids Research. 2009, D750-D754. 10.1093/nar/gkn887. 37 Database
- Hoskins RA, Carlson JW, Kennedy C, Acevedo D, Evans-Holm M, Frise E, Wan KH, Park S, Mendez-Lago M, Rossi F: Sequence finishing and mapping of Drosophila melanogaster heterochromatin. Science. 2007, 316 (5831): 1625-1628. 10.1126/science.1139816.PubMed CentralView ArticleGoogle Scholar
- Mendez-Lago M, Wild J, Whitehead SL, Tracey A, de Pablos B, Rogers J, Szybalski W, Villasante A: Novel sequencing strategy for repetitive DNA in a Drosophila BAC clone reveals that the centromeric region of the Y chromosome evolved from a telomere. Nucleic Acids Research. 2009, 37 (7): 2264-2273. 10.1093/nar/gkp085.PubMed CentralView ArticleGoogle Scholar
- Strathmann M, Hamilton BA, Mayeda CA, Simon MI, Meyerowitz EM, Palazzolo MJ: Transposon-facilitated DNA sequencing. Proc Natl Acad Sci. 1990, 88: 1247-1250. 10.1073/Proceedings of the National Academy of Sciences.88.4.1247.View ArticleGoogle Scholar
- She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, Church DM, Sutton G, Halpern AL, Eichler EE: Shotgun sequence assembly and recent segmental duplications within the human genome. Nature. 2004, 431 (7011): 927-930. 10.1038/nature03062.View ArticleGoogle Scholar
- Medvedev P, Brudno M: Maximum likelihood genome assembly. Journal of Computational Biology. 2009, 16 (8): 1101-1116. 10.1089/cmb.2009.0047.PubMed CentralView ArticleGoogle Scholar
- Drost DR, Novaes E, Boaventura-Novaes C, Benedict CI, Brown RS, Yin T, Tuskan GA, Kirst M: A microarray-based genotyping and genetic mapping approach for highly heterozygous outcrossing species enables localization of a large fraction of the unassembled Populus trichocarpa genome sequence. Plant Journal. 2009, 58 (6): 1054-1067. 10.1111/j.1365-313X.2009.03828.x.View ArticleGoogle Scholar
- Cameron RA, Mahairas G, Rast JP, Martinez P, Biondi TR, Swartzell S, Wallace JC, Poustka AJ, Livingston BT, Wray GA: A sea urchin genome project: sequence scan, virtual map, and additional resources. Proc Natl Acad Sci USA. 2000, 97 (17): 9514-9518. 10.1073/Proceedings of the National Academy of Sciences.160261897.PubMed CentralView ArticleGoogle Scholar
- Multerer KA, Smith LC: Two cDNAs from the purple sea urchin, Strongylocentrotus purpuratus, encoding mosaic proteins with domains found in factor H, factor I, and complement components C6 and C7. Immunogenetics. 2004, 56 (2): 89-106. 10.1007/s00251-004-0665-2.View ArticleGoogle Scholar
- Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke RD: The genome of the sea urchin, Strongylocentrotus purpuratus. Science. 2006, 314 (5801): 941-952. 10.1126/science.1133609.View ArticleGoogle Scholar
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF: The genome sequence of Drosophila melanogaster. Science. 2000, 287 (5461): 2185-2195. 10.1126/science.287.5461.2185.View ArticleGoogle Scholar
- Schatz MC, Phillippy AM, Shneiderman B, Salzberg SL: Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biology. 2007, 8 (3): R34-10.1186/gb-2007-8-3-r34.PubMed CentralView ArticleGoogle Scholar
- Rebeiz M, Posakony JW: GenePalette: a universal software tool for genome sequence visualization and analysis. Developmental Biology. 2004, 271 (2): 431-438. 10.1016/j.ydbio.2004.04.011.View ArticleGoogle Scholar
- Hall TA: BioEdit: a user friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series: 1999. 1999, 95-98.Google Scholar
- Kumar S, Dudley J, Nei M, Tamura K: MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in Bioinformatics. 2008, 9: 299-306. 10.1093/bib/bbn017.PubMed CentralView ArticleGoogle Scholar
- Toth G, Deak G, Barta E, Kiss GB: PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats. Nucleic Acids Res. 2006, W708-W713. 10.1093/nar/gkl263. 34 Web Server
- Britten RJ, Cetta A, Davidson EH: The single-copy DNA sequence polymorphism of the sea urchin Strongylocentrotus purpuratus. Cell. 1978, 15 (4): 1175-1186. 10.1016/0092-8674(78)90044-2.View ArticleGoogle Scholar
- Green P: Phrap documentation. 1996, [http://www.phrap.org/phredphrap/phrap.html]Google Scholar
- Havlak P, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Weinstock GM, Gibbs RA: The Atlas genome assembly system. Genome Research. 2004, 14 (4): 721-732. 10.1101/gr.2264004.PubMed CentralView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.View ArticleGoogle Scholar
- Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA: A whole-genome assembly of Drosophila. Science. 2000, 287 (5461): 2196-2204. 10.1126/science.287.5461.2196.View ArticleGoogle Scholar
- Kapitonov VV, Jurka J: Harbinger transposons and an ancient HARBI1 gene derived from a transposase. DNA Cell Biology. 2004, 23 (5): 311-324. 10.1089/104454904323090949.View ArticleGoogle Scholar
- Li B, Xia Q, Lu C, Zhou Z, Xiang Z: Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes. Genomics Proteomics Bioinformatics. 2004, 2 (1): 24-31.Google Scholar
- Schlötterer C, Wiehe T: Microsatellites, a neutral marker to infer selective sweeps. Microsatellites: Evolution and Applications. 1999, Oxford: Oxford University Press, 238-247.Google Scholar
- Schlotterer C: Evolutionary dynamics of microsatellite DNA. Chromosoma. 2000, 109: 365-371. 10.1007/s004120000089.View ArticleGoogle Scholar
- Bagshaw AT, Pitt JP, Gemmell NJ: High frequency of microsatellites in S. cerevisiae meiotic recombination hotspots. BMC Genomics. 2008, 9: 49-10.1186/1471-2164-9-49.PubMed CentralView ArticleGoogle Scholar
- Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G: A high-resolution recombination map of the human genome. Nature Genetics. 2002, 31 (3): 241-247.Google Scholar
- Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005, 310 (5746): 321-324. 10.1126/science.1117196.View ArticleGoogle Scholar
- Jensen-Seaman MI, Furey TS, Payseur BA, Lu Y, Roskin KM, Chen CF, Thomas MA, Haussler D, Jacob HJ: Comparative recombination rates in the rat, mouse, and human genomes. Genome Research. 2004, 14 (4): 528-538. 10.1101/gr.1970304.PubMed CentralView ArticleGoogle Scholar
- Trifonov EN: Tuning function of tandemly repeating sequences: a molecular device for fast adaptation. Evolutionary theory and processes: modern horizons, papers in honor of Eviatar Nevo. Edited by: Wassser SP. 2003, Amsterdam, The Netherlands: Kluwer Academic Publishers, 1-24.Google Scholar
- Gendrel C-G, Boulet A, Dutreix M: (CA/GT)n microsatellites affect homologous recombination during yeast meiosis. Genes and Development. 2000, 14: 1261-1268.PubMed CentralGoogle Scholar
- Payen C, Koszul R, Dujon B, Fischer G: Segmental duplications arise from Pol32-dependent repair of broken forks through two alternative replication-based mechanisms. PLoS Genetics. 2008, 4 (9): e1000175-10.1371/journal.pgen.1000175.PubMed CentralView ArticleGoogle Scholar
- Wahls WP, Wallace LJ, Moore PD: The Z-DNA motif d(TG)30 promotes reception of information during gene conversion events while stimulating homologous recombination in human cells in culture. Molecular and Cellular Biology. 1990, 10 (2): 785-793.PubMed CentralView ArticleGoogle Scholar
- Napierala M, Dere R, Vetcher A, Wells RD: Structure-dependent recombination hot spot activity of GAA.TTC sequences from intron 1 of the Friedreich's ataxia gene. Journal of Biological Chemistry. 2004, 279 (8): 6444-6454. 10.1074/jbc.M309596200.View ArticleGoogle Scholar
- Murphy KE, Stringer JR: RecA independent recombination of poly[d(GT)-d(CA)] in pBR322. Nucleic Acids Research. 1986, 14: 7325-7340. 10.1093/nar/14.18.7325.PubMed CentralView ArticleGoogle Scholar
- Bullock P, Miller J, Botchan M: Effects of poly[d(pGpT).d(pApC)] and poly[d(pCpG).d(pCpG)] repeats on homologous recombination in somatic cells. Molecular and Cellular Biology. 1986, 6 (11): 3948-3953.PubMed CentralView ArticleGoogle Scholar
- Traherne JA: Human MHC architecture and evolution: implications for disease association studies. International Journal of Immunogenetics. 2008, 35 (3): 179-192. 10.1111/j.1744-313X.2008.00765.x.PubMed CentralView ArticleGoogle Scholar
- Meyers BC, Kozik A, Griego A, Kuang H, Michelmore RW: Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis. Plant Cell. 2003, 15 (4): 809-834. 10.1105/tpc.009308.PubMed CentralView ArticleGoogle Scholar
- Richly E, Kurth J, Leister D: Mode of amplification and reorganization of resistance genes during recent Arabidopsis thaliana evolution. Molecular Biology and Evolution. 2002, 19 (1): 76-94.View ArticleGoogle Scholar
- Hibino T, Loza-Coll M, Messier C, Majeske AJ, Cohen A, Terwilliger DP, Buckley KM, Brockton V, Nair S, Berney K: The immune gene repertoire encoded in the purple sea urchin genome. Developmental Biology. 2006, 300: 349-365. 10.1016/j.ydbio.2006.08.065.View ArticleGoogle Scholar
- Boothroyd CE, Dreesen O, Leonova T, Ly KI, Figueiredo LM, Cross GA, Papavasiliou FN: A yeast-endonuclease-generated DNA break induces antigenic switching in Trypanosoma brucei. Nature. 2009, 459 (7244): 278-281. 10.1038/nature07982.PubMed CentralView ArticleGoogle Scholar
- Leister D: Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene. Trends in Genetics. 2004, 20 (3): 116-122. 10.1016/j.tig.2004.01.007.View ArticleGoogle Scholar
- Li E: Chromatin modification and epigenetic reprogramming in mammalian development. Nature Reviews Genetics. 2002, 3: 662-673. 10.1038/nrg887.View ArticleGoogle Scholar
- Rast JP, Messier-Solek C: Marine invertebrate genome sequences and our evolving understanding of animal immunity. Biological Bulletin. 2008, 214: 274-283. 10.2307/25470669.View ArticleGoogle Scholar
- Gilad Y, Man O, Paabo S, Lancet D: Human specific loss of olfactory receptor genes. Proc Natl Acad Sci USA. 2003, 100 (6): 3324-3327. 10.1073/Proceedings of the National Academy of Sciences.0535697100.PubMed CentralView ArticleGoogle Scholar
- Buckley KM, Terwilliger DP, Smith LC: Sequence variations between the 185/333 genes and messages from the purple sea urchin suggest post-transcriptional modifications. Journal of Immunology. 2008, 181 (12): 2203-2212.View ArticleGoogle Scholar
- Brockton V, Henson JH, Raftos DA, Majeske AJ, Kim YO, Smith LC: Localization and diversity of 185/333 proteins from the purple sea urchin - unexpected protein-size range and protein expression in a new coelomocyte type. Journal of Cell Science. 2008, 121 (3): 339-348. 10.1242/jcs.012096.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.