Partial duplication of the APBA2 gene in chromosome 15q13 corresponds to duplicon structures

Background Chromosomal abnormalities affecting human chromosome 15q11-q13 underlie multiple genomic disorders caused by deletion, duplication and triplication of intervals in this region. These events are mediated by highly homologous segments of DNA, or duplicons, that facilitate mispairing and unequal cross-over in meiosis. The gene encoding an amyloid precursor protein-binding protein (APBA2) was previously mapped to the distal portion of the interval commonly deleted in Prader-Willi and Angelman syndromes and duplicated in cases of autism. Results We show that this gene actually maps to a more telomeric location and is partially duplicated within the broader region. Two highly homologous copies of an interval containing a large 5' exon and downstream sequence are located ~5 Mb distal to the intact locus. The duplicated copies, containing the first coding exon of APBA2, can be distinguished by single nucleotide sequence differences and are transcriptionally inactive. Adjacent to APBA2 maps a gene termed KIAA0574. The protein encoded by this gene is weakly homologous to a protein termed X123 that in turn maps adjacent to APBA1 on 9q21.12; APBA1 is highly homologous to APBA2 in the C-terminal region and is distinguished from APBA2 by the N-terminal region encoded by this duplicated exon. Conclusion The duplication of APBA2 sequences in this region adds to a complex picture of different low copy repeats present across this region and elsewhere on the chromosome.


Background
Human chromosome 15q11-q13 is associated with genomic disorders involving deletions, duplications and triplications of different intervals in this region. The formation of these cytogenetic abnormalities is mediated by low-copy repeated sequences, or duplicons [1]. High sequence homology between copies presumably facilitates mispairing and unequal recombination events in meiosis. Interstitial deletion of 15q11-q13 results in Prader-Willi syndrome (PWS; MIM # 176270) or Angelman syndrome (AS; MIM # 105830), when deletions occur on the paternally-derived or maternally-derived homolog, respectively (see Figure 1; reviewed in refs. [2][3][4]). Interstitial duplication of the same interval has been found in cases of autism, and data suggest a higher risk for autism with maternal compared to paternal duplications [5][6][7]. Chromosome 15 is the most frequently involved chromosome in cases of autism with supernumerary marker chromosomes. 15q-derived markers are inverted, duplicated and pseudodicentric structures and are known as inv dup (15) or idic (15). Small markers which do not contain the PWS/AS deletion interval are associated with a normal phenotype [8]. Larger idic (15) duplications are typically associated with a more severe autistic phenotype than that seen in cases of interstitial duplication [5][6][7][9][10][11][12]. The increased severity may be due to the presence of two additional copies of this region, however these duplications also extend significantly farther telomerically and therefore contain more duplicated genes. ldic (15) markers also vary in the location of the distal breakpoint, depending on which duplicon sequence mediated the rearrangement. Interstitial triplications have also been described, and these tend to utilize variable distal breakpoints common to idic (15) chromosomes [13]. Triplications are associated with a phenotype analogous to that for idic (15) when maternally-derived [7,12,13].
Several functional duplicons have been described, and these may be divided into two classes. There are two proximal breakpoints (BP1 and BP2) that define two classes of PWS/AS deletion [14]. The common distal breakpoint is termed BP3 and has a bipartite structure [15,16]. The BP2 and BP3 duplicons span several hundred kb and contain at least seven expressed sequences. One prominent example is the HERC2 (Hect domain and Rcc1 domain protein 2) gene [17]. Approximately 11 HERC2 copies have been identified on chromosomes 15 and 16, although most map to 15q11-q13. Another class of 15q duplicons has been termed LCR15s, for low copy repeats [18,19]. These share some sequences in common with the large HERC2 duplicons. LCR15s extend only ~15 kb individually, but are often present in clusters extending to 30-60 kb. This class of duplicon has been proposed to help mediate rearrangements in idic(15) markers and larger interstitial duplications and triplications [19].
The gene encoding the amyloid precursor protein-binding protein A2 (APBA2) was previously mapped to chromosome 15 [20] and localized to the common PWS/AS deletion region centromeric to BP3 and distal to a cluster of GABA A receptor subunit genes [21,22] (see Figure 1). This location made APBA2 an attractive positional candidate gene for involvement in duplication-associated and inherited autism. On this basis we sought to characterize APBA2 in terms of structure and expression. APBA2 has also been termed Mint2, for Munc-interacting protein 2, or X-11β. Mint2 functions as a neuronal adaptor protein that binds to the Munc18-1 protein, which is essential for synaptic vesicle exocytosis [23]. Biederer and Südhof have shown that a multiprotein complex containing Mint1 (APBA1) binds to the cytoplasmic tail of neurexins, which play a critical role in synaptic vesicle exocytosis. Mint1 and Mint2 can bind directly to neurexins via a PDZ domain-mediated interaction. Thus the Mint proteins serve to anchor Munc-18 to the neurexins and thereby facilitate exocytosis. We show here that APBA2 actually maps to a more distal location, outside the narrow common interstitial deletion/duplication interval but within the larger idic(15) duplications. We found that a large 5' and the first coding exon of APBA2 is duplicated, and these transcriptionally inactive copies are located at more telomeric sites within the region. The nature of the duplication may have implications for the evolution of APBA2/Mint2 from its other family members.

Figure 1
Schematic representation of 15q11-q13. The chromosome 15q11-q13 is depicted, with breakpoints utilized in PWS/AS deletions and autism-associated duplications and triplications represented by larger jagged lines. Smaller jagged lines identify rare chromosomal breakpoints which delimit the PWS and AS critical regions. Centromeric (cen) and telomeric (tel) orientation is shown and breakpoints are numbered from BP1-BP5. Arrows over the map show the intervals deleted in PWS/AS and duplicated in duplications or triplications affecting 15q11-q13. Gene and framework marker positions are indicated over the map. The most narrow autism candidate region is based on the smallest autism-associated maternally-biased duplications.

Results
We sought to characterize the structure and expression of the APBA2 gene, as a prelude to analysis of this gene as a possible autism candidate. We obtained a bacterial artificial chromosome (BAC) clone (686I6) containing genomic sequence of the APBA2 transcriptional unit. We also obtained yeast artificial chromosome (YAC) clones covering the interval between the 3' end of GABRG3 and OCA2 (962D11; contig WC-153) and to which an APBA2 EST (Wl-14097) and nearby markers had been mapped (764C6; contig WC-699). STS content mapping demonstrated that APBA2 sequences and other markers are present within the BAC 686I6 and YAC 764C4 but absent from the YAC clone covering the GABRG3-OCA2 interval ( Figure 2b). Thus, the previous map position appeared to be incorrect. Based on STS marker content in the 68616 clone, APBA2 maps to a more distal location in 15q13. The revised map position places APBA2 outside of the most narrow interstitial duplication interval and the area most supported by linkage data but within the larger idic(15) duplications. Current genomic sequence assemblies now agree with this experimentally-determined data.
To facilitate direct screening of APBA2, we determined the gene structure and developed exon-specific PCR assays. We characterized genomic structure using a combination of existing genomic sequence and direct sequencing from BAC clone template. APBA2 is encoded by 14 exons; 12 exons are coding and 2 exons contain 5' untranslated sequence. All exons are now accurately predicted in the NCBI (Genbank: NT_010363) and Celera assemblies, while the EnsembI assembly lacks the putative first exon. A sequence containing this exon has been deposited in GenBank (accession: AY228760). Junctions for most exons were confirmed by direct sequencing from RP-11 BAC 330K3 template, which also contains the APBA2 transcriptional unit. Table 1 itemizes intron-exon boundary information. Primer sequences and conditions for PCR amplification of coding exons (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14) are presented in Table 2. APBA2 is transcribed towards the telomere.
In the course of analyzing the structure of APBA2, we noted that a large 5' exon containing the initiating methionine codon was present not only in the 686I6 BAC but also highly homologous to sequences from other BAC clones with non-overlapping STS content. In silico STS content mapping revealed two distinct pairs of BAC clones containing the partial duplications ( Figure 2a). One of these (RP-11 clones 602M11 and 122P18) also contains a partial duplication of a pseudogene containing the 3' end of the neuronal nicotinic receptor α 7 subunit (CHRNA7) gene [24]. This pseudogene locus maps to a site telomeric to APBA2 in 15q13. The second group of BACs (RP-11 438P7 and 1 H8) contained the partial APBA2 duplication and the K + CIcotransporter (SLC12A6 or KCC3) gene. The duplications include the entire 1-kb exon and ~5-kb of downstream sequence, however this sequence is interrupted in the intact locus by an ~9.7-kb non-duplicated interval. The first block of homology includes exon 3, ~100 bp of sequence 5' and 1,045 bp 3' to the exon. Following the homology gap, there is an additional 3,945 bp of duplicated sequence. Furthermore, an additional 15 kb region in both sets of BACs containing partial APBA2 duplications is highly homologous to a region approximately 25 kb upstream of the presumptive first exon of APBA2. Current sequence assemblies show the correct map location for the intact locus and show the map location of these dupAPBA2 sequences at ~5 Mb telomeric.
BLAST analysis of BACS containing the partial APBA2 duplications revealed duplicon-like sequences, based on significant homology to a large number of BACs from known duplicons. The largest number of similarities detected (>30) corresponds to an interval of ~15 kb located ~15 kb centromeric to the APBA2 duplications. However, there is highly significant homology over ~100-kb of sequence from assembly contig NT_035325 mapping to 15q26, where LCR-15 duplicon sequences have been reported [18]. These sequences appear to correspond to the LCR15 class of duplicons, based on sequence content. BLAST analysis of sequence at the intact locus revealed a similar low copy repeat or duplicon-type sequence of ~1.3-kb.
More than 80 copies of this sequence are present at sites across the genome on every chromosome, and with an apparent clustering at telomere locations. This sequence, while short, does not correspond to any of the known chromosome 15q11-q13 duplicon sequences, and therefore could represent a novel class of such low copy repeats. This sequence has been deposited in GenBank (AY237156). It is worth noting that the presence of multiple duplicons and repeated sequences has significantly complicated genomic sequence assembly for this region.
Since the partial duplications of APBA2 contained the first coding exon and the transcription start site(s) for APBA2 have not been characterized, we sought to determine whether these copies were transcriptionally active. We developed restriction fragment length polymorphism (RFLP) assays to detect sequence differences discriminating the intact APBA2 locus from the partial duplications. These assays were used following PCR from genomic or BAC DNA and cDNA from adult and fetal human brain (Figure 2c). Distinct "fingerprint" patterns are apparent for the intact locus and the duplications. The sequence differences used in Figure 2c that distinguish the duplications from the intact locus do not distinguish the duplications from each other. This distinction was made based on in silico analysis of numerous sequence differences, clone-sequence assembly and clone-marker/gene content relationships. The pattern in cDNA samples was identical to that in clone 686I6 but not either of the two duplications. These data argue that the only transcript present in brain corresponds to the intact locus, and that the two partial duplications are therefore not transcribed.
Immediately telomeric to APBA2 lies another gene (KIAA0574; see Figure 2a), which was initially identified from a sequencing project of large, brain-derived cDNAs [25]. KIAA0574 encodes a protein of unknown function, and is transcribed in the opposite orientation relative to APBA2. BLASTP using the KIAA0574 predicted amino acid A. An expanded view of the 15q12-q14 region containing distal duplication breakpoints is shown. Breakpoints BP3 and BP5 are depicted as jagged, hatched structures; BP4 is not thus shown but its position is indicated by an arrow over the map. STS marker positions are indicated on the map by closed circles, and arrows over the map indicate transcriptional orientation for genes in this region. YAC and BAC clone locations are shown below the map; YAC 764C6 is located in contig WC-153 and 962D11 is in WC-699. Closed circles within clones reflects the inclusion and position of specific STS markers shown above. B. PCR amplification of an APBA2-specific STS using genomic, YAC and BAC templates is shown. Position of clone pairs relative to cytogenetic banding and public sequence assemblies is indicated. Product consistent with genomic template is present only in clones from the region just telomeric to BP3. C. PCR from genomic or BAC clone template, followed by restriction with either Hinfl or Avall is shown. Unique banding patterns, corresponding to single nucleotide differences between copies, may be observed for the intact locus and each of the two duplication copies. RT-PCR using adult or fetal cDNA reveals the pattern corresponding to the intact locus only.

BP4
sequence detects weak homology (35% identity, 64% similarity over 69 residues) to a protein termed X123, the gene for which is located in the 9q12.21 Freidrich's ataxia region. The X123 gene in turn maps immediately adjacent to APBA1 (alias Mint1 or X11α). APBA1 and APBA2 are highly homologous for the C-terminal half of their respective sequences (90% similar, 84% identical). The N-terminal half, corresponding to the duplicated exon, is only weakly similar (36% similar, 30% identical). Thus the distinction between the APBA1 and APBA2 proteins corresponds to exon 3-encoded residues. Despite the comparatively weak homology between the KIAA0574 and X123 predicted peptides, this scenario suggests a clear evolutionary relationship between these gene pairs. The propensity of the 15q11-q13 region to undergo rearrangement may have played a role in the evolution of this gene family.
The distribution of APBA2 brain expression was determined using in situ hybridization and northern blotting. Commercial brain northern blots were hybridized with a cDNA probe corresponding to exon 3. Such a probe allows us to discriminate between APBA2 expression and potential signal from APBA1. Northern blotting revealed a predominant moderate-abundance transcript of ~4.2-kb widely distributed throughout the brain and spinal cord (Figure 3a). Several smaller transcripts were also seen. A cDNA clone for the mouse ortholog of APBA2 was obtained and a similar probe was used to test for expression using in situ hybridization to mouse brain sections. Figure  3b shows representative coronal and sagittal mouse brain sections. Apba2 demonstrates moderate expression in mouse cortical and limbic regions including frontal, parietal, and temporal cortex, hippocampus, amygdala, thalamus, and cerebellum and lower level expression in many other regions summarized in Table 3. A developmental expression series illustrating expression from embryonic day 8 through day 15 (E8-E15) is shown in Figure 4. Early expression in the primitive neural tube emerges at day E10. Apba2 expression extends throughout the neural tube during days E11 and E12, but is apparently restricted to the primitive brain vesicles during days E13 and E14. Within the brain, distribution is ubiquitous and uniform. In addition to diffuse expression throughout the brain, by day E15, a punctate pattern appears around the spinal   Developmental expression profile of Apba2. Apba2 expression in the primitive neural tube begins by embryonic day 10 (E10). Expression throughout the neural tube is seen in day E11 and E12, but appears to be restricted to the primitive brain vesicles by day E13 and continues during day E14 in all regions of primitive brain, including medulla, midbrain, thalamus, striatum, olfactory lobe, and neopallial cortex as seen above. By day E15, however, Apba2 transcript also appears in the dorsal root ganglia surrounding the spinal cord. cord, corresponding to expression within the dorsal root ganglia.
The availability of emerging mouse genomic sequence allowed us to examine sequence conservation across the transcriptional unit. We submitted human and mouse genomic sequences to the VISTA web site http://wwwgsd.lbl.gov/vista/ and homology is plotted and shown in Figure 5. Murine genomic sequence for exons 1 and 2 was not available, and the comparison was made for the remainder of the transcriptional unit. We would expect coding sequences to be conserved, but regions of non-coding conservation, which could harbour potential functional (conserved) sequences, are of particular interest. Several such regions are present within the APBA2 transcriptional unit.

Discussion
We have described the mapping of APBA2 to 15q13.1 within a region duplicated in idic(15) marker chromosomes, thus clarifying the map position of this gene. While public and private sequence assemblies now depict the correct location for this gene, previous published reports indicated that this gene mapped proximal to BP3 within the PWS/AS deletion and autism-associated interstitial duplication interval [21,22]. STS content mapping with regional clones confirms the predicted genomic assemblies and places this gene distal to the BP3 HERC2containing breakpoint. Two partial APBA2 copies containing the first coding exon and downstream sequence and are located ~5-Mb telomeric in 15q13.3. This location contains duplicon-like sequences, based on BLAST-detected homology relationships to other non-overlapping BAC clones, which contain LCR15-inclusive sequences. This region may represent yet another functional duplicon. This suggests the possibility of such sequences upstream from APBA2, most likely corresponding to BP3 or in a sequence gap between HERC2 and APBA2. The existence of a duplicon-like region near APBA2 would most easily explain the duplication. The duplicated region corresponded to two blocks in the intact locus which are joined together in the duplication copies, perhaps reflecting the original organization of the intact ancestral locus. Within the 9.7 kb gap between homology blocks, another low-copy repeat se- Forward (F) and reverse (R) oligonucleotide sequences are shown along with annealing temperatures (Ta) and PCR product size quence of ~1.3 kb was identified. The pattern of partial duplication seen with APBA2 is similar to that of CHRNA7. The repetitive nature of this region has presented significant challenges to genomic sequence assembly. It is likely that repeated elements cause some segments to be poorly represented in E. coli-based libraries. Our interest in this gene was driven by its plausibility as a candidate for autism, based upon function. The location of APBA2 outside the narrow deletion/duplication interval makes this gene less attractive as a candidate for inherited autism. However, overexpression of this gene in idic(15) cases could be responsible in part for the increased phenotypic severity in those cases compared to interstitial duplication.
We characterized gene structure for the human gene and expression for both human and mouse genes. There are 14 known exons, 13 of which are shown in the current public human genomic sequence assemblies. A putative first exon, corresponding to the 5' end of the Celera transcript, maps at least 74-kb upstream of exon 2. Based on a combination of the public and Celera assemblies, the APBA2 transcriptional unit spans >197 kb. We provided primer sequences and amplification conditions to facilitate SNP and disease variant discovery at this locus.

Figure 5
Comparative sequence analysis of human and mouse APBA2 using VISTA. Output from VISTA analysis of the ATP10C transcriptional unit is shown, with regions of non-coding sequence conservation (>75% identity) indicated by pink shading and coding homology by blue shading. Coding sequence (CDS) is indicated over the transcriptional unit with arrows indicating transcriptional orientation. Various repetitive sequences are indicated by shaded boxes over the homology plots. Genomic sequence corresponding to the mouse ortholog is present in the public assembly, although sequence upstream of exon 3 is not available. VISTA comparison of human and mouse sequences reveals several sites of conserved non-coding sequences; these may contain functional elements necessary for proper expression.

Conclusions
The APBA2 gene maps to 15q13.1 telomeric to HERC2 and BP3. Partial duplications of a 6-kb segment containing the first coding exon are present as two copies at ~5-Mb telomeric in 15q13.3 and are not transcribed. The pattern of duplication for APBA2 adds to a complex mosaic including many duplicon-like structures in 15q13. The tendency of this region to undergo rearrangement may bear relation to the evolution of this gene family.

Southern and Northern Blot Analyses
Non-conserved probes corresponding to exon 3 were hybridized to commercially-available Multiple Tissue Northern blots containing adult human brain polyA+ RNA (Invitrogen Life Technologies, Carlsbad, CA). Blots were hybridized overnight at 65°C according to manufacturer's recommendations and then washed for 2-3 hours in 2X SSC/0.1% SDS at 65°C. Filters were exposed to Kodak XAR x-ray film (Eastman Kodak; New Haven, CT).

In Situ Hybridization Analysis
Following perfusion, whole brains were obtained from C57BL/6 mice. Using a Leica CM300 cryostat (Leica, Wetzler, Germany), frozen brains were sectioned into five series of 20 mm slices, with individual sections within a series separated by 100 mm. Wax-embedded embryonic sections were purchased from Novagen (Madison, Wl). A probe-containing plasmid was developed for Apba2. Antisense and sense control cRNA probes were synthesized with the MAXIscript kit using T3, T7, or SP6 RNA polymerase (Ambion, Austin, TX) in reactions incorporating α-[ 33 P]-UTP (New England Nuclear Life Science Products, Boston, MA). Purified probes were hybridized to slides for 20 hours at 58°C and washed in a series of solutions to a final stringency of 0.1 X SSC at 65°C. Hybridized and washed slides were exposed to BioMax MR film (Eastman Kodak, New Haven, CT) for varying times, depending on message abundance. X-ray images were prepared by scanning film at high resolution.

Sequence Analysis
Direct sequencing from BAC clone template was performed using ABI dye terminator chemistry according to manufacturer's recommendations (Applied Biosystems, Foster City, CA). Sequences were purified using spin columns and submitted to the Vanderbilt University Shared Sequencing Resource. Oligonucleotide primers were designed using Oligo v6.7 (Molecular Biology Insights, Inc., Cascade, CO) and MacVector v7.1 (Accelrys, Inc., Burlinton, MA). BLAST comparisons were performed from the NCBI http://www.ncbi.nlm.nih.gov or EnsembI http:// www.ensembl.org/ websites. Comparative analysis of human and mouse genomic sequences was performed through the VISTA web site http://www-gsd.lbl.gov/vista/.