A novel custom high density-comparative genomic hybridization array detects common rearrangements as well as deep intronic mutations in dystrophinopathies

Background The commonest pathogenic DMD changes are intragenic deletions/duplications which make up to 78% of all cases and point mutations (roughly 20%) detectable through direct sequencing. The remaining mutations (about 2%) are thought to be pure intronic rearrangements/mutations or 5'-3' UTR changes. In order to screen the huge DMD gene for all types of copy number variation mutations we designed a novel custom high density comparative genomic hybridisation array which contains the full genomic region of the DMD gene and spans from 100 kb upstream to 100 kb downstream of the 2.2 Mb DMD gene. Results We studied 12 DMD/BMD patients who either had no detectable mutations or carried previously identified quantitative pathogenic changes in the DMD gene. We validated the array on patients with previously known mutations as well as unaffected controls, we identified three novel pure intronic rearrangements and we defined all the mutation breakpoints both in the introns and in the 3' UTR region. We also detected a novel polymorphic intron 2 deletion/duplication variation. Despite the high resolution of this approach, RNA studies were required to confirm the functional significance of the intronic mutations identified by CGH. In addition, RNA analysis identified three intronic pathogenic variations affecting splicing which had not been detected by the CGH analysis. Conclusion This novel technology represents an effective high throughput tool to identify both common and rarer DMD rearrangements. RNA studies are required in order to validate the significance of the CGH array findings. The combination of these tools will fully cover the identification of causative DMD rearrangements in both coding and non-coding regions, particularly in patients in whom standard although extensive techniques are unable to detect a mutation.


Conclusion:
This novel technology represents an effective high throughput tool to identify both common and rarer DMD rearrangements. RNA studies are required in order to validate the significance of the CGH array findings. The combination of these tools will fully cover the identification of causative DMD rearrangements in both coding and non-coding regions, particularly in patients in whom standard although extensive techniques are unable to detect a mutation.

Background
The DMD gene was the first gene identified by reverse genetics. Mutations in the gene cause Duchenne (DMD) and Becker (BMD) muscular dystrophies. Both the frequency and devastating nature of these conditions make DMD one of the most extensively studied genes among the rare genetic disorders [1][2][3].
This intense research has provided molecular tools for the identification of the causative mutation in about 98% of patients, combining MLPA to detect exonic deletions/ duplications (75-80% of mutations) and direct sequencing to identify small mutations (up to 20% of mutations). Nevertheless, some mutations remain unidentified. Furthermore it is well known that the large size (2.2 Mb) of the gene makes it prone to complex rearrangements which are impossible to define precisely using routine molecular diagnostic techniques.
As a consequence, there are a considerable number of DMD/BMD patients in whom no causative mutation has been identified. This impacts on genetic diagnosis, genetic prognosis, clinical confirmation, carrier detection, prenatal diagnosis and genetic counselling for the families involved.
Furthermore, the recent opportunities in terms of innovative therapeutic approaches [4,5] highlight the relevance for patients and families of obtaining a correct molecular diagnosis, which is required in order to be included in innovative trials. Indeed the increased availability of experimental but highly mutation specific therapies, summarised in the concept of "personalised medicine" [6,7], makes the identification of private mutations in the DMD gene necessary to be eligible for these trials.
In the last few years genome scanning technologies have enabled the detection of previously unrecognised large (>1 kb) copy-number variations (CNVs) in human DNA. While many of these variants do exist as polymorphisms, some of them can change the copy number of critical genes or genomic regions, or alter gene regulation and underlie monogenic disorders, developmental abnormalities and a variety of complex genetic disorders [8][9][10][11].
Therefore there is a wide consensus on the potential of array-CGH to determine CNVs for research and clinical purposes, in terms of providing robust and precise measurement of CNVs, scalability and very high resolution [12].
Although CGH was initially considered as a strategy for improving cytogenetic resolution by detecting fine chromosome imbalances [13,14], recently other applications have been envisaged such as cancer studies [15], complex syndromes, mental retardation, Mendelian disorders and polygenic traits [16].
The flexibility of CGH arrays is also due to the availability of both commercial and custom arrays, which are designed on demand, therefore it is possible to investigate any region of interest with the appropriate resolution.
Dhami et al. [2] designed a single strand PCR-based CGH array in order to detect exon deletions/duplications in a few genes, including DMD.
This strategy demonstrated the ability to identify CNVs, however, in the same way as MLPA and other techniques, it only investigated coding regions.
We have applied the CGH technique in a novel full-gene approach which investigates the presence of CNVs in the entire genomic region of the DMD gene. Our custom designed high density-comparative genomic hybridisation array (DMD-CGH) based on in situ synthesis of 60 mer probes with intervals of 260 bp, allowed us to obtain a full map of CNVs in the gene, including the non coding regions which have not been investigated previously.
Our studies allowed us to validate our array for accurately detecting previously identified rearrangements, to define intronic breakpoints precisely and to identify three pathogenic purely intronic CNVs. We corroborated the CGH studies by RNA analysis, therefore validating the significance of the gene imbalances identified. Transcription analysis of the full DMD transcript furthermore disclosed three rare splicing mutations due to small intronic changes, missed by the CGH analysis.

DMD-CGH array analysis
We firstly validated the DMD-CGH array both on ten normal control males and on four patients (1, 2, 3 and 4) with mutations previously characterised by MLPA ( Figure  1a-d). In patient 1, we detected two non-contiguous duplications, one of 116 kb from intron 1P to intron 2 and including exon 2 and the other of 37 kb in intron 2 ( Figure 1a). Patient 2 showed a deletion of 3569 bp, from intron 13 to exon 14 ( Figure 1b).
We also precisely defined the breakpoint in BMD patient number 3, carrying an out of frame exon 3-6 duplication and representing an exception to the rule [17]. The DMD CGH array identified a duplication of 111 kb from intron 2 to nt 45 of exon 6, removing its 5'donor splice site consensus sequences (Figure 1c).
In patient 4, with a duplication of exons 65-79 and a DMD phenotype with associated severe mental retardation, the array allowed us to define the 3' breakpoint within the 3' UTR. The mutation consists of a duplication of 89 kb ranging from intron 64 to 241 bp downstream of the DMD stop codon within the 3' UTR ( Figure 1d).
The DMD-CGH allowed us to identify the causative rearrangements in three out of eight DMD patients previously negative for DMD mutations.
Interestingly in all 12 patients the DMD-CGH array identified a CNV of 1.4 kb in intron 2 which was deleted in patients 10 and 11 and duplicated in all the others ( Figure  1a and 1c). Examples of deleted and duplicated alleles of the intron 2 CNV are reported in Figure 3.
CGH analysis of ten normal control males revealed the presence of both deleted and duplicated regions, therefore suggesting this to be a polymorphic CNV (data not shown).
Real Time PCR was performed in patients 5 and 7 while PCR and sequencing were performed in patients 6, 10 and 11, validating the duplications and deletions identified with the array (data not shown).

RNA analysis and sequence analysis
Patients with pathogenic CNVs identified at the DMD-CGH array analysis Patient 5 showed a 4 Kb duplication in intron 55 confirmed by RealTime PCR, but no RNA was available for analysis to determine whether this variant was pathogenic or not. DMD-CGH analysis of patient 6 revealed two non-contiguous deletions located in intron 44 and intron 45. RNA analysis showed skipping of exon 45 (data not shown). Since no splicing defect was detected on genomic DNA using FM-CSCE analysis this suggested that an inversion could be responsible for the phenotype. Based on DMD-CGH results PCR analysis was performed confirming the occurrence of an inversion of the entire region with a deletion of 98 kb (intron 44) and 4 kb (intron 45) at the respective inversion breakpoints (Figure 4a and 4b).

DMD-CGH array profiles of deletions and duplications in patients with known mutations, identified by MLPA
Patient 7 showed a 1.3 kb duplication within intron 4. The duplication was confirmed by RealTime PCR. RNA analysis showed failure to amplify the exon 4-5 junction, whereas exons 1 to 4 and exons 5 to 8 were correctly spliced. PCR analysis with primers located within the duplicated region coupled with primers in exons 4 and 5, failed to detect any product. This behaviour suggests the insertion of a very large intronic region into the transcript between exons 4 and 5.
Patient 3 with BMD and the out-of-frame MLPA exons 3-6 genomic duplication showed an in-frame exons 3-5 transcript, as expected considering the CGH results [17].  abnormality as expected since the same CNV was also found in the 10 unaffected males (data not shown).

Patients negative for pathogenic CNVs at the DMD-CGH Array analysis
All the results of the CGH study are reported in Table 1.

Discussion
The array CGH technique represents an extremely effective tool for the identification of CNVs in the genome with important technical advantages (especially the large scale/ high resolution capacity) and relevant diagnostic implications. CGH array has been well validated in a variety of approaches for defining both cytogenetic abnormalities and some Mendelian disorders (CGH exon arrays). How-ever, in the latter, non coding regions were never explored [2].
Here we describe the results obtained using a novel custom DMD-CGH array covering the full genomic region of the DMD gene, including 100 kb upstream and downstream of the 5' and 3' UTRs. We made this novel microarray in order to identify all possible quantitative pathogenic changes in the DMD gene as well as elusive deep intronic pathogenic CNVs. The DMD-array was able to accurately identify and refine already known deletions/ duplications in the gene. This suggests that the array could be used as a high throughput technique for high scale DMD-CGH array profiles of intron 2 CNV DMD molecular diagnosis. Remarkably it allowed us to define both intronic and untranslated region (UTR) breakpoints in all patients studied. It also revealed rare pure intronic mutations which were not detected by routine genomic analysis. Notably, we describe a rare DMD gene inversion affecting exon 45 [18], the first to be identified by CGH.
As expected, the array failed to identify very small intronic mutations affecting splicing, for which RNA profiling was necessary.
Among the patients studied, we identified three novel pathogenic CNVs, which are purely intronic. RNA analysis allowed us to demonstrate, at least for two of them, that they affect the correct splicing of the DMD gene. The CGH-mediated identification of these rearrangements avoided an extensive RNA analysis, often impaired by low/poor quality of the RNA obtainable from patients' tissues, in particular when only MyoD transformed myogenic cells are available. By our DMD-CGH we also identified a non pathogenic CNV within intron 2, not reported in the CNVs database [19]. This was confirmed as a normal variant by transcription analysis and by analysing normal controls.
Furthermore three complex rearrangements have been defined in term of both orientation and breakpoint definition, again improving the molecular diagnosis. This for example allowed better understanding of the genotypephenotype correlation in a BMD patient carrying an exons 3-6 out-of-frame duplication. The DMD-CGH array showed that the 3' breakpoint falls within exon 6, providing a genomic basis for the observed splicing behaviour.
The DMD-CGH array may also help to investigate DMD/ BMD cases with additional features such as severe mental retardation. Cognitive impairment in some DMD patients has been associated with mutations affecting the distal Dp140 and Dp71 dystrophin isoforms [20][21][22]. In our patient with the rare duplication of exons 65-79 and mental retardation, we confirmed the role of this distal region in impairing dystrophin-related brain function [20,23].
Furthermore the DMD-CGH array allowed us to reveal that the breakpoint of the large duplication within the 3' UTR involves a region containing seven AUF1 and two Hu protein binding motifs. These proteins are well known to be involved in mRNA stability [24]. It is conceivable that large genomic changes within the 3'UTR of the DMD gene may influence the resulting phenotype suggesting that the 3'UTR should be routinely investigated to possibly unravel still unknown DMD regulatory mechanisms [21,25].
Considering these results, our DMD array promises to be a useful tool both for DMD pathogenic CNV identification and for refining the genomic configuration not only in patients with unusual mutations but indeed in all patients. In fact, while routine mutation analysis clearly identifies apparently identical deletions in different patients, in reality the intronic breakpoints will almost invariably differ. This might involve motifs that affect gene splicing in different ways [26].
Although the advantages of using the DMD-CGH array to identify mutations are clear, RNA studies provided additional important information in these patients. In particular RNA studies allowed us to determine the significance of the CNVs identified and also to see the effects on splicing of mutations identified by the array. In addition the RNA analysis allowed to identify small mutations affecting splicing which had not been detected by the array.
Among these, we found three very unusual small deep intronic mutations which would have required extensive intronic sequencing to locate using standard methods. All three were shown to alter the DMD splicing profile. In particular, while the two point mutations creating a novel cryptic splice site may be considered to be easily interpreted in terms of their effect on splicing, the small 18 bp deletion within intron 37 is quite peculiar. In fact, although the effect of this deletion on the transcript is evident, it is unclear how this novel genomic configuration modifies the splicing machinery.
Genomic configuration in patient 6 Figure 5 Genomic configuration in patient 6. a) Schemes of the inverted genomic region including exon 45 in patient 6 and primers position. PCR amplification for the detection of inversion breakpoints was carried out using two forward primers (black arrows) and two reverse primers (grey arrows) surrounding the breakpoint regions; b) PCR results in patient 6: Lane 1 molecular weight marker VI, Lane 2: proximal breakpoint (int44/int45(inv), Lane 3: distal breakpoint (int44(inv)/int45).

Conclusion
Our results suggest that this DMD-CGH array is a valuable, cost-effective tool for high throughput DMD molecular diagnosis as well as for definition of elusive DMD gene mutations. We suggest that the CGH genomic analysis should precede RNA analysis in order to firstly define the genomic profile.
In addition to the diagnostic implications, the investigation of non-coding regions as possibly implicated in the etiopathogenesis of mutations in DMD but also in other genetic disorders, may disclose findings of interest for basic as well as applied research. Finally, the breakpoint definition in large rearrangements, which has always represented an extremely complex task, will considerably improve our understanding of the correlations between genotype and phenotype.

Patients
We have studied 12 patients with DMD/BMD by the DMD-CGH microarray, after obtaining informed consent (Table 1). Four patients were already known to have rearrangements in the DMD gene identified by MLPA. One DMD patient presented with mental retardation associated with a large duplication of exons 65-79. One BMD patient had an out-of-frame duplication of exons 3-6 as an exception to the reading frame rule, one DMD patient showed an isolated duplication of exon 2, and one BMD patient had a deletion of exon 14.
Eight DMD patients, fully analysed by MLPA and either sequencing or FM-CSCE [27], had tested negative for DMD mutations, despite protein studies using immunohistochemical and/or Western blot analysis indicating a dystrophinopathy.
Ten normal control males were also tested on the DMD-CGH array.
The study was approved by the local ethics committee (approval number 9/2005).

RNA analysis
Transcription analysis of the DMD messenger RNA was performed in six DMD and one BMD patients.
Total RNA was isolated from muscle biopsies using the RNeasy Kit (Qiagen) following the manufacturer's instructions. In patient 9 total RNA was isolated from MyoD transformed fibroblasts as previously described [30]. Before cDNA synthesis, RNA was treated with DNAse I (Roche) and checked for residual DNA contamination with a 55 cycle PCR.
Reverse transcription (RT) and PCR amplification were performed using random hexanucleotide primers and Superscript III enzyme (Invitrogen) according to the protocol supplied. All the PCR fragments were purified using the QIAquick purification kit (QIAGEN) and sequenced on an ABI Prism 3130.
In patients 8, 9 and 10 the entire DMD transcript was amplified in overlapping fragments of 750-800 bp (primers sequences are available upon request). For patients 5, 11 and 12 no muscle biopsies were available.

PCR genomic analysis
In patients 8, 9 and 10 we amplified the intronic regions surrounding the sequences included in the mature transcript (introns 55, 65 and 37). DNA was extracted from leukocytes by the Qiagen Biorobot.
In Patient 6 two forward primers were coupled in order to amplify the centromeric inversion breakpoint and two reverse primers were coupled in order to amplify the telomeric inversion breakpoint.
PCR assays were performed with Ex Taq polymerase (Takara), according to standard procedures. All the PCR fragments were purified using the QIAquick purification kit (QIAGEN) and sequenced on an ABI Prism 3130. All the oligonucleotide sequences are available upon request.