GAP-Seq: a method for identification of DNA palindromes
© Yang et al.; licensee BioMed Central Ltd. 2014
Received: 28 August 2013
Accepted: 26 April 2014
Published: 22 May 2014
Closely spaced long inverted repeats, also known as DNA palindromes, can undergo intrastrand annealing to form DNA hairpins. The ability to form these hairpins results in genome instability, difficulties in maintaining clones in Escherichia coli and major problems for most DNA sequencing approaches. Because of their role in genomic instability and gene amplification in some human cancers, it is important to develop systematic approaches to detect and characterize DNA palindromes.
We developed a new protocol to identify palindromes that couples the S1 nuclease treated Cot0 DNA (GAPF) with high-throughput sequencing (GAP-Seq). Unlike earlier protocols, it does not involve restriction enzymatic digestion prior to DNA snap-back thereby preserving longer DNA sequences. It also indicates the location of the novel junction, which can then be recovered. Using MCF-7 breast cancer cell line as the proof-of-principle analysis, we have identified 35 palindrome candidates and physically characterized the top 5 candidates and their junctions. Because this protocol eliminates many of the false positives that plague earlier techniques, we have improved palindrome identification.
The GAP-Seq approach underscores the importance of developing new tools for identifying and characterizing palindromes, and provides a new strategy to systematically assess palindromes in genomes. It will be useful for studying human cancers and other diseases associated with palindromes.
KeywordsPalindrome Gene amplification Inversion-PCR GAP-Seq GAPF Breakpoint MCF7 Genome instability Cancer Human diseases
Long DNA palindromes are difficult to directly analyze using standard molecular genetics methods. This is because perfect and near perfect palindromes, where a sequence is immediately followed by its exact inverse complement with very little or no spacer, are able to intrastrand anneal to form hairpin structures. Palindromes longer than 200 bp cannot be amplified by traditional PCR using DNA polymerases with low strand displacement activity, nor can they be stably maintained in Escherichia coli. Palindromes are also underrepresented in high-throughput sequencing results generated from libraries constructed by PCR amplification or sequencing steps that involve emulsion PCR amplification (Yang H. et al., unpublished observations).
The propensity of palindromes to adopt secondary structure interferes with DNA replication, transcription and repair, and leads to genome instability [1–5]. Natural AT-rich palindromes (PATRRs) exist at sites of some recurrent chromosomal rearrangements in humans and cause genetic disorders [6–8]. Long inverted repeats that may reflect de novo palindromes have been found in tumor cells and cancer cell lines, and are likely drivers of gene amplification [5, 9–12]. Previous studies demonstrated that the novel junctions of palindromes contained sequences important for understanding the mechanisms that can lead to de novo palindrome formation [13, 14]. Due to a lack of systematic approaches to identify and characterize palindromes from genomes, little is known about the distribution of DNA palindromes nor their association with human diseases.
Genome-wide Analysis of Palindrome Formation (GAPF) is a microarray-based technique that has been used for detection of palindromic genome rearrangements in human cancers [9, 12]. It has limitations to eliminate false positive signals and it cannot predict the orientation of palindromes, making the novel junctions difficult to find. We have explored alternative methods for systematically analyzing palindromes in the genome and here we report our analysis of de novo DNA palindromes from the MCF-7 breast cancer cell line .
We used high molecular weight genomic DNA rather than enzyme digested DNA prior to DNA snap-back, because the enzyme digestion can eliminate palindromes containing the restriction site in the spacer or close to the center and can also limit the length of the signal recovered. In the analysis of our GAP-Seq data, we were able to identify true palindrome candidates by a signature pattern of read density distribution. This signature also predicted the location of the novel palindrome junction allowing junction recovery. In this study we identified 35 palindrome candidates from MCF-7 and selected the top 5 candidates for further mapping. Using inversion-PCR, we recovered 7 novel junctions that had not been identified in any previous studies despite extensive analysis. The combination of novel GAP-Seq, bioinformatics analysis and inversion PCR strategies provide a systematic approach for palindrome detection and novel junction recovery, allowing a more accurate assessment of the palindrome content in the genome.
Bioinformatics for identification of DNA palindrome candidates
Summary of Roche 454 data
aReference genome (hg18) (%)
Mapped to unique locus
Low copy repeats
Human but not mapped
Total # reads
The location of palindromes in the unique portion of the genome can be observed as regions with a higher than expected number of sequence reads. Our estimated coverage of the non-repetitive sequences (~8 x 107 bp) mapped to total unique genome sequences (~1.26 x 109 bp) is ~6%. To determine palindrome locations, we looked for unique sequence regions that were over-represented as determined by the base read ratio “B”. For a single read mapped to the unique region of the genome, B = 1 (Figure 1B-1&2). For overlapping reads forming a contiguous genomic region (contig) “C”, the base read ratio is the sum of the read lengths divided by the length of the contig. Thus for contig “c” based on the total length of uniquely mapped reads where n(c) is the number of reads in contig c, ReadLength(r,c) is the read length of read r in contig c, and ContigLength(c) is the length of contig c. Contigs are limited to the mapped unique sequences and exclude repetitive sequences masked by Repeat Masker. To combine adjacent contigs that are likely to represent a single locus, we joined contigs where B ≥ 1.5. For pragmatic reasons we focused on enriched unique sequence intervals that were within 7.5 kb from each other (Figure 1B-3). Enrichment of joined contigs was compared by using a Rank Score “R” calculated as the sum of the read lengths assigned to each joined contig divided by the length of the joined contig minus the length of the masked regions (M), thus for JoinedContig “a” (Figure 1B-4).
Summary of palindrome candidate data
We also sequenced GAPF prepared DNA from MCF-7 and IMR90 by Illumina sequencing. In MCF-7, 25 million reads were generated with average length 36 bps, and 94% of the reads were mapped. The mapped bp was equivalent to ~28% of the total human genome (hg18). The Illumina sequencing data yielded a higher coverage of the genomic DNA and was used for evaluating Roche 454 identified palindrome candidates.
Sorting of the palindrome candidates for physical analysis
For the 35 palindrome candidates obtained by the bioinformatic and statistical analyses, we further analyzed their read density by plotting the number of sequence reads in 1 kb bins extending over the enriched areas including 10 kb upstream and downstream (Additional file 3: Figure S1). Although the size of palindromes could be several Mbp in the genome, the genomic DNA isolation step shears the DNA into smaller fragments generally less than about 50 kb. In addition, denatured palindromes reanneal more efficiently in regions closer to the palindrome center. Therefore, the palindromic DNA closest to the center is more highly enriched than sequences further away. The result is a signature pattern represented by a higher read density toward the palindrome center. However, this pattern is obfuscated by problems associated with mapping repeated sequences. For example, a 1 kb bin that corresponds to a repeated sequence could be over represented because of the faster renaturation kinetics of repeated sequences, or it could be underrepresented if reads from repeated elements were removed by the algorithm used to map the reads (in our case Repeat Masker). Using the read density information, we examined our 35 MCF-7 palindrome candidates (Additional file 3: Figure S1) for the signature pattern. One candidate that exhibited this pattern in both the 454 and Illumina data corresponded to a palindrome previously (Chr8:128,202,704-128,210,979) identified . We chose five additional candidates with this pattern and further characterized them by identifying the novel junctions associated with their formation and determining the spacer between the inverted repeats. The methods used for this analysis are illustrated for one of the candidates below.
Mapping of the palindrome spacer and novel junction
The novel DNA junction created by palindrome formation may provide clues to the mechanism(s) by which they were formed. Since the S1 nuclease treatment in our protocol removes the hairpins and/or spacers of the palindromes, we have established new approaches to isolate the novel palindrome junctions.
Predicting the location of palindrome center
The enrichment of the central region of the palindrome can also be detected by quantitative PCR. We used Taqman qPCR to compare the MCF-7 DNA palindrome candidates as compared to the same sequences from IMR90 DNA, or non-enriched unique sequences from MCF-7 DNA. Comparison of the Ct values (threshold cycles, the number of cycles at which the fluorescence detected exceeds the threshold, a relative measure of the amount of target DNA) before and after snap-back plus S1 digestion were used to calculate the amount of DNA protected in each sample. For example, we used a non-palindromic single copy gene, RAD52, as a control and found that the Ct value increased ~ 10–12 cycles for the primer set for RAD52 in all DNA samples tested (Figure 2B and C). This corresponds to more than 1000-fold depletion of the DNA. In contrast, the primer pair P2 from the chromosome 15q21.1 palindrome candidate only had a 3 Ct cycle increase in MCF-7. This 7-cycle difference suggests a relative enrichment of P2 to RAD52 of over 100-fold for this region in MCF-7. This enrichment for the P2 target was not seen in the control cell line IMR-90, suggesting a de novo palindrome arose in the MCF-7. The P1 target, only ~1 kb centromere-proximal to P2, was not enriched indicating it was located outside of the palindrome region. We have also done qPCR using primer pairs P3 and P4, which are located on the palindrome but further away from the center. Those two primer pairs showed a similar, but somewhat lesser, relative enrichment as the P2 target, indicating that our Taqman qPCR approach can detect enrichment as distant as 10 kb from the palindrome center.
Analysis of palindrome structure
To further analyze the 15q21.1 palindrome structure in the genome, we used Southern blot analyses to monitor rearrangement associated with the palindrome. We chose restriction enzymes BamHI, BglII and NcoI to digest the genomic DNA because we expected to see novel bands with these enzymes in MCF-7 DNA (sites noted on Figure 2C). As predicted from the map in Figure 2C, we found rearranged bands (*) corresponding to the palindrome in MCF-7 that were not found in IMR-90 with the increasing size as expected (Figure 2D). Next we further analyzed BamHI-digested DNA by comparing untreated genomic DNA to melted and self-annealed (snap-back) treated DNA in MCF-7 and IMR-90. We found a half-sized fragment after snap-back treatment of MCF-7 DNA (Figure 2E, arrowhead, SB) indicating intra-strand reannealing, thus confirming the palindromic structure of the Chr15q21.1 candidate.
Inversion PCR to recover palindrome junctions
Summary of characterized MCF7 palindrome spacers
Assembled 454 palindrome contigs
454 palindrome rank score
Spacer size (bp)
Insertion or deletion in Spacer
Insertion: Chr16(-):185 bp
Deletion: Chr8(+): 58,813 bp
Comparison between GAP-Seq and microarray based GAPF method
The microarray-based GAPF approach has been used for detecting palindromes in cancer cells, and >80 GAPF positive cytogenetic bands were identified in MCF-7 [9, 28]. Subsequently, Diede et al. modified the GAPF approach by introducing 50% formamide in DNA denaturation step to remove false positive signals from non-palindromic regions that were found to correlate with regions of high DNA methylation . Guenthoer et al. next re-examined GAPF profiles in MCF-7 breast cancer cell line as well as the control cell line IMR90. They found total 52 GAPF positive regions in MCF-7 and physically mapped one region on Chr8 (128,201,619-128,208,246) . 39 of their GAPF positive regions were less than 1 kb and 7 were less than 100 bp. The authors recognized that identifying true palindromes remains elusive and pointed out two possibilities for false GAPF positives: 1) Repeat sequences in the genome, such as Alus, LINEs, or short tandem repeats, can obfuscate the identification of palindromes; 2) The limitations in the sensitivity of their approach cannot detect palindromes in a subpopulation of cells in heterogeneous tumor samples . Their use of restriction enzyme digested DNA might limit the ability of palindrome detection.
The GAP-Seq approach significantly improves on the detection of true palindromes in several aspects: 1) The use of high-molecular weight DNA rather than enzyme digested DNA results in the recovery of longer sequences making identification more likely; 2) Read density distribution adds another feature characteristic of palindrome candidates (Additional file 3: Figure S1); 3) The read density distribution also provided us with important information about the orientation of the center of the palindrome, which was important in the isolation and sequencing of the palindrome junction and spacer. Using GAP-Seq we were able to identify and verify novel junctions that have never been reported in the plethora of previous studies of MCF-7 and provide an important extension to previous attempts to characterize this cell line.
Examination of biological consequences associated with identified palindromes
Some of the identified palindrome candidates were associated with amplified genomic regions that contain cancer genes. Cancer genes are defined as genes that when mutated are causally implicated in oncogenesis . The confirmed palindrome in 8q24.21 (Chr8:128,202,704-128,210,979) was co-amplified with the MYC oncogene. Two palindrome candidates, 17q23.2 (Chr17: 56,691,822-56,700,625) and 20q13.2 (Chr20: 52,771,235-52,783,881), contained the BCAS3 and ZNF217 genes that was amplified and overexpressed in breast cancers and was often associated with chromosomal alterations affecting the locus [35–37].
In this study, we developed a new strategy to detect DNA palindromes by coupling fast annealing genomic DNA treated by S1 nuclease (GAPF) with high-throughput sequencing (GAP-Seq) and recovery of novel palindrome junctions. We chose to use the MCF-7 breast cancer cell line for this initial proof-of-principle study because it has been extensively analyzed at the genomic level, allowing us to determine if our approach could generate novel data. In fact, none of our palindrome junctions had been identified by either sequence analysis or novel breakpoint analyses of MCF-7 [22, 25, 26]. This difference may be a result of either or both of two constraints presented by the characteristics of palindromes: 1) the breakpoint analysis was done from BAC clones, where palindromes are not stable during E.coli propagation, and 2) most of novel breakpoints identified here are located in or near to repeat-masked regions and would not be recovered by mapping of high-throughput sequencing data without knowing more about the sequences surrounding palindrome center. Therefore, palindromes are likely an underestimated structure of somatic rearrangements in cancer and other associated human diseases.
Although the palindrome junctions sequenced here have fairly large spacers and should theoretically be stable in BAC clones, it is not clear whether very long inverted repeats are well tolerated in E. coli. Some of the inverted repeats associated with the palindromes are likely to be megabases long, possible reflections of chromosomal breakage-fusion-bridge (BFB) cycles . Although the palindrome would include the entire length of the CNV fragment, only the central ~20-40 kb could be recovered in our study due to DNA shearing. Furthermore, a study of complex genomic rearrangements consisting of intermixed duplications and triplications of genomic segments at both the MECP2 and PLP1 loci demonstrates that long inverted repeats with larger spacers can lead to genome rearrangements and contribute to local instability in the human genome .
We mapped 7 novel breakpoints that have a 0–7 bp microhomology at the junctions (Figure 3), suggesting that they were made by NHEJ. However, it is possible that the junctions were made independent of NHEJ. Palindromes can be created by template switching of replication forks through microhomology (FoSTeS- Fork-Stalling and Template Switching) by Microhomology Mediated Break Induced Replication (MMBIR) [40–43], or by foldback replication [13, 44]. The microhomologies identified at three of the sequenced junctions (Chr15q21.3 breakpoint 3, Chr8q21.2 breakpoint 4, and Chr1q31.1 breakpoint 6) could reflect a foldback priming mechanisms as seen in our previous analysis of palindromes from yeast . However, it is difficult to determine if a similar mechanism is functioning with such a small data set. Two of the palindromes contained complex junctions including an insertion from another chromosome (Chr15q21.1) and the deletion of local sequences (Chr8q21.2). Such events could reflect more complicated pathways for their initial formation or secondary events indicative of the instability of the initial palindrome structure. The presence of two contiguous palindrome breakpoints on Chr15 leads us to speculate that there was an initial double strand break at the more telomere proximal site. This could have led to subsequent BFB cycles that may have generated further amplification and the second more centromere-proximal palindrome.
The palindrome candidates located in clusters on Chr17 and Chr20 all were contained within a large highly amplified region, indicating that secondary events might have resulted in more complicated genomic rearrangements at these loci. The complex genome amplification patterns seen in some breast cancers are characterized by multiple closely spaced amplicons, frequent high-level amplifications, and are highly correlated with aggressive disease . These patterns are suggestive of a palindromic structure and associated genomic instability although this correlation has not been examined. Based on our observations, we hypothesize that an initial event might be the formation of a palindrome, which can then lead to genome instability and further amplification. This could provide a mechanism for amplifying cancer-associated genes, which are then selected for during cancer development.
We have developed a new strategy to detect palindromes and recover their junctions in the genome. Our GAP-Seq approach improves upon previous microarray-based GAPF technique by combining GAPF with high-throughput sequencing. Our bioinformatics analysis also provides us with palindrome orientation information that is critical for junction recovery. Taken together, we show here that we can overcome the previous barriers due to the large number of false positives that obfuscate analysis of true palindromes. Using MCF-7 breast cancer cell line as the proof-of-principle analysis, we have identified 35 palindrome candidates and physically characterized the top 5 candidates and their junctions, proving that our strategy can correctly predict palindrome orientation and recovery of the novel DNA junctions associated with palindromes. Despite extensive analysis of MCF-7 at the molecular level, these data are novel and are missing from previous analyses of this cell line. Our approach underscores the importance of developing new tools for identifying and characterizing palindromes, and provides a new strategy to systematically identify palindromes in genomes.
The human breast cancer cell lines MCF-7 and IMR-90 (CCL-186) primary fibroblast were obtained from the American Type Culture Collection (ATCC). Cell lines were maintained under standard culture conditions (ATCC) and harvested at log phase.
Roche 454 sequencing
The genomic DNA from cells was extracted by the Blood & Cell Culture DNA Kits (Qiagen) according to the manufacturer’s instructions. To prepare for a 454 sequencing library, the genomic DNA was denatured in the presence of 50% formamide and reannealed briefly in 100 mM NaCl on ice, then subsequently treated with S1 nuclease as previously described  with the following modifications: We started with ~100 μg of genomic DNA. After snap-back and S1 nuclease treatment, we twice extracted the DNA with UltraPure Phenol:Chloroform:Isoamyl Alcohol (25:24:1, v/v) (Invitrogen). The DNA was then precipitated in 100% ethanol, washed with 70% ethanol and dissolved in 1xTE buffer. We prepared Roche 454 libraries sheared to approximately 500 bp fragments and sequenced with the Roche 454 GS FLX + system by the standard method.
Deep sequencing of the 36-mers was obtained using Illumina Genome Analyzer IIx at the Ohio State University James Cancer Hospital. High molecular weight genomic DNA was obtained from MCF-7 and IMR-90. Cells were harvested and were incubated in the lysis buffer (100 mM NaCl/10 mM Tris·HCl, pH 8.0/25 mM EDTA/0.5% SDS/Proteinase K) for 24 hours at 37°C, followed by phenol/chloroform extraction and ethanol precipitation as described previously . Briefly, one mg of genomic DNA was first digested with either restriction enzyme KpnI or SbfI. After heat-inactivation of restriction enzymes, both digests were pooled and denatured with formamide in boiling water for 7 minutes followed by quick renaturation on ice in 100 mM NaCl. Single-stranded DNA was digested by S1 nuclease at 37°C for 1 hour. Processed DNA samples were purified using the PCR-clean up kit (Promega). DNA was fragmented by sonication using a Covaris S2 and 200 bp DNA fragments were used for the construction of a sequencing library using the Illumina CHIP-SEQ kit.
Affymetrix SNP6 copy number analysis
Genome-wide copy number analyses for MCF-7 and IMR-90 were performed using SNP6.0 (Affymetrix) at the Case Comprehensive Cancer Center (P30 CA43703). Two mg of genomic DNA was processed for hybridization using the SNP6 core reagent kit. The data were analyzed using Partek Genomics Suite (Partek). Raw data were normalized using the Robust Multi-Array Average (RMA) method. RMA consists of three steps: a background adjustment, quantile normalization  and final summarization. Normalized data were used to calculate the relative copy number of MCF-7 to IMR-90.
Roche 454 data analysis
Mapping and content analysis
The 454 reads were masked with RepeatMasker . The remainder of the sequences were mapped with BLAT  to the Human genome sequence (Version hg18, repeats masked) with condition of at least 75% identity of at least 40 bp. Sequences that were not mapped to the genome with these conditions were subjected to the metagenomic analysis pipeline (Smythers and Volfovsky; unpublished observations). This analysis identified additional matches to human DNA from GenBank, missing in previous hg18 analyses.
Random palindrome simulation analysis
The locations of mapped reads from MCF-7 were randomly assigned in the genome based on the actual number of reads and projected read lengths observed. When a repeat-masked region was encountered during simulation, the procedure was repeated in a new random location. The resulting null distribution data were clustered using the same parameters as we used to identify palindromes to generate a null simulation palindrome data set.
Illumina sequencing data analysis
Illumina data were mapped with Bowtie to human reference genome (hg18) with the default module (-k 1). 35 GAP-Seq positive regions in MCF7 and 9 GAP-Seq positive regions in IMR90 from Roche 454 data analysis (R > 0.75) were binned as 1000 bp (Additional file 3: Figure S1 and Additional file 5: Figure S2).
Real-time qPCR analysis of palindromes
Real-time qPCR was used to assess the enrichment of a palindrome over a single copy non-palindrome region (RAD52). We used TaqMan probes (Applied Biosystems) to genomic regions within the 454 positive signal. Real-time qPCR reactions used the Master mix from Applied Biosystems and were run according to manufacture’s instructions on a Bio-Rad multicolor Real-Time qPCR detection machine (IcyclerIQ) and analyzed with Icycler3.1 IQ software. Primer sequences are listed in Additional file 6: Table S4.
PCR analysis of palindrome junctions
All PCR reactions were performed under standard conditions as recommended by the manufacturer (Clontech) using Titanium Taq polymerase. MCF-7 subline genomic DNAs (MCF-7-neo, MCF-7-BK, MCF-7-B, MCF-7-C) were obtained from Dr. Adrian Lee’s lab. To get the palindrome junctions, we designed PCR primers based on the unrearranged chromosome using Primer-Blast program on NCBI website . The PCR products were cloned using TOPO TA cloning kit (PCR 2.1-TOPO, Invitrogen) and the cloned products were isolated and sequenced using Sanger sequencing (LMT, SAIC Frederick). Primer sequences are listed in Additional file 6: Table S4.
Southern blotting and snap-back southerns were performed as previously described .
Availability of supporting data
The data sets supporting the results of this article are available in The Gene Expression Omnibus (GEO) with accession number GSE43679 and The NCBI Sequence Read Archive (SRA) with accession ID SRA064847 and SRA065361.
The data sets supporting the results of this article are included within the article and its additional files.
Bacterial artificial chromosome
Copy number variation breakpoints
Double strand DNA break
Genome-wide Analysis of Palindrome Formation
Non-homologous end joining
Palindromic AT-rich repeats
Real-Time quantitative PCR
GAPF technique with high-throughput sequencing.
The authors thank Dr. Adrian Lee for MCF-7 subline genomic DNA (MCF-7-neo, MCF-7-BK, MCF-7-B, MCF-7-C) and Dr. John Weinstein for MCF-7-NCI60 genomic DNA, Dr. Robert Stephens for helpful discussions, Dr. Duncan Donohue for providing the R-script for the prototype of the data visualization of Figure 4, Claudia Stewart for Roche 454 sequencing, Sandra Burkett for cell culture, Dr. Pieter Faber for Illumina sequencing, Dr. Thomas Schneider for discussion of mathematic equations and Dr. Sharon Moore for critique of the manuscript. This work was supported by the National Institutes of Health, National Cancer Institute, Center for Cancer Research, Frederick National Laboratory for Cancer Research (the Intramural Research Program, HHSN261200800001E to N.V. and X.C., R01CA149385 to H.T.).
- Lewis SM, Cote AG: Palindromes and genomic stress fractures: bracing and repairing the damage. DNA Repair (Amst). 2006, 5 (9–10): 1146-1160.View ArticleGoogle Scholar
- Lobachev KS, Rattray A, Narayanan V: Hairpin- and cruciform-mediated chromosome breakage: causes and consequences in eukaryotic cells. Front Biosci. 2007, 12: 4208-4220. 10.2741/2381.PubMedView ArticleGoogle Scholar
- Inagaki H, Ohye T, Kogo H, Kato T, Bolor H, Taniguchi M, Shaikh TH, Emanuel BS, Kurahashi H: Chromosomal instability mediated by non-B DNA: cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans. Genome Res. 2009, 19 (2): 191-198.PubMed CentralPubMedView ArticleGoogle Scholar
- Leach DR: Long DNA palindromes, cruciform structures, genetic instability and secondary structure repair. Bioessays. 1994, 16 (12): 893-900. 10.1002/bies.950161207.PubMedView ArticleGoogle Scholar
- Tanaka H, Yao MC: Palindromic gene amplification–an evolutionarily conserved role for DNA inverted repeats in the genome. Nat Rev Cancer. 2009, 9 (3): 216-224. 10.1038/nrc2591.PubMedView ArticleGoogle Scholar
- Kurahashi H, Inagaki H, Ohye T, Kogo H, Kato T, Emanuel BS: Palindrome-mediated chromosomal translocations in humans. DNA Repair (Amst). 2006, 5 (9–10): 1136-1145.View ArticleGoogle Scholar
- Inagaki H, Ohye T, Kogo H, Yamada K, Kowa H, Shaikh TH, Emanuel BS, Kurahashi H: Palindromic AT-rich repeat in the NF1 gene is hypervariable in humans and evolutionarily conserved in primates. Hum Mutat. 2005, 26 (4): 332-342. 10.1002/humu.20228.PubMed CentralPubMedView ArticleGoogle Scholar
- Lewis SM, Chen S, Strathern JN, Rattray AJ: New approaches to the analysis of palindromic sequences from the human genome: evolution and polymorphism of an intronic site at the NF1 locus. Nucleic Acids Res. 2005, 33 (22): e186-10.1093/nar/gni189.PubMed CentralPubMedView ArticleGoogle Scholar
- Tanaka H, Bergstrom DA, Yao MC, Tapscott SJ: Widespread and nonrandom distribution of DNA palindromes in cancer cells provides a structural platform for subsequent gene amplification. Nat Genet. 2005, 37 (3): 320-327. 10.1038/ng1515.PubMedView ArticleGoogle Scholar
- Zhao Y, Marotta M, Eichler EE, Eng C, Tanaka H: Linkage disequilibrium between two high-frequency deletion polymorphisms: implications for association studies involving the glutathione-S transferase (GST) genes. PLoS Genet. 2009, 5 (5): e1000472-10.1371/journal.pgen.1000472.PubMed CentralPubMedView ArticleGoogle Scholar
- Tanaka H, Bergstrom DA, Yao MC, Tapscott SJ: Large DNA palindromes as a common form of structural chromosome aberrations in human cancers. Hum Cell. 2006, 19 (1): 17-23. 10.1111/j.1749-0774.2005.00003.x.PubMedView ArticleGoogle Scholar
- Guenthoer J, Diede SJ, Tanaka H, Chai X, Hsu L, Tapscott SJ, Porter PL: Assessment of palindromes as platforms for DNA amplification in breast cancer. Genome Res. 2012, 22: 232-245. 10.1101/gr.117226.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Rattray AJ, Shafer BK, Neelam B, Strathern JN: A mechanism of palindromic gene amplification in Saccharomyces cerevisiae. Genes Dev. 2005, 19 (11): 1390-1399. 10.1101/gad.1315805.PubMed CentralPubMedView ArticleGoogle Scholar
- Tanaka H, Cao Y, Bergstrom DA, Kooperberg C, Tapscott SJ, Yao MC: Intrastrand annealing leads to the formation of a large DNA palindrome and determines the boundaries of genomic amplification in human cancer. Mol Cell Biol. 2007, 27 (6): 1993-2002. 10.1128/MCB.01313-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Soule HD, Vazguez J, Long A, Albert S, Brennan M: A human cell line from a pleural effusion derived from a breast carcinoma. J Natl Cancer Inst. 1973, 51 (5): 1409-1416.PubMedGoogle Scholar
- Kytola S, Rummukainen J, Nordgren A, Karhu R, Farnebo F, Isola J, Larsson C: Chromosomal alterations in 15 breast cancer cell lines by comparative genomic hybridization and spectral karyotyping. Genes Chromosomes Cancer. 2000, 28 (3): 308-317. 10.1002/1098-2264(200007)28:3<308::AID-GCC9>3.0.CO;2-B.PubMedView ArticleGoogle Scholar
- Rummukainen J, Kytola S, Karhu R, Farnebo F, Larsson C, Isola JJ: Aberrations of chromosome 8 in 16 breast cancer cell lines by comparative genomic hybridization, fluorescence in situ hybridization, and spectral karyotyping. Cancer Genet Cytogenet. 2001, 126 (1): 1-7. 10.1016/S0165-4608(00)00387-3.PubMedView ArticleGoogle Scholar
- Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP, Tong F, Speed T, Spellman PT, DeVries S, Lapuk A, Wang NJ, Kuo WL, Stilwell JL, Pinkel D, Albertson DG, Waldman FM, McCormick F, Dickson RB, Johnson MD, Lippman M, Ethier S, Gazdar A, Gray JW: A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006, 10 (6): 515-527. 10.1016/j.ccr.2006.10.008.PubMed CentralPubMedView ArticleGoogle Scholar
- Shadeo A, Lam WL: Comprehensive copy number profiles of breast cancer cell model genomes. Breast Cancer Res. 2006, 8 (1): R9-10.1186/bcr1370.PubMed CentralPubMedView ArticleGoogle Scholar
- Jonsson G, Staaf J, Olsson E, Heidenblad M, Vallon-Christersson J, Osoegawa K, de Jong P, Oredsson S, Ringner M, Hoglund M, Borg A: High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization. Genes Chromosomes Cancer. 2007, 46 (6): 543-558. 10.1002/gcc.20438.PubMedView ArticleGoogle Scholar
- Huang J, Wei W, Zhang J, Liu G, Bignell GR, Stratton MR, Futreal PA, Wooster R, Jones KW, Shapero MH: Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics. 2004, 1 (4): 287-299. 10.1186/1479-7364-1-4-287.PubMed CentralPubMedView ArticleGoogle Scholar
- Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH, Bajsarowicz K, Paris PL, Tao Q, Kowbel D, Lapuk A, Shagin DA, Shagina IA, Gray JW, Cheng JF, de Jong PJ, Pevzner P, Collins C: Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res. 2006, 16 (3): 394-404. 10.1101/gr.4247306.PubMed CentralPubMedView ArticleGoogle Scholar
- Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo WL, Magrane G, De Jong P, Gray JW, Collins C: End-sequence profiling: sequence-based analysis of aberrant genomes. Proc Natl Acad Sci USA. 2003, 100 (13): 7696-7701. 10.1073/pnas.1232418100.PubMed CentralPubMedView ArticleGoogle Scholar
- Raphael BJ, Volik S, Yu P, Wu C, Huang G, Linardopoulou EV, Trask BJ, Waldman F, Costello J, Pienta KJ, Mills GB, Bajsarowicz K, Kobayashi Y, Sridharan S, Paris PL, Tao Q, Aerni SJ, Brown RP, Bashir A, Gray JW, Cheng JF, de Jong P, Nefedov M, Ried T, Padilla-Nash HM, Collins CC: A sequence-based survey of the complex structural organization of tumor genomes. Genome Biol. 2008, 9 (3): R59-10.1186/gb-2008-9-3-r59.PubMed CentralPubMedView ArticleGoogle Scholar
- Hampton OA, Den Hollander P, Miller CA, Delgado DA, Li J, Coarfa C, Harris RA, Richards S, Scherer SE, Muzny DM, Gibbs RA, Lee AV, Milosavljevic A: A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. Genome Res. 2009, 19 (2): 167-177.PubMed CentralPubMedView ArticleGoogle Scholar
- Hillmer AM, Yao F, Inaki K, Lee WH, Ariyaratne PN, Teo AS, Woo XY, Zhang Z, Zhao H, Ukil L, Chen JP, Zhu F, So JB, Salto-Tellez M, Poh WT, Zawack KF, Nagarajan N, Gao S, Li G, Kumar V, Lim HP, Sia YY, Chan CS, Leong ST, Neo SC, Choi PS, Thoreau H, Tan PB, Shahab A, Ruan X, et al: Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome Res. 2011, 21 (5): 665-675. 10.1101/gr.113555.110.PubMed CentralPubMedView ArticleGoogle Scholar
- Yao F, Ariyaratne PN, Hillmer AM, Lee WH, Li G, Teo AS, Woo XY, Zhang Z, Chen JP, Poh WT, Zawack KF, Chan CS, Leong ST, Neo SC, Choi PS, Gao S, Nagarajan N, Thoreau H, Shahab A, Ruan X, Cacheux-Rataboul V, Wei CL, Bourque G, Sung WK, Liu ET, Ruan Y: Long span DNA paired-end-tag (DNA-PET) sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons. PLoS One. 2012, 7 (9): e46152-10.1371/journal.pone.0046152.PubMed CentralPubMedView ArticleGoogle Scholar
- Diede SJ, Tanaka H, Bergstrom DA, Yao MC, Tapscott SJ: Genome-wide analysis of palindrome formation. Nat Genet. 2010, 42 (4): 279-10.1038/ng0410-279.PubMed CentralPubMedView ArticleGoogle Scholar
- Britten RJ, Graham DE, Neufeld BR: Analysis of repeating DNA sequences by reassociation. Methods Enzymol. 1974, 29: 363-418.PubMedView ArticleGoogle Scholar
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996, http://www.repeatmasker.org,Google Scholar
- Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297 (5583): 1003-1007. 10.1126/science.1072047.PubMedView ArticleGoogle Scholar
- Miller FJ, Rosenfeldt FL, Zhang C, Linnane AW, Nagley P: Precise determination of mitochondrial DNA copy number in human skeletal and cardiac muscle by a PCR-based assay: lack of change of copy number with age. Nucleic Acids Res. 2003, 31 (11): e61-10.1093/nar/gng060.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey JA, Eichler EE: Genome-wide detection and analysis of recent segmental duplications within mammalian organisms. Cold Spring Harb Symp Quant Biol. 2003, 68: 115-124. 10.1101/sqb.2003.68.115.PubMedView ArticleGoogle Scholar
- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nat Rev Cancer. 2004, 4 (3): 177-183. 10.1038/nrc1299.PubMed CentralPubMedView ArticleGoogle Scholar
- Osborne C, Wilson P, Tripathy D: Oncogenes and tumor suppressor genes in breast cancer: potential diagnostic and therapeutic applications. Oncologist. 2004, 9 (4): 361-377. 10.1634/theoncologist.9-4-361.PubMedView ArticleGoogle Scholar
- Santarius T, Shipley J, Brewer D, Stratton MR, Cooper CS: A census of amplified and overexpressed human cancer genes. Nat Rev Cancer. 2010, 10 (1): 59-64. 10.1038/nrc2771.PubMedView ArticleGoogle Scholar
- Barlund M, Monni O, Weaver JD, Kauraniemi P, Sauter G, Heiskanen M, Kallioniemi OP, Kallioniemi A: Cloning of BCAS3 (17q23) and BCAS4 (20q13) genes that undergo amplification, overexpression, and fusion in breast cancer. Genes Chromosomes Cancer. 2002, 35 (4): 311-317. 10.1002/gcc.10121.PubMedView ArticleGoogle Scholar
- McClintock B: The Stability of Broken Ends of Chromosomes in Zea Mays. Genetics. 1941, 26 (2): 234-282.PubMed CentralPubMedGoogle Scholar
- Carvalho CM, Ramocki MB, Pehlivan D, Franco LM, Gonzaga-Jauregui C, Fang P, McCall A, Pivnick EK, Hines-Dowell S, Seaver LH, Friehling L, Lee S, Smith R, Del Gaudio D, Withers M, Liu P, Cheung SW, Belmont JW, Zoghbi HY, Hastings PJ, Lupski JR: Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat Genet. 2011, 43 (11): 1074-1081. 10.1038/ng.944.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt D, Ha C, Johnson S, Kennemer MI, Mohan S, Nazarenko I, Watanabe C, Sparks AB, Shames DS, Gentleman R, de Sauvage FJ, Stern H, Pandita A, Ballinger DG, Drmanac R, Modrusan Z, Seshagiri S, Zhang Z: The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature. 2010, 465 (7297): 473-477. 10.1038/nature09004.PubMedView ArticleGoogle Scholar
- Hastings PJ, Ira G, Lupski JR: A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 2009, 5 (1): e1000327-10.1371/journal.pgen.1000327.PubMed CentralPubMedView ArticleGoogle Scholar
- Hastings PJ, Lupski JR, Rosenberg SM, Ira G: Mechanisms of change in gene copy number. Nat Rev Genet. 2009, 10 (8): 551-564. 10.1038/nrg2593.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu P, Carvalho CM, Hastings PJ, Lupski JR: Mechanisms for recurrent and complex human genomic rearrangements. Curr Opin Genet Dev. 2012, 22 (3): 211-220. 10.1016/j.gde.2012.02.012.PubMed CentralPubMedView ArticleGoogle Scholar
- Tanaka H, Tapscott SJ, Trask BJ, Yao MC: Short inverted repeats initiate gene amplification through the formation of a large DNA palindrome in mammalian cells. Proc Natl Acad Sci USA. 2002, 99 (13): 8772-8777. 10.1073/pnas.132275999.PubMed CentralPubMedView ArticleGoogle Scholar
- Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, Esposito D, Alexander J, Troge J, Grubor V, Yoon S, Wigler M, Ye K, Borresen-Dale AL, Naume B, Schlicting E, Norton L, Hagerstrom T, Skoog L, Auer G, Maner S, Lundin P, Zetterberg A: Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 2006, 16 (12): 1465-1479. 10.1101/gr.5460106.PubMed CentralPubMedView ArticleGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185-193. 10.1093/bioinformatics/19.2.185.PubMedView ArticleGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL: Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics. 2012, 13: 134-10.1186/1471-2105-13-134.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.