Involvement of a citrus meiotic recombination TTC-repeat motif in the formation of gross deletions generated by ionizing radiation and MULE activation
© Terol et al.; licensee BioMed Central. 2015
Received: 22 April 2014
Accepted: 26 January 2015
Published: 13 February 2015
Transposable-element mediated chromosomal rearrangements require the involvement of two transposons and two double-strand breaks (DSB) located in close proximity. In radiobiology, DSB proximity is also a major factor contributing to rearrangements. However, the whole issue of DSB proximity remains virtually unexplored.
Based on DNA sequencing analysis we show that the genomes of 2 derived mutations, Arrufatina (sport) and Nero (irradiation), share a similar 2 Mb deletion of chromosome 3. A 7 kb Mutator-like element found in Clemenules was present in Arrufatina in inverted orientation flanking the 5′ end of the deletion. The Arrufatina Mule displayed “dissimilar” 9-bp target site duplications separated by 2 Mb. Fine-scale single nucleotide variant analyses of the deleted fragments identified a TTC-repeat sequence motif located in the center of the deletion responsible of a meiotic crossover detected in the citrus reference genome.
Taken together, this information is compatible with the proposal that in both mutants, the TTC-repeat motif formed a triplex DNA structure generating a loop that brought in close proximity the originally distinct reactive ends. In Arrufatina, the loop brought the Mule ends nearby the 2 distinct insertion target sites and the inverted insertion of the transposable element between these target sites provoked the release of the in-between fragment. This proposal requires the involvement of a unique transposon and sheds light on the unresolved question of how two distinct sites become located in close proximity. These observations confer a crucial role to the TTC-repeats in fundamental plant processes as meiotic recombination and chromosomal rearrangements.
KeywordsDouble-strand breaks Crossover hot spot Structural variations Transposable-element
One of the major lines of evidence supporting that structural variations in genomes have a strong impact on phenotypic diversity comes from the study of human genomes (www.1000genomes.org/) and their prevalence on diseases . It is well known that structural genome variations may occur through numerous processes, i.e. segmental duplications, illegitimate recombination or transposable elements (TEs) activity [1,2]. TE insertions specially provide an extraordinary source of natural genetic variation and diversity . Transposable elements [4,5], as ionizing radiation [6-8] for instance, have frequently been associated with major chromosomal rearrangements such as deletions, duplications, inversions, translocations and recombination of host genomes [4,5].
In previous work, we generated through irradiation of Citrus clementine, cv. “Clemenules”, (CLE) a collection of induced mutants in order to increase phenotypic diversity. The screening for fruit precocity of this collection identified a mutant, Nero (NER), strongly resembling the spontaneous Arrufatina somatic mutation. Since ionizing radiation is generally expected to produce mostly deletions  it was hypothesized that a similar deletion rearrangement might be the cause of the precocious behavior of the ARR natural mutation. In the work presented in here we took advantage of the availability of the citrus clementine genome (GenBank: AMZM00000000.1) to show that both mutants certainly share a similar 2 Mb deletion of chromosome 3 and that the ARR deletion is associated with the activation of a Mutator-like element (MULE). Mutator and MULEs are widespread in plants, fungi and animals. MULEs contain transposase domains, terminal inverted repeats (TIRs) and generally have a 9–11 bp target site duplication (TSD) flanking the transposon formed during “cut and paste” transposition  into a new genomic location [10-13]. In general, there is solid evidence showing that Mutator frequently induces deletions  and although major advances have been made in the biology of TEs, there are still many open questions to be elucidated on this association. According to Gray , there are 2 possible mechanisms by which TE-associated chromosomal rearrangements may occur: homologous recombination and alternative transposition process. During homologous recombination, sequences are exchanged between homologous DNA fragments. For instance, intra-strand homologous recombination between two different TEs may result in deletion of the in-between region. In alternative transposition a hybrid element is formed after the synapsis of complementary TE ends from separate TEs. Depending upon the orientation of the termini and on the chromosomal location of the elements, alternative transpositions can lead to many kinds of chromosomal rearrangements including inversions, duplications, and deletions . It is well known that pairs of closely-linked transposable elements can induce various chromosomal rearrangements in several systems, through both, homologous recombination and alternative transposition [14,16-18]. Nevertheless, there are many examples where bimolecular synapsis cannot explain all TE-mediated rearrangements not resolved by homologous recombination . In fact, the observed characteristics of the ARR deletion do not match the accepted premises of homologous recombination, alternative transposition either those of “cut and paste” transposition.
On the other hand, ionizing radiation of cells has provided a large body of evidence that chromosomal rearrangements are clearly influenced by “proximity” effects . Illegitimate repair, for example, decreases as the distance between DSBs at the time of formation increases. Furthermore, the production of interstitial deletions is larger than randomness would indicate. Therefore, the occurrence of a gross deletion implies the presence of double-stranded breaks (DSBs) in two distinct genomic locations physically located in close proximity and joined by incorrect repair. However, it remains still very unclear how two genomically distinct sites become located in close proximity. For instance, Van Zelm et al.  studied gross deletions in human genes and found that the breakpoints analyzed involved at least two DSBs in distant genomic locations that were first placed in physical proximity and then incorrectly repaired. The authors performed a careful evaluation of the current hypothesis to explain this circumstance and conclude that unknown additional factors were required to mediate co-localization of two distant genomic regions and double-stranded DNA break induction. While these unknown factors have resulted rather elusive to date, in the current work we provide evidence compatible with the suggestion that a TTC-repeat motif very similar to the recently reported CTT-repeat DNA motif from an Arabidopsis meiotic crossover hot spot [20,21], may enable the proximity of the two distant sequences facilitating transposase reactions.
Results and discussion
It is widely accepted that transposable-element mediated chromosomal rearrangements require coordinated transposition or involvement of at least two TEs. Thus, several mechanisms generating chromosomal rearrangements have been devised mostly based on variations of the basic homologous recombination and alternative transposition processes [4,15]. While these mechanisms have received wide experimental support, there are many examples where TE-mediated rearrangements are mostly incompatible with homologous recombination and with the classical or alternative “cut and paste” transposition . In addition, the occurrence of gross or large rearrangements also implies the presence of DSBs in two distinct and separate genomic locations but physically located in close proximity to allow TE insertion. This question, how two distinct chromosomal sites become close together, is a current unresolved enigma attributed to unknown factors . In the current work we provide evidence in citrus suggesting that a TTC-repeat motif of a meiotic crossover hot spot might enable the proximity of two distant sequences facilitating transposition of a TE that as a consequence generated a gross deletion.
In this work, we took advantage of the availability of the citrus clementine genome (GenBank: AMZM00000000.1) to identify and characterize a structural deletion involved in citrus fruit precociousness through the analysis and comparison of the genomes of three clementines (Citrus clementine), Clemenules (CLE), Arrufatina (ARR) and Nero (NER). Both ARR and NER are mutants derived from somatic CLE mutations; ARR is a spontaneous bud sport whereas NER is a fast neutron induced mutant. The mutants are phenotypically very alike except for the sterility developed in the induced mutant and both show fruit precocity when compared with CLE (Additional file 1: Table S1).
Illumina pair-end, short read technology was used to sequence the genomes of CLE, ARR and NER. For mapping, the high quality genome sequence of a haploid clementine variety generated by the International Citrus Genome Consortium (GenBank: AMZM00000000.1) was used as reference. The sequencing, mapping and variant calling statistics presented in Additional file 2: Table S2 indicates that the genome sequences obtained were of high quality. More than 400 million reads were mapped in CLE and approximately a half of these in both mutants. Coverage of the CLE (69x), ARR (44x) and NER (39x) genomes was, therefore, relatively high with more than 81% of the genome sequences covered by at least 15 reads. CLE Illumina reads were submitted to the NCBI Sequence Read Archive with the experiment accession number SRX371962. ARR and NER sequences were submitted to the European Nucleotide Archive with the study accession number PRJEB5808.
Variant calling: SNVs and indels
Variant calling was performed with GATK  in order to identify single nucleotide variants (SNVs) and small indels (in general, no more than 15 bp). As the reference sequence used for the analysis was a haploid genotype of CLE, the number of SNVs identified in the diploid CLE genotype (1,4 million) was slightly smaller than those found in ARR and NER. Between 12–14 thousand SNVs in each genome were detected as homozygous SNVs as related to the reference genome (Additional file 2: Table S2). A deeper insight in these data showed that most of the reads actually had either multiallelic positions or low frequencies for the reference allele and that only 30% were pure homozygous SNVs, suggesting that these could be Sanger errors in the reference genome. Multiallelic positions and low allelic frequencies are probably linked to the chimeric nature of these genotypes as explained below. The variant calling also revealed about 150 thousand heterozygous indels and another 10 thousand homozygous indels.
Coverage, copy number variation, pair-end reads and PCR analyses: chromosomal rearrangements
All boundaries of the above mentioned rearrangements were confirmed by PCR analyses except the 3 deletions of chromosome 3 from NER. The first deletion, however, was ascertained by gene dosage because the 5′ boundary of the deletion was resistant to amplification due to the occurrence of a 32 bp palindrome at the beginning of the deletion. On the other hand, precise boundaries for the other 2 deletions could not be well defined.
MULE identification, characterization and phylogenetic analyses
Mutator-like elements of the CitMule family
SNV 22 a
SNV 292 a
SNV 313 a
SNV 6757 a
SNV 6895 a
5′ TSD b
3′ TSD b
ORF prediction performed with Genscan  with the 4 complete elements showed that the putative transposable elements contained a 5 exons transcript and that the largest exon 1 was conserved in the 4 elements (Figure 3). The ORFs of CitMule1 and _2 coded for proteins 795 aa long, while CitMule_3 and _4 produced proteins of 805 aa. Motif analysis with InterProScan  revealed that the predicted proteins contained a MULE transposase domain as usually found in the Mutator superfamily of transposable elements [11,12,26], in addition to a FAR1 DNA binding domain and a SWIM-type Zinc finger motif (Additional file 5: Table S4).
A MEGABLASTN search performed against the citrus EST of the GenBank yielded a total of 7 ESTs unequivocally derived from CitMule_1 and _2 and another 17 ones identical to CitMule_3 and _4 fragments (Additional file 6: Table S5). Furthermore, the presence of transcripts from the different CitMUles was unequivocally confirmed through RNA-seq analyses in C. clementina and another additional 11 species from mayor citrus groups including mandarins, oranges, lemons, pummelos and citrons (to be published elsewhere). Therefore, the data indicated that these elements were transcriptionally active.
The predicted protein sequence of the CitMule elements was aligned against those from other Mutator-like elements previously described in other species: MoSB-1(AAD27572) from Sorghum bicolor, MoOS-521 (BAA92521), and Os3378 (AP008211) from Oryza sativa, Jittery (AAF66982), TRAP (CAB51950), TED (AGR45850), and Mudr (AAA21566) from Zea mays, and Far1 (AAD51282), AtMu1 (AAG52094) and AtMu6 (AAD19776) from Arabidopsis thaliana (Additional file 7: Figure S2). The phylogenetic analysis performed with the Neighbor-Joining method  indicated that CitMule elements were clearly related to Jittery, a Mutator-like element from maize . In order to identify additional citrus Mutator-like elements in the CLE genome a BLASTN search was performed using the exon 1 sequence as a query. A total of 143 sequences longer than 500 bp and with significant similarity were obtained. These sequences analyzed by the Neighbor-Joining method  produced the phylogenetic tree shown in Additional file 8: Figure S3. Based on these results the citrus MULEs can be grouped into 6 subfamilies, named Mutator-Like I to VI. CitMule elements clustered in subfamily I. It is worth to mention that the phylogenetic relationships between CitMule_1, 2, 3 and 4 showed in Additional file 7: Figures S2 (protein sequences) and Additional file 8: FigureS3 (genomic sequences) were slightly different probably due to the inclusion of intron sequences in the analyses of Additional file 8: Figure S3. Mapping of the MULE sequences on CLE chromosomes showed that the 6 subfamilies were interspersed randomly in the genome (Additional file 9: Figure S4).
Insights into the CitMule 5′end
The most conspicuous element of this list is the pyrimidine/purine rich track (TC/GA)n. In humans, the presence of TC stretches in sites of sister chromatid exchange  and at the breakpoints of hybrid genes  has been known for a long time. The pyrimidine rich tracks are structures that, like purine pyrimidine mirror repeat sequences and palindromes of polypurine/polypyrimidine DNA stretches (see below), can readily form triplexes adopting the triple helical H-form of DNA [30,34]. Since H-DNA is partially single-stranded it may be susceptible to nuclease attack that could then facilitate recombination. Moreover, the TC rich track of CitMule also contains 6 GGG triplets that eventually might have the possibility of forming quadruplex DNA, a highly stable structure derived from double stranded GC-rich sequences . Although unequivocal evidence for a specific role on chromosomal translocations in humans is still lacking, triplex DNA and G-quadruplex have been clearly implicated in genomic instability .
Translin is a multimeric protein that recognizes the single-stranded ends for DNA repair and potential translin binding sites including those detected in CitMule (GC[T/C]CTG [C/T]T) have been found at translocation breakpoints in many cancer diseases . Chi-like elements, as the one found in CitMule (GCTGGT), are mediators of prokaryotic recombination and have been extensively reported in association with oncogenic translocation and gross deletions breakpoints in humans. It has been suggested that the Chi-like sequence elements may represent a class of recognition element for recombinases . In humans, a plethora of papers have reported that topoisomerase consensus II cleavage sites have been observed in the vicinity of translocation breakpoints at a diversity of genes and in several inherited disease-associated deletion breakpoints [29,33]. The sequence detected in CitMule (CTCATCTCGCTGCTCTCT) exhibits an 89% identity with the human topoisomerase II recognition site (topo II, v) . In addition to all these elements found in humans, this CitMule terminus also contained another short motif (CCAATCA) that has been significantly associated with DSBs hotspots in the genome of Schizosaccharomyces pombe . In spite of the fact that these motifs in general are either relatively short or highly redundant and, therefore, their chance occurrence at breakpoint junctions may be simply accidental, it is rather striking that all these elements accumulate exclusively in the 5′ terminus of the CitMule. This singularity is exemplified, for instance, in Additional file 10: Figure S5 that shows that the number of TC dinucleotides, in a 100 bp basis, in the TC rich track of this terminus was 6 or 7-fold higher that the average of TC repeats detected in the 800 bp fragments flanking the TC track. Thus, the accumulation of several universal recombination motifs in this end appears to indicate that they play a role in the transposition mechanism of the CitMule element.
Another striking observation regarding the structure of CitMule is that this element does not contain long (100–200 bp) terminal inverted repeats (LTIRs) structures with high similarity (around 95%), as usual in most TIR-MULEs and Mu elements in many plants including Arabidopsis . As shown in Figure 4B, CitMule 1 exhibits shorter degenerated TIR motifs showing much lower sequence similarity (17 bp 100%; 34 bp 88%; 65 bp 78%) and therefore appears to belong to a non-TIR-MULEs group according to the definition in Yu et al. . The biological significance of these degenerate sequence motifs within the subterminal regions remains unknown although it has been suggested that they, as the longer TIRS, also may correspond to transposase recognition sequences or to cis-factors for transposase binding.
Allele frequency analysis: identification of chimerism and a meiotic recombination motif
Moreover, Figure 6 reporting the frequency of alternative allele in the deletions identified in ARR and NER shows a sudden shift in the allelic frequency detected in the 6,7-8,6 Mb deletion in both mutants. This shift indicates the occurrence of a meiotic recombination event in the reference citrus genome . According to the allelic frequency shift the recombination hotspot was located in a 260 bp track delimited by position 7797493, corresponding to the last SNV in the mutants with prevalence of the alternative allele, and position 7797753 corresponding to the first SNV with prevalence of the reference allele (Additional file 13: Figure S7). It is worth to mention that these SNV were evident because of the chimeric nature of the mutants. Although the proportion of bases was exactly the same in the 260 bp sequence included between these 2 markers and in the surrounding sequences, the number of TTC triplets in a 260 bp basis, progressively increased from 1 to 11, from position 7796194 to the track containing the hotspot. In this track, there was a TTC-trinucleotide repeat composed of 7 triplets starting at position 7797537. This sequence is very similar, if not the same, to the recently CTT-repeat motif identified in Arabidopsis as being enriched at meiotic crossover hot spots in Arabidopsis [20,21]. This observation is relevant because long GAA/TTC tracks, that cause Friedrich’s ataxia , for example, elicit profound mutagenic, genetic instability and major recombination behaviors. In yeast, these trinucleotide repeats strongly stimulated mitotic crossovers and were preferred sites for chromosome breakage , double-stranded DNA breaks and terminal deletions. Stimulation of plasmid-plasmid recombination has also been observed for GAA/TTC repeats in E. coli, . In these recombination studies there was a positive correlation between the length of the shorter tracts and the frequency of recombination because the process was significantly hampered by the ability of longer tracks to form “sticky DNA”. Namely, short GAA/TTC repeats (no longer than 30 units) that can exist in the cell as a B-DNA duplex or a triplex, but cannot form the sticky DNA structure due to their length, are excellent substrates for intramolecular recombination . The authors of this work also concluded that triplex structures formed by short tracts were responsible for the recombination hot spot activity of GAA/TTC repeats. Therefore we propose that the GAA/TTC track identified at positions 7797537–7797558 is the sequence responsible of the hot spot activity that resulted in the crossover identified in the citrus reference genome.
The TTC-repeat motif produces genome instability at different levels
The idea that recombination motifs are drivers of genome instability is not new. Thus, in addition to the specific role on meiotic crossovers, a number of recent reports also suggest that meiotic recombination motifs are similarly associated with chromosomal rearrangements. In humans, a common sequence motif found in hypervariable minisatellites and clustered in the breakpoint regions of both diseases and mitochondrial deletion hot spots was clearly implicated in genome instability . Furthermore, other human hotspot sequence motifs and repeat elements also showed an interesting connection between meiotic recombination and genes with disease associated chromosomal rearrangements . It is now widely accepted that some tracks of genomic DNA that adopt non-canonical B-DNA structures like DNA-hairpin, cruciform, Z-DNA, triplex and tetraplex are represented as hotspots of chromosomal breaks, homologous recombination and gross chromosomal rearrangements . Intra-molecular triplex, for instance, are overrepresented in the human genome and generally found near promoter regions and recombination hotspots. The TTC-repeat in particular has been extensively studied because expansions of these tracks are associated with the human disease Friedrich’s ataxia . These triplets are prone to form DNA triplexes, the major components of H-DNAs, unusual DNA structures formed in homopurine-homopyridine regions of supercoiled DNA [45,46]. The H form consists of an intramolecular triple helix formed by the pyrimidine strand and half of the purine strand, leaving the other half of the purine strand single stranded. The existence of single stranded purine stretches and the hyperreactivity of this conformation to S1 nuclease  appear to be the reason of the strong association of the TTC- and other repeat motifs to recombination and also to the generation of chromosomal rearrangements. Furthermore, it has been demonstrated the occurrence of replication-associated intramolecular junctions between TTC-repeats and other homopurine-homopyrimidine tracts . These unusual molecular junctions are accompanied with breakage that appears to be physically linked to non-GAA DNA sequences and could result from exposure of the single stranded DNA. In this configuration, chromosome fragility takes place in proximity to GAA repeats while in the vast majority of reported rearrangements the motifs are rather coincident with the breakpoints of the genomic alterations [40,42-44]. However, there is a clear-cut difference between the involvement of TTC-repeats in the gross genomic chromosomal rearrangements reported to date and the deletions observed in ARR an NER, since in these mutations the motif is centered just in the middle of the deletions (Figure 6). This difference implies that chromosome break in the mutants was not due to the TTC-motif itself, but instead the motif by a mechanistic way located in close proximity the two initially separate target sites in each mutation. Since the formation of triplex DNA implies a sharp bend in the DNA molecule, these observations predict a functional model that is consistent with a duplex-strand separation mediated by the TCC-repeat motif that does not end in recombination or chromosome break but fold back the DNA initiating a loop configuration that brings together both targets. Therefore, the TTC-motif may act as driver of genomic instability at different levels since in Clemenules there was no indication of TTC activity, while in the citrus reference genome, this motif provoked a crossover, in NER formed a loop and in ARR formed a loop that facilitated a TE-mediated deletion.
Proposed model for the ARR TE-mediated deletion
Chromosomal rearrangements in irradiated NER
Several genomic lesions, i.e. three deletions in chromosome 3 and two more in chromosome 8 plus a translocation from chromosome 8 to 6 were identified in NER, the variety generated through fast neutron irradiation (Figures 1B and 2B). It is believed that most genomic lesions induced in cells exposed to ionizing radiations are caused either directly by DNA ionization or indirectly by free radicals. Ionizing radiation frequently causes “clustered DNA damage” when the ionization track induces several lesions in the DNA within a couple of turns [7,8]. Furthermore, it is generally accepted that DSBs in the DNA is the main cause of these lesions since misrepair or lack of repair of DSBs apparently triggers most mutations and chromosomal rearrangements found in irradiated tissues. Sankaranarayanan and Wassom  have proposed that the mechanisms involved in the repair of radiation induced DSBs in mammalian cells are likely the same that those naturally occurring, and have hypothesized that incorrect DSB rejoining of broken ends in the same chromosome may generate an interstitial deletion. While no attempts have been made in the current work to test this hypothesis, it is intriguing that the breakpoints of the 2 Mb deletion detected in chromosome 3 of NER, (positions 6782589–8724143) are just equidistant to the repeat motif location (position 7797537), as in ARR. This observation is also compatible with the formation in NER of a triplex DNA configuration promoted by the TTC motif and, therefore, it is coherent to suggest that the same ionization track provoked both breakpoints generating “clustered DNA damage” [7,8] in the two initially distant sites. Since the boundaries of the other 2 deletions in chromosome 3 could not be exactly defined, the presence of these repeats around the center of the deletions could not be precisely ascertained, but the fragment involved in the rearrangement detected in chromosome 8 of NER (positions 12602311–13635996) also contained at least three different TTC-repeat motifs roughly located around the midpoint of the rearrangement.
Gene content in the ARR/NER deletion
Genes present in the ARR and NER deletion were obtained from the annotation of the clementine genome available at http://www.citrusgenomedb.org/. A total of 244 primary transcripts were predicted in this region. Functional annotation of the genes was carried out with BLAST2GO  (Additional file 14: Table S7). In the listing of annotated genes located at the deletion there were several transcription factors and genes both related to hormone responses and associated with early ripening, the main characteristic of the ARR and NER phenotype (Additional file 1: Table S1). Among them, there were genes related to chlorophyll and carotenoid biosynthesis, carbohydrate and sugar metabolic processes and also to acid metabolism (Additional file 14: Table S7). This region also contained 4 different protein-COBRA like genes that presumably encode a plant-specific glycosylphosphatidylinositol-anchored protein  and three additional genes implicated in phosphatidylinositol biosynthetic processes. This observation deserves further investigation because it has been shown that COBRA overexpression in transgenic tomato promoted early fruit development and delayed senescence . Similarly, a few genes apparently related to other minor ARR/NER traits such as water transport and defense response to fungus, were also detected. ARR and NER certainly differ in one pivotal characteristic because ARR is a self-incompatible variety while NER is basically a sterile variety. However, the sterility in NER should be attributed to the irradiation treatment  and specifically to the generation of additional deletions (Figure 1) that presumably compromised correct chromosome pairing  during meiosis.
With no additional information, however, the diversity of the ARR and NER phenotypic traits suggests that this phenotype is a consequence of the combined effects of many deleted genes rather than a particular effect of one single or a few genes. The concept that deletions are more likely to manifest themselves as combination of multiple developmental alterations have previously been settled in other fields such as the irradiation induced deletions in human germ cells .
Aside from the identification of Mules in citrus and the first report on the conspicuous accumulation of several motifs involved in human chromosome breaks in a TE end terminus, this work highlights overall two major findings: the identification of a TTC-repeat as a motif physically responsible of meiotic recombination in plants and its involvement in the generation of gross deletions. The existence of gross deletions implies the presence of double-stranded breaks in two initially distinct locations that became physically located in close proximity and are rejoined by illegitimate repair. However, there is not a reasonable explanation of how two genomically separate sites become located in close proximity. We show compelling evidence in two different genotypes (a spontaneous and irradiate mutant) consistent with the proposal that the recombination TTC-repeat motif form a triplex DNA structure generating a loop that enables the proximity of distant sequences, facilitating double strand break by Mule reaction or clustered DNA damage provoked by a ionizing track. This proposal offers a new insight in “cut and paste” transposition since it requires the involvement of a unique transposon for the formation of gross deletions and offers a simple explanation to the unresolved question of how two genomically distinct sites become located in close proximity during chromosomal rearrangements. These observations confer an active and specific double role to the identified TTC-repeat motif in fundamental processes as meiotic recombination and chromosomal rearrangements in plants.
Plant material and DNA extraction
Clementine (Citrus clementina) cultivars of Clemenules (CLE), Arrufatina (ARR) and Nero (NER) were used in this study. Clemenules is a self-incompatible genotype that is vegetatively propagated while ARR and NER are both derived from somatic CLE mutations. ARR is a bud sport and NER an induced mutant obtained by fast neutron irradiation. Genomic DNA for genome sequencing was extracted exclusively from fresh young leaves after nuclear isolation. Leafs were grinded, nuclear buffer was added and samples homogenized and filtered on Miracloth layers. Extracts were centrifuged twice and the pellet re-suspended in floating buffer and centrifuged again. Nuclei, recovered by pipetting, were homogenized, re-suspended in nuclear buffer and centrifuged. The supernatant was discarded, RNase and protein Kinase A were added to the pellet and the extract incubated at 50°C with gentle shaking and centrifuged. Nuclei in the supernatant were transferred and DNA extraction was performed by mixing the solution with an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1). A second extraction with isopropanol was carried out and after centrifugation the DNA was recovered in the pellet. Three washes of ethanol were performed; the alcohol discarded and after drying DNA was re-dissolved in TE. For PCR analyses DNA was extracted from leaves and flavedo tissue with the DNAeasy Plant mini kit (Qiagen).
Libraries were constructed using the Illumina TruSeq DNA Sample Prep standard protocol with some modifications. Briefly, 1 μg of high molecular weight genomic DNA was fragmented with a Covaris sonication device. Thereafter, DNA fragments were end-repaired and A-tailed. Adapters were then ligated via a 3′ thymine overhang. Finally, ligated fragments were amplified by PCR (10 cycles). Libraries insert sizes ranged from 400 to 500 bp. The library was applied to an Illumina flowcell for cluster generation. Sequencing was performed on a HiSeq2000 instrument using 100 bp paired-end reads. Primary analysis of the data included quality control on the Illumina RTA sequence analysis pipeline.
Sequence processing, mapping and variant calling
Low quality bases from sequence tails were trimmed via a custom script. Afterwards, extremely short remaining lectures and those with low mean quality were also filtered out. PCR duplicated sequences were removed. Selected reads were aligned against the citrus clementine reference genome (v 1.0) using the Burrows-Wheeler Aligner (BWA) . Raw mapped reads were filtered by mapping quality, sorted and indexed with Samtools . Finally, selected reads were realigned following the GATK  variant detection protocol. High quality mapped reads were used to detect SNVs and Indels. These were called with the GATK software and variants were labelled according to the quality control scores provided in the tool. Finally, labelling was used to define high quality sets of variants with low false positive rates.
Gene dosage measurements were determined through Real-time quantitate PCR, on a LightCycler 2.0 instrument (Roche) using the LightCycler FastStart DNA MasterPLUS SYBR Green I kit (Roche) essentially as described in Rios et al. . Each individual PCR reaction contained 2 ng of genomic DNA. Cycling protocol consisted of 10 min at 95°C for pre-incubation followed for 45 cycles of 10 s at 95°C for denaturation, 10 s at 60°C for annealing and 20 s at 72°C for extension. Specificity of the PCR reaction was assessed by the presence of a single peak in the dissociation curve after amplification and through size estimation of the amplified product. Gene dosage measurements were calculated comparing the ratio of target sequences inside and outside of the deletion in the three genotypes. PCR and normalized calculations were repeated in at least three independent samples. Non-quantitative PCR reactions contained 100 ng of genomic DNA, 0,6 μM of each primer and 0.5X of 2xPhusion master mix (Phusion High-Fidelity PCR Master, Cat. no F-532S, Thermoscientific). Cycling protocol consisted of 1 min at 98°C for pre-incubation followed by 35 cycles of 10 s at 98°C for denaturation, 20 s at 60°C for annealing and 60 s at 72°C for extension and one cycle of 10 min at 72°C for final elongation. Specificity of PCR reaction was confirmed by agarose gel electrophoresis and direct sequencing of the PCR product. For TA cloning, a 539 bp fragment corresponding to Ciclev10024595m.g locus was amplified using the Advantage HD Polymerase Mix (Clontech, Mountain View, CA, USA) with the GD1 specific oligos. Primers used are listed in Additional file 15: Table S8.
TA cloning was performed following the pGEMT-Easy vector system protocol (Promega, Madison, USA) according to manufacturer’s instructions.
Availability of supporting data
CLE Illumina reads were submitted to the NCBI Sequence Read Archive with the experiment accession number SRX371962 (http://www.ncbi.nlm.nih.gov/sra/?term=SRX371962). ARR and NER sequences were submitted to the European Nucleotide Archive with the study accession number PRJEB5808 (http://www.ebi.ac.uk/ena/data/view/PRJEB5808).
All the other supporting data are included as additional files.
This research was funded by the Citruseq-Citrusgenn Consortium and the Ministerio de Ciencia e Innovación through FEDER grants PSE-060000-2009-8, IPT-010000-2010-43 and AGL2011-30240. We thank Elena Blázquez, Angel Boix, Cristina Martínez, Mariano Montoro, Juana Ramirez, Antonio Prieto, Isabel Sanchís and Matilde Sancho for different collaborations and laboratory tasks.
- Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14(2):125–38.View ArticlePubMedGoogle Scholar
- Abeysinghe SS, Chuzhanova N, Krawczak M, Ball EV, Cooper DN. Translocation and gross deletion breakpoints in human inherited disease and cancer I: Nucleotide composition and recombination-associated motifs. Hum Mutat. 2003;22(3):229–44.View ArticlePubMedGoogle Scholar
- Huang CR, Burns KH, Boeke JD. Active transposition in genomes. Annu Rev Genet. 2012;46:651–75.View ArticlePubMed CentralPubMedGoogle Scholar
- Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331–68.View ArticlePubMed CentralPubMedGoogle Scholar
- Pritham EJ. Transposable elements and factors influencing their success in eukaryotes. J Hered. 2009;100(5):648–55.View ArticlePubMed CentralPubMedGoogle Scholar
- Sankaranarayanan K, Wassom JS. Ionizing radiation and genetic risks XIV, potential research directions in the post-genome era based on knowledge of repair of radiation-induced DNA double-strand breaks in mammalian somatic cells and the origin of deletions associated with human genomic disorders. Mutat Res. 2005;578(1–2):333–70.View ArticlePubMedGoogle Scholar
- Goodhead DT. Initial events in the cellular effects of ionizing radiations: clustered damage in DNA. Int J Radiat Biol. 1994;65(1):7–17.View ArticlePubMedGoogle Scholar
- Sage E, Harrison L. Clustered DNA lesion repair in eukaryotes: relevance to mutagenesis and cell survival. Mutat Res. 2011;711(1–2):123–33.View ArticlePubMed CentralPubMedGoogle Scholar
- Changela A, Perry K, Taneja B, Mondragon A. DNA manipulators: caught in the act. Curr Opin Struct Biol. 2003;13(1):15–22.View ArticlePubMedGoogle Scholar
- Yu Z, Wright SI, Bureau TE. Mutator-like elements in Arabidopsis thaliana. Structure, diversity and evolution. Genetics. 2000;156(4):2019–31.PubMed CentralPubMedGoogle Scholar
- Lisch D. Mutator transposons. Trends Plant Sci. 2002;7(11):498–504.View ArticlePubMedGoogle Scholar
- Diao XM, Lisch D. Mutator transposon in maize and MULEs in the plant genome. Yi Chuan Xue Bao. 2006;33(6):477–87.PubMedGoogle Scholar
- Gao D. Identification of an active Mutator-like element (MULE) in rice (Oryza sativa). Mol Genet Genomics. 2012;287(3):261–71.View ArticlePubMedGoogle Scholar
- Robertson DS, Stinard PS. Genetic evidence of mutator-induced deletions in the short arm of chromosome 9 of maize. Genetics. 1987;115(2):353–61.PubMed CentralPubMedGoogle Scholar
- Gray YH. It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet. 2000;16(10):461–8.View ArticlePubMedGoogle Scholar
- Jennes I, de Jong D, Mees K, Hogendoorn PC, Szuhai K, Wuyts W. Breakpoint characterization of large deletions in EXT1 or EXT2 in 10 multiple osteochondromas families. BMC Med Genet. 2011;12:85. 2350-12-85.View ArticlePubMed CentralPubMedGoogle Scholar
- Xuan YH, Piao HL, Je BI, Park SJ, Park SH, Huang J, et al. Transposon Ac/Ds-induced chromosomal rearrangements at the rice OsRLG5 locus. Nucleic Acids Res. 2011;39(22):e149.View ArticlePubMed CentralPubMedGoogle Scholar
- van Zelm MC, Geertsema C, Nieuwenhuis N, de Ridder D, Conley ME, Schiff C, et al. Gross deletions involving IGHM, BTK, or Artemis: a model for genomic lesions mediated by transposable elements. Am J Hum Genet. 2008;82(2):320–32.View ArticlePubMed CentralPubMedGoogle Scholar
- Sachs RK, Chen AM, Brenner DJ. Review: proximity effects in the production of chromosome aberrations by ionizing radiation. Int J Radiat Biol. 1997;71(1):1–19.View ArticlePubMedGoogle Scholar
- Choi K, Zhao X, Kelly KA, Venn O, Higgins JD, Yelina NE, et al. Arabidopsis meiotic crossover hot spots overlap with H2A.Z nucleosomes at gene promoters. Nat Genet. 2013;45(11):1327–36.View ArticlePubMedGoogle Scholar
- Wijnker E, Velikkakam James G, Ding J, Becker F, Klasen JR, Rawat V, et al. The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana. Elife. 2013;2:e01426.View ArticlePubMed CentralPubMedGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.View ArticlePubMed CentralPubMedGoogle Scholar
- Xie C, Tammi MT. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics. 2009;10(Xie C, Tammi MT):80. 2105-10-80.View ArticlePubMed CentralPubMedGoogle Scholar
- Burge CB, Karlin S. Finding the genes in genomic DNA. Curr Opin Struct Biol. 1998;8(3):346–54.View ArticlePubMedGoogle Scholar
- Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–40.View ArticlePubMed CentralPubMedGoogle Scholar
- Benito MI, Walbot V. Characterization of the maize Mutator transposable element MURA transposase as a DNA-binding protein. Mol Cell Biol. 1997;17(9):5165–75.PubMed CentralPubMedGoogle Scholar
- Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–25.PubMedGoogle Scholar
- Xu Z, Yan X, Maurais S, Fu H, O’Brien DG, Mottinger J, et al. Jittery, a Mutator distant relative with a paradoxical mobile behavior: excision without reinsertion. Plant Cell. 2004;16(5):1105–14.View ArticlePubMed CentralPubMedGoogle Scholar
- Xiang H, Wang J, Hisaoka M, Zhu X. Characteristic sequence motifs located at the genomic breakpoints of the translocation t(12;16) and t(12;22) in myxoid liposarcoma. Pathology. 2008;40(6):547–52.View ArticlePubMedGoogle Scholar
- Nambiar M, Raghavan SC. How does DNA break during chromosomal translocations? Nucleic Acids Res. 2011;39(14):5813–25.View ArticlePubMed CentralPubMedGoogle Scholar
- Steiner WW, Davidow PA, Bagshaw AT. Important characteristics of sequence-specific recombination hotspots in Schizosaccharomyces pombe. Genetics. 2011;187(2):385–96.View ArticlePubMed CentralPubMedGoogle Scholar
- Weinreb A, Katzenberg DR, Gilmore GL, Birshtein BK. Site of unequal sister chromatid exchange contains a potential Z-DNA-forming tract. Proc Natl Acad Sci U S A. 1988;85(2):529–33.View ArticlePubMed CentralPubMedGoogle Scholar
- Panagopoulos I, Lassen C, Isaksson M, Mitelman F, Mandahl N, Aman P. Characteristic sequence motifs at the breakpoints of the hybrid genes FUS/CHOP, EWS/CHOP and FUS/ERG in myxoid liposarcoma and acute myeloid leukemia. Oncogene. 1997;15(11):1357–62.View ArticlePubMedGoogle Scholar
- Wells RD, Dere R, Hebert ML, Napierala M, Son LS. Advances in mechanisms of genetic instability related to hereditary neurological diseases. Nucleic Acids Res. 2005;33(12):3785–98.View ArticlePubMed CentralPubMedGoogle Scholar
- Burge S, Parkinson GN, Hazel P, Todd AK, Neidle S. Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res. 2006;34(19):5402–15.View ArticlePubMed CentralPubMedGoogle Scholar
- Kanoe H, Nakayama T, Hosaka T, Murakami H, Yamamoto H, Nakashima Y, et al. Characteristics of genomic breakpoints in TLS-CHOP translocations in liposarcomas suggest the involvement of Translin and topoisomerase II in the process of translocation. Oncogene. 1999;18(3):721–9.View ArticlePubMedGoogle Scholar
- Wu GA, Prochnik S, Jenkins J, Salse J, Hellsten U, Murat F, et al. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotech. 2014;32(7):597–698.View ArticleGoogle Scholar
- Szymkowiak EJ, Sussex IM. What chimeras can tell us about plant development. Annu Rev Plant Physiol Plant Mol Biol. 1996;47:351–76.View ArticlePubMedGoogle Scholar
- Wells RD. DNA triplexes and Friedreich ataxia. FASEB J. 2008;22(6):1625–34.View ArticlePubMedGoogle Scholar
- Tang W, Dominska M, Greenwell PW, Harvanek Z, Lobachev KS, Kim HM, et al. Friedreich’s ataxia (GAA)n*(TTC)n repeats strongly stimulate mitotic crossovers in Saccharomyces cerevisae. PLoS Genet. 2011;7(1):e1001270.View ArticlePubMed CentralPubMedGoogle Scholar
- Napierala M, Dere R, Vetcher A, Wells RD. Structure-dependent recombination hot spot activity of GAA.TTC sequences from intron 1 of the Friedreich’s ataxia gene. J Biol Chem. 2004;279(8):6444–54.View ArticlePubMedGoogle Scholar
- Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet. 2008;40(9):1124–9.View ArticlePubMedGoogle Scholar
- Zhou T, Hu Z, Zhou Z, Guo X, Sha J. Genome-wide analysis of human hotspot intersected genes highlights the roles of meiotic recombination in evolution and disease. BMC Genomics. 2013;14:67. 2164-14-67.View ArticlePubMed CentralPubMedGoogle Scholar
- Bacolla A, Wojciechowska M, Kosmider B, Larson JE, Wells RD. The involvement of non-B DNA structures in gross chromosomal rearrangements. DNA Repair (Amst). 2006;5(9–10):1161–70.View ArticleGoogle Scholar
- Mirkin SM, Lyamichev VI, Drushlyak KN, Dobrynin VN, Filippov SA, Frank-Kamenetskii MD. DNA H form requires a homopurine-homopyrimidine mirror repeat. Nature. 1987;330(6147):495–7.View ArticlePubMedGoogle Scholar
- Frank-Kamenetskii MD, Mirkin SM. Triplex DNA structures. Annu Rev Biochem. 1995;64:65–95.View ArticlePubMedGoogle Scholar
- Follonier C, Oehler J, Herrador R, Lopes M. Friedreich’s ataxia-associated GAA repeats induce replication-fork reversal and unusual molecular junctions. Nat Struct Mol Biol. 2013;20(4):486–94.View ArticlePubMedGoogle Scholar
- Levin HL, Moran JV. Dynamic interactions between transposable elements and their hosts. Nat Rev Genet. 2011;12(9):615–27.View ArticlePubMed CentralPubMedGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.View ArticlePubMedGoogle Scholar
- Roudier F, Schindelman G, DeSalle R, Benfey PN. The COBRA family of putative GPI-anchored proteins in Arabidopsis a new fellowship in expansion. Plant Physiol. 2002;130(2):538–48.View ArticlePubMed CentralPubMedGoogle Scholar
- Cao Y, Tang X, Giovannoni J, Xiao F, Liu Y. Functional characterization of a tomato COBRA-like gene functioning in fruit development and ripening. BMC Plant Biol. 2012;12:211. 2229-12-211.View ArticlePubMed CentralPubMedGoogle Scholar
- Ehrenberg L. Factors influencing radiation induced lethality, sterility, and mutations in barley. Hereditas. 1955;41(1–2):123–46.Google Scholar
- Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, et al. Building the sequence map of the human pan-genome. Nat Biotechnol. 2010;28(1):57–63.View ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. 1000 genome project data processing subgroup: the sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Rios G, Naranjo MA, Iglesias DJ, Ruiz-Rivero O, Geraud M, Usach A, et al. Characterization of hemizygous deletions in citrus using array-comparative genomic hybridization and microsynteny comparisons with the poplar genome. BMC Genomics. 2008;9:381. 2164-9-381.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.