Volume 17 Supplement 13
Between-species differences in gene copy number are enriched among functions critical for adaptive evolution in Arabidopsis halleri
© The Author(s) 2016
Published: 22 December 2016
Gene copy number divergence between species is a form of genetic polymorphism that contributes significantly to both genome size and phenotypic variation. In plants, copy number expansions of single genes were implicated in cultivar- or species-specific tolerance of high levels of soil boron, aluminium or calamine-type heavy metals, respectively. Arabidopsis halleri is a zinc- and cadmium-hyperaccumulating extremophile species capable of growing on heavy-metal contaminated, toxic soils. In contrast, its non-accumulating sister species A. lyrata and the closely related reference model species A. thaliana exhibit merely basal metal tolerance.
For a genome-wide assessment of the role of copy number divergence (CND) in lineage-specific environmental adaptation, we conducted cross-species array comparative genome hybridizations of three plant species and developed a global signal scaling procedure to adjust for sequence divergence. In A. halleri, transition metal homeostasis functions are enriched twofold among the genes detected as copy number expanded. Moreover, biotic stress functions including mostly disease Resistance (R) gene-related genes are enriched twofold among genes detected as copy number reduced, when compared to the abundance of these functions among all genes.
Our results provide genome-wide support for a link between evolutionary adaptation and CND in A. halleri as shown previously for Heavy metal ATPase4. Moreover our results support the hypothesis that elemental defences, which result from the hyperaccumulation of toxic metals, allow the reduction of classical defences against biotic stress as a trade-off.
KeywordsCross-species Array-CGH Metal hyperaccumulation CNV Arabidopsis halleri Toll-Interleukin Receptor-Nucleotide Binding Site-Leucine Rich Repeat (TIR-NBS-LRR) protein family Resistance genes (R genes)
Genetic and epigenetic variation form the basis for local adaptation and speciation processes, and are becoming increasingly accessible through advances in genomic and bioinformatic tools. The advent of microarray and ultra-high throughput sequencing (UHTS) technologies have thus brought about a renewed interest in evolutionary questions, with a prospect for gaining novel insights at the whole-genome level. These opportunities have spurred genome-wide surveys of single nucleotide polymorphisms (SNPs)  and methylation polymorphisms in many organisms including plants, for example in multiple accessions of the genetic model organism Arabidopsis thaliana and in closely related species [2–6]. In attempts to identify causative genetic changes in plant adaptations, classical linkage analysis and genome-wide association studies (GWAS) have successfully mapped traits governing the performance under local environmental conditions to SNPs at specific loci [7, 8]. Structural variation in the form of gene copy number variation (CNV) polymorphism is an influential component of natural genetic diversity that markedly contributes to phenotypic variation . However, CNV has been addressed in noticeably fewer studies because of technical difficulties in its comprehensive and reliable assessment [10, 11].
Short CNVs consisting of insertions or deletions below 1 kb in size can be readily detected based on UHTS technologies. However, the identification of CNVs comprising from 1 kb up to one or multiple genes has generally remained challenging. Genome-wide analyses in human and other mammalian model organisms revealed CNVs to be much more abundant than previously known, e.g. affecting 10% of the mouse genome and 12% of the human genome (reviewed in ). CNVs have been implicated in human disease etiology, and evidence for adaptive CNVs is also emerging . In comparison to mammalian genomes, gene duplications and deletions especially from whole genome duplications appear to be even more abundant in plant genomes . Single-gene and segmental duplications as well as whole-genome duplications have been hypothesized to propel adaptive evolution and speciation. In plants, this view is supported by recent reports on cultivar-specific boron tolerance in barley , aluminium tolerance in maize  and species-wide heavy metal tolerance in the wild plant Arabidopsis halleri [17, 18], all supporting the role of gene copy number expansion in plant adaptation to abiotic stress. Population genomic data, for example from Arabidopsis thaliana and Zea mays, have identified an unexpectedly high abundance of CNVs [11, 19], generating interest in the contribution of structural mutations to genome plasticity. Ten percent of maize genes were found to exhibit copy number polymorphisms, and an experimental evolution study in A. thaliana reported de novo structural mutations resulting in 400 copy number variant genes after only 5 generations . Although between-species genome comparisons have remained difficult to date, the few existing studies have supported the hypothesis that gene copy number expansions, and especially those involving tandem duplications , might underlie plant adaptations to environmental stress . Given that novel functions are much more likely to be generated by adaptive specialization of one of several pre-existing copies of a duplicated gene than by an entirely novel gene [23, 24], such comparative studies are key to understanding the patterns of genomic polymorphisms associated with adaptation and speciation.
Previous cross-species transcriptomics studies identified a number of differentially expressed candidate genes for the metal hyperaccumulation/hypertolerance trait of A. halleri [30–32]. Among these, Heavy Metal ATPase 4 (HMA4), which encodes a metal pump that acts as an exporter of Zn2+ and Cd2+ from specific cell types, was shown to be necessary both for the hyperaccumulation of Zn and for the full extent of Zn and Cd hypertolerance . Strongly increased HMA4 transcript levels in A. halleri were attributed to a lineage-specific tandem triplication combined with cis-regulatory mutations . An analysis of sequence polymorphism in the genomic region of HMA4 gene copy number expansion demonstrated strong positive selection, as well as selection for enhanced HMA4 gene product dosage . Another candidate gene, Nicotianamine Synthase 2 (NAS2), encodes an enzyme that catalyses the biosynthesis of the low-molecular-weight metal chelator nicotianamine from S-adenosyl methionine, and was shown to contribute to Zn hyperaccumulation . In addition to HMA4, several other transition metal homeostasis candidate genes of A. halleri were demonstrated to be copy number expanded through the DNA gel (Southern) blot technique [31, 34].
The objective of the work presented here was to identify genes exhibiting copy number expansion in A. halleri at a genome-wide scale, in relation to the known species-specific extreme traits. We conducted a survey of gene copy number divergence (CND) in A. halleri relative to A. thaliana by employing array-comparative genomic hybridization (array-CGH) in a cross-species manner using the ATH1 microarray designed for A. thaliana. In order to test whether the identified CNDs are species- and thus possibly trait-specific, our analysis included A. lyrata as a third species, which is a non-tolerant non-hyperaccumulator like A. thaliana, but shares with A. halleri an equal phylogenetic distance from A. thaliana. We devised a novel routine for evaluating cross-species array-CGH data, which is based on the quantification and subsequent global correction of the effects of sequence divergence on hybridization signal intensities. Our procedure operates without loss of probe information, which is crucial for retaining statistical power for CNV estimation further downstream. Our predictions of genic CNDs were validated against a small set of genes with known copy number in A. halleri  and against a set of genes predicted to be copy number expanded or reduced according to the A. lyrata reference genome sequence . Gene copy number expansions in A. halleri, but not in A. lyrata, were found to be significantly enriched for metal homeostasis functions. Conversely, biotic stress functions were significantly enriched among genes exhibiting copy number reduction in A. halleri, but not in A. lyrata. These results suggest that between-species divergence in gene copy numbers reflects adaptive evolution of metal hyperaccumulation, a species-specific trait of A. halleri that has been proposed to provide an elemental defence against biotic stress [36, 37].
Metal hyperaccumulation and hypertolerance in A. halleri have previously been attributed to the constitutively enhanced expression of a number of metal homeostasis genes, several of which were additionally shown to be expanded in genomic copy number through DNA gel blots , BAC sequencing [18, 38] or other methods . Here, the technique of cross-species array-CGH was employed for a genome-wide assessment of between-species divergence in gene copy number. Fragmented and labelled nuclear genomic DNA samples from A. thaliana and from the two closely related heterologous species A. halleri and A. lyrata, were hybridized in duplicate to Affymetrix ATH1 GeneChip®; microarrays (see Fig. 1) .
To implement our strategy, we began by establishing a representative subset of probes for which curated sequence data was available from A. halleri, termed reference dataset (see Fig. 2; Additional file 1). A similar reference dataset was also generated at random from the available reference genome sequence of A. lyrata. For each heterologous target species, the signal correction factor was calculated from the statistical distribution of the occurrence of mismatches and the ensuing effect on hybridization signal intensity as measured in the respective reference dataset. Subsequently, the normalized and corrected cross-species hybridization data were analysed for differential signals between species in order to identify putative copy number divergent genes. Finally, a comparison between copy number alterations in A. lyrata and A. halleri enabled us to identify species-specific copy number alterations.
Consequences of inter-species sequence divergence for mismatch occurrence between probe sequences and heterologous target sequences
We computed the expected distribution of the total number of mismatches of a given nucleotide sequence with respect to the probe sequence on the microarray (See Methods). The expected distribution (binomial) of mismatches in cross-species hybridization of A. halleri gDNA closely matched our observations for the reference dataset (Pearson’s χ 2=0.985, df = 4, P = 0.37; Fig. 3). Both the observed and expected probabilities of the occurrence of more than 4 mismatches by comparison to the probe sequence were negligible (3 and 6% for A. halleri and A. lyrata reference dataset, respectively, and 1.5% expected). Finally, the observed mismatch distributions for A. halleri and A. lyrata were highly similar to each other (Pearson’s χ 2= 0.979, df = 4, P = 0.44), thus confirming similar levels of sequence divergence from A. thaliana (see Fig. 1) and allowing us to use the expected distribution of mismatches calculated for A. halleri also in A. lyrata hybridizations.
Both the A. halleri and A. lyrata lineages are thought to have diverged from the common ancestor with A. thaliana at the same point in the past (see Fig. 1). Our observations support the theory that a correlation exists between the levels of sequence divergence and actual phylogenetic distances between species, as was estimated, for example, based on cross-species array-CGH data .
Quantification of effects of sequence mismatches on signal intensities in cross-species microarray hybridization
Surprisingly, we observed a noisy profile of signal intensity over different positions of a single mismatch along the probe sequence instead (Fig. 4 b). The expected sharp drop in signal intensity when a single mismatch is positioned in the centre (13th nucleotide) of a probe sequence, as proposed by Affymetrix for so-called mismatch (MM) probes , was not detected here. This finding is in agreement with a number of previous studies [48, 49], which have pointed out that experimental data do not conform to this postulate and that, in fact, for some probes signal intensity was even found to be higher for MM probes than for perfectly matching (PM) probes . Consequently, position-based effects on hybridization signal intensity are hard to construct, and accordingly, the most popular normalization methods no longer take the information from MM probes into account. Therefore, for our between-species normalization strategy, we did not consider the influence of sequence mismatch position. We estimated the incremental signal correction factor S k for a probe with k mismatches as the average of the ratio of normalized hybridization signal intensity of A. thaliana to the respective signal intensity of the heterologous species. For each probe containing 0 to 4 mismatches, incremental signal correction factors were weighted by their probability of occurrence (see Fig. 3), followed by the calculation of the arithmetic mean to yield species-specific global scaling factors (see Fig. 2). These global signal correction factors of 1.22 for hybridizations of A. halleri gDNA and 1.13 for hybridizations of A. lyrata gDNA were employed to scale the hybridization signal intensities of the respective cross-species microarray hybridizations.
Cross-species normalization and validation of copy number divergent genes
Validation of array-CGH results against highly conserveda genes predicted to be copy number expanded (CNEs) in A. lyrata
Array-CGH prediction method
Total number of CNEs detected
No. of CNEs detected (117 ; 98)b
Sensitivity (% positives detected out of predicted positives)
Specificity (% negatives detected out of predicted negatives)
Precision (% true positives out of total no. detected)c
Darby et al. 2011d
Machado et al. 2010e
We compared the performance of our array-CGH based approach with that of the two previous studies that also aimed at estimating CND using the array-CGH technique [42, 43]. We reproduced the normalization and scaling strategies of Machado and Renn (2010) and Darby et al. (2011) as described [42, 43], with few small modifications necessary to apply these methods to our array-CGH platform (see Methods). The method of Darby et al. (2011) resulted in the prediction of a 2.47-fold elevated number of gene copy number expansions. Out of the two previously published methods, maximum sensitivity, specificity and precision of the detection of copy number expansion among highly conserved genes were 8.5, 99.5 and 2.1%, respectively, all inferior to our method (10.3%, 99.5%, 5.3%, Table 1). Even for the genes that are not highly conserved but predicted to be copy number expanded concordantly by both Ensembl Plants and the A. lyrata genome project, our method reports higher sensitivity, specificity and precision – 5, 99.1 and 8.8% respectively than previous studies [42, 43] – 3.9, 98.7 and 3.7% (Additional file 7A). Specificity and precision of our method were also superior concerning copy number reductions or gene deletions (Additional file 7B).
Functional analysis of copy number divergent genes of A. halleri
Genes identified to be altered in copy number in A. halleri through cross-species hybridization of gDNA onto A. thaliana microarrays
Affymetrix probeset ID
AGI locus ID
Short gene namea
A. halleri vs. A. thaliana
A. lyrata vs. A. thaliana
L o g 2 FCb
L o g 2 FCb
Copy number expanded in A. halleri vs. A. thaliana(L o g 2 F C> 1)
Mitoferrin-related, mitochondrial solute carrier (MSC) family
Plant cadmium resistance 1/2
Member of Na+/H+ antiporter family
Vacuolar iron transporter 1 (VIT1)-related
Ricinus iron transport protein 2-related
Cation/H+ exchanger 2
ZRT-, IRT-like protein 6
Metal transport/tolerance protein 1
Vacuolar iron transporter 1 (VIT1)-related
S-adenosylmethionine synthetase 2
Cadmium-induced protein AS8
Heavy metal ATPase 3
Homologue of yeast copper chaperone Sco1xx
Nicotianamine synthase 2
Putative copy number expanded in A. halleri vs. A. thaliana(L o g 2 ≤ 1 and > 0.6)
Yellow stripe like transporter 4
ZRT-, IRT-like$ protein 3
Plastid transcriptionally active 17, putative Zn metallochaperone
Trypsin inhibitor 1, defensin-like protein family
Putative metallochaperone-like protein
Putative metallochaperone-like protein
Protein with NAS domain and KH domain
Farnesylated protein 3, metal-binding
Copper transporter 4, embryo defective 1513
Vacuolar iron transporter 1 (VIT1)-related
O-acetylserine thiol lyase isoform A2
ZRT-, IRT-like$ protein 9
Plant defensin 2.6
S-adenosylmethionine synthetase 3
Metal transport protein 7
S-adenosylmethionine synthetase 3
Cation/H+ exchanger, CPA2 family
Plant defensin 1.2b
Cation/H+ exchanger, CPA2 family
Methionine adenosyltransferase 3
Putative metallochaperone-like protein
Ca2+/H+ exchanger 11
Vacuolar iron transporter 1 (VIT1)-related
Plant defensin 1.4
ZRT-, IRT-like$ protein 11
Homologue of yeast copper chaperone Sco1 ∞
Affymetrix probeset ID
AGI locus ID
Short gene name
A. halleri vs. A. thaliana
A. lyrata vs. A. thaliana
L o g 2 FCb
L o g 2 FCb
Complex I & 23 kDa subunit; α-helical ferredoxin
Complex I & B12 subunit
Dicarboxylate carrier 2
ATP synthase β-subunit
Complex I & 14 kDa subunit; Fe-S subunit 5
Complex I & 17.2 kDa subunit
Complex I & 12 kDa subunit NDUFS6; PDSW subunit
ATP synthase D chain
Cytochrome c biogenesis protein family
Complex I & subunit of the 400 kDa subcomplex; Embryo defective 1467
Orthologue of E. coli CcmE heme chaperone in cytochrome c maturation
Homologue of yeast copper chaperone Sco1 ∞
Complex I & Prohibitin 6; Prohibitin 1
Complex I & LYR family of Fe/S cluster biogenesis protein
Affymetrix probeset ID
AGI locus ID
Short gene namea
A. halleri vs. A. thaliana
A. lyrata vs. A. thaliana
L o g 2 FCb
L o g 2 FCb
Disease resistance protein (CC-NBS-LRR# class) family
Disease resistance protein (TIR-NBS-LRR# class) family
TIR domain-containing protein
LRR and NB-ARC+ domains-containing disease resistance protein
Receptor like protein 40/42
Defensin-like (DEFL) family
NB-ARC+ domain-containing disease resistance protein
Tolerant to tobacco ringspot nepovirus
Receptor like protein 34/53
Chitinase family protein
TIR domain transmembrane protein
Among the genes predicted to be copy number reduced according to our array-CGH results, we observed an overrepresentation of biotic defence functions (see Fig. 6, Table 2C). This is an interesting observation with respect to the proposed ecological role of metal hyperaccumulation in plants (see Discussion). Almost all biotic stress-related genes predicted to be copy number reduced in A. halleri encode members of large protein families typically involved in plant innate immunity and designated as disease Resistance (R) genes, such as the predominating TIR-NBS-LRR receptor kinases (see Table 2C). Infecting pathogens generate characteristic molecular patterns that can be specifically recognized by cognate R gene products, which subsequently trigger a localized cell death response that is essential for plant disease resistance. The enrichment of R gene-related biotic stress functions among genes reduced in copy number in the hyperaccumulator A. halleri supports the elemental defence hypothesis as well as the trade-off hypothesis for the evolutionary role of elemental defence. Accordingly, elemental defences through metal hyperaccumulation allow for the loss of R genes, thus alleviating the fitness costs associated with R gene expression.
At lower levels of the ontological hierarchy, post-translational modification functions, and genes encoding glycine-rich and crinkly-like proteins were over-represented among the copy number expanded genes of A. halleri, whereas DC1 domain-containing, as well as protein degradation/E3-SCF-F-box and cysteine protease functions were significantly enriched among the genes predicted to be copy number reduced. Furthermore, genes predicted to be copy number reduced in both A. halleri and A. lyrata according to our array-CGH, showed a significant enrichment of the MapMan functional category or BIN “DNA.synthesis/chromatin structure.retrotransposon/transposase” (20.4% in A. halleri, 38.7% in A. lyrata; 1.5% of all genes on array; data not shown). Contrary to this observation, it is known that transposable element gene families have undergone an expansion in the A. lyrata genome relative to A. thaliana .
Contributions of segmental duplications or deletions to copy number divergence
CNV detection pipelines employing arrays are still generally considered to be more accurate than sequencing-based algorithms [59–61], particularly for the detection of large duplicated segments or duplications of high sequence conservation [60, 61]. De novo assembly of non-model genomes often collapses paralogous gene copies into a single locus. While the traditional shotgun assembly resulted in a pronounced under-detection of highly identical gene CNV regions of sizes > 1 kb , de novo assembly of short reads missed 99%  of all known sequence duplications . By contrast, the higher the similarity between paralogs, the higher is the probability of their detection by hybridization-based methods. This makes array-CGH methods particularly useful for detecting recent gene duplications or gene copy number divergence in emergent species. Recent and thus almost identical gene duplicates are expected to underlie the distinctive traits of species after short divergence times of 4 to 6 Mya, such as human and apes, or A. halleri and A. thaliana, as in our study . Cross-species array-CGH methods thus fill a gap in the identification of gene CNVs and should be considered an important complementary approach to other CNV detection methods and for genome annotation.
Previously, array-CGH techniques were demonstrated to be quick, reliable and cost-effective in the detection of CNVs within the same species or lineage. Although whole genome tiling arrays are most widely used for this purpose, cross-species comparative genomic studies have also used gene expression arrays to compare the coding portions of genomes, for example the identification of genomic islands of speciation in three diverging populations of Anopheles gambiae  and the identification of human-specific gene duplications and contractions at a genome-wide level [65, 66]. The use of expression arrays for determining inter-species genic copy number divergence via array-CGH was generally demonstrated to give reliable and reproducible results [41, 45]. To date, Affymetrix ATH1 microarrays have been used in cross-species genomic DNA hybridizations for selecting a subset of probes for use in subsequent cross-species transcriptomics studies, but not with the aim to determine CNVs . Conversely, combined with the global scaling approach developed here, array-CGH data could be used for adjusting hybridization signals in cross-species transcriptomics in order to improve accuracy without the loss of probe information. By comparison to previously published studies, besides being the only one in plants, our study decidedly benefits from the choice of species that are closely related to the reference species.
Our method yields erroneous results when the probes on ATH1 gene chip have been sourced from genomic regions prone to high sequence divergence such as transposable elements. We found an enrichment of retrotransposon/transposase functional category in the CNRs, contrary to the known expansion of TE families in A. lyrata relative to A. thaliana. We calculated an average of 39.1% sequence divergence between the ATH1 probe sequences corresponding to A. thaliana transposable element genes and the corresponding sequences of the A. lyrata reference genome, identified as the best blast hits. We thus propose that our result can be explained by the high degree of sequence divergence of transposable element gene families from A. thaliana in both heterologous species, as they diverged from A. thaliana at approximately the same time. Indeed, other sequence based orthology methods like Ensembl-Compara also report the TE genes as missing orthologs in A. lyrata . Prior to the publication of the A. lyrata genome sequence, several population genomic studies predicted a reduction in TE copy number in comparison to A. thaliana [69, 70]. Similarly, a well-known CNE gene in A. halleri - HMA4 was not detected by our methodology. On the ATH1 GeneChip, probe sequences for HMA4 are located within the 3-́region of the HMA4 coding sequence, which encodes the C-terminal cytoplasmic regulatory domain of the protein and is highly divergent in A. halleri (22% nucleotide sequence divergence compared to A. thaliana). This unusually high nucleotide sequence divergence explains our false negative result for HMA4 copy number expansion. Additionally, a significant enrichment of genes “Not assigned” were found among copy number reduced genes of both A. halleri and A. lyrata. Among our predictions of CNRs common to both heterologous target species, we would generally expect false positives corresponding to genes that are highly divergent from their A. thaliana orthologues and that belong to fast-evolving groups of genes.
The sensitivity of our method for copy number reductions (CNRs) also appears to be lower than other methods but this is because employing the procedures of both previous studies resulted in the prediction of a substantial, more than 26-fold excess of gene copy number reductions or deletions (Additional file 6) when compared to the high-confidence set of predictions shared by both Ensembl Plants  and the A. lyrata reference genome project . Thus, specifically accounting for inter-species sequence divergence in the analysis of cross-species array-CGH data enhances accuracy and prevents the under-reporting of true gene orthologs. We also note that our superior results in comparison to previous methods might reflect the fact that each method is most suited to the array platform and scientific goal for which it was developed. For example, single base substitutions are likely to be less deleterious in hybridizations to 500-bp probes on spotted cDNA arrays  than to the much shorter 25-nt probes of the ATH1 array. Different from the ATH1 array, tiling arrays do not require any summarization of signals from single probes within a probeset .
Extrapolating from our results that several genes known to be highly-expressed in A. halleri relative to A. thaliana are also copy number expanded, there might be a general association of gene copy number expansion with enhanced transcript levels. Indeed, one of the ways in which gene copy number divergence can confer an adaptive advantage is by effecting changes in gene product dosage, with a high probability of generating a physiological effect. In support of this, drastic phenotypes have been demonstrated to result from modification of transcript levels resulting from alterations in gene copy number [17, 71–73]. Regarding the enrichment of mitochondrial electron transport functions in CNE genes of A. halleri, a function of mitochondrial proteins in metal hypertolerance or hyperaccumulation has not been reported to date, but complex III of the mitochondrial electron transport chain was proposed to be a major molecular target for Cd toxicity in plants . Alternatively, our observation may relate to metabolic alterations allowing enhanced dicarboxylate accumulation. Malate and citrate have been identified as quantitatively important metal chelators in hyperaccumulator plants [75, 76], thought to operate mostly inside vacuoles and in the apoplastic xylem. An important role of these anions at storage sites might be supplying a charge balance rather than chelation. Finally, these copy number expansions could reflect a need for enhanced mitochondrial respiration to energize processes involved in metal hyperaccumulation or hypertolerance. Future experiments will have to address these alternative hypotheses on the possible roles of copy number expanded genes associated with mitochondrial electron transport and ATP synthesis in metal hypertolerance or hyperaccumulation or other traits of A. halleri.
The biotic defense functions were found enriched in CNR genes of A. halleri. The elemental defence hypothesis postulates that the extraordinarily high concentrations of metals accumulated in leaves of metal hyperaccumulator plants act as a defence against herbivory or pathogen attack [36, 37]. Furthermore, the trade-off hypothesis postulates that metal hyperaccumulation-based defences allow plants to reduce the production and thus fitness costs of secondary metabolite or protein-based organic defences, incurred through their direct metabolic expense or indirect metabolic expense in scarce nutrients such as sulfur and nitrogen . A quantitative comparison of metabolic expenses of organic and elemental defences, however, has not been possible to date. Indeed, secondary metabolism-related functions were not found to be enriched among copy number reduced genes of A. halleri in this study. Thus, our results do not provide support for the commonly proposed metabolic cost trade-off hypothesis regarding elemental defences in hyperaccumulators.
A fitness cost of up to 9% was reported for the expression of a single R gene, RPM1, in A. thaliana . Recent work has identified plant R genes as an important group of genes underlying instances of hybrid necrosis [78, 79]. A. halleri is an obligate outcrosser, for which post-zygotic hybrid incompatibility is expected to have more severe consequences than for its largely selfing relative A. thaliana. In metal hyperaccumulator plants, fitness costs of R gene expression could additionally involve the recently proposed inadvertent activation of R protein-mediated defence signalling by internal heavy metals . The loss of R genes in A. halleri can be interpreted to suggest that for a plant capable of metal hyperaccumulation, threats from biotic stress are reduced on soils permitting the hyperaccumulation of heavy metals [80–82]. Our results thus suggest that CNV resulting from structural mutations are important for plant evolution occurring upon interaction with both biotic stress and abiotic environmental factors. Genetic alterations similar to those reported here for A. halleri might thus underlie other reports of niche adaptation and consequent speciation driven by herbivore pressure alone, or in combination with soil edaphic factors [80, 81].
In this work, we generated conservative estimates of genome-wide between-species gene copy number divergence in two close relatives of the model plant A. thaliana, based on cross-species microarray hybridization data. We devised a novel normalization strategy, which requires less existing sequence data, yields more comprehensive results and performs better than previously developed methods. Functions in transition metal homeostasis were found to be most highly enriched among copy number expanded genes in the metal hyperaccumulator A. halleri, but not in the non-accumulating sister-species A. lyrata, despite a shared overall genome structure and an equivalent evolutionary divergence from A. thaliana. In combination with the enrichment of biotic stress functions among copy number reduced genes of A. halleri also observed here, our finding lends support to the elemental defence hypothesis for the evolution of the metal hyperaccumulation trait . Our results highlight the genome-wide importance of gene copy number alterations in adaptive evolution and suggest that genome scans for copy number divergence can identify functional networks that have been targets of natural selection. We propose that our findings are applicable in ecotoxicology for identifying the types and targets of environmental change-mediated stress in suitable indicator organisms. Finally, our study has identified novel candidate genes for the future improvement of the molecular physiological understanding of metal hyperaccumulation and associated hypertolerance in plants.
Two samples of leaf material of Arabidopsis halleri ssp. halleri (accession Langelsheim) were obtained, respectively, from one cloned individual (W 504) , and from 10 pooled F1 progeny grown from seeds of a controlled, reciprocal crosses between two Langelsheim individuals (Lan3.1 and Lan5) . Arabidopsis lyrata ssp. petrea (accession Kubova Hut, kindly provided by Marc Macnair, University of Exeter) and Arabidopsis thaliana (accession Col-0) were grown from seed, and leaf material was pooled from 10 individuals, respectively. A. thaliana was cultivated on standard soil, and the other two species were cultivated hydroponically .
Genomic DNA isolation
Genomic DNA was isolated, fragmented and end-labelled with Bio-N6-ddATP according to Borevitz et al. , with some modifications of the procedure. Total genomic DNA was isolated from 4 to 6 g fresh biomass of plant leaf tissue with cetyl trimethylammonium bromide (CTAB) buffer (0.8% (w/v) CTAB, 800 mM NaCl, 1% (w/v) N-laurylsarcosine, 140 mM sorbitol, 22 mM EDTA, 220 mM Tris pH 8). Frozen plant tissue was ground to a fine powder in liquid N2, transferred to a 50 ml tube containing 30 ml CTAB buffer and incubated at 65 °C with occasional vigorous shaking for 20 min. After addition of 12 ml chloroform/isoamylalcohol (24:1) and vigorous mixing, tubes were placed at room temperature (RT) on an inverter for 20 min. After centrifugation in a table-top centrifuge at 3,700 g for 5 min, the aqueous phase was transferred to a fresh tube, 1 vol. isopropanol was added and nucleic acids were precipitated on ice for 30 min. After centrifugation at 9,500 rpm (rotor: JA25-50), 4 °C for 8 min, the supernatant was drained and the pellets were resuspended in 6 ml ultrapure H2O. 1 vol. 4 M LiAc was added and the samples were incubated on ice for 20 min to precipitate RNA. After centrifugation at 9,500 rpm (rotor: JA25-50), 4 °C for 11 min, the supernatant was transferred to a fresh tube, 2 vol. ethanol were added and the samples were placed at RT for a few seconds. To collect the precipitate, samples were centrifuged at 12,000 rpm (rotor: JA25-50), 4 °C for 20 min. Pellets were resuspended in 1.35 ml ultrapure H2O, followed by the addition of 150 μl 3 M sodium acetate. Each sample was then split between two 2-ml eppendorf vials. One vol. of phenol/chloroform/isoamylalcohol (25:24:1) was added, vial contents were mixed by shaking, and subsequently centrifuged in a tabletop microcentrifuge at 14,000 rpm (> 16,800 g) for 5 min to resolve phases. The aqueous phase was collected, 2 vol. ethanol were added and vials were placed on ice for 5 min. The precipitate was collected by centrifugation in a tabletop microcentrfuge at 14,000 rpm (> 16,800 g) for 5 min. Each pellet was washed with 80% (v/v) ethanol and dried at 37 °C for 10 min. Pellets were resuspended by gentle pippetting and combined for each sample in 300 μl ultrapure H2O. Nucleic acid concentration and purity was determined spectrophotometrically by measuring absorption at 260 and 280 nm and by subjecting 4 μg DNA of each preparation to agarose gel electrophoresis.
Genomic DNA fragmentation, 3’-end labelling with biotin and microarray hybridization
Nucleic acid (600 μg) were digested with 10 U RNase (RNase ONE Ribonuclease, Promega) in 500 μl total volume at 37 °C for 15 min. After RNase digest, 50 μl 3 M sodium acetate were added to the samples, followed by extraction with 1 vol. phenol/chloroform/isoamylalcohol. The aqueous phase was collected, 2 vol. ethanol were added and samples were placed on ice for 5 min. The precipitate was collected by centrifugation in a tabletop microcentrifuge at 20,800 g, 4 °C for 15 min. The pellet was washed with 80% (v/v) ethanol and dried. Pellets were resuspended in 200 μl ultrapure H2O by gentle pipetting. Genomic DNA concentration and purity were determined spectrophotometrically by measuring absorption at 260 and 280 nm and by subjecting 1 μg of each preparation to agarose gel electrophoresis. 20 μg of genomic DNA were fragmented with 0.33 U DNase (RQ1 RNase-free DNase, Promega) for 4 min at 37 °C, in a reaction containing 1x One-Phor-All Buffer Plus (Amersham Biosciences) and 1.5 mM CoCl2 in a total volume of 35 μl. To ensure an equal start and length of all reactions, the DNase was placed in a small drop at the inner wall of each reaction tube and spun down into the reaction mix. All reactions were briefly vortexed and spun down before they were incubated at 37 °C in a water bath. Immediately after the digest, the DNase was heat-inactivated at ≥ 95 °C for 15 min. After heat-inactivation, the reaction tubes were placed on ice. DNA digestion was confirmed by subjecting 3 μl of a reaction to DNA-gel electrophoresis on a 2% (w/v) agarose gel. Oligonucleotides of 22 and 50 bp length were used as size markers. A broad band indicated good digestion with fragments between 20 and 50 bp. The genomic DNA fragments were 3′-end labelled with biotin by adding 40 U terminal deoxynucleotidyl transferase (Promega) and 2 μl of 1 mM Bio-N6-ddATP (Enzo Life Sciences, USA) to the remaining 32 μl of the fragmentation reaction and incubating the samples at 37 °C in the dark for 1 h. Hybridization was conducted according to the standard protocol for hybridizing fragmented cRNA to the Affymetrix ATH1 GeneChip , using 20 μl of the fragmented and labelled genomic DNA instead of fragmented cRNA. Hybridized arrays were washed and stained in an Affymetrix Fluidics Station FS450, and the fluorescent signals were measured with an Affymetrix GeneChip Scanner 3000 using standard protocols provided by the manufacturer.
Reference datasets for A. halleri and A. lyrata
ATH1 GeneChip®; microarray contains between 11 and 20 probe pairs per gene in a probeset, each probe comprising 25 nucleotides, for each of 23,725 A. thaliana genes. Because of the design of the ATH1 microarray for genome-wide quantification of transcript levels in A. thaliana, all probes correspond to transcribed regions of the A. thaliana genome. The available sequences of 39 nuclear-encoded cDNAs of A. halleri ssp. halleri, accession Langelsheim, were used to assess the effect of nucleotide sequence divergence of heterologous species on signal intensities when hybridizing to probes designed for A. thaliana. After removing known copy number expanded genes of A. halleri (Additional file 5), 33 genes/probesets with a total of 273 probe sequences on the ATH1 GeneChip served as a representative A. halleri reference dataset (Additional file 1). We generated a corresponding reference dataset for A. lyrata, exploiting the sequence information from the A. lyrata reference genome . We generated a custom BLAST database (ALyDB) from the fasta sequences of the FilteredModels6 gene models track from the A. lyrata genome assembly version1. Individual probe sequences from the ATH1 GeneChip were blasted (nucleotide-short) against the ALyDB database with default parameters to report a maximum of 3 best alignments. Out of a total of 22,810 probe sets on the ATH1 GeneChip, we retained 16,626 probe sets that reported the top hit in the same A. lyrata gene for at least 8 of the probe sequences of a given probe set. We then removed probesets mapping to non-nuclear genomic sequences, and we randomly selected 44 probesets corresponding to single-copy genes based on the available information , to serve as representative A. lyrata reference dataset (Additional file 1). The numbers and positions of mismatches between probe sequences and the corresponding sequences of the heterologous target species, A. halleri and A. lyrata, were recorded for both reference datasets (Additional file 2, Additional file 1) in order to determine their effects on hybridization signal intensity in array-CGH.
Assessing and adjusting for sequence divergence in probe sequence hybridization
Evaluation of normalization methods
To choose the most appropriate normalization method, we used a dataset of 14 A. halleri genes for which the gene copy number is reliably known (Additional file 5) through genomic DNA blots (11 genes, ) or taken from literature (2 genes, [18, 38]). The fact that this set of genes is composed primarily of metal homeostasis genes does not bias the validation, because both single-copy genes and copy number expanded genes are represented equally. Out of these 14 genes, 6 are copy number expanded and 8 are present as single copies. We evaluated normalization procedures involving combinations of Quantile  and VSN  normalization algorithms, MAS5  and GCRMA  background corrections and two significance tests — ANOVA (Limma package in R ) and Wilcoxon rank sum test, for each gene. The global scaling factor described above was calculated after each normalization procedure and applied to the hybridization data of A. halleri. Subsequently, we compared the sensitivity (percentage of true positives correctly identified), specificity (percentage of true negatives correctly identified) and precision (percentage of true positives out of all predicted positives) of each procedure, based on the number of genes in Additional file 5 for which copy number state was correctly estimated. We found the combination of MAS5 background correction and VSN normalization, followed by ANOVA significance test at the probe level to perform the best (66.7% sensitivity, 87.5% specificity). It was twice as sensitive as the second best normalization method GCRMA (33.3% sensitivity) while maintaining the same specificity.
Normalization, scaling and estimating genic CNVs
The raw intensity (.CEL format) files from affymetrix GeneChipⓇ ATH1 genome array were loaded into R (version 2.14.1). One of the A. halleri gDNA hybridizations (Mw_101105_03X.CEL) was detected to have some compact defects affecting nearly 2% of the total probes. This defect was corrected using Harshlight package in R , wherein the corresponding probe signal of the replicate GeneChip replaced the values of erroneous probes. We used the affy  and VSN2  package to apply MAS5.0  background correction followed by masking of probes from mitochondrial, chloroplast, control and known multi-copy genes in A. halleri. VSN normalization was applied to the replicate array hybridizations of each species separately. We scaled the signals of the entire A. halleri and A. lyrata gDNA hybridized GeneChips by their respective global scaling factor S, as calculated above, to make the hybridizations comparable. Normalized and scaled data were then subjected to an ANOVA test at individual probe level (lmFit and eBayes functions of Limma package ) to reliably identify genes with differential hybridization signal between species. Genes with L o g 2 ratio ≥ 1 of scaled signal intensities in A. halleri or A. lyrata, respectively, vs. A. thaliana and corrected P-value ≤ 0.1 (Benjamini-Hochberg multiple testing correction ) were considered candidates for copy number expansion in the heterologous species. Similarly, genes with L o g 2 ratio ≤ -1 and corrected P-value ≤ 0.1 (Benjamini-Hochberg multiple testing correction) were considered candidate genes exhibiting copy number reduction in the heterologous species (complete dataset provided in Additional file 9). The log-ratio thresholding employed here is a commonly used robust method of determining CNVs from array-CGH data  to which we have added the power of statistical analysis.
Construction of A. lyrata reference datasets
To construct an A. lyrata reference dataset for comparison with the CNDs predicted by our array-CGH approach, we first retrieved copy number information for each A. lyrata gene ortholog relative to the corresponding A. thaliana gene from two sources. The first source were Ensembl Plants orthology predictions (http://plants.ensembl.org; ref. ), and the second source was the A. lyrata genome sequencing project . In the A. lyrata genome sequencing project, orthology classifications between A. thaliana and A. lyrata genes were established through reciprocally best BLAST hits (personal communication, Dr. Tina T. Hu, Lewis-Sigler Institute for Integrative Genomics, Princeton University USA), predicting 1,522 copy number expanded genes and 826 copy number reduced genes (Additional file 6). Only about one third of the predicted CNEs (384 genes) were common to both reference-genome based predictions (Additional file 6). We chose the Ensembl Plants predictions as our primary basis for validation of our array-CGH results because they were based on a benchmarked and robust orthology prediction pipeline .
Reproduction of two alternative array-CGH methods for comparison
Two previously published methods for cross-species array-CGH based prediction of gene copy number divergence were applied to our gDNA hybridization data. Machado and Renn  used spotted cDNA arrays with 500 bp long probes. In contrast, we employed the Affymetrix ATH1 GeneChip with ∼22,500 features, each comprising a set of usually 11 probes (25 bp long). This necessitated some minor modifications to the original workflow, as outlined below. Starting with raw hybridization signals from A. lyrata and A. thaliana, we performed background correction by MAS instead of the “minimum” algorithm, because minimum is specific for use with two-colour hybridization arrays and MAS background correction appears most similar in approach to minimum. Thereafter, within-array normalizations were carried out using a subset of 1,000 conserved genes. For this, we generated a list of 8,268 A. lyrata single-copy orthologs that show ≥ 95% sequence identity with A. thaliana. Out of the probesets representing these genes, we then randomly picked a subset of 1,000 probesets. We applied loess normalization  based on perfect match (PM) probe hybridization signal intensities of these 1,000 conserved genes. Summarization of probeset intensities to a single value was done using the Avdiff algorithm . A linear model was fitted to the data using lmFit and eBayes functions in the Limma package from R . P-values were calculated (Multiple testing correction by FDR). Genes represented by probesets displaying L o g 2 ratio > 0 of normalized signal intensities for A. lyrata vs. A. thaliana and P-value ≤ 0.1 were classified as duplicated or copy number expanded. Similarly, genes represented by probesets displaying a L o g 2 ratio < 0 of normalized signal intensities for A. lyrata vs. A. thaliana and P-value ≤ 0.1 were classified as deleted or copy number reduced.
We also implemented an approach analogous to Darby et al.  used Affymetrix Caenorhabditis elegans tiling arrays with probes of 25 nucleotides in length. In brief, this involves normalization at the probe level by adjusting for thermodynamic binding affinity, and then a species-specific scaling based on the array control probes that are expected to be equally dissimilar in all species. Thus, we first calculated the thermodynamic binding affinity — Δ G37 value (Gibb’s free energy estimate by Santa-Lucia model ) for each probe on the ATH1 GeneChip using the oligoprop package in Matlab. We plotted the background corrected (MAS) signal intensities of the custom control probes against their thermodynamic binding affinities to obtain the model parameters (intercept and error terms) that are required to normalize probe signal intensities for all the probes on the chip in a species-specific manner. Subsequently, the median signal intensity of the control probes of the hybridized target species was used to scale the hybridization signal intensities of all probes on the ATH1 GeneChip. Darby et al. performed a visual confirmation of 131 known genomic deletions for their data. This was not possible for us because we expected thousands of gene duplications/deletions in A. lyrata (Additional file 6). Consequently, we summarized the probe L o g 2 ratios of signal intensities for each A. lyrata gene relative to A. thaliana by computing probeset means from the constituent probes. Then, in a conservative implementation of the procedure of Darby et al., a gene with L o g 2 ratio of ≥ 0.25 was classified as a copy number expanded gene and a gene with L o g 2 ratio of ≤−0.25 was classified as a gene copy number reduction in A. lyrata relative to A. thaliana.
MapMan functional enrichment analysis
We carried out an over-representation analysis (ORA) based on the MapMan functional ontology  to detect functional enrichment in the genes with CNVs by comparison to all nuclear genes represented on the array. We chose this ontology because it is plant-specific, has been curated by experts in the constituent subject areas and uniquely includes functional categories of specific relevance for plant biology. The MapMan ontology is hierarchical, with 36 major functional categories called BINs at the top level; we used only these top BINs for ORA. The top level of MapMan bears similarity to the generic Gene Ontology (GO) slim , widely used for summarizing GO annotations in microarrays. The Affymetrix ATH1 probeset identifiers were used as primary gene identifiers to avoid redundancy in assigning genes to their functional categories (BIN). Given a list of copy number divergent genes, we calculated the percentage of genes in the list assigned to a particular function (BIN) and compared it to the overall percentage of genes with the same function present on the ATH1 GeneChip, applying Fisher’s exact test  to test for statistically significant differences (Benjamini-Hochberg corrected P-value ≤ 0.05). The annotation of genes known or purported to be involved in transition metal homeostasis functions was sourced from an expert-curated list. This list represents an updated version of the MapMan functional category “metal handling”, which had originally been generated by Ute Krämer, but was found to exclude some genes that were functionally characterized only recently and to contain several genes based on previous misannotations. The updated list is provided in Additional file 8 and will be submitted to the curators of MapMan for updating their current metal homeostasis gene annotations.
Regions enriched in genic copy number divergence
To assess the contributions of large-scale structural rearrangements to copy number divergence, we implemented a sliding window approach, in which a 20 kb long window was moved along each chromosome of A. thaliana with 10 kb step sizes and scored positive upon the presence of a CND region covering at least 5 kb of contiguous genes with concordant CND or CNR calls, respectively, in A. halleri or A. lyrata. A large CND region was defined as three consecutive windows with positive and concordant calls for either CNE or CNR, which would ensure a minimum of 10 kb to be copy number expanded or reduced in a stretch of 40 kb.
We would like to acknowledge Dr. Otto Törjek for providing helpful advice and discussion on gDNA fragmentation and labelling procedure. We thank Dr. Ales Pecinka for helpful discussions on the divergence and abundance of plant transposable elements.
This article has been published as part of BMC Genomics Volume 17 Supplement 13, 2016: 15th International Conference On Bioinformatics (INCOB 2016). The full contents of the supplement are available online at https://bmcgenet.biomedcentral.com/articles/supplements/volume-17-supplement-13.
This work was supported by a FRONTIERS grant in the Excellence Initiative of Heidelberg University, and completed with funding from the Deutsche Forschungsgemeinschaft Research Priority Programme SPP1529 “ADAPTOMICS” and the Department of Plant Physiology, Ruhr University Bochum. Publication charges for this article have been funded by the Research Priority Program 1529 ADAPTOMICS (Kr1967/10-2) of the Deutsche Forschungsgemeinschaft DFG.
Availability of data and material
The datasets generated and analysed during the current study are available in the GEO repository (http://www.ncbi.nlm.nih.gov/geo), accession number GSE52003 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52003).
Conceived and designed the experiments: BB, RE, SC, UK; Sample preparation, DNA extraction and array hybridization: IT, MW, SC; Contributed reagents/materials/analysis tools: BB, IT, MW, RE, SC, UK, VS. Data handling and analysis: VS; Wrote the paper: VS with contributions from IT, UK; All authors read, commented upon and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA, Chen H, Frazer KA, Huson DH, Schölkopf B, Nordborg M, Rätsch G, Ecker JR, Weigel D. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science (New York). 2007; 317(5836):338–42.View ArticleGoogle Scholar
- Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008; 133(3):523–36.View ArticlePubMedPubMed CentralGoogle Scholar
- Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008; 452(7184):215–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Becker C, Hagmann J, Müller J, Koenig D, Stegle O, Borgwardt K, Weigel D. Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature. 2011; 480(7376):245–9.View ArticlePubMedGoogle Scholar
- Schmitz RJ, Schultz MD, Lewsey MG, O’Malley RC, Urich MA, Libiger O, Schork NJ, Ecker JR. Transgenerational Epigenetic Instability Is a Source of Novel Methylation Variants. Science. 2011; 334(6054):369–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Greaves IK, Groszmann M, Ying H, Taylor JM, Peacock WJ, Dennis ES. Trans chromosomal methylation in Arabidopsis hybrids. Proc Natl Acad Sci U S A. 2012; 109(9):3570–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Hancock AM, Brachi B, Faure N, Horton MW, Jarymowycz LB, Sperone FG, Toomajian C, Roux F, Bergelson J. Adaptation to climate across the Arabidopsis thaliana genome. Science (New York). 2011; 334(6052):83–6.View ArticleGoogle Scholar
- Fournier-Level A, Korte A, Cooper MD, Nordborg M, Schmitt J, Wilczek AM. A map of local adaptation in Arabidopsis thaliana. Science (New York). 2011; 334(6052):86–9.View ArticleGoogle Scholar
- Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010; 11(2):97–108.PubMedGoogle Scholar
- Muñoz-Amatriaín M, Eichten SR, Wicker T, Richmond TA, Mascher M, Steuernagel B, Scholz U, Ariyadasa R, Spannagl M, Nussbaumer T, Mayer KF, Taudien S, Platzer M, Jeddeloh JA, Springer NM, Muehlbauer GJ, Stein N. Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome. Genome Biol. 2013; 14(6):58.View ArticleGoogle Scholar
- Swanson-Wagner RA, Eichten SR, Kumari S, Tiffin P, Stein JC, Ware D, Springer NM. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res. 2010; 20(12):1689–99.View ArticlePubMedPubMed CentralGoogle Scholar
- Henrichsen CN, Chaignat E, Reymond A. Copy number variants, diseases and gene expression. Hum Mol Genet. 2009; 18(R1):1–8.View ArticleGoogle Scholar
- Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, Carter NP, Lee C, Stone AC. Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007; 39(10):1256–60.View ArticlePubMedPubMed CentralGoogle Scholar
- Van de Peer Y, Maere S, Meyer A. The evolutionary significance of ancient genome duplications. Nat Rev Genet. 2009; 10(10):725–32.View ArticlePubMedGoogle Scholar
- Sutton T, Baumann U, Hayes J, Collins NC, Shi BJ, Schnurbusch T, Hay A, Mayo G, Pallotta M, Tester M, Langridge P. Boron-toxicity tolerance in barley arising from efflux transporter amplification. Science (New York). 2007; 318(5855):1446–9.View ArticleGoogle Scholar
- Maron LG, Guimarães CT, Kirst M, Albert PS, Birchler JA, Bradbury PJ, Buckler ES, Coluccio AE, Danilova TV, Kudrna D, Magalhaes JV, Piñeros MA, Schatz MC, Wing RA, Kochian LV. Aluminum tolerance in maize is associated with higher MATE1 gene copy number. Proc Natl Acad Sci U S A. 2013; 110(13):5241–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Hanikenne M, Kroymann J, Trampczynska A, Bernal M, Motte P, Clemens S, Krämer U. Hard selective sweep and ectopic gene conversion in a gene cluster affording environmental adaptation. PLoS Genet. 2013; 9(8):e1003707.View ArticlePubMedPubMed CentralGoogle Scholar
- Hanikenne M, Talke IN, Haydon MJ, Lanz C, Nolte A, Motte P, Kroymann J, Weigel D, Krämer U. Evolution of metal hyperaccumulation required cis-regulatory changes and triplication of HMA4. Nature. 2008; 453(7193):391–5.View ArticlePubMedGoogle Scholar
- Santuari L, Pradervand S, Amiguet-Vercher AM, Thomas J, Dorcey E, Harshman K, Xenarios I, Juenger TE, Hardtke CS. Substantial deletion overlap among divergent Arabidopsis genomes revealed by intersection of short reads and tiling arrays. Genome Biol. 2010; 11(1):4.View ArticleGoogle Scholar
- DeBolt S. Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biol Evol. 2010; 2:441–53.View ArticlePubMedPubMed CentralGoogle Scholar
- Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu SH. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 2008; 148(2):993–1003.View ArticlePubMedPubMed CentralGoogle Scholar
- Dassanayake M, Oh DH, Haas JS, Hernandez A, Hong H, Ali S, Yun DJ, Bressan RA, Zhu JK, Bohnert HJ, Cheeseman JM. The genome of the extremophile crucifer Thellungiella parvula. Nat Genet. 2011; 43(9):913–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Flagel LE, Wendel JF. Gene duplication and evolutionary novelty in plants. New Phytol. 2009; 183(3):557–64.View ArticlePubMedGoogle Scholar
- Cannon SB, Mitra A, Baumgarten A, Young ND, May G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004; 4(1):10.View ArticlePubMedPubMed CentralGoogle Scholar
- Hunter B, Bomblies K. Progress and Promise in using Arabidopsis to study adaptation, divergence, and speciation. Arabidopsis Book. 2010; 8:e0138.View ArticlePubMedPubMed CentralGoogle Scholar
- Bomblies K, Weigel D. Arabidopsis: a model genus for speciation. Curr Opin Genet Dev. 2007; 17(6):500–4.View ArticlePubMedGoogle Scholar
- Krämer U. Metal hyperaccumulation in plants. Annu Rev Plant Biol. 2010; 61:517–34.View ArticlePubMedGoogle Scholar
- Koch M, Haubold B, Mitchell-Olds T. Molecular systematics of the Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences. Am J Bot. 2001; 88(3):534–44.View ArticlePubMedGoogle Scholar
- Beilstein MA, Nagalingum NS, Clements MD, Manchester SR, Mathews S. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2010; 107:18724–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Weber M, Harada E, Vess C, Roepenack-Lahaye EV, Clemens S, v Roepenack-Lahaye E. Comparative microarray analysis of Arabidopsis thaliana and Arabidopsis halleri roots identifies nicotianamine synthase, a ZIP transporter and other genes as potential metal hyperaccumulation factors. Plant J. 2004; 37(2):269–81.View ArticlePubMedGoogle Scholar
- Talke IN, Hanikenne M, Krämer U. Zinc-dependent global transcriptional control, transcriptional deregulation, and higher gene copy number for genes in metal homeostasis of the hyperaccumulator Arabidopsis halleri. Plant Physiol. 2006; 142(1):148–67.View ArticlePubMedPubMed CentralGoogle Scholar
- Becher M, Talke IN, Krall L, Krämer U. Cross-species microarray transcript profiling reveals high constitutive expression of metal homeostasis genes in shoots of the zinc hyperaccumulator Arabidopsis halleri. Plant J. 2004; 37(2):251–68.View ArticlePubMedGoogle Scholar
- Deinlein U, Weber M, Schmidt H, Rensch S, Trampczynska A, Hansen TH, Husted SR, Schjoerring JK, Talke IN, Krämer U, Clemens S. Elevated nicotianamine levels in Arabidopsis halleri roots play a key role in zinc hyperaccumulation. Plant Cell. 2012; 24(2):708–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Dräger DB, Desbrosses-Fonrouge AG, Krach C, Chardonnens AN, Meyer RC, Saumitou-Laprade P, Krämer U. Two genes encoding Arabidopsis halleri MTP1 metal transport proteins co-segregate with zinc tolerance and account for high MTP1 transcript levels. Plant J Cell Mol Biol. 2004; 39(3):425–39.View ArticleGoogle Scholar
- Hu TT, Pattyn P, Bakker EG, Cao J, Cheng JF, Clark RM, Fahlgren N, Fawcett JA, Grimwood J, Gundlach H, Haberer G, Hollister JD, Ossowski S, Ottilar RP, Salamov AA, Schneeberger K, Spannagl M, Wang X, Yang L, Nasrallah ME, Bergelson J, Carrington JC, Gaut BS, Schmutz J, Mayer KFX, Van de Peer Y, Grigoriev IV, Nordborg M, Weigel D, Guo YL. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet. 2011; 43(5):476–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Poschenrieder C, Tolrà R, Barceló J. Can metals defend plants against biotic stress?Trends Plant Sci. 2006; 11(6):288–95.View ArticlePubMedGoogle Scholar
- Boyd RS. The defense hypothesis of elemental hyperaccumulation: status, challenges and new directions. Plant Soil. 2007; 293(1-2):153–76.View ArticleGoogle Scholar
- Shahzad Z, Gosti F, Frérot H, Lacombe E, Roosens N, Saumitou-Laprade P, Berthomieu P. The five AhMTP1 zinc transporters undergo different evolutionary fates towards adaptive evolution to zinc tolerance in Arabidopsis halleri. PLoS Genet. 2010; 6(4):1000911.View ArticleGoogle Scholar
- Redman JC, Haas BJ, Tanimoto G, Town CD. Development and evaluation of an Arabidopsis whole genome Affymetrix probe array. Plant J Cell Mol Biol. 2004; 38(3):545–61.View ArticleGoogle Scholar
- Bar-Or C, Bar-Eyal M, Gal TZ, Kapulnik Y, Czosnek H, Koltai H. Derivation of species-specific hybridization-like knowledge out of cross-species hybridization results. BMC Genomics. 2006; 7:110.View ArticlePubMedPubMed CentralGoogle Scholar
- Bar-Or C, Czosnek H, Koltai H. Cross-species microarray hybridizations: a developing tool for studying species diversity. Trends Genet. 2007; 23(4):200–7.View ArticlePubMedGoogle Scholar
- Darby BJ, Jones KL, Wheeler D, Herman MA. Normalization and centering of array-based heterologous genome hybridization based on divergent control probes. BMC Bioinforma. 2011; 12(1):183.View ArticleGoogle Scholar
- Machado HE, Renn SCP. A critical assessment of cross-species detection of gene duplicates using comparative genomic hybridization. BMC Genomics. 2010; 11:304.View ArticlePubMedPubMed CentralGoogle Scholar
- Gilbert LB, Chae L, Kasuga T, Taylor JW. Array Comparative Genomic Hybridizations: assessing the ability to recapture evolutionary relationships using an in silico approach. BMC Genomics. 2011; 12(1):456.View ArticlePubMedPubMed CentralGoogle Scholar
- Renn SCP, Machado HE, Jones A, Soneji K, Kulathinal RJ, Hofmann Ha. Using comparative genomic hybridization to survey genomic sequence divergence across species: a proof-of-concept from Drosophila. BMC Genomics. 2010; 11:271.View ArticlePubMedPubMed CentralGoogle Scholar
- Gilad Y, Rifkin SA, Bertone P, Gerstein M, White KP. Multi-species microarrays reveal the effect of sequence divergence on gene expression profiles. Genome research. 2005; 15(5):674–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Wright SI, Lauga B, Charlesworth D. Subdivision and haplotype structure in natural populations of Arabidopsis lyrata. Mol Ecol. 2003; 12(5):1247–63.View ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (Oxford). 2003; 19(2):185–93.View ArticleGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (Oxford). 2003; 4(2):249–64.View ArticleGoogle Scholar
- Lynch M. Evolution of the mutation rate. Trends Genet. 2010; 26(8):345–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Thimm O, Bläsing O, Gibon Y, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M. MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 2004; 37(6):914–39.View ArticlePubMedGoogle Scholar
- Ó Lochlainn S, Bowen HC, Fray RG, Hammond JP, King GJ, White PJ, Graham NS, Broadley MR. Tandem quadruplication of HMA4 in the zinc (Zn) and cadmium (Cd) hyperaccumulator Noccaea caerulescens. PLoS ONE. 2011; 6:9.View ArticleGoogle Scholar
- Willems G, Dräger DB, Courbot M, Godé C, Verbruggen N, Saumitou-Laprade P. The genetic basis of zinc tolerance in the metallophyte Arabidopsis halleri ssp. halleri (Brassicaceae): an analysis of quantitative trait loci. Genetics. 2007; 176:659–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Song WY, Choi KS, Alexis DA, Martinoia E, Lee Y. Brassica juncea Plant Cadmium Resistance 1 protein (BjPCR1) facilitates the radial transport of calcium in the root. Proc Natl Acad Sci U S A. 2011; 108:19808–13.View ArticlePubMedPubMed CentralGoogle Scholar
- Song WY, Choi KS, Kim DY, Geisler M, Park J, Vincenzetti V, Schellenberg M, Kim SH, Lim YP, Noh EW, Lee Y, Martinoia E. Arabidopsis PCR2 is a zinc exporter involved in both zinc extrusion and long-distance zinc transport. Plant Cell. 2010; 22:2237–252.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong LH, Choo KHA. Evolutionary dynamics of transposable elements at the centromere. Trends Genet. 2004; 20(12):611–6.View ArticlePubMedGoogle Scholar
- Henikoff S, Ahmad K, Malik HS. The centromere paradox: stable inheritance with rapidly evolving DNA. Science (New York). 2001; 293(5532):1098–102.View ArticleGoogle Scholar
- Eichler EE. Segmental duplications: what’s missing, misassigned, and misassembled–and should we care?Genome Res. 2001; 11(5):653–6.View ArticlePubMedGoogle Scholar
- Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011; 12(5):363–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2011; 8(1):61–5.View ArticlePubMedGoogle Scholar
- Teo SM, Pawitan Y, Ku CS, Chia KS, Salim A. Statistical challenges associated with detecting copy number variations with next-generation sequencing. Bioinformatics (Oxford). 2012; 28(21):2711–8.View ArticleGoogle Scholar
- Mills RE, Walter K, Stewart C, et al. Mapping copy number variation by population-scale genome sequencing. Nature. 2011; 470(7332):59–65.View ArticlePubMedPubMed CentralGoogle Scholar
- Dennis MY, Nuttle X, Sudmant PH, Antonacci F, Graves TA, Nefedov M, Rosenfeld JA, Sajjadian S, Malig M, Kotkiewicz H, Curry CJ, Shafer S, Shaffer LG, de Jong PJ, Wilson RK, Eichler EE. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell. 2012; 149(4):912–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Turner TL, Hahn MW, Nuzhdin SV. Genomic islands of speciation in Anopheles gambiae. PLoS Biol. 2005; 3(9):285.View ArticleGoogle Scholar
- Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, Meltesen L, Brenton M, Hink R, Burgers S, Hernandez-Boussard T, Karimpour-Fard A, Glueck D, McGavran L, Berry R, Pollack J, Sikela JM. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol. 2004; 2:207.View ArticleGoogle Scholar
- Dumas L, Kim YH, Karimpour-Fard A, Cox M, Hopkins J, Pollack JR, Sikela JM. Gene copy number variation spanning 60 million years of human and primate evolution. Genome Res. 2007; 17:1266–77.View ArticlePubMedPubMed CentralGoogle Scholar
- Hammond JP, Bowen HC, White PJ, Mills V, Pyke KA, Baker AJM, Whiting SN, May ST, Broadley MR. A comparison of the Thlaspi caerulescens and Thlaspi arvense shoot transcriptomes. New Phytologist. 2006; 170(2):239–60.View ArticlePubMedGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009; 19(2):327–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Wright SI, Le QH, Schoen DJ, Bureau TE. Population dynamics of an ac-like transposable element in self- and cross-pollinating Arabidopsis. Genetics. 2001; 158(3):1279–88.PubMedPubMed CentralGoogle Scholar
- Lockton S, Gaut BS. The evolution of transposable elements in natural populations of self-fertilizing Arabidopsis thaliana and its outcrossing relative Arabidopsis lyrata. BMC Evol Biol. 2010; 10(1):10.View ArticlePubMedPubMed CentralGoogle Scholar
- Filatov V, Dowdle J, Smirnoff N, Ford-Lloyd B, Newbury HJ, Macnair MR. Comparison of gene expression in segregating families identifies genes and genomic regions involved in a novel adaptation, zinc hyperaccumulation. Mol Ecol. 2006; 15(10):3045–59.View ArticlePubMedGoogle Scholar
- Gingeras TR. Origin of phenotypes: genes and transcripts. Genome Res. 2007; 17(6):682–90.View ArticlePubMedGoogle Scholar
- López-Maury L, Marguerat S, Bähler J. Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nat Rev Genet. 2008; 9(8):583–93.View ArticlePubMedGoogle Scholar
- Heyno E, Klose C, Krieger-Liszkay A. Origin of cadmium-induced reactive oxygen species production: mitochondrial electron transfer versus plasma membrane NADPH oxidase. New Phytologist. 2008; 179:687–99.View ArticlePubMedGoogle Scholar
- Sarret G, Saumitou-Laprade P, Bert V, Proux O, Hazemann JL, Traverse A, Marcus MA, Manceau A. Forms of zinc accumulated in the hyperaccumulator Arabidopsis halleri. Plant Physiol. 2002; 130:1815–1826.View ArticlePubMedPubMed CentralGoogle Scholar
- Persans MW, Yan X, Patnoe J-MML, Krämer U, Salt DE. Molecular dissection of the role of histidine in nickel hyperaccumulation in Thlaspi goesingense (Hálácsy). Plant Physiol. 1999; 121:1117–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Tian D, Traw MB, Chen JQ, Kreitman M, Bergelson J. Fitness costs of R-gene-mediated resistance in Arabidopsis thaliana. Nature. 2003; 423:74–7.View ArticlePubMedGoogle Scholar
- Bomblies K. Doomed lovers: mechanisms of isolation and incompatibility in plants. Annu Rev Plant Biol. 2010; 61:109–24.View ArticlePubMedGoogle Scholar
- Bomblies K, Weigel D. Hybrid necrosis: autoimmunity as a potential gene-flow barrier in plant species. Nat Rev Genet. 2007; 8:382–93.View ArticlePubMedGoogle Scholar
- Lau JA, McCall AC, Davies KF, McKay JK, Wright JW. Herbivores and edaphic factors constrain the realized niche of a native plant. Ecology. 2008; 89(3):754–62.View ArticlePubMedGoogle Scholar
- Fine PVA, Mesones I, Coley PD. Herbivores promote habitat specialization by trees in Amazonian forests. Science (New York). 2004; 305(5684):663–5.View ArticleGoogle Scholar
- Rajakaruna N. The Edaphic Factor in the Origin of Plant Species. Int Geol Rev. 2004; 46(5):471–8.View ArticleGoogle Scholar
- Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T, Weigel D, Berry CC, Winzeler E, Chory J. Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003; 13(3):513–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Wolfgang H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics. 2002; 18(Suppl 1):S96–S104.View ArticleGoogle Scholar
- Hubbell E, Liu WM, Mei R. Robust estimators for expression analysis. Bioinformatics (Oxford). 2002; 18:1585–92.View ArticleGoogle Scholar
- Wu Z, Spencer F, Irizarry RA, Gentleman R, Murillo FM. A model based background adjustment for oligonucleotide expression arrays a model based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc. 2004; 99:909–17.View ArticleGoogle Scholar
- Smyth GK. Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer: 2005. p. 397–420. Chap. 23. doi:10.1007/0-387-29362-0_23.Google Scholar
- Suárez-Fariñas M, Pellegrino M, Wittkowski KM, Magnasco MO. Harshlight: a “corrective make-up” program for microarray chips. BMC Bioinforma. 2005; 6:294.View ArticleGoogle Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA. affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004; 20(3):307–15.View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995; 57(1):289–300.Google Scholar
- Smyth GK, Speed TP. Normalization of cDNA microarray data. Methods. 2003; 31:265–73.View ArticlePubMedGoogle Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA. affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics (Oxford). 2004; 20:307–15.View ArticleGoogle Scholar
- SantaLucia J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci. 1998; 95(4):1460–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25:25–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Fisher RA. On the interpretation of χ2 from contingency tables, and the calculation of P. J R Stat Soc. 1922; 85:87–94.View ArticleGoogle Scholar
- Koch MA, Haubold B, Mitchell-Olds T. Comparative evolutionary analysis of Chalcone Synthase and Alcohol Dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol Biol Evol. 2000; 17(10):1483–98.View ArticlePubMedGoogle Scholar