Horizontal transfer of expressed genes in a parasitic flowering plant
© Xi et al.; licensee BioMed Central Ltd. 2012
Received: 15 February 2012
Accepted: 8 June 2012
Published: 8 June 2012
Skip to main content
© Xi et al.; licensee BioMed Central Ltd. 2012
Received: 15 February 2012
Accepted: 8 June 2012
Published: 8 June 2012
Recent studies have shown that plant genomes have potentially undergone rampant horizontal gene transfer (HGT). In plant parasitic systems HGT appears to be facilitated by the intimate physical association between the parasite and its host. HGT in these systems has been invoked when a DNA sequence obtained from a parasite is placed phylogenetically very near to its host rather than with its closest relatives. Studies of HGT in parasitic plants have relied largely on the fortuitous discovery of gene phylogenies that indicate HGT, and no broad systematic search for HGT has been undertaken in parasitic systems where it is most expected to occur.
We analyzed the transcriptomes of the holoparasite Rafflesia cantleyi Solms-Laubach and its obligate host Tetrastigma rafflesiae Miq. using phylogenomic approaches. Our analyses show that several dozen actively transcribed genes, most of which appear to be encoded in the nuclear genome, are likely of host origin. We also find that hundreds of vertically inherited genes (VGT) in this parasitic plant exhibit codon usage properties that are more similar to its host than to its closest relatives.
Our results establish for the first time a substantive number of HGTs in a plant host-parasite system. The elevated rate of unidirectional host-to- parasite gene transfer raises the possibility that HGTs may provide a fitness benefit to Rafflesia for maintaining these genes. Finally, a similar convergence in codon usage of VGTs has been shown in microbes with high HGT rates, which may help to explain the increase of HGTs in these parasitic plants.
Recent studies have shown that plant genomes have potentially undergone rampant horizontal gene transfer (HGT) [1–6]. In plant parasitic systems HGT appears to be facilitated by the intimate physical association between the parasite and its host [7–12]. HGT in these systems has been invoked when a DNA sequence obtained from a parasite is placed phylogenetically very near to its host rather than with its closest relatives . Studies of HGT in parasitic plants have focused largely on single or few genes, and relied mostly on the fortuitous discovery of gene phylogenies that indicate HGT. No broad, systematic, genome-wide search for HGT has been undertaken in parasitic systems where it is most expected to occur. One parasitic plant clade that appears to be prone to HGT is Rafflesiaceae sensu stricto, which belong to the order Malpighiales [7, 13, 14] and whose members produce the largest flowers in the world. Rafflesiaceae are endophytic holoparasites, which lack leaves and stems. They parasitize a small number of species of Tetrastigma (i.e., members of the grapevine family, Vitaceae) hosts, on which they rely exclusively for their nutrition. The many reported host-to-parasite gene transfers from Tetrastigma to Rafflesiaceae make this system especially intriguing for more in-depth investigation [7, 9, 10]. Moreover, the association of Tetrastigma and Rafflesiaceae provides one of the best opportunities to study HGT in plant parasitic systems because i.) the parasites have a very narrow host specialization range, ii.) complete genomes are available for close relatives of the parasite (Manihot esculenta Crantz , Populus trichocarpa Torr. & Gray , and Ricinus communis L. ; Malpighiales) and its host (Vitis vinifera L. ; Vitaceae), and iii.) the host and parasite are separated by at least 115 million years of evolution . These three factors make it easier to distinguish horizontally from vertically inherited gene regions.
To better understand HGT in this host-parasite plant system, we generated transcriptomic data for both the parasite (Rafflesia) and its obligate host (Tetrastigma). These data were analyzed using phylogenomic approaches that included whole-genome sequences from nine other plant model organisms. Our results show that several dozen actively transcribed, largely nuclear, genes are of host origin. These results are above false positive rates and establish for the first time a substantive number of HGTs in a plant host-parasite system. Moreover, we find that hundreds of vertically inherited genes in these parasitic plants exhibit codon usage properties that are more similar to their hosts than to their closest relatives, which may help to explain the increase of HGTs in these parasitic plants.
Although striking discordance between gene trees and species trees can be indicative of HGT, gene trees can be incongruent with species trees due to a variety of factors other than HGT, including incomplete lineage sorting, incorrect orthology assessment (either due to inadequate paralog sampling or differential gene loss), small taxon sampling, and elevated substitution rates . In order to assess the contribution of these factors, we first estimated the percentage of transcripts for Manihot esculenta and Ricinus communis that have placements consistent with HGT as defined above for Rafflesia (i.e., with Vitis). These two species are the closest relatives of Rafflesia and are not expected a priori to be prone to rampant HGT. Thus, they provide an estimate of the non-HGT related factors that may contribute to phylogenetic discordance. Our results indicate that only 0.3% and 0.4% of the transcripts in Manihot and Ricinus, respectively, have placements defined as HGT (Figure 2). Thus, the very small number of Tetrastigma transcripts whose phylogenetic placements are consistent with HGT is comparable to background rates in these non-host, non-parasitic taxa. We also compared nucleotide substitution rates between Rafflesia HGT and VGT transcripts, and with their homologous sister lineages (i.e., Rafflesia HGT with Vitis; Rafflesia VGT with Manihot and Ricinus). Although Rafflesia transcripts show elevated substitution rates overall (Additional file 6: Figure S5), Rafflesia HGT transcripts are not evolving significantly faster than Rafflesia VGT transcripts (p-value = 0.08 in Welch’s t test), nor to homologous Vitis transcripts (p-value = 0.42). Both the background HGT rate assessment and nucleotide substitution rate assessment indicate that our estimates of HGT in Rafflesia are robust.
In addition, most putative Rafflesia HGT transcripts showed their closest affinity to Vitis in our BLAST searches (Additional file 7: Table S2), which is consistent with these phylogenetic conclusions. Thus, we conclude that the observed Rafflesia transcripts show evidence that they originated via HGT from their obligate hosts. This is indicative of elevated rates of unidirectional host-to- parasite gene transfer in this system. Furthermore, our results suggest that rates of HGT in this eukaryotic parasitic system are on par with some prokaryotic organisms . Moreover, our estimates of HGT are likely to be conservative because our transcriptome was built from a single organ from one developmental stage.
Since host-to-parasite exchange of RNA is known to occur via translocation in the phloem , we used a multi-pronged approach to confirm that our results cannot be attributed to host contamination (see also Methods). Importantly, if our sample preparations were contaminated, we would not expect the high degree of sequence divergence we observed between putative HGT transcripts of Rafflesia and homologous transcripts of Tetrastigma. Of the 49 HGT transcripts from Rafflesia that could be directly compared with Tetrastigma, sequence divergence ranged from 0.05%–39.7% (mean divergence, 22.5%). This indicates that some period of evolution has elapsed since the time of gene transfer. Furthermore, assuming synteny between Tetrastigma and Vitis, the Rafflesia HGT transcripts most likely originated from multiple chromosomes in the host (Additional file 7: Table S2), which suggests a series of episodic HGT events.
To further verify genomic integration of these Rafflesia HGT transcripts, we also sequenced the genomic DNA (gDNA) from the same floral organ used for Rafflesia cDNA library construction. Our next-generation sequencing of Rafflesia gDNA verified a large percentage of the transgenes (63.3%, 31 of 49 HGT transcripts) identified from our cDNA analyses, independent of the bootstrap support threshold (Additional file 8: Figure S6) or transcript length (Additional file 9: Figure S7). The percentage of HGT transcripts verified to be integrated into the genome of Rafflesia was nearly identical to the percentage of verified VGTs (61.5%, 1218 of 1979 transcripts). Thus, our identification of putative HGTs does not appear to be due to contamination of the mRNA pool. Additionally, of the 31 Rafflesia HGT transcripts verified from our next-generation sequencing of genomic DNA, 16 show the presence of an intron (Additional file 7: Table S2 and Additional file 10: Figure S8). Although only two introns were covered in their entirety by our gDNA sequencing, all introns possess the characteristic GT or AG splice site . This suggests that the source of many of these integrated HGTs at the time of transfer was likely genomic DNA, rather than processed mRNA transcripts. Together, these results indicate that the HGTs identified here are indeed integrated into the Rafflesia genome.
Most previous reports of parasitic plant HGTs appear to involve only the movement of gene regions that reside in the mitochondrial genome . Our results confirm recent findings in the parasitic plant Striga that some horizontally transferred genes likely reside in the nucleus  and indicate that perhaps many of these transgenes originated from the nuclear genomes of their hosts. Homologues of nearly all HGT regions inferred in Rafflesia are localized in the nuclear genomes of the other reference genomes examined here (47 are nuclear and two are mitochondrial; Additional file 7: Table S2).
These HGTs represent a wide range of cellular functions as determined by gene annotation data, including roles related to respiration, metabolism, mitochondrial translation, and protein turnover, to name a few (Additional file 7: Table S2). A natural extension of these results is to examine the extent to which these transgenes are functional in their recipient species. Many previously reported transgenes in plants appear to be non-functional and often coexist with a native, functional homologue . Although some studies have shown that a small number of transgenes are likely transcribed , none have been convincingly demonstrated to be functional in their recipient lineages. In contrast, we find that Rafflesia HGTs are expressed at levels comparable to VGTs (Additional file 11: Figure S9). This suggests that these HGTs likely have functional promoters and therefore may play a role in cellular function. In addition, only 11 of the 49 HGTs had homologous VGTs expressed in the same transcript pool (Additional file 7: Table S2), which suggests that the HGT may have replaced the VGT function in some cases. The observed expression levels of the HGTs further rules out the possibility that the HGTs are due to contaminating RNA, which would not be present at native levels, and confirms that HGTs have been integrated into the Rafflesia genome.
The elevated rate of unidirectional gene transfer from Tetrastigma to Rafflesia and the apparent pattern of convergent host codon usage in vertically inherited Rafflesia genes raises the possibility that there may be a fitness benefit to Rafflesia for maintaining genes that are more host-like. A critical component of a parasite’s ability to maximize resource extraction is to minimize the ability of the host to detect and mount a defense response to the parasite. The bacterium that causes Legionnaires’ disease in humans appears to have acquired dozens of eukaryotic proteins via HGT that alter host cell functions to its advantage . Similarly, in plants the obligate bacterial pathogen of citrus trees encodes a gene that was horizontally transferred; its product mimics a host protein that has been shown to limit the host’s defense response . Our demonstration of HGT and VGT host codon usage convergence in Rafflesia raises the possibility that it may similarly express host-like genes to manipulate its host to its advantage. Our hypothesis that Rafflesia may be engaged in such genomic deception requires further experimental work and examination in Rafflesiaceae and in other plant parasitic systems.
Total RNA and gDNA were obtained from the holoparasite, Rafflesia cantleyi from peninsular Malaysia. Total RNA was extracted using the RNAqueous and Plant RNA Isolation Aid kits (Ambion, Inc.), and treated with the TURBO DNA- free kit (Ambion, Inc.) at 37°C for 4 hours to remove residual DNA. The cDNA library was synthesized from total RNA following the protocols of Novaes et al. . gDNA was extracted using the DNeasy Plant Mini kit (Qiagen, Inc.), and treated with RNAase A at 60°C for 1.5 hours to remove any residual RNA contamination. Illumina paired-end libraries were prepared for both cDNA and gDNA following the protocols of Bentley et al. . Each library was sequenced in one lane of the Genome Analyzer II (Illumina, Inc.) with paired-end 150 base pairs (bp) read lengths at the FAS Center for Systems Biology at Harvard University (http://sysbio.harvard.edu/csb/resources/instrumentation/sequencing_illumina.html). Leaf, tendril, and stem tissue was obtained from an unparasitized specimen of Tetrastigma rafflesiae at the Missouri Botanical Garden (St. Louis, Missouri, USA). The cDNA library was prepared in a similar way as above and sequenced separately on a GS-FLX (Roche, Inc.) at the Environmental Genomics Core facility at the University of South Carolina (http://engencore.sc.edu/) following the protocols of Margulies et al. . Tremendous care was taken to avoid and/or detect host or lab contamination in our sample preparation. First, a single perigone lobe from an unopened floral bud of Rafflesia was used for extractions of total RNA and gDNA. These perigone lobes are well protected by the outer bracts and are far from the zone of direct physical contact with tissue of Tetrastigma. Second, our extractions of total RNA and gDNA from Rafflesia were performed separate from any extractions of Tetrastigma; thus, laboratory contamination of our Rafflesia RNA or DNA with Tetrastigma can be eliminated. Third, we performed a PCR assay to assess the integrity of our Rafflesia and Tetrastigma gDNAs. We PCR-screened for the presence of the plastid genes matK and rbcL, which are universally present in autotrophic plants like Tetrastigma, but are absent in Rafflesia. Both genes were easily amplified from Tetrastigma gDNA, but not from our Rafflesia gDNA, indicating that our Rafflesia gDNA extraction was likely free of host contamination. In addition, we PCR-screened gDNAs for the mitochondrial gene matR from Rafflesia and Tetrastigma, which is present in both species [7, 13, 14]. This gene amplified easily from our gDNAs and direct Sanger sequencing of PCR-products produced unique (i.e., singular) sequences from the host and parasite, respectively. There was no ambiguity in the matR sequence chromatograms for these samples, which would be expected if they were cross- contaminated. The cDNA reads from next-generation sequencing were first assembled using Oases v0.1.21 (http://www.ebi.ac.uk/~zerbino/oases/) with default settings, and then translated into amino acid sequences with prot4EST v2.2 . Transcripts shorter than 30 amino acids were deemed too short for analysis and were discarded . All sequence data have been deposited at the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) with accession #SRA052224.
Nine species available from whole genome sequencing projects (Additional file 3: Table S1) were included with Rafflesia and Tetrastigma to establish orthology and build our datasets for phylogenetic analysis following Dunn et al. . Only those genes clusters that included at least Oryza (for outgroup rooting), Rafflesia/Tetrastigma (transcripts under investigation), Manihot/Populus/Ricinus (close Malpighiales relatives of the parasite), and Vitis (close Vitaceae relative of the host) were retained. The best maximum likelihood tree for each cluster group was inferred using RAxML v7.2.8  under the GTRGAMMA nucleotide substitution model. BS values were estimated similarly using the rapid bootstrap algorithm with 100 replicates as implemented in RAxML.
Rafflesia gDNA Illumina reads were mapped onto assembled Rafflesia cDNA transcripts using Bowtie v0.12.7 . To avoid complications with intron regions we first divided each 150 bp Illumina read into multiple 25 bp fragments following Kim and Salzberg , and then mapped each read onto the assembled cDNA transcripts with zero mismatches.
Rafflesia HGT transcripts were BLAST searched against the NCBI nucleotide sequence database (March 1, 2011) using BLASTN v2.2.25 . Searches were first performed using megablast (high similarity search). For sequences that failed to match existing sequences with high confidence, we then used more dissimilar (discontiguous megablast) searches. Most sequences had high- confidence BLAST hits in both of these searches. Where higher similarity searches retrieved no hits, we used blastn. Putative gene location and function were inferred based on the highest confidence hit with a functional annotation.
where N is the number of mapped Illumina reads for a given transcript, L is the transcript length, k is the minimum number of base pairs which must overlap between each Illumina read and the transcript (k = 25 for this analysis), and l is the read length (l = 25). R was further normalized for each transcript to the standard reads per kilobase per million reads (i.e., RPKM; ).
The VGT transcripts were first converted into four nucleotide frequencies (i.e., A, C, G, and T), 16 dinucleotide frequencies, and 61 codon frequencies (excluding stop codons), and normalized by the length of each transcript. Next, χ2 distances were calculated between each transcript within a gene cluster . Finally, we chose the smallest χ2 distance between the gene of interest and homologues from other species within the cluster (i.e., for a Rafflesia gene, whether it is closest to Manihot or Ricinus [Malpighiales], or Vitis). If the transcript was closer to Malpighiales, then it was assigned to be Mapighiales-like; otherwise, it was assigned to be Vitis-like.
Horizontal gene transfer
Reads per kilobase per million reads
Vertical gene transfer.
We thank Stijn van Dongen, Casey Dunn, Michael Ethier, Amir Karger, Devin Locke, Sergey Lukyanov, Evandro Novaes, Alexis Stamatakis, and Jiangwen Zhang for technical assistance. We also thank William Anderson, Mark Beilstein, Andrew Knoll, Christopher Marx, Sarah Mathews, Brian O’Meara, and members of the Davis laboratory for helpful discussions. CCD was supported by the National Science Foundation (NSF) Assembling the Tree of Life grant DEB-0622764 and NSF DEB-1120243.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.