Whole-genome resequencing using next-generation and Nanopore sequencing for molecular characterization of T-DNA integration in transgenic poplar 741
BMC Genomics volume 22, Article number: 329 (2021)
The molecular characterization information of T-DNA integration is not only required by public risk assessors and regulators, but is also closely related to the expression of exogenous and endogenous genes. At present, with the development of sequencing technology, whole-genome resequencing has become an attractive approach to identify unknown genetically modified events and characterise T-DNA integration events.
In this study, we performed genome resequencing of Pb29, a transgenic high-resistance poplar 741 line that has been commercialized, using next-generation and Nanopore sequencing. The results revealed that there are two T-DNA insertion sites, located at 9,283,905–9,283,937 bp on chromosome 3 (Chr03) and 10,868,777–10,868,803 bp on Chr10. The accuracy of the T-DNA insertion locations and directions was verified using polymerase chain reaction amplification. Through sequence alignment, different degrees of base deletions were detected on the T-DNA left and right border sequences, and in the flanking sequences of the insertion sites. An unknown fragment was inserted between the Chr03 insertion site and the right flanking sequence, but the Pb29 genome did not undergo chromosomal rearrangement. It is worth noting that we did not detect the API gene in the Pb29 genome, indicating that Pb29 is a transgenic line containing only the BtCry1AC gene. On Chr03, the insertion of T-DNA disrupted a gene encoding TAF12 protein, but the transcriptional abundance of this gene did not change significantly in the leaves of Pb29. Additionally, except for the gene located closest to the T-DNA integration site, the expression levels of four other neighboring genes did not change significantly in the leaves of Pb29.
This study provides molecular characterization information of T-DNA integration in transgenic poplar 741 line Pb29, which contribute to safety supervision and further extensive commercial planting of transgenic poplar 741.
Poplar is one of the most widely distributed tree species owing to its rapid growth and strong adaptability to environmental changes [1,2,3]. It is one of the important industrial timber species that is widely used in the paper-making industry and panel processing. However, with the continuous increase of poplar planting area, the ensuing insect attack has become more and more serious, which has brought huge losses to forestry production . In order to reduce the economic losses caused by insect pests, decrease the need for chemical pesticides, and protect the ecological environment, the cultivation of insect-resistant transgenic varieties is particularly important . Transgenic technology is used commercially for growing trees in China, which was the first country to commercialize transgenic poplar.
At the same time, the possible impact of transgenic technology on humans and ecology is still unclear. Therefore, China, like most other countries and regions in the world, is still very cautious about the application and supervision of transgenic technology, requiring that the research and experiment, environmental release and commercial production of genetically modified organisms (GMOs) all require safety certificates provided by relevant departments . Inheritance and expression stability of exogenous genes is a prerequisite for commercial application of transgenic plants, which depends on the molecular characteristics of T-DNA integration into the host genome . Because of the randomness and non-replicability of T-DNA integration, the molecular information of T-DNA integration becomes the specific marker of transgenic plants, which is conducive to the identification and supervision of different transgenic lines. The genome sequence (genetic material) of a transgenic plant has been altered due to the insertion of T-DNA through genetic engineering . Several studies have shown that the molecular characterization of T-DNA integration, including T-DNA sequence, insertion position, copy number and flanking sequences of the insertion site, will affect the expression of transgenes. In hybrid poplar, the transgene inactivation is always the result of transgene repetition . Fladung et al. analyzed three unstable 35S-rolC transgenic aspen lines, and the results showed that transgene expression may be highly variable and unpredictable when the transgenes are present in the form of repeats . In GFP-transgenic barley, when the insert is proximate to the highly repetitive nucleolus organizer region (NOR) on chromosome 7, the expression of the transgene is completely silent, while fluorescent expression appears in other regions . Kumar et al. indicated that the host genome can control the expression of a foreign gene, and AT-rich regions may play a role in defense against foreign DNA . Furthermore, T-DNA insertion often leads to expected and unexpected changes at transcriptional, protein and metabolic levels in transgenic plants, which potentially affects food/feed quality and safety [12, 13]. Therefore, clarifying the molecular characterization data of T-DNA integration such as T-DNA copy number and insertion site locations is particularly important for risk assessors and regulators of transgenic plants.
There are many methods for locating the insertion sites of foreign genes in transgenic plants, most of which are based on polymerase chain reaction (PCR) amplification; these include thermal asymmetric interlaced PCR , inverse PCR , and adapter-ligated PCR . Although these methods have been successfully applied to transgenic plants of species such as Arabidopsis thaliana  and rape , they are prone to false-positives, and are also time-consuming, laborious, and poorly reproducible. In recent years, with the continuous development of sequencing technology, next-generation sequencing (NGS) has been widely used for genome sequencing because of its high throughput capability, low cost, and accurate results. NGS has been successfully used to locate T-DNA insertion sites in transgenic soybean , rice , and birch . However, the NGS reads are too short to accurately locate all of the T-DNA insertion sites in transgenic plants with complex T-DNA integration patterns or genomes. By contrast, third-generation sequencing technology, developed by Oxford Nanopore Technologies and PacBio, can produce longer reads, which can overcome the limitations of NGS such as short reads and bias due to GC content, although the accuracy is relatively low. Therefore, by combining NGS with third-generation sequencing technology, we can accurately and efficiently analyze overall genomic changes due to T-DNA mutations.
Poplar 741 is an excellent cultivar of the section Leuce Duby that was cultivated after two hybridizations in 1974. The hybridized combination is [P. alba L. × (P. davidiana Dode. + P. simonii Carr.)] × P. tomentosa Carr . Transgenic poplar 741, which was cultivated by Hebei Agricultural University and the Institute of Microbiology of the Chinese Academy of Sciences, was obtained by Agrobacterium-mediated transformation of the expression vector containing BtCry1AC gene and arrowhead proteinase inhibitor (API) gene into poplar 741 . According to national standards for transgenic animals and plants, transgenic poplar 741 has been certified safe after environmental impact and production tests and were planted commercially from 2002 to 2007. Pb29 is a high-resistance line of transgenic poplar 741. It carries two insect-resistant genes (BtCry1AC and API) in theory and shows high levels of resistance to lepidopteran pests, such as Hyphantria cunea and Clostera anachoreta [4, 23]. However, no molecular analysis of T-DNA integration in transgenic poplar 741 has been performed. In this study, we performed whole-genome resequencing of transgenic poplar 741 using NGS and Nanopore sequencing, and analyzed the copy number and insertion sites of the T-DNA as well as the flanking sequences at the T-DNA integration site. Our results obtained the molecular characterization data of T-DNA integration in transgenic poplar 741 line Pb29, which can provide precise information for safety supervision and contribute to further extensive commercial planting of transgenic poplar 741.
Results of NGS analysis
After performing quality-control checks, a total of 52.3 million clean reads for transgenic poplar 741 line Pb29 were obtained from the raw reads, corresponding to more than 30× coverage of the Populus trichocarpa reference genome (https://www.ncbi.nlm.nih.gov/genome/98). More than 92% of the sequencing data had Phred-like quality scores ≥30, indicating that the data were high quality (Table S1). After sequence alignment, nine junction reads on chromosome 03 (Chr03), and four on Chr10, were identified in the Pb29 genome sequence, indicating that there are two T-DNA insertion sites in the Pb29 genome (Table S2). Based on the physical positions of the junction reads, one insertion site is located at 9,283,937 bp on Chr03, and the other at 10,868,777 bp on Chr10. T-DNA is inserted in the reverse direction on Chr03, and in the forward direction on Chr10. However, further analysis revealed that only unilateral junction reads could be detected at both T-DNA insertion sites; ideally, junction reads should be detected on both sides of each insertion site (Fig. 1).
Confirmation of insertion sites and directions using PCR amplification
To verify the accuracy of the T-DNA insertion sites and directions, we designed 6 primers based on the flanking sequences of the T-DNA insertion sites and the T-DNA sequence (Fig. 2a), and amplified the genomic DNA of poplar 741 and Pb29 using different primer combinations (Fig. 2b). The results of PCR amplification revealed that the PCR runs using primer combinations 3, 4, 6, and 7 generated products with a single band for Pb29 in Fig. 2c, whereas no products were amplified for poplar 741 in Fig. 2d. When primer combinations 1, 2, 8, and 9 were used in the PCR, amplified bands were not produced for Pb29 or poplar 741, indicating that T-DNA was indeed inserted into Chr03 in the reverse direction and into Chr10 in the forward direction, thus verifying the NGS results. Meanwhile, the target band was observed after PCR runs using primer combinations 5 and 10 for both Pb29 and poplar 741, indicating that Pb29 is a heterozygous mutant created via T-DNA insertion (Fig. 2c; Fig. 2d).
Results of Nanopore sequencing analysis
To further verify the NGS results and determine whether chromosomal rearrangement occurred in the Pb29 genome due to T-DNA insertion, we used the third-generation sequencing technology developed by Oxford Nanopore Technologies to resequence the whole genomes of poplar 741 and Pb29. More than 96% of the clean reads of both poplar 741 and Pb29 mapped to the P. trichocarpa reference genome, corresponding to 40× and 39× coverage of the reference genome, respectively. The depth of coverage was evenly distributed across both poplar 741 and Pb29 chromosomes, indicating that the genomic DNA of poplar 741 and Pb29 was sequenced in a random manner (Fig. S1).
The BAM file generated by comparing all junction reads with the P. trichocarpa reference genome was imported into Integrative Genomics Viewer (IGV) software for visual analysis. All junction reads only mapped to Chr03 or Chr10, and there was a gap between reads on both chromosomes. The two gaps, each formed by a T-DNA insertion that disrupted part of the genome sequence, matched the two T-DNA insertion sites in the Pb29 genome exactly. The two T-DNA insertion sites in the Pb29 genome are located at 9,283,905–9,283,937 bp on Chr03 and 10,868,777–10,868,803 bp on Chr10, consistent with the detection results obtained using NGS (Fig. 3).
Compared with the P. trichocarpa reference genome, evidence of many Structural variation (SV) events was seen in the genomes of both poplar 741 and Pb29, most of which were deletions or insertions of chromosome segments (Fig.S2). After removing the regions representing SV events of the same type at the same positions in the poplar 741 and Pb29 genomes, SV events > 1 kb are regarded as chromosomal rearrangements in the Pb29 genome caused by T-DNA insertion. However, we did not detect this type of event, indicating that the insertion of T-DNA did not cause large chromosomal rearrangements in the Pb29 genome.
T-DNA and flanking sequence analysis
Because Nanopore sequencing can be used to obtain longer reads, some junction reads contained complete T-DNA sequences. The complete T-DNA sequences at the two insertion sites were extracted and compared with the vector sequence. The results showed that the left and right border sequences of the T-DNA inserted on Chr03 were missing 26 and 3 bp, respectively, whereas the left and right border sequences of the T-DNA inserted on Chr10 were missing 35 and 34 bp, respectively (Fig. 4a). It is worth noting that the 35S-API-Nos expression component was not detected in the T-DNA sequences at either insertion site; furthermore, both T-DNA sequences are exactly the same, indicating that the expression component of the API gene was not lost during the transformation process. Rather, it was not present in the expression vector in Agrobacterium before transformation (Fig. 5).
We compared isolated flanking sequences with the P. trichocarpa reference genome and found that fragments had been deleted from the flanking sequences at both insertion sites, as T-DNA insertion damaged the genome sequence at those sites (box with black outline in Fig. 4b and Fig. 4c). The genome sequence at the T-DNA insertion sites on Chr03 and Chr10 was missing 33 and 27 bp, respectively, consistent with the results of the alignment analysis (Fig. 3). A short fragment (24 bp in length) was found between the T-DNA insertion site and the right flanking sequence on Chr03 in the Pb29 genome; this fragment could not be mapped to the P. trichocarpa reference genome (box with black outline in Fig. 4b). We analyzed the clean reads from poplar 741 found that reads mapped to the same positions essentially had the same sequences as the corresponding sections of the P. trichocarpa genome (Fig. S3), indicating that the 24-bp fragment did not arise from the difference between genomes but was instead caused by the insertion of an unknown fragment during the T-DNA integration process.
Analysis of the expression levels of genes located near the insertion sites
The genes within 20 kb upstream and downstream of the two T-DNA insertion sites were detected based on the genome annotation file of P. trichocarpa. The results showed that T-DNA was inserted 9466 bp downstream of the LOC112326972 gene and 8137 bp upstream of the LOC7475699 gene on Chr03, and 15,621 bp downstream of the LOC7498060 gene and 1543 and 11,914 bp upstream of the LOC7498061 and LOC7498062 genes, respectively, on Chr10 (Table 1). Fragments Per Kilobase Million (FPKM) values associated with the transcriptome data were used to compare the expression levels of the five neighboring genes. The results showed that except for the LOC7498061 gene, the expression levels of the other four genes in Pb29 leaves did not change significantly, indicating that the insertion of T-DNA did not significantly affect the expression levels of these four genes. The LOC7498061 gene is located closest to the T-DNA insertion site; its expression level was significantly upregulated in Pb29 leaves, indicating that the insertion of T-DNA in Pb29 affects gene expression within a certain range (Fig. 6a).
Analysis of the TAFs gene family
According to the results of whole-genome resequencing analysis, the T-DNA insertion site on Chr03 (9,283,895–9,283,937 bp) is located within the first exon of the LOC7478355 gene (9,283,876–9,291,377 bp). Therefore, the insertion of T-DNA disrupted the structure of the LOC7478355 gene. According to the National Center for Biotechnology Information (NCBI) analysis, the LOC7478355 gene, which belongs to the TAFs gene family, encodes a TAF12 protein, which is one of the core subunits constituting the basic transcription factor TFIID. To understand the impact that this disruption of the gene structure has on the function of this gene, we first analyzed the TAFs gene family to clarify the number of genes encoding TAF12 protein in the genome.
We identified 33 TAFs genes in the genome of P. trichocarpa through bioinformatics analysis. The 33 PtTAFs genes were renamed according to their chromosomal positions and the phylogenetic tree constructed with PtTAFs and AtTAFs proteins (Table S3; Fig. S4A). Within the TAFs gene family, there are three genes encoding TAF12 protein—PtTAF12, PtTAF12b, and PtTAF12c. Through synteny analysis of the PtTAFs gene family, we identified five segmental duplication events involving 10 PtTAF genes that encode TAF7, TAF8, and TAF15 proteins. No duplicated segments containing genes encoding TAF12 protein were identified, indicating that PtTAF12, PtTAF12b, and PtTAF12c were not formed from segmental duplication occurring among the three genes (Fig. S4B). The RNA-seq results showed that the expression levels of the three genes in Pb29 leaves were slightly higher than those in poplar 741, but none of the differences were significant, indicating that the transcriptional abundance of the genes encoding TAF12 protein did not change significantly (Fig. 6b).
Whole-genome resequencing using NGS and Nanopore sequencing improved the accuracy of T-DNA insertion site analysis
Molecular characterization information of T-DNA integration, such as the locations of T-DNA insertion sites and copy numbers, is of great significance for the safety supervision of genetically modified organisms (GMOs) . PCR-based methods are often used to elucidate T-DNA insertion sites and copy numbers. However, these methods are time-consuming, labor-intensive, and produce inaccurate results. When T-DNA integration patterns or the genomes of T-DNA mutants are relatively complex, PCR-based methods cannot be used to accurately determine all T-DNA insertion sites and copy numbers. For example, Gang et al. performed 120 rounds of PCR using 12 border primers and 10 arbitrarily degenerated primers, and located only two T-DNA insertion sites in a birch T-DNA mutant; in contrast, six T-DNA insertion sites were located via genome resequencing using NGS . Whole-genome resequencing is a more effective method for analyzing T-DNA insertion sites and copy numbers. With the emergence and development of high-throughput NGS technology, NGS is now widely used to elucidate T-DNA insertion sites and copy numbers because of its high throughput capability and low cost. However, NGS reads are too short to obtain complete information on the T-DNA insertion sites . In this study, although both NGS and Nanopore sequencing located two T-DNA insertion sites in the Pb29 genome, NGS only detected junction reads on one side of each insertion site. In contrast, complete T-DNA sequences and flanking sequences of T-DNA insertion sites were elucidated using Nanopore sequencing, because it can produce longer reads. Nanopore sequencing can also be used to analyze the entire genome of a T-DNA mutant and identify any chromosomal rearrangements due to T-DNA integration . Therefore, NGS and Nanopore sequencing should be used together to analyze T-DNA mutants, to improve the accuracy of T-DNA insertion site analysis.
T-DNA insertion sites and copy numbers constitute important molecular information for safety supervision of transgenic plants
There have been several controversial incidents regarding the safety of genetically modified products, such as those involving Bertholletia excelsa  and the monarch butterfly . Accordingly, the potential threats of genetically modified organisms to the environment and human health are of widespread concern. As a result, many countries have formulated legislation and established agencies to conduct safety assessments and management of genetically modified organisms. The T-DNA insertion site provides important molecular information for the screening and identification processes that are conducted during safety assessments of genetically modified materials, before the materials are released into the environment . Additionally, due to the random location of T-DNA insertion sites and the existence of position effects , the euchromatin or heterochromatin region into which the T-DNA is inserted, and the flanking sequences of the T-DNA insertion sites, affect the expression activity of the foreign gene [31, 32]. This activity may also be correlated with the copy number of the foreign gene. For example, the fatty acid content in transgenic rape is positively correlated with the copy number of the thioesterase gene, which encodes an acyl-ACP carrier protein . Furthermore, Cervera et al. found a significant negative correlation between the expression level of the GUS gene and its copy number in transgenic citrus . Therefore, T-DNA insertion sites and copy numbers are closely related to the transcription level of the foreign gene, which is important to consider during safety assessments of transgenic plants . In this study, we located two T-DNA insertion sites in the genome of Pb29, a transgenic poplar 741 line, at 9,283,905–9,283,937 bp on Chr03 and 10,868,777–10,868,803 bp on Chr10. According to the sequence information associated with the junction reads, T-DNA was inserted in opposite directions at those two insertion sites. The T-DNA sequences at both insertion sites did not result from tandem duplication; instead, a single-copy integration pattern was observed at both sites (Fig. 7). The T-DNA insertion sites and directions elucidated via resequencing were further confirmed using PCR amplification. At present, the commercialization certificate of transgenic poplar 741 has expired, coupled with China’s very cautious attitude towards the commercialization of genetically modified (GM) plants, which makes the planted area of transgenic insect-resistant poplar in China only 450 ha as of 2011 even if the losses due to insect herbivory to forestry production are continuing to increase . Therefore, the molecular characterization information of T-DNA integration in Pb29 could aid safety supervision and management, and contribute to the reapplication of commercialization certificate and the further extensive commercial planting of Pb29.
T-DNA and flanking sequence analysis
T-DNA integration into a genome often results in base deletions on the T-DNA left and right border sequences, or to duplications, deletions, and inversions of DNA sequences in the receptor genome ; it can even induce chromosomal rearrangement . Kim et al. analyzed a large number of transgenic rice plants and found a difference in the number of base deletions on the T-DNA left and right border sequences, with more deletions occurring on the left side . In a birch T-DNA mutant, the integration of T-DNA led to the deletion or translocation of several chromosomal fragments . Although we did not observe chromosomal rearrangement in the Pb29 genome, base deletions were observed in the T-DNA left and right border sequences, and in the genome sequence at the T-DNA insertion sites; this phenomenon is common in many genetically modified materials. However, we also found a 24-bp fragment that was inserted between the T-DNA insertion site and its right flanking sequence on Chr03. Kersten et al. analyzed two early-flowering poplar lines, T193–2 and T195–1, and found a partial T-DNA fragment with AtFT-specific primers at the T-DNA insertion site of each line . However, the difference is that the 24-bp fragment in Pb29 is not part of the T-DNA, so further analysis is needed to elucidate the specific source of the fragment. No API gene was detected within the T-DNA sequences at either insertion site, indicating that the API gene was not integrated into the Pb29 genome. The two T-DNA sequences are exactly the same, indicating that the API gene was not present in the expression vector in Agrobacterium before transformation. In addition, the Agrobacterium tumefaciens strain LBA4404 used in genetic transformation is deficient in RecA activity (RecA-), indicating that the loss of API gene is not caused by homologous recombination, so it is speculated that the API gene may be lost during the transformation of the recombinant plasmid into Agrobacterium. Therefore, Pb29 is a transgenic line that contains only one insect resistance gene.
Analysis of the expression levels of genes near the T-DNA insertion sites
T-DNA is randomly inserted into the genome [39, 40]. The introduction of exogenous genes may affect the regulation and expression of endogenous genes in plants . When T-DNA is inserted into a coding gene, the function of the gene is affected. Furthermore, insertion into an intergenic region may affect the expression activity of upstream and downstream genes, resulting in unexpected effects . In the transgenic rice 04Z11EM13 line, T-DNA was inserted into the fourth exon of the OsBC1L4 gene, resulting in a mutant phenotype that exhibited fewer tillers and dwarfism . Liu et al. analyzed the flanking sequences of the T-DNA insertion site in a rice flag leaf mutant and found that T-DNA insertion led to a significant reduction in the expression of the neighboring AK100376 gene, thus causing phenotypic change . In the genome of the transgenic poplar 741 line Pb29, one T-DNA copy was inserted into the first exon of a gene encoding TAF12 protein, which belongs to the TAFs gene family, on Chr03. Through gene family analysis, we identified three genes encoding TAF12 protein in the genome of P. trichocarpa. However, the expression of these three genes in Pb29 leaves did not change significantly, which may be due to T-DNA being integrated into only one homologous chromosome, whereas poplar 741 is triploid and has three alleles for each gene. Of the neighboring genes at the two T-DNA insertion sites, except for the LOC7498061 gene, which is located closest to an insertion site, the expression levels of the other four genes did not change significantly in Pb29 leaves, implying that the insertion of T-DNA had little effect on the expression of endogenous genes in the Pb29 genome. Any changes in growth or physiology that may result from the significantly upregulated expression of the LOC7498061 gene in Pb29 need to be studied further.
In this study, we resequenced the whole genomes of poplar 741 and the transgenic poplar 741 line Pb29 using NGS and Nanopore sequencing. In the Pb29 genome, we found that the T-DNA sequence was inserted inversely into the 9,283,905–9,283,937-bp region on Chr03, and in the forward direction into the 10,868,777–10,868,803-bp region on Chr10. Both insertion sites exhibited a single-copy integration pattern, and the locations and directions of T-DNA insertion were confirmed using PCR amplification. After the T-DNA copies had been inserted into the genome, different degrees of base deletions were detected on the T-DNA left and right border sequences, and in the flanking sequences of the insertion sites. A fragment was found to be inserted between the insertion site and right flanking sequence on Chr03, and no chromosomal rearrangement was detected in the Pb29 genome. Only the BtCry1Ac gene was detected in the T-DNA sequence at both insertion sites; no API gene was detected, indicating that Pb29 is a transgenic line containing only one insect resistance gene. The insertion of T-DNA destroyed the structure of a gene encoding TAF12 protein on Chr03, but the transcriptional abundance of this gene did not change significantly in Pb29 leaves. Except for the LOC7498061 gene, which is located closest to a T-DNA insertion site, the expression of four other neighboring genes did not change significantly in Pb29 leaves. This study provides molecular characterization information of T-DNA integration in transgenic poplar 741 line Pb29, which contribute to safety supervision and further extensive commercial planting of transgenic poplar 741.
The experimental materials used in this study were tissue culture seedlings of poplar 741 and transgenic poplar 741 line Pb29, which were stored in Hebei Key Laboratory of Forest Germplasm Resources and Forest Protection, College of Forestry, Hebei Agricultural University. The leaves of poplar 741 and Pb29 were collected, immediately placed into liquid nitrogen, and preserved at − 80° for subsequent DNA extraction.
Genome resequencing using NGS and data analysis
Genome resequencing of Pb29 via NGS was performed by Biomics Co., Ltd. (Beijing, China). Genomic DNA was extracted using a plant DNA extraction kit (SENO Biological Technology Co., Ltd., Zhangjiakou, China) in accordance with the manufacturer’s protocol and quantified using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). The DNA was broken into fragments with an average length of 300 bp to construct the library. The library was then sequenced using the HiSeq sequencing platform (Illumina, San Diego, CA, USA), and 150-bp paired-end reads were generated. After sequencing, the raw data were initially screened to remove adapter sequences and low-quality reads (Q < 20) and thus obtain clean, high-quality data. Using AIM-HII software , the clean reads were compared against the Populus trichocarpa genome sequence and the vector sequence to identify junction reads. First, the junction reads that aligned with both the reference genome sequence and the vector sequence were identified, and the T-DNA insertion sites and directions were then determined based on the alignment information associated with the junction reads.
PCR verification of T-DNA insertion sites and directions
To verify the T-DNA insertion sites and directions, we designed primers based on the flanking sequences of the T-DNA insertion sites and the T-DNA sequence (Table S4), and amplified the genomic DNA of poplar 741 and Pb29. The PCR products corresponding to the sides of the junction reads undetected by NGS were purified and ligated into the pUCm-T vector. After 12 h at 16 °C, the plasmid was transformed into E. coli DH5α competent cells. The cells were shaken for 1 h in a constant-temperature shaker at 37 °C, and then plated onto a Luria-Bertani agar plate containing ampicillin and cultured for 8 h at 37 °C. Single colonies were selected and sent to Beijing Zhongke Xilin Biotechnology Co., Ltd. (Beijing, China) for sequencing, to determine the integrity of the flanking sequences.
Genome resequencing using Nanopore sequencing and data analysis
The genomic DNA of poplar 741 and Pb29 was resequenced using the Nanopore sequencing platform (Biomarker Technologies, Beijing, China). After extracting the genomic DNA of poplar 741 and Pb29, the purity, concentration, and integrity of the extracted DNA were inspected using a NanoDrop spectrophotometer, Qubit fluorometer (Invitrogen, Carlsbad, CA, USA), and 0.35% agarose gel electrophoresis, respectively. After passing the quality checks, the DNA samples were used to construct libraries and sequenced with Ligation Sequencing Kit 1D (SQK-LSK109; Oxford Nanopore Technologies, Oxford, UK). Low-quality reads, reads with adapters, and short sequencing reads (length < 500 bp) were filtered from the raw reads. Then, Minimap2 software (https://github.com/lh3/minimap2)  was used to compare the clean reads with the reference genome and vector sequences (at the same time). The junction reads thus obtained were saved in BAM file format, and the data were visualized with IGV software (http://www.broadinstitute.org/software/igv/) to locate the T-DNA insertion sites. By comparing the clean reads and reference genome sequence, information such as alignment rate and sequencing depth and coverage could be calculated. Sniffles software  was used to detect large SVs in the genome, such as insertions, deletions, repetitions, inversions, and translocations, and the SV distribution was visualized using Circos software (http://circos.ca) .
T-DNA and flanking sequence analysis
Part of the flanking sequences were isolated from the junction reads obtained by NGS and the Sanger sequencing reads obtained from the PCR products; the other part of the flanking sequences and complete T-DNA sequences were extracted from the junction reads obtained by genome resequencing using Nanopore sequencing. All flanking and T-DNA sequences were compared with the genome and vector sequences, respectively, to determine the integrity of the flanking and T-DNA sequences at the insertion sites.
RNA-sequencing (RNA-Seq) analysis of the expression levels of genes located near the insertion sites
To detect whether the insertion of T-DNA affects the expression of upstream and downstream genes near the insertion site, we analyzed poplar 741 and Pb29 leaves using RNA-seq. First, healthy and mature leaves along a long branch within the upper parts of mature trees of poplar 741 and Pb29 that have been grown for 6 years in the test forest were collected. Then, total RNA was extracted from the leaves using a plant RNA extraction kit (SENO Biological Technology Co., Ltd.) in accordance with the manufacturer’s instructions. The concentration and quality of the RNA samples were determined using a NanoDrop 2000 spectrophotometer and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). After the quality of the RNA samples had been verified, a cDNA library of each sample was constructed and Illumina sequencing was performed by LC Bio Technology Co., Ltd. (Hangzhou, China). FPKM values were used to examine changes in the expression of genes upstream and downstream of the T-DNA insertion sites. Three biological replicates for each poplar line were sampled.
Identification of TAFs gene family members and expression of genes encoding TAF12 protein
The whole-genome and protein sequences of P. trichocarpa were downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/genome/98). Identified TAFs protein sequences from A. thaliana (downloaded from the Arabidopsis Information Resource; https://www.arabidopsis.org/) were used as queries in BLASTP searches against the P. trichocarpa genome with an e-value cutoff of 1e-10. Redundant sequences were manually removed, and all candidate proteins were analyzed and verified using InterProScan (http://www.ebi.ac.uk/interpro/search/sequence-search) and the Conserved Domains Database (https://www.ncbi.nlm.nih.gov/cdd). A multiple sequence alignment of TAFs proteins was generated using ClustalW in MEGA 7 (https://www.megasoftware.net) with default parameters. A neighbor-joining phylogenetic tree was constructed based on the alignment results with the following settings: Poisson model, pairwise deletion, and 1000 bootstrap replications. PtTAFs gene duplication events were analyzed using the Multiple Collinearity Scan toolkit (MCScanX; http://chibba.pgml.uga.edu/mcscan2) . The expression levels of the genes encoding TAF12 protein in leaves were analyzed using the above-mentioned RNA-seq.
Availability of data and materials
The datasets analysed during the current study are available in the Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra) under accession number PRJNA720519 and PRJNA720721.
Polymerase chain reaction
Integrative Genomics Viewer
Fragments Per Kilobase per Million
National Center for Biotechnology Information
Genetically modified organisms
Multiple Collinearity Scan toolkit
Feng H, Guo J, Wang W, Song X, Yu S. Soil depth determines the composition and diversity of bacterial and archaeal communities in a poplar plantation. Forests. 2019;10(7):550. https://doi.org/10.3390/f10070550.
Zhu J, Tian J, Wang J, Nie S. Variation of traits on seeds and germination derived from the hybridization between the sections Tacamahaca and Aigeiros of the genus Populus. Forests. 2018;9(9):516. https://doi.org/10.3390/f9090516.
Wang P, Wei H, Sun W, Li L, Zhou P, Li D, et al. Effects of Bt-Cry1Ah1 transgenic poplar on target and non-target pests and their parasitic natural enemy in field and laboratory trials. Forests. 2020;11(12):1255. https://doi.org/10.3390/f11121255.
Wang G, Dong Y, Liu X, Yao G, Yu X, Yang M. The current status and development of insect-resistant genetically engineered poplar in China. Front Plant Sci. 2018;9:1048.
Zhao C, Wang J, Zhao J, Pang D, Zhang D, Yang M. Expression characteristics of Bt gene in transgenic poplar transformed by different multi-gene vectors. Scientia Silvae Sinicae. 2019;55(09):61–70.
State Council of China (State Council). Regulations on safety of agricultural genetically modified organisms. 2001. http://www.gov.cn/flfg/2005-08/06/content_21003.htm. Accessed 16 Mar 2020.
Ren YC, Zhang J, Liang HY, Wang JM, Yang MS. Inheritance and expression stability of exogenous genes in insect-resistant transgenic poplar. Plant Cell Tiss Org. 2017;130(3):567–76. https://doi.org/10.1007/s11240-017-1247-y.
Kersten B, Leite Montalvão AP, Hoenicka H, Vettori C, Paffetti D, Fladung M. Sequencing of two transgenic early-flowering poplar lines confirmed vector-free single-locus T-DNA integration. Transgenic Res. 2020;29(3):321–37. https://doi.org/10.1007/s11248-020-00203-0.
Kumar S, Fladung M. Gene stability in transgenic aspen (Populus). II. Molecular characterization of variable expression of transgene in wild and hybrid aspen. Planta. 2001;213(5):731–40. https://doi.org/10.1007/s004250100535.
Fladung M, Kumar S. Gene stability in transgenic Aspen-Populus III. T-DNA repeats influence transgene expression differentially among different transgenic lines. Plant Biol. 2002;4(3):329–38.
Chen JM, Carlson AR, Wan JM, Kasha KJ. Chromosomal location and expression of green fluorescent protein (gfp) gene in microspore derived transgenic barley (Hordeum vulgare L.). Acta Genet Sin. 2003;30(8):697–705.
Yang L, Wang C, Holst-Jensen A, Morisset D, Lin Y, Zhang D. Characterization of GM events by insert knowledge adapted re-sequencing approaches. Sci Rep. 2013;3(1):2839. https://doi.org/10.1038/srep02839.
EFSA. Guidance for risk assessment of food and feed from genetically modified plants. EFSA J. 2011;9:2150.
Liu YG, Whittier RF. Thermal asymmetric interlaced PCR: automatable amplification and sequencing of insert end fragments from P1 and YAC clones for chromosome walking. Genomics. 1995;25(3):674–81. https://doi.org/10.1016/0888-7543(95)80010-J.
Ochman H, Gerber AS, Hartl DL. Genetic applications of an inverse polymerase chain reactio. Genetics. 1988;120(3):621–3. https://doi.org/10.1093/genetics/120.3.621.
O’Malley RC, Alonso JM, Kim CJ, Leisse TJ, Ecker JR. An adapter ligation-mediated PCR method for high-throughput mapping of T-DNA inserts in the Arabidopsis genome. Nat Protoc. 2007;2(11):2910–7. https://doi.org/10.1038/nprot.2007.425.
Papazova N, Ghedira R, Glabeke SV, Bartegi A, Windels P, Taverniers I, et al. Stability of the T-DNA flanking regions in transgenic Arabidopsis thaliana plants under influence of abiotic stress and cultivation practices. Plant Cell Rep. 2008;27(4):749–57. https://doi.org/10.1007/s00299-007-0495-4.
Yang K, Wu XL, Lang CX, Chen JQ. Isolation of the flanking sequences adjacent to transgenic T-DNA in Brassica napus genome by an improved inverse PCR method. Agric Sci Technol. 2010;11(2):65–8 139.
Guo B, Guo Y, Hong H, Qiu L. Identification of genomic insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method. Front Plant Sci. 2016;7:1009.
Park D, Kim D, Jang G, Lim J, Shin Y, Kim J. Efficiency to discovery transgenic loci in GM rice using next generation sequencing whole genome re-sequencing. Genomics Inform. 2015;13(3):81–5. https://doi.org/10.5808/GI.2015.13.3.81.
Gang H, Liu G, Zhang M, Zhao Y, Jiang J, Chen S. Comprehensive characterization of T-DNA integration induced chromosomal rearrangement in a birch T-DNA mutant. BMC Genomics. 2019;20(1):311. https://doi.org/10.1186/s12864-019-5636-y.
Zhang Y, Zhang J, Lan J, Wang J, Liu J, Yang M. Temporal and spatial changes in Bt toxin expression in Bt-transgenic poplar and insect resistance in field tests. J For Res. 2016;27(6):1249–56. https://doi.org/10.1007/s11676-016-0254-x.
Tian YC, Zheng JB, Yu HM, Liang HY, Li CQ, Wang JM. Studies of transgenic hybrid poplar 741 carrying two insect-resistant genes. Acta Bot Sin. 2000;42(3):263–8.
Gang H, Li R, Zhao Y, Liu G, Chen S, Jiang J. The birch GLK1 transcription factor mutant reveals new insights in chlorophyll biosynthesis and chloroplast development. J Exp Bot. 2019;70(12):3125–38. https://doi.org/10.1093/jxb/erz128.
Williams-Carrier R, Stiffler N, Belcher S, Kroeger T, Stern DB, Monde RA, et al. Use of Illumina sequencing to identify transposon insertions underlying mutant phenotypes in high-copy Mutator lines of maize. Plant J. 2010;63(1):167–77. https://doi.org/10.1111/j.1365-313X.2010.04231.x.
Jupe F, Rivkin AC, Michael TP, Zander M, Motley ST, Sandoval JP, et al. The complex architecture and epigenomic impact of plant T-DNA insertions. PLoS Genet. 2019;15(1):e1007819. https://doi.org/10.1371/journal.pgen.1007819.
Nordlee JA, Taylor SL, Townsend JA, Thomas LA, Bush RK. 1996. Identification of a Brazil-nut allergen in transgenic soybeans. N Engl J Med. 1996;334(11):688–92. https://doi.org/10.1056/NEJM199603143341103.
Losey JE, Rayor LS, Carter ME. Transgenic pollen harms monarch larvae. Nature. 1999;399(6733):214. https://doi.org/10.1038/20338.
Xu J, Hu H, Mao W, Mao C. Identifying T-DNA insertion site(s) of transgenic plants by whole-genome resequencing. Hereditas (Beijing). 2018;40(8):676–82.
Chandler VL, Vaucheret H. Gene activation and gene silencing. Plant Physiol. 2001;125(1):145–8. https://doi.org/10.1104/pp.125.1.145.
Hilder VA, Barker RF, Samour RA, Gatehouse AMR, Gatehouse JA, Boulter D. Protein and cDNA sequences of Bowman-Birk protease inhibitors from the cowpea (Vigna unguiculata Walp.). Plant Mol Biol. 1989;13(6):701–10. https://doi.org/10.1007/BF00016025.
Tinland B, Schoumacher F, Gloeckler V, Bravo-Angel AM, Hohn B. The agrobacterium tumefaciens virulence D2 protein is responsible for precise integration of T-DNA into the plant genome. EMBO J. 1995;14(14):3585–95. https://doi.org/10.1002/j.1460-2075.1995.tb07364.x.
Tang J, Scarth R, Fristensky B. Effects of genomic position and copy number of acyl-ACP thioesterase transgenes on the level of the target fatty acids in Brassica napus L. Mol Breed. 2003;12(1):71–81. https://doi.org/10.1023/A:1025495000264.
Cervera M, Pina JA, Juárez J, Navarro L, Peña L. A broad exploration of a transgenic population of citrus: stability of gene expression and phenotype. Theor Appl Genet. 2000;100(5):670–7. https://doi.org/10.1007/s001220051338.
Lu MZ, Hu JJ. A brief overview of field testing and commercial application of transgenic trees in China. BMC Proc. 2011;5(Suppl 7):O63. https://doi.org/10.1186/1753-6561-5-S7-O63.
Forsbach A, Schubert D, Lechtenberg B, Gils M, Schmidt R. A comprehensive characterization of single-copy T-DNA insertions in the Arabidopsis thaliana genome. Plant Mol Biol. 2003;52(1):161–76. https://doi.org/10.1023/A:1023929630687.
Ruprechta C, Carrollb A, Persson S. T-DNA-induced chromosomal translocations in feronia and anxur2 mutants reveal implications for the mechanism of collapsed pollen due to chromosomal rearrangements. Mol Plant. 2014;7(10):1591–4. https://doi.org/10.1093/mp/ssu062.
Kim SR, Lee J, Jun SH, Park S, Kang HG, Kwon S. Transgene structures in T-DNA-inserted rice plants. Plant Mol Biol. 2003;52(4):761–73. https://doi.org/10.1023/A:1025093101021.
Kim SI, Gelvin SB. Genome-wide analysis of agrobacterium t-dna integration sites in the arabidopsis genome generated under non-selective conditions. Plant J. 2007;51(5):779–91. https://doi.org/10.1111/j.1365-313X.2007.03183.x.
Magori S, Citovsky V. Epigenetic control of agrobacterium T-DNA integration. Biochim Biophys Acta. 2011;1809(8):388–94. https://doi.org/10.1016/j.bbagrm.2011.01.007.
Deng L, Deng X, Wei S, Cao Z, Tang L, Xiao G. Development and identification of herbicide and insect resistant transgenic plant B1C893 in rice. Hybrid Rice. 2014;29(1):67–71.
Jiang X, Xiao G. Detection of unintended effects in genetically modified herbicide-tolerant (GMHT) rice in comparison with nontarget phenotypic characteristics. Afr J Agric Res. 2010;5(10):1082–8.
Dai XX. Isolation of flanking sequences from a rice T-DNA insertional mutant library and function study of OsBC1L family genes. Ph.D. Thesis. Wuhan: Huazhong Agricultural university; 2009.
Liu H, Lu H, Luo L, Zhu ML. Phenotypic analysis of a rice flag leaf mutant and T-DNA flanking genes. Plant Sci J. 2017;35(5):708–15.
Esher SK, Granek JA, Alspaugh JA. Rapid mapping of insertional mutations to probe cell wall regulation in Cryptococcus neoformans. Fungal Genet Biol. 2015;82:9–21. https://doi.org/10.1016/j.fgb.2015.06.003.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–100. https://doi.org/10.1093/bioinformatics/bty191.
Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, Von Haeseler A, et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018;15(6):461–8. https://doi.org/10.1038/s41592-018-0001-7.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–45. https://doi.org/10.1101/gr.092759.109.
Wang Y, Tang H, DeBarry JD, Tan X, Li J, Wang X, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49-e.
This study was supported by the National Key Research and Development Program of China (2016YFD0600401).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The summary of sequence data from NGS.
The junction reads obtained by NGS in the Pb29 genome.
The physical characteristics of TAFs gene family in Populus trichocarpa.
Primer sequences for verifying T-DNA insertion sites.
Genomewide distribution of read coverage of poplar 741 and Pb29.
The distribution of SV variants on chromosomes in genomes of poplar 741 and Pb29. From outside to inside: chromosome coordinates (Mb), insertion, deletion, inversion, duplication and translocation.
Partial alignment result of sequence data of poplar 741 with P. trichocarpa genome.
Phylogentic analysis (a) and synteny analysis (b) of PtTAFs family genes.
About this article
Cite this article
Chen, X., Dong, Y., Huang, Y. et al. Whole-genome resequencing using next-generation and Nanopore sequencing for molecular characterization of T-DNA integration in transgenic poplar 741. BMC Genomics 22, 329 (2021). https://doi.org/10.1186/s12864-021-07625-y
- Transgenic poplar 741
- Integration site
- Copy number