Comparison of RAD and 2b-RAD tags
The REs recognition sites, including EcoRI, SbfI and HindIII for RAD, BsaXI and AlfI for 2b-RAD, were detected on the reference genomes containing 10 plants and 10 animals. Each sequence of “GAATTC” (EcoRI), “CCTGCAGG” (SbfI) and “AAGCTT” (HindIII) tested on the genomes was regarded as one recognition site, producing two tags for RAD [5]; each sequence of “ACNNNNNCTCC”, “GGAGNNNNNGT” (both for BsaXI) and “GCANNNNNNTGC” (AlfI) was regarded as one recognition site, producing one tag for 2b-RAD, respectively.
The EcoRI was one of the most commonly used REs for RAD. For 2b-RAD, the recognition sites of BsaXI were much more than AlfI by simulation. The recognition bases of EcoRI and BsaXI were both six. So the application potential of EcoRI and BsaXI were compared. The genomes size was divided by EcoRI sites numbers. The average fragment length was approximately 4000 bp of the 20 species, indicating that EcoRI recognition sites located once per 4000 bp. Each 4000 bp sequence was regarded as a window and represented a simulated tiny scaffold. The recognition sites equaled 0, 1 and more than 1 in these scaffold denoted different potentiality of assembly. Zero indicated that the scaffold was impossible to be connected to any chromosome; one indicated the scaffold could be connected with no orientation; more than one indicated potential perfect assembly with an orientation. The sites of EcoRI and BsaXI were detected in these tiny scaffolds to estimate their potentiality for chromosome assembly.
F2 populations
The F2 population materials including two F0 parents and 277 F2 progeny were gained from the rice genetic breeding laboratory of China Agricultural University. Of the two parents, the female was Nipponbare while the male was a stable recombinant inbred line for five generations of Oryza sativa spp. Japonica line whose ancestry derived from a cross between two Japonica varieties. One was Nipponbare, and the other was a Chinese landrace with the name of Mayi danru. The parents and their progeny, altogether 279 DNA were extracted from their fresh leaves according to Doyle’s protocol [20]. The DNA only with a lowest concentration of 50 ng/μL and no degradation were able to be applied for the following 2b-RAD library.
Adapter design
To create a simple and quick 2b-RAD library approach with pooling procedure, we designed two kinds of adapters. Adapter1 with 5–9 bp barcodes was complementary to the Illumina multiplexing PCR primer 1.0; adapter2 was complementary to index primer. The digested fragment of BsaXI was 33 bp with 3 bp random overhangs on the 3′ ends. The sequences of the adapter1 were:
5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCTxxxxxNNN-3′ and 5′-yyyyyAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT-3′, the “xxxxx” and “yyyyy” denoted the barcode and its complement sequences. The NNN was complementary to the 3 bp sticky end generated by BsaXI (“N” was a random base of A, G, C, T and there were 64 kinds of combinations altogether). The adapter2 was:
5′- AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -3′ and 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNN-3′. The annealed adapter1 and adapter2 were adjusted to 5 μM as working concentration.
The BsaXI digested fragment was only 33 bp and it would be completely sequenced by single-end 50 bp (SE 50), so a little longer (9 bp) barcodes would not reduce the tag sequence. The barcodes were designed according to Poland JA’s criteria [12] and made slight modification. (1) The lengths of barcodes were different form 5 to 9 bp to maximize the balance of bases at each position, especially in the BsaXI recognition sites. (2) The barcodes must be two or more bp different from all other barcodes. (3) The barcodes can’t contain or recreate (after ligation step) BsaXI restriction site. A set of 12 barcode sequence were designed (Additional file 1).
2b-RAD library preparation
The concentrations of all DNA samples were adjusted to 50 ng/μL. The DNA (200 ng) was digested in 10 μL reaction volume of NEB Buffer 4 with 1 U BsaXI (New England BioLabs Inc, Catalog # R0609L) at 37°C for 2 h. An additional DNA was digested simultaneously to detect the digestion efficiency by 1% agarose gel electrophoresis. The primary DNA band disappeared and became disperse, indicating a successful digestion. Then the ligation reaction was completed in the same tube as the digestion, combining the remaining digested DNA with 2 μL of the adapter1, 2 μL of the adapter2, 2 μL ligase buffer and 400 U of the T4 ligase (New England BioLabs Inc, Catalog # M0202L). BsaxI could not be inactivated by heating, so the digested productions was recommended performing the ligations in 4°C for 1 hour, then hold on ice [14].
For exploring the relationship between sequencing data size and enzyme sites coverage, each parents repeated 4 times in library experiment. All 285 DNA samples (277 progeny +4 female +4 male) were divided into 24 groups. Each group contained 12 samples and the last group contained 9 samples. The samples in each group were ligated with different 12 adapters.
Twelve samples in a group were pooled together when they were purified using purification kit (QIAquick PCR Purification Kit, Catalog # 28106). The ligated productions were completely combined to keep the amount of pooled DNA as 1 ~ 1.5 μg, which was sufficient for the following PCR procedure. The samples in each group with different barcode adapters were gathered to a tube which had been added the wash buffer of purification kit beforehand. Then the pooled DNA was purified according to the manufacturer’s instructions and eluted in 25 μL EB (elution buffer).
The Illumina multiplexing PCR primers were used for amplification. The sequences were:
5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′ (multiplexing PCR primer 1.0) and
5′-CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′ (index primer), where the six “X” represented a 6 bp index. The index primer sequences were derived from a BGI patent (http://www.google.ca/patents/WO2012037880A1?hl=zh-CN&cl=en) [21]. For each PCR, we combined 50 ng of pooled DNA, 1 μL of each primer (10 μM), buffers, nuclease-free water and Phusion polymerase (New England BioLabs Inc, M0531) at a final 50 μL total volume. Temperature cycling consisted of 98°C for 30 seconds followed by 12 cycles of 98°C for 30 seconds, 65°C for 30 seconds, 72°C for 30 seconds with a final Taq extension step at 72°C for 5 minutes. Four kinds of index primers were used in this study for distinguishing the libraries (Additional file 1). The expected tag fragment after PCR was from 160 bp to 164 bp (5–9 bp barcode). PCR production was detecting by 2% agarose gel electrophoresis, and then the 150–200 bp bands were cut and purified by the purification gel kit (QIAquick Gel Extraction Kit, 28704) and eluted in 30 μL EB.
The quantify molarity and library fragment size distribution of cleaned DNA were detected by an Agilent Bioanalyzer. Quantification was conducted by qPCR. SE50 sequencing of four 12-plex libraries per flowcell channel lane was performed on Hiseq 2000 platform (Illumina, Inc.). Totally six lanes were used.
The detailed protocol of whole 2b-RAD library procedure was available on Additional file 2. The complementary relationships of all the sequence used in 2b-RAD library were showed on Figure 1.
Filtering raw sequence data
The reads filtering steps were performed by our own Perl scrip as follows. (1) Matched one of the 12 barcodes allowing one mismatch. After the reads were assigned into each sample, the 5–9 bases barcode were removed. (2) The reads following on the heels of the barcode should perfectly matched BsaXI fragment “NNNNNNNNNNNNACNNNNNCTCCNNNNNNNNNN” or “NNNNNNNNNNGGAGNNNNNGTNNNNNNNNNNNN” (these “N” means random sequences of BsaXI digestion fragments) with no “Ns” (“Ns” means the bases which were failed to be sequenced by Hiseq 2000). Then the reminder bases of each read were deleted. Two kinds of 33 bp tags containing the recognition sites “ACNNNNNCTCC” or “GGAGNNNNNGT” were obtained. They were regarded as high quality reads.
The two cohesive ends of BsaXI digestion fragment were identical, so the adapter1 was probably ligated to any ends of the fragment, as well as the adapter2. So a fragment with many copies after DNA extracting and digestion, was ligated to adapters of couple of possibilities. However, only the fragment which was ligated adapter1 and adapter2 simultaneously could be PCR amplified. The fragment with only 33 bp length, was probably sequenced both form plus strand and minus strand (Figure 2). In order to solve this problem, all the tags contained “GGAGNNNNNGT” were translated to their reverse compliment sequences, namely, the form of “ACNNNNNCTCC”. And then the tags contained only one strand of the digestion fragments.
The tag mapping
For estimating BsaXI sites coverage, firstly, simulant detections of the BsaXI sites in Nipponbare reference genome were carried out. The 33 bp fragments in reference containing recognition sites of “GGAGNNNNNGT” or “ACNNNNNCTCC” were picked out. Then these sequences were mapped to reference using Bowtie (version 0.12.7) with the parameter of -m 1 -v 2 allowing 2 mismatches. The sequences which aligned with only one location were regarded as unique tags and the expected potential markers.
Secondly, the high quality reads of 285 samples were mapped to Nipponbare reference (Bowtie -m 1 -v 1). The mapped reads were detected whether they covered simulant tags.
To look for the relationship between sequencing data size and BsaXI sites coverage, a testing was performed based on several combinations of 4 repeated female samples. The female was Nipponbare, as well as the reference genome. Therefore the testing could accurately reflect the relationship.
After digestion, the fragments between two recognition sites also possessed sticky end “NNN” as follows.
The structure made the fragments possible to be ligated with adapters and performed PCR amplification. If their length was nearly 33 bp, they would not be removed in the agarose gel size selection and be sequenced. So the reads matched barcodes while not matched BsaXI recognition sites were analyzed. After removing redundant sequences (barcode and part of atapter2), they were mapped to Nipponbare reference genome (Bowtie -m 1 -v 1).
SNP calling with and without reference
Any two of the female samples combination could cover nearly 100% of the simulant REs sites, so we combined female-2 and female-3 together (with the lowest data size) to perform SNP-calling. Same treatment to male samples of male-2 and male-4 were performed.
The SNP calling without references were performed using Stacks (version 1.01) [22] with four programs. The parameters were -m 2 -M 2 -N 1 for ustacks, -n 2 for cstacks, default for sstacks and -m 2 -t F2 -o joinmap -c --min_hom_seqs 2 --min_het_seqs 0.010 --max_het_seqs 0.011 for genotypes. The “-m” in genotypes means minimum reads depth for genotype. We used –m 2 as the depth and the automated corrections system (-c) in consideration of the lower sequencing depth of the progeny. The markers of aa x bb style were selected for genetic map. Meanwhile, the female tags of these markers were mapped to Nipponbare (Bowtie -m 1 -v 1).
To construct the genetic linkage map, the markers derived from Stacks were pretreated. The markers with less than 20% missing rate (<54 samples) were preserved. The linkage analysis was performed using JoinMap (version 4.1, http://www.Joinmap.nl). The SNP genotypes in F2 population were expected to segregate at a 1:2:1 ratio. Distorted markers (P <0.01) were filtered by χ2 test. The parameters using JoinMap were “independence LOD” for grouping method, “regression mapping” for mapping algorithm, “Kosambi’s” for mapping function method. The groups with less than 5 markers were discarded. Each linkage map marker’s location in alignment result was retrieved to measure accuracy of the genetic map. Of the sorted markers via genetic distances in each group, the one which was significantly opposite the mainstream of the order (more than 2Mbp to its adjacent two markers) was regarded as “order error” marker.
To roughly evaluate SNP numbers between two parents, the other way for genotyping relied on reference genome. Firstly, reads of two parents were mapped to the Nipponbare reference using SOAP2 (version 2.21) with default. The aligned reads were performed SNP calling using SOAPsnp (version 1.03) with default. The SNPs in cns file (SOAPsnp result file) were preserved only when the quality score of consensus genotype were more than 10. After integrating their SNP results, the genotype of aa x bb style between two parents were obtained. The two results between SOAPsnp and Stacks were compared.
Comments
View archived comments (1)