RNA-seq based SNPs in some agronomically important oleiferous lines of Brassica rapaand their use for genome-wide linkage mapping and specific-region fine mapping
© Paritosh et al.; licensee BioMed Central Ltd. 2013
Received: 30 November 2012
Accepted: 1 July 2013
Published: 9 July 2013
Brassica rapa (AA) contains very diverse forms which include oleiferous types and many vegetable types. Genome sequence of B. rapa line Chiifu (ssp. pekinensis), a leafy vegetable type, was published in 2011. Using this knowledge, it is important to develop genomic resources for the oleiferous types of B. rapa. This will allow more involved molecular mapping, in-depth study of molecular mechanisms underlying important agronomic traits and introgression of traits from B. rapa to major oilseed crops - B. juncea (AABB) and B. napus (AACC). The study explores the availability of SNPs in RNA-seq generated contigs of three oleiferous lines of B. rapa - Candle (ssp. oleifera, turnip rape), YSPB-24 and Tetra (ssp. trilocularis, Yellow sarson) and their use in genome-wide linkage mapping and specific-region fine mapping using a RIL population between Chiifu and Tetra.
RNA-seq was carried out on the RNA isolated from young inflorescences containing unopened floral buds, floral axis and small leaves, using Illumina paired-end sequencing technology. Sequence assembly was carried out using the Velvet de-novo programme and the assembled contigs were organised against Chiifu gene models, available in the BRAD-CDS database. RNA-seq confirmed the presence of more than 17,000 single-copy gene models described in the BRAD database. The assembled contigs and the BRAD gene models were analyzed for the presence of SSRs and SNPs. While the number of SSRs was limited, more than 0.2 million SNPs were observed between Chiifu and the three oleiferous lines. Assays for SNPs were designed using KASPar technology and tested on a F7-RIL population derived from a Chiifu x Tetra cross. The design of the SNP assays were based on three considerations - the 50 bp flanking region of the SNPs should be strictly similar, the SNP should have a read-depth of ≥7 and no exon/intron junction should be present within the 101 bp target region. Using these criteria, a total of 640 markers (580 for genome-wide mapping and 60 for specific-region mapping) marking as many genes were tested for mapping. Out of 640 markers that were tested, 594 markers could be mapped unambiguously which included 542 markers for genome-wide mapping and 42 markers for fine mapping of the tet-o locus that is involved with the trait tetralocular ovary in the line Tetra.
A large number of SNPs and PSVs are present in the transcriptome of B. rapa lines for genome-wide linkage mapping and specific-region fine mapping. Criteria used for SNP identification delivered markers, more than 93% of which could be successfully mapped to the F7–RIL population of Chiifu x Tetra cross.
The Next-Generation Sequencing (NGS) technologies are being extensively used for genome-wide genetic marker development through RNA-seq, reduced-representation sequencing, restriction-site-associated DNA sequencing (RAD-seq) and low-coverage genotyping . Availability of abundant markers will facilitate association mapping, marker aided selection (MAS), and fine mapping of regions of interest for circumventing the problem of linkage drag during introgressions and for map based cloning.
NGS technologies have also contributed to completion of reference genome sequences of many important crops [2, 3]. Availability of reference genomes will facilitate characterization of variability within a crop and its wild relatives by high throughput re-sequencing. In the family Brassicaceae, model species Arabidopsis thaliana was sequenced by the Sanger method using aligned overlapping BACs . Using NGS technologies, a large number of ecotypes have been sequenced in a much shorter span of time . Eventually 1001 ecotypes will be sequenced.
The first crop species sequenced from Brassicaceae is Brassica rapa (2n = 20, AA genome) . The assembled sequence of 283.8 Mb covers more than 98% of the gene space. Sequencing was carried out using Illumina GA II technology. Sequence data was integrated with BAC-end sequences obtained through the Sanger sequencing method. Sequencing work was carried out on line Chiifu, a leafy vegetable type of B. rapa belonging to ssp. pekinensis. The genome sequence of Chiifu is available on BRAD, a genomic database created for B. rapa and other Brassica species [7, 8].
Genus Brassica contains some of the most important vegetable and oleiferous crops of the world. The relationship of the six crop species namely, B. rapa (AA, 2n = 20), B. nigra (BB, 2n = 16), B. oleracea (CC, 2n = 18), B. juncea (AABB, 2n = 36), B. napus (AACC, 2n = 38), B. carinata (BBCC, 2n = 34) was first described by U  and later confirmed by others using molecular markers . Crop Brassicas display a range of morphotypes, which include vegetable types where root, leaves, stems and inflorescence have been modified for human consumption, oilseed types and condiment types, all selected under domestication . Three of the species namely B. juncea, B. napus and B. carinata are recent allopolyploids with the full chromosome compliment of the two parental genomes. The three diploids are paleohexaploids with extensive chromosomal rearrangements, gene subfunctionalization and loss [6, 12, 13]. Comparative genomic studies have shown that the gene blocks identified in A. lyrata and A. thaliana are represented at least three times in B. rapa, although every gene in a block is not necessarily represented by three paralogs [12–15].
B. rapa as a species shows enormous morphological variability, containing both vegetable types and oilseed types, and has extensive geographical distribution [11, 16]. Different morphotypes have been classified under subspecies (ssp.). There is however, no consensus on the relationship of different types . A recent classification has recognised 10 ssp. in B. rapa. The most extensive study to date on variability within B. rapa, conducted on 161 accessions with AFLP markers, has shown that oleiferous types of a region are closer to the vegetable types of that region rather than to the oleiferous types of the other regions, thereby implying independent domestication of the oleiferous types in many regions .
In the present study we have carried out RNA-seq of three different oleiferous lines of B. rapa namely, YSPB-24 and Tetra (both belonging to the Yellow sarson group, ssp. trilocularis), Candle (turnip rape, ssp. oleifera) and a vegetable type line Chiifu using Illumina GA II technology to find out if sufficient numbers of SNPs are available for genome-wide mapping and for fine mapping in specific regions of the genome. YSPB-24 has a typical bilocular ovary and Tetra is an interesting variant in the Yellow sarson group as it has a tetralocular ovary. The most probable region of origin of the Yellow sarson lines is Eastern India. These lines are extensively grown in this region. Candle is an oilseed line of European origin. Therefore, the study includes two closely related oleiferous lines (YSPB-24 and Tetra), which are distant from the oleiferous line Candle. All the three lines are divergent from the leafy vegetable type line Chiifu that has been sequenced recently.
RNA-seq can provide the most informative SSRs and SNPs for gene synteny based comparative genomics [19–22] and association mapping. A number of programs have been developed for SNP identification from the NGS data [23, 24]. Marking SNPs in the single-copy genes has been worked out reasonably well. However, in paleoploid species like B. rapa we require SNPs not only for marking allelic variation but also for marking the paralogs as has been the case for more recent allopolyploid species like wheat or B. napus[25–28].
We report that a sufficient number of SNPs are available in the B. rapa single-copy genes for genome-wide mapping and in the paralogs to mark both the allelic differences and paralog specific differences to saturate a specific region with unique marker probes. This strategy for genome-wide mapping and specific-region fine mapping has been tested using a F7-RIL population of a Chiifu x Tetra cross using KASPar oligo technology . A total of 542 SNPs have been mapped on the 10 linkage groups of B. rapa and 52 SNPs have been mapped in a region of LG A4, which we have recently shown to contain the tet-o locus that encodes the trait - tetralocular ovary [unpublished].
Plant material and sequencing
Four Brassica rapa lines – YSPB-24 and Tetralocular (Yellow sarson types, ssp. trilocularis), Candle (ssp. oleifera) and Chiifu (ssp. pekinensis) were used for RNA sequencing. Inflorescence, with all the unopened flower buds, along with a few small leaves was used for RNA extraction. This amalgam of organs represents most of the aerial tissues of a Brassica plant. Paired end cDNA libraries, used for sequencing, were prepared from the poly-A containing RNA and sequenced as 2x101 nt reads on the Illumina GAIIX sequencer.
Sequencing samples of Tetra, YSPB-24 and Candle were run in two lanes of the flow cell for sequencing whereas Chiifu sample was run in only one lane. As information on more than 98% of the gene space is available for Chiifu and has been organized as CDS in the BRAD database , the limited Chiifu transcriptome sequencing was carried out to check the overall quality of the assembly of the raw sequencing data obtained in this study.
Data filtering and de-novoassembly of the transcriptome
Sequencing and assembly statistics of four different lines of B. rapa
Total number of reads
Paired ends (after filtering low quality reads)
Number of contigs
Percentage of reads assembled
Maximum length of contigs (bp)
N50 contig length (bp)
Mean depth of the contigs
Sequence comparison of YSPB-24, Tetra and Candle with Chiifu
We compared the Chiifu transcriptome sequence obtained in this study with the Chiifu gene models described in the BRAD database . A total of 43,110 out of 44,972 contigs obtained in this study could be aligned with the CDS sequences reported in the BRAD database using the parameter of minimum 96% identity, indicating quality assembly of the raw reads in this study. However, the assembled contigs of Chiifu and the three oleiferous type lines were predominantly partial sequences (ESTs). Therefore, in the rest of the study the contigs obtained from the three oleiferous lines – YSBP-24, Tetra and Candle were compared with the more detailed full-length CDS (predicted gene models) available for the line Chiifu in the BRAD database.
Homology based grouping of contigs of three oleiferous B. rapa lines with Chiifu
Number of repeat motifs identified in the gene models of Chiifu and transcriptome sequences of the three lines of B. rapa
Identification of SNPs between different lines of B. rapa
Number of SNPs identified between Chiifu and three lines of B. rapa using Maq and MUMmer tools
Total SNPs (filtered)
Two copy genes
Three copy genes
(a) Maq software output
Chiifu vs. Tetra
Chiifu vs. YSPB-24
Chiifu vs. Candle
Tetra vs. YSPB-24
Tetra vs. Candle
YSPB-24 vs. candle
(b) MuMmer tool output
Chiifu vs. Tetra
Chiifu vs. YSPB-24
Chiifu vs. Candle
Tetra vs. YSPB-24
Tetra vs. Candle
YSPB-24 vs. candle
When homologs of different B. rapa lines (Additional file 1) were analysed using the MUMmer tool, 249,671, 231,259 and 266,349 SNPs were identified between ‘Chiifu and Tetra’, ‘Chiifu and YSPB-24’ and ‘Chiifu and Candle’, respectively (Table 4b). These were further categorized on the basis of their being present in single-copy genes, or in genes with two or three paralogs.
For the identification of SNPs between lines other than Chiifu, homologs were compared in pair wise combinations of ‘Tetra vs. YSPB-24’, ‘Tetra vs. Candle’ and ‘YSPB-24 vs. Candle’ using both Maq (Table 4a) and MUMmer programs (Table 4b). Very high SNP frequencies were found for Candle vs. the two Yellow sarson lines. The least amount of polymorphism (20,310 SNPs with the MUMmer program) was recorded between Tetra and YSPB-24, the two closely related lines.
In general, the number of SNPs recoded when the three oleiferous lines were compared with each other was lesser than the number recorded when the assembled contigs were compared with the full-length Chiifu CDS sequences available in the BRAD database (Table 4). As the contigs generated through Velvet assembly in the study were partial sequences, increasing the coverage could provide more SNPs for fine mapping.
Marker development from the identified SNPs
A survey of the BRAD database showed that 42% of the gene models have been reported as single-copy genes and these are well distributed throughout the B. rapa genome (Additional file 1). Developing markers from the single-copy genes for genome-wide linkage analysis was therefore considered to be the ideal strategy.
We used Chiifu vs. Tetra polymorphism data for SNP-marker development as these could be tested on a F7-RIL population of Chiifu x Tetra developed in our laboratory. SNPs were identified using the following criteria
SNPs should be surrounded with a conserved flanking region of ≥50 bp on both sides. This length of sequences would allow flexibility in positioning the oligos for amplification.
The region of 101 bp, containing the variable base at the middle, should not have any exon-intron junctions.
Read-depth of each of the identified SNPs should be ≥7.
A total of 6,451 single-copy genes that showed ≥ 90% sequence identity between Chiifu and Tetra were compared for SNPs. Applying the first criteria of 50 bp conserved sequences around the SNP, the number of useful contigs was reduced to 4,990. The number got reduced to 2,836 when sequences containing the intron/exon junctions were removed. Further application of the read-depth criteria reduced the number of contigs to 2,113, which could be used for marker development. To test our selection criteria for SNP marking, a set of 580 sequences, all representing single-copy genes that are well distributed throughout the genome were selected for developing SNP assays.
For specific-region mapping, the region around the tet-o locus was targeted. This locus has been earlier mapped (unpublished) on LG4 (A4) in a region that contains the S and I blocks of this linkage group. Sixty genes from this area were selected for developing SNP markers of which 42 were single-copy genes and 18 had two or three paralogs present in the genome. For such multi-copy genes, the paralogs were aligned and both the allele-specific SNP and paralog-specific variations (PSVs) were marked. Allele specific SNPs were selected wherein the 50 bp on either side of the SNP had some paralog specific variations (PSVs), so that locus specific primers could be designed for marking the specific paralog of interest.
Oligos for SNP genotyping were synthesized by KASPar technology [http://www.kbioscience.co.uk] using FRET quencher oligos competitor allele specific arrays. A total of 640 SNP based markers were developed between Chiifu and Tetra lines. The sequence of the marker oligos are provided in Additional file 2.
Linkage mapping in Chiifu x Tetra F7RIL population
Characteristics of 10 linkage groups of B. rapa map constructed with 594 SNP and 138 IP and SSR markers
Total number of markers
No. of intervals
Average interval size (cM)
We have earlier developed a linkage map of three different mapping populations in B. rapa – Chiifu x Tetra RIL-F6, Chiifu x Tetra F2 and Chiifu x YSPB-24 F2 using IP (genic) and SSR makers (mostly non-genic). These maps have now been integrated with the SNP map. A comparison of the features of the four parental maps, i.e. CTF2, CTF6, CYF2 and CTF7 is given in Additional file 3. IP and SSR markers genotyped on CTF7 in this study have been mapped previously in all these three mapping populations (unpublished) and these were used as anchor markers for developing the integrated map.
The integrated map has a total of 1,036 markers (211 IP, 230 SSR and 594 SNP) and a morphological marker ‘tet-o’. The features of the integrated map have been described in Additional file 4. The total genetic length spanned by the 10 linkage groups of the integrated map was 831.0 cM. The new integrated map is shown in Additional file 5.
Although NGS technologies can be used in a variety of ways for mapping, we have opted in this study to use them for testing whether RNA-seq could provide adequate genic SSRs and SNPs for both genome-wide linkage analysis and for fine mapping of a specific region. Our results show that the number of polymorphic genic SSRs is rather limited, but abundant SNPs are available between the distantly related (Chiifu and Tetra, Chiifu and YSPB-24 and Chiifu and Candle) lines. The extent of polymorphism is low between the two closely related lines, YSPB-24 and Tetra. In general, the numbers of SNPs available are adequate both for genome-wide mapping and specific-region fine mapping.
A number of techniques are available for marking SNPs . These have been broadly classified as allele specific hybridization, allele-specific single-base extension and allele specific enzymatic cleavage. Different methods have been developed for detection of allele specific products and a number of technology platforms have been developed for allele based sequence determination. All the technologies have been used in one or the other study of SNP based mapping in plants. We chose KASPar technology as it seems to be the most appropriate technology for the most frequently encountered mapping situations in crop genetics – (a) relatively small populations and a reasonable number of markers for genome-wide linkage mapping and (b) large populations and a small number of markers for specific-region fine mapping. Markers for 640 SNPs were developed and 594 could be successfully mapped. The technology also allows development of SNP markers that can differentiate paralogs and allelic differences in one reaction.
Considerable difficulty has been encountered in marking allelic SNPs in allotetraploid species like Triticum aestivum (wheat) [28, 35, 36], Gossypium hirsutum (cotton)  and B. napus (rapeseed)  due to the presence of homeologous chromosomes . B. rapa, in contrast to the three species mentioned above, is an ancient paleoploid with three genomes that have gone through extensive gene fractionation and chromosomal rearrangements. At the genomic level there has been extensive gene loss, leading to many genes being present as single-copy genes besides some having two or three paralogs .
The data available for Chiifu in the BRAD CDS database and NGS based RNA-seq carried out in this study show that a very large proportion of genes present in B. rapa (42% in Chiifu) exist as single-copy genes and the nucleotide polymorphism that exists between the single-copy genes of various lines is sufficient for genome-wide as well as specific-region fine mapping. Sufficient nucleotide level polymorphism also exists between the paralogs to mark these through PSVs. In our study, > 92% of identified SNPs could be converted to successful assays using the KASPar technology. Our success with KASPar markers is higher than what has been reported in wheat using this technology . The possible reasons for this could be – (a) a reference genome is available in B. rapa, (b) the frequency of single-copy genes in the genome is very high and (c) more stringent criteria were used in this study for developing SNP detection assays.
Use of molecular markers for genetic mapping in B. rapa began with an extensive use of RFLP markers . Use of AFLP and SRAP markers provided more extensive marker densities [39, 40]. These markers, though abundant, are anonymous and do not provide any information on genomic synteny and therefore, are difficult to use for fine-mapping. Markers obtained from the gene space of a species, are most informative. EST-cDNA probes [14, 41], genic SSRs [42, 43], intron polymorphism (IP) markers [15, 43] and InDels (Sequence Tagged Sites, STS) , have all been successfully used for comparative mapping and studying genomic synteny amongst the Brassica species belonging to the U’s triangle .
The large number of SNPs available in B. rapa will allow more involved genome-wide linkage mapping and also association studies. A core set of 168 B. rapa lines has been identified and used in a genome wide association study using predominately AFLP markers . It should be possible to use SNPs in future genome-wide association studies. However, we expect that major emphasis in Brassica species will be on mapping specific traits like disease [46–49] and pest resistance and QTL for yield [50–52] through the use of populations derived from biparental crosses. This is evident from extensive work that has been carried out on mapping of such traits using biparental crosses. Precise introgressions while avoiding linkage drag, particularly when unadapted germplasm is used, is going to be the key to improvement of a large number of vegetable and oleiferous crops available within the Brassica species belonging to the U’s triangle. The ready availability of SNPs for both background selection (general markers) and precise introgression (specific-area markers) will help in introduction from unadapted to adapted germplasm and from the diploid species to allotetraploid species. Specific-area marking and mapping of genes, as has been shown in this study for the region containing the tet-o locus, will also be useful for map based cloning.
Plant material, RNA extraction and library preparation
The four Brassica rapa lines - YSPB-24, Tetralocular (Yellow sarson types, ssp. trilocularis, seeds procured from Indian Agriculture Research Institute, India) Candle (ssp. oleifera, seeds procured from Gerhard Rakow, Agri-Food, Canada), Chiifu (ssp. pekinensis, seeds procured from Lim Yong Pyo, Chungnam National University, South Korea) used for transcriptome analysis were grown in the field during the mustard growing season (October – March). Tissues for RNA isolation were taken from the field-grown plants at the time of flowering.
Inflorescence with unopened flower buds along with a few small leaves, was used for RNA extraction. Harvested tissues were immediately frozen in liquid nitrogen. Total RNA was isolated using Total RNA Spectrum Kit (Sigma), following the manufacturer’s instructions. Contaminating DNA was removed by DNase treatment (DNaseA Kit, Ambion). RNA was further purified by treatment (thrice) with acidic phenol: chloroform (1:1). RNA was quantified using Nanodrop ND1000 spectrophotometer (Nanodrop Technologies). Integrity of the obtained RNA samples was checked on Agilent 2100 Bio analyzer. RNA samples with RIN value ≥ 7 were used for further experiments.
Paired end cDNA libraries, used for sequencing, were prepared from 20 μg of total RNA using predominantly the reagents available in the mRNA-seq Sample Preparation Kit (Illumina). mRNA was isolated from the total RNA with magnetic oligo (dT) beads. Purified mRNA was fragmented by treatment with divalent cations for 5 min (solution is provided in the kit). The obtained mRNA fragments were transcribed into first strand cDNA using Superscript II reverse transcriptase (Invitrogen), followed by second-strand cDNA synthesis using DNA polymerase and RNaseH. Double stranded cDNA molecules were purified by a QIAquick PCR purification kit (Qiagen). End repair of the double-stranded cDNA was carried out using T4 DNA polymerase, the Klenow DNA polymerase and T4 polynucleotide kinase. DNA was purified again using a QIAquick PCR purification kit. This was followed by a single ‘A’ base addition at the 3’ end of the double stranded cDNA molecules using Klenow 3’ to 5’ exo-polymerase followed by purification of modified cDNA molecules using a MinElute PCR purification kit (Qiagen). Sequencing adaptors were ligated to the ‘A’ tailed fragments using T4 ligase. Adaptor ligated cDNA fragments were separated on 2% agarose gel and fragments ranging in size from 200–250 bp were excised from the gel and purified using a QIAquick Gel Extraction Kit. PCR was performed for 15 cycles from the purified DNA molecules using the adaptor specific primers (available in the Illumina kit) and the amplified fragments were purified using a QIAquick PCR Purification Kit. The size and quantity of the obtained enriched cDNA libraries was checked on an Agilent 2100 Bio analyzer. Libraries with single discrete band of 200–250 bp were selected for sequencing reactions. The libraries were sequenced as 2 x 101 nt paired end reads on the Genome Analyzer IIx instrument (GAIIx, Illumina).
Data filtering and de-novoassembly of the transcriptome
Data obtained from the sequencer were processed for image deconvolution and quality value calculation of each base using the CASAVA package [Version 1.6, Illumina]. Quality check was done using Fastx-toolkit . Reads with more than 30% bases having a Phred quality score < 20 were removed from the analysis using fastq_quality_filter (−q 20, -p 70). Thirty-one bases of the tail region, which were found to have low Phred score, were removed from the obtained sequences using Fastx-quality_trimmer (−t 31).
The assembly of the filtered reads was done using Velvet de-novo assembly program with the velvetg main parameters: -ins_length_sd 20 -ins_length (variable, depending on the insert length) -read_trkg yes -min_contig_lgth 100 -scaffolding yes -alignments yes. Obtained contig sequences are available on request.
SNP marker development
For marker development SNPs were mined from the EST contigs of Tetra generated in this study and the reference gene model sequence data of Chiifu available in the BRAD database. For identifying SNPs in the single-copy genes, the reference sequence dataset of 17,572 single-copy gene sequences was compared against corresponding Tetra contigs identified as single-copy genes. Sequences showing ≥ 90% homology were compared for SNPs using the MUMmer software. After removing sequences with insertions/deletions, sequences with SNPs and 50 bp conserved flanking sequence on both sides of the SNP were excised from the contigs using custom perl scripts. For identifying allele specific SNPs in genes with two/three paralogs, the Tetra contigs were compared against the reference sequence of Chiifu by conducting a BLAST search . The paralogs were separated based on the homology and paralog specific variations (PSVs). The allelic variations were marked using custom PERL scripts and SNPs with 50 bp on each side of SNP were excised as before. PSV containing sequences were selected to design paralog specific primers. As the SNPs were identified using EST sequences, BLAST search was conducted with all the excised 101 bp sequences containing SNPs against the B. rapa genome sequences of Chiifu available in the BRAD database (Brapla_genome_data_v2.1) to identify intron/exon junctions. Selected SNPs showing the presence of such boundaries in the flanking regions were discarded, as the oligo designing at such positions would result in assay failure.
The cut-off read depth for the identified SNP was set at ≥ 7. For this, a Velvet assembly generated .afg file was converted to .ace file and further pileup file was generated using Samtools . The depth of each of the base (A, T, G and C) was calculated using custom perl scripts.
SNP genotyping and construction of linkage map
For validation and mapping of SNPs, KBioscience KASPar assay  was used. The primer design and assay development was undertaken by KBiosciences . The ID and features of identified putative SNPs along with 100 bp sequences are provided in the Additional file 2.
For genotyping the SNPs markers, a Recombinant Inbred Line (RIL) population of 93 individuals in the F7 generation derived from a cross between the leafy vegetable type line Chiifu and the oleiferous line Tetra was used. From the already developed map of B. rapa using F6 -RIL population of Chiifu and Tetra in our lab (unpublished), randomly selected 99 IP and 39 SSR markers, well dispersed on all the 10 linkage groups were also genotyped on F7-RILS to develop a skeleton map. PCR reactions and product separation for IP and SSR markers were performed as described earlier . Linkage groups were established at a LOD > 6.0 with Join Map 4.0  following the mapping criteria of Pradhan et al. . The recombination fractions were transformed to map distances with the Kosambi function . The graphical representation of the linkage groups was generated by Map Chart 2.2 .
For the construction of an integrated map, marker information from three mapping populations CTF2 (Chiifu x Tetra F2), CTF6 (Chiifu x Tetra F6) and CYF2 (Chiifu x YSPB-24 F2) [unpublished] along with the genotyped SNP markers was used. Recombination fraction data from the CTF2, CTF6 and CYF2 were merged with the recombination data of CTF7 map and heterogeneity tests were performed for pairs of markers common to the four mapping populations. A set of markers from each linkage group was identified on the basis of the order among the component maps (CTF2, CTF6, CTF7 and CYF2) and was used to define a fixed order for the construction of an integrated map. Common marker pairs that differed significantly (p < 0.01) in the recombination frequencies were excluded from mapping. An integrated map was generated by Join Map 4.0 with the combine groups for a map integration function using the regression mapping algorithm.
RNA-seq of three agronomically interesting oleiferous lines of B. rapa using paired end sequencing provided a large number of SNPs for both genome-wide mapping and for fine mapping in specific areas of the genome. High confidence SNPs in homologs were selected for KASPar based genetic marker development by three-step selection criteria. High frequency (~96%) of markers could be mapped successfully in the B. rapa genome. KASPar technology can be effectively used for marking allelic SNPs and PSVs for marking paralogs in B. rapa lines, and can be used for marking genes with multiple paralogs in any genome. Also, availability of large number of SNPs will allow fine mapping of regions of interest.
KP carried out all the sequencing reactions and the bioinformatics work. VG helped with the development of the SNP markers. SKY, PP-M and AKP carried out the mapping work. YSS developed and maintained the mapping population. DP initiated the study and wrote the manuscript with inputs from all the authors. All authors read and approved the final manuscript.
The work was supported by a special grant from the Department of Biotechnology (DBT), Government of India. Research on mustard in our laboratory is supported by funding from the National Dairy Development Board (NDDB) and DBT in the form of a Centre of Excellence in mustard breeding. DP acknowledges support of DST through a J. C. Bose fellowship. We thank Dr Sarita Sharma and Dr K Lakshmi Padmaja for discussions on synteny and Dr Manish Kumar for the choice of the hardware used in the study. Meena Bhandari, Simi Pahwa and Jagdish Kumar provided technical support. We thank Dr Manu Agarwal and Dr Surekha Katiyar-Agarwal for use of the Solexa sequencer and Ankur Bhardwaj and Gopal Joshi for help with the sequencing runs.
- Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML: Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet. 2011, 12 (7): 499-510. 10.1038/nrg3012.View ArticlePubMedGoogle Scholar
- Morrell PL, Buckler ES, Ross-Ibarra J: Crop genomics: advances and applications. Nat Rev Genet. 2012, 13 (2): 85-96.Google Scholar
- Salse J: In silico archeogenomics unveils modern plant genome organisation, regulation and evolution. Curr Opin Plant Biol. 2012, 15 (2): 122-130. 10.1016/j.pbi.2012.01.001.View ArticlePubMedGoogle Scholar
- Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408 (6814): 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Cao J, Schneeberger K, Ossowski S, Gunther T, Bender S, Fitz J, Koenig D, Lanz C, Stegle O, Lippert C, et al: Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet. 2011, 43 (10): 956-963. 10.1038/ng.911.View ArticlePubMedGoogle Scholar
- Brassica rapa Genome Sequencing Project Consortium: The genome of the mesopolyploid crop Brassica rapa. Nat Genetics. 2011, 43: 1035-1039. 10.1038/ng.919.View ArticleGoogle Scholar
- Cheng F, Liu S, Wu J, Fang L, Sun S, Liu B, Li P, Hua W, Wang X: BRAD, the genetics and genomics database for Brassica plants. BMC Plant Biol. 2011, 11: 136-10.1186/1471-2229-11-136.PubMed CentralView ArticlePubMedGoogle Scholar
- Brassica database. http://brassicadb.org/brad,
- U N: Genome analysis of Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Japan J Bot. 1935, 7: 389-452.Google Scholar
- Song KM, Osborn TC, Williams PH: Brassica taxonomy based on nuclear restriction fragment length polymorphisms (RFLPs). Theor Appl Genet. 1988, 75: 784-794.View ArticleGoogle Scholar
- Prakash S, Wu X-M, Bhat SR: History, evolution and domestication of Brassica crops. Plant Breeding Rev. Vol 35. Edited by: Janick J. 2012, Hoboken, NJ, USA: John Wiley & Sons, Inc, 19-84.Google Scholar
- Schranz ME, Lysak MA, Mitchell-Olds T: The ABC’s of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 2006, 11: 535-542. 10.1016/j.tplants.2006.09.002.View ArticlePubMedGoogle Scholar
- Parkin IAP: Chasing ghosts: Comparative mapping in the Brassicaceae. Genetics and Genomics of Brassicaceae, Plant Genetics and Genomics: Crop and Models 9. Edited by: Schmidt R, Bancroft I. 2011, New-Yor: Springer Science + Business Media, 153-170.View ArticleGoogle Scholar
- Parkin IA, Gulden SM, Sharpe AG, Lukens L, Trick M, Osborn TC, Lydiate DJ: Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics. 2005, 171 (2): 765-781. 10.1534/genetics.105.042093.PubMed CentralView ArticlePubMedGoogle Scholar
- Panjabi P, Jagannath A, Bisht NC, Padmaja KL, Sharma S, Gupta V, Pradhan AK, Pental D: Comparative mapping of Brassica juncea and Arabidopsis thaliana using Intron Polymorphism (IP) markers: homoeologous relationships, diversification and evolution of the A, B and C Brassica genomes. BMC Genomics. 2008, 9: 113-10.1186/1471-2164-9-113.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao J, Wang X, Deng B, Lou P, Wu J, Sun R, Xu Z, Vromans J, Koornneef M, Bonnema G: Genetic relationships between Brassica rapa as inferred from AFLP fingerprints. Theor Appl Genet. 2005, 110: 1301-1314. 10.1007/s00122-005-1967-y.View ArticlePubMedGoogle Scholar
- Gomez-Campo C: Taxonomy. Biology of Brassica Coenospecies. Edited by: Gomez-Campo C. 1999, Amsterdam, The Netherlands: Elsevier Science B. V, 3-23.View ArticleGoogle Scholar
- Diederichsen A: Brassica rapa group. Mansfeld’s Encyclopedia of Agricultural and Horticultural Crops. Edited by: Hanelt P. 2001, Berlin Heidelberg New York: Springer, 1446-1453.Google Scholar
- Haseneyer G, Schmutzer T, Seidel M, Zhou R, Mascher M, Schon CC, Taudien S, Scholz U, Stein N, Mayer KF, et al: From RNA-seq to large-scale genotyping - genomics resources for rye (Secale cereale L.). BMC Plant Biol. 2011, 11: 131-10.1186/1471-2229-11-131.PubMed CentralView ArticlePubMedGoogle Scholar
- Oliver RE, Lazo GR, Lutz JD, Rubenfield MJ, Tinker NA, Anderson JM, Wisniewski Morehead NH, Adhikary D, Jellen EN, Maughan PJ, et al: Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology. BMC Genomics. 2011, 12: 77-10.1186/1471-2164-12-77.PubMed CentralView ArticlePubMedGoogle Scholar
- Wei W, Qi X, Wang L, Zhang Y, Hua W, Li D, Lv H, Zhang X: Characterization of the sesame (Sesamum indicum L.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genomics. 2011, 12: 451-10.1186/1471-2164-12-451.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT: SNP discovery by high throughput sequencing in soybean. BMC Genomics. 2011, 11: 469-View ArticleGoogle Scholar
- Tang J, Vosman B, Voorrips RE, van der Linden CG, Leunissen JA: QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics. 2006, 7: 438-10.1186/1471-2105-7-438.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18 (11): 851-1858.View ArticleGoogle Scholar
- Allen AM, Barker GL, Berry ST, Coghill JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D’Amore R, McKenzie N, et al: Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnol J. 2011, 9 (9): 1086-1099. 10.1111/j.1467-7652.2011.00628.x.View ArticlePubMedGoogle Scholar
- Kaur S, Francki MG, Forster JW: Identification, characterization and interpretation of single-nucleotide sequence variation in allopolyploid crop species. Plant Biotechnol J. 2012, 10 (2): 125-138. 10.1111/j.1467-7652.2011.00644.x.View ArticlePubMedGoogle Scholar
- Trick M, Long Y, Meng J, Bancroft I: Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing. Plant Biotechnol J. 2009, 7 (4): 334-346. 10.1111/j.1467-7652.2008.00396.x.View ArticlePubMedGoogle Scholar
- Trick M, Adamski NM, Mugford SG, Jiang CC, Febrer M, Uauy C: Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA) to fine-map genes in polyploid wheat. BMC Plant Biol. 2012, 12: 14-10.1186/1471-2229-12-14.PubMed CentralView ArticlePubMedGoogle Scholar
- KBiosciences Ltd. http://www.kbioscience.co.uk/,
- Zerbino DR, Birney E: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.PubMed CentralView ArticlePubMedGoogle Scholar
- MISA- Microsatellite Searching Tool. http://www.pgrc.ipk-gatersleben.de/misa,
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18 (11): 1851-1858. 10.1101/gr.078212.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Delcher AL, Salzberg SL, Phillippy AM: Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatics. 2003, 10: 10-13.Google Scholar
- Twyman RM: SNP genotyping techniques: An overview. Encyclopedia of medical genomics and proteomics. Edited by: Fuchs J, Podda M. 2004, New York: Marcel Dekker Inc, 1202-1207.View ArticleGoogle Scholar
- Akhunov E, Nicolet C, Dvorak J: Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. Theor Appl Genet. 2009, 119 (3): 507-517. 10.1007/s00122-009-1059-5.PubMed CentralView ArticlePubMedGoogle Scholar
- Lai K, Duran C, Berkman PJ, Lorenc MT, Stiller J, Manoli S, Hayden MJ, Forrest KL, Fleury D, Baumann U, et al: Single nucleotide polymorphism discovery from wheat next-generation sequence data. Plant Biotechnol J. 2012, 10 (6): 743-749. 10.1111/j.1467-7652.2012.00718.x.View ArticlePubMedGoogle Scholar
- Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA: Development and mapping of SNP assays in allotetraploid cotton. Theor Appl Genet. 2012, 124 (7): 1201-1214. 10.1007/s00122-011-1780-8.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramchiary N, Lim YP: Genetics of Brassica rapa L. Genetics and genomics of the Brassicaceae. Volume 9. Edited by: Schmidt R, Bancroft I. 2011, New York: Dordrecht Heidelberg, 215-260.View ArticleGoogle Scholar
- Choi SR, Teakle GR, Plaha P, Kim JH, Allender CJ, Beynon E, Piao ZY, Soengas P, Han TH, King GJ, et al: The reference genetic linkage map for the multinational Brassica rapa genome sequencing project. Theor Appl Genet. 2007, 115 (6): 777-792. 10.1007/s00122-007-0608-z.View ArticlePubMedGoogle Scholar
- Li W, Zhang J, Mou Y, Geng J, McVetty PB, Hu S, Li G: Integration of Solexa sequences on an ultradense genetic map in Brassica rapa L. BMC Genomics. 2011, 12: 249-10.1186/1471-2164-12-249.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim JS, Chung TY, King GJ, Jin M, Yang TJ, Jin YM, Kim HI, Park BS: A sequence-tagged linkage map of Brassica rapa. Genetics. 2006, 174 (1): 29-39. 10.1534/genetics.106.060152.PubMed CentralView ArticlePubMedGoogle Scholar
- Gao C, Tang Z, Yin J, An Z, Fu D, Li J: Characterization and comparison of gene-based simple sequence repeats across Brassica species. Mol Genet Genomics. 2011, 286 (2): 161-170. 10.1007/s00438-011-0636-x.View ArticlePubMedGoogle Scholar
- Ramchiary N, Nguyen VD, Li X, Hong CP, Dhandapani V, Choi SR, Yu G, Piao ZY, Lim YP: Genic microsatellite markers in Brassica rapa: development, characterization, mapping, and their utility in other cultivated and wild Brassica relatives. DNA Res. 2011, 18 (5): 305-320. 10.1093/dnares/dsr017.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Sun S, Liu B, Wang H, Deng J, Liao Y, Wang Q, Cheng F, Wang X, Wu J: A sequence-based genetic linkage map as a reference for Brassica rapa pseudochromosome assembly. BMC Genomics. 2011, 12: 239-10.1186/1471-2164-12-239.PubMed CentralView ArticlePubMedGoogle Scholar
- Carpio PDD, Basnet RK, De Vos RCH, Maliepaard C, Paulo MJ, Bonnema G: Comparative methods for association studies: A case study on metabolite variation in a Brassica rapa core collection. PLoS One. 2011, 6 (5): e19624-10.1371/journal.pone.0019624.View ArticleGoogle Scholar
- Kole C, Williams PH, Rimmer SR, Osborn TC: Linkage mapping of genes controlling resistance to white rust (Albugo candida) in Brassica rapa (syn. campestris) and comparative mapping to Brassica napus and Arabidopsis thaliana. Genome. 2002, 45 (1): 22-27. 10.1139/g01-123.View ArticlePubMedGoogle Scholar
- Yu S, Zhang F, Yu R, Zou Y, Qi J, nZhao X, Yu Y, Zhang D, Li L: Genetic mapping and localization of a major QTL for seedling resistance to downy mildew in Chinese cabbage (Brassica rapa ssp. pekinensis). Mol Breed. 2009, 23 (4): 573-590. 10.1007/s11032-009-9257-z.View ArticleGoogle Scholar
- Tanhuanpaa P: Identification and mapping of resistance gene analogs and a white rust resistance locus in Brassica rapa ssp. oleifera. Theor Appl Genet. 2004, 108 (6): 1039-1046. 10.1007/s00122-003-1525-4.View ArticlePubMedGoogle Scholar
- Yu F, Lydiate DJ, Gugel RK, Sharpe AG, SR R: Introgression of Brassica rapa subsp. sylvestris blackleg resistance into B. napus. Mol Breed. 2012, 12: 1-12.Google Scholar
- Cao Z, Tian F, Wang N, Jiang C, Lin B, Xia W, Shi J, Long Y, Zhang C, Meng J: Analysis of QTLs for erucic acid and oil content in seeds on A8 chromosome and the linkage drag between the alleles for the two traits in Brassica napus. J Genet Genomics. 2010, 37 (4): 231-240. 10.1016/S1673-8527(09)60041-2.View ArticlePubMedGoogle Scholar
- Lou P, Zhao J, Kim JS, Shen S, Del Carpio DP, Song X, Jin M, Vreugdenhil D, Wang X, Koornneef M, et al: Quantitative trait loci for flowering time and morphological traits in multiple populations of Brassica rapa. J Exp Bot. 2007, 58 (14): 4005-4016. 10.1093/jxb/erm255.View ArticlePubMedGoogle Scholar
- Ramchiary N, Padmaja KL, Sharma S, Gupta V, Sodhi YS, Mukhopadhyay A, Arumugam N, Pental D, Pradhan AK: Mapping of yield influencing QTL in Brassica juncea: implications for breeding of a major oilseed crop of dry land areas. Theor Appl Genet. 2007, 115 (6): 807-817. 10.1007/s00122-007-0610-5.View ArticlePubMedGoogle Scholar
- Fastx Toolkit. http://hannonlab.cshl.edu/fastx_toolkit,
- BLAST- Basic Local Alignment Search Tool. http://www.ncbi.nlm.nih.gov/blast,
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/Map format and SAM tools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMedGoogle Scholar
- Van Ooijen JW: Join Map® 4 - Software for the calculation of genetic linkage maps in experimental populations. 2006, Wageningen, The Netherlands: Kyazma BVGoogle Scholar
- Pradhan AK, Gupta V, Mukhopadhyay A, Arumugam N, Sodhi YS, Pental D: A high-density linkage map in Brassica juncea (Indian mustard) using AFLP and RFLP markers. Theor Appl Genet. 2003, 106: 607-614.PubMedGoogle Scholar
- Kosambi DD: The estimation of the map distance from recombination values. Ann Eugen. 1944, 12: 172-175.View ArticleGoogle Scholar
- Voorrips RE: Map Chart: software for the graphical presentation of linkage maps and QTLs. J Hered. 2002, 93 (1): 77-78. 10.1093/jhered/93.1.77.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.