Sequence based polymorphic (SBP) marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome
© Sahu et al; licensee BioMed Central Ltd. 2012
Received: 19 September 2011
Accepted: 13 January 2012
Published: 13 January 2012
Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs) for any genomic regions. Here a sequence based polymorphic (SBP) marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described.
A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0) was obtained by applying Illumina's sequencing by synthesis (Solexa) technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0) genome sequence identified putative single nucleotide polymorphisms (SNPs) throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated.
The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for cloning genes based on their genetic map positions and identifying tightly linked molecular markers for selecting desirable genotypes in animal and plant breeding experiments.
KeywordsNiederzenz Solexa sequencing sequence based polymorphic marker nonhost resistance Phytophthora sojae SHORE analysis
Discovery of molecular markers has facilitated mapping of both qualitative and quantitative traits. Tightly linked molecular markers facilitate (i) isolation of the genes encoding these traits and (ii) selection of genotypes carrying the desirable alleles. Several molecular marker technologies such as, RFLP, RAPD, DAF, SSR, SSLP, AFLP, CAPS, SNP have been discovered for molecular mapping experiments [1–6]. Fingerprinting of genotypes for restriction fragment length polymorphisms (RFLPs) has been regarded as the most sensitive method of genotyping. This procedure, however, requires a large quantity of genomic DNA and use of radioactive probes. In the random amplified polymorphic DNA (RAPD) marker technology, multiple random loci of the genomes are PCR amplified with a single, 10 nucleotide long primer of arbitrary sequence . In DNA amplification fingerprinting (DAF), many loci are PCR amplified with the aid of a single, short arbitrary primer, as short as 5-nucleotides long . Simple sequence repeat (SSR) markers, also known as microsatellite markers, utilize the variation for tandem repeats such as (CA)n repeats observed between genotypes . Simple sequence length polymorphism (SSLP) markers, similar to SSR markers, are designed based on a unique segment of genomic DNA sequence that contains a simple tandem repeat that distinguishes the genotypes. In Arabidopsis, SSLPs are largely based on the (GA)n repeats . Cleaved amplified polymorphic sequences (CAPS) markers are designed based on restriction fragment length polymorphisms of PCR amplified fragments, when sequence information of one of the haplotypes is unknown .
The high-throughput amplified fragment length polymorphism (AFLP) marker technology combines principles of RFLP and random PCR amplification for rapid identification of molecular loci of the entire genome . AFLP technology is particularly suitable for developing high density molecular marker maps, essential for both map-based cloning of genes and the isolation of molecular markers for selecting desirable genotypes in breeding programs. AFLP technology identifies molecular markers based on a fraction of the restriction fragment length polymorphisms between two genotypes. Restriction site associated DNA (RAD) marker technology, on the other hand, generates markers for all polymorphic sites of a restriction endonuclease between two genotypes; and thus, it is a very sensitive marker technology for developing a high density molecular map .
Polymorphisms detected by various marker technologies have been used to generate molecular marker maps of those species that do not have any genome sequences and physical maps. Since assembled genome sequence of many species are available, and the cost of sequencing has declined significantly with advent of the next generation sequencing technologies, single nucleotide polymorphism (SNP) is becoming the most popular molecular marker [10–12]. However, SNP assays are not always simple or flexible. Here, a strategy of using SNPs for rapid generation of molecular markers, termed sequence based polymorphic (SBP) marker technology, is described.
The assembled Arabidopsis thaliana genome sequence is selected for this study . Many of the ecotypes of this species are available and have been used in mapping experiments to conduct genetic and biological studies. SNPs among some of the accessions or ecotypes of this model plant species are available [14-15; at http://www.arabidopsis.org]. Niederzenz-0 (Nd-0), used for mapping the Phytophthora sojae s usceptible (pss) mutants that are infected by the soybean pathogen, P. sojae (R. Sumit, B.B. Sahu and M.K. Bhattacharyya, unpublished), was selected for this study. The pss mutants were created in the pen1-1 mutant of the ecotype, Columbia-0 (Col-0). To facilitate mapping of the putative PSS gene loci conferring nonhost resistance of Arabidopsis against P. sojae, SBP markers were developed as follows. Seventy-five nucleotide long sequencing reads obtained by conducting Solexa sequencing of the Nd-0 genome were compared to Col-0 sequences to identify the SNPs, which were subsequently converted to SBP markers if either of the ecotypes was cut by at least one restriction endonuclease at the SNP sites. By applying this technology, 21 co-dominant SBP markers were generated for the marker-poor regions of the Arabidopsis genome. This novel SBP marker technology should be applicable to any higher eukaryotic species with assembled genome sequences for rapid development of high density molecular marker maps for map-based cloning of genes or identification of suitable molecular markers for selection of desirable genotypes in breeding programs.
Generation of a global molecular map for the polymorphic loci of the Arabidopsis thaliana ecotypes, Col-0 and Nd-0
List of CAPS markers polymorphic between Arabidopsis ecotypes Col-0 and Nd-0
Rsa I, Tsp 509I
Taq I, Tsp 509I
Rsa I, Tsp 509I
Taq I, Rsa I, Tsp 509I
Taq I, Rsa I, Tsp 509I
Generation of SBP markers for saturating a global genome map in Arabidopsis thaliana
The global genome map of SSLP and CAPS was marker poor in some genomic regions (Additional file 1). In order to fill out some of the marker poor regions, single nucleotide polymorphism (SNP)-based molecular markers were generated as follows. First, the Nd-0 genome was sequenced in an Illumina/Solexa genome Analyzer II (GAII) at the DNA facility, Iowa State University. Three genome equivalents of Nd-0 sequence in 75 bp reads then were analyzed to discover SNPs (Accession No. SRA048909.1) between Col-0 and Nd-0 by conducting reference guided sequence analysis for all five chromosomes with the aid of the SHORE program .
Primers and restriction enzymes used in generating 21 SBP markers
The use of molecular markers has gained importance in genetic studies particularly for map based cloning of genes . The relatively low cost of sequencing a genome, with the emergence of high throughput sequencing technology, has facilitated genome wide polymorphism studies [18, 19]. The SBP marker technology can convert most of the single nucleotide polymorphisms to molecular markers for any genomic regions. SBP markers developed based on sequence information are ideal for those species, whose genomes are sequenced and assembled. Reference genome sequence can be utilized to develop SBP markers for a specific genomic region with known physical location. Thus, marker-poor regions can be enriched with SBP markers. In this study, the applicability of the SBP marker technology for generating markers is shown for improving a genetic map that represents polymorphisms between two Arabidopsis ecotypes, Col-0 and Nd-0 (Figure 4). SBP markers were generated from just three genome equivalents Nd-0 genome sequence of 75 bp Solexa reads. The method also has been successfully applied in developing a high density molecular map of the PSS1 gene that confers nonhost resistance against the soybean pathogens, Phytophthora sojae and Fusarium virguliforme (R. Sumit, B.B. Sahu and M.K. Bhattacharyya, unpublished).
The SHORE program used in this study is highly powerful and has been employed successfully in identification of a mutation through analysis of deep sequence data of a bulk of 500 mutant F2 progenies . If the genome sequencing is not conducted to a higher depth (e.g. ≥ 20 fold), SNPs identified through SHORE analyses can be verified by conducting BLAST analyses. Staggered Solexa sequence reads (Figure 2) containing SNPs are considered for generating SBP markers for such a scenario. Similarly, candidate SNP containing regions of the reference genome should be supported by multiple sequences, such as transcript sequences and/or sequences from more than one BAC clone to avoid any possible sequencing errors (Additional file 4).
If none of the haplotypes of interest are sequenced, then reference genome sequence should be used to define the SNP maps of individual haplotypes by running the SHORE program. The SNP maps then can be compared to determine the SNPs between the haplotypes of interest. Once the candidate SNPs are identified, small PCR amplicons of ~ 200 bp can be amplified and digested with suitable restriction endonuceases to release the restriction length polymorphisms. A significant proportion of the SNPs could be unusable in SBP marker development because they may not be digested with restriction endonucleases in a haplotype- or genotype-specific manner. In such a case, one can apply d erived CAPS (dCAPS) technology to improve the efficiency of SBP marker development .
A new molecular marker technology, based on genome sequence and physical map locations, is reported for those species whose assembled genome sequences are available. The technology was applied in identifying 21 SBP markers for some of the marker-poor genomic regions of the Arabidopsis molecular marker map that represent polymorphisms between ecotypes, Col-0 and Nd-0 (Figure 4). The SBP marker technology should be applicable to any genomic regions and will facilitate (i) map-based cloning genes as well as (ii) the development of tightly linked molecular markers for selecting desirable genotypes in animal and plant breeding experiments.
Ease in SBP marker development and application to any genomic regions, and genome-wide abundance of SNPs make this technology suitable for mapping experiments, especially to develop high density molecular maps for positional gene cloning experiments, if the assembled genome sequence and physical maps of the studied species are available. Innumerable SBP markers can be developed rapidly for a genomic region containing a target gene in a map-based cloning experiment. Co-dominant gel-based SBP markers are ideal to identify genetic recombination events between two loci. Such PCR-based markers can be used to screen a large number of segregants to identify informative recombinants of the target gene region. These recombinants will then facilitate the development of high resolution maps of a large number of SBP markers, essential for cloning genes based on their map position. Thus, high-throughput deep sequencing, together with SBP markers, should expedite map-based cloning in higher eukaryotes.
Plant materials and growth conditions
Seeds of Arabidopsis thaliana ecotypes, Col-0 and Nd-0, were sown on LC1 soil-less mixture (Sun Gro Horticulture, Bellevue, WA) under 16 h light/8 h dark regime at 21°C with approximately 60% relative humidity. The light intensity was maintained at 120-150 μE/m2/s . Ten days after sowing, the seedlings were transplanted in LC1 mixture. The newly transplanted seedlings were covered with humidity domes for two days and thereafter watered every fourth day. A fertilizer mixture of 15:15:15::N:P:K (1% concentration v/v) was applied to the seedlings seven days after transplantation.
DNA preparation and the whole genome sequencing
Genomic DNA was extracted from Arabidopsis by the CTAB method . Either young inflorescence or a rosette leaf was selected for DNA extraction. The Nd-0 genome was sequenced in a Solexa, Illumina sequencing platform at the DNA facility, Iowa State University. The 75 bp Solexa Nd-0 reads were saved as the gsNd database (Accession No. SRA048909.1) for further studies.
Analysis of the raw reads from Solexa Sequencing
The raw 75 bp Solexa reads of the gsNd database were analyzed by the mapping algorithms, Efficient Large scale Alignment of Nucleotide Databases (ELAND), which is built in with the Solexa sequence analysis pipeline of the Illumina sequencer . This program can match a large number of reads against a reference genome sequence; e.g., in this study the Arabidopsis Col-0 genome sequence was used as the reference genome. In order to identify the SNPs from the entire Arabidopsis genome (NCBI_SS#478443777 through 428555842), the 75 bp Solexa sequence reads of Nd-0 were compared to the assembled Col-0 genome sequence (version TAIR10) (ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/) by running the SHORE program . The gsNd database also was used for conducting the BLASTN (bl2seq) search for polymorphic sequences of the marker poor genomic regions.
SSLP and CAPS markers polymorphic between Col-0 and Nd-0
Candidate SSLP and CAPS markers available from the TAIR database were selected to cover the entire genome. Sequence information of primers for SSLP markers were obtained from Bell and Ecker  and the Arabidopsis Information Resource (TAIR) database (http://www.arabidopsis.org). The chromosome map tool function available at the TAIR database (http://www.arabidopsis.org/jsp/ChromosomeMap/tool.jsp) was used to map the physical locations of the markers that showed polymorphisms between the two accessions.
PCR conditions and digestion with restriction endonucleases
The final DNA concentration in PCR was 20 ng/μl. The PCR mixtures contained 2 mM MgCl2( Bioline, Taunton, MA), 0.25 μM each of forward and reverse primer, 2 μM dNTPs and 0.5 U Choice Taq polymerase (Denville Scientific, Inc., Metuchen, NJ). For SBP or SSLP, PCR was conducted at 94°C for 2 min, and then 40 cycles of 94°C for 30 s, 50°C or 55°C for 30 s and 72°C for 30 s. Finally, the mixture was incubated at 72°C for 10 min. For CAPS markers, PCR was conducted at 94°C for 2 min, and then five cycles of 94°C for 30 s followed by decreasing annealing temperatures from 55°C to 50°C (-1°C/cycle) and 72°C for 1 min. Then 35 cycles of 94°C for 30 s, 50°C for 30 s, and 72°C for 1 min were conducted. Finally, the reaction mixtures were incubated at 72°C for 10 minutes. PCR was carried out in PTC-100 Programmable Thermal Controllers (MJ Research Inc., Waltham, MA). The amplified products were resolved on a 4% (w/v) agarose gel at 8 V/cm. Amplified CAPS and SBP products were digested with the respective restriction enzymes following manufacturer's protocols. The ethidium bromide stained PCR products were visualized by illuminating with UV light.
We thank Steve Rodermel and Yanhai Yin for providing us primers for some of the SSLP and CAPS markers, respectively. We thank Jordan Baumbach, Catherine Brooke, David Grant and Reid Palmer for critically reviewing the manuscript. This work was supported by a grant from the Consortium for Plant Biotechnology Research (CPBR) and the Iowa Soybean Association.
- Zabeau M, Vos P: Selective restriction fragment amplification: a general method for DNA fingerprinting. European Patent Office. 1993, Publication 0534858 A1, bulletin 93/13Google Scholar
- Botstein D, White Rl, Skolnick M, Davis OW: Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980, 32: 314-331.PubMed CentralPubMedGoogle Scholar
- Williams JGK, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV: DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res. 1990, 18: 6531-6535. 10.1093/nar/18.22.6531.PubMed CentralView ArticlePubMedGoogle Scholar
- Kolchinsky AM, Funke RP, Gresshoff PM: DAF-amplified fragments can be used as markers for DNA from pulse field gels. Biotechniques. 1993, 14: 400-403.PubMedGoogle Scholar
- Pindo M, Vezzulli S, Coppola G, Cartwright D, Zharkikh A, Velasco R, Troggio M: SNP high-throughput screening in grapevine using the SNPlex™ genotyping system. BMC Plant Biol. 2008, 8: 12-10.1186/1471-2229-8-12.PubMed CentralView ArticlePubMedGoogle Scholar
- Konieczny A, Ausubel FM: A procedure for mapping Arabidopsis mutations using co-dominant ecotype-specific PCR-based markers. Plant J. 1993, 4: 403-410. 10.1046/j.1365-313X.1993.04020403.x.View ArticlePubMedGoogle Scholar
- Weber JL, May PE: Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Am J Hum Genet. 1989, 44: 388-396.PubMed CentralPubMedGoogle Scholar
- Bell CJ, Ecker JR: Assignment of 30 microsatellite loci to the linkage map of Arabidopsis. Genomics. 1994, 19: 137-144. 10.1006/geno.1994.1023.View ArticlePubMedGoogle Scholar
- Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA: Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 2007, 17: 240-248. 10.1101/gr.5681207.PubMed CentralView ArticlePubMedGoogle Scholar
- Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML: Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet. 2011, 12: 499-510. 10.1038/nrg3012.View ArticlePubMedGoogle Scholar
- Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011, 12: 443-451. 10.1038/nrg2986.PubMed CentralView ArticlePubMedGoogle Scholar
- Barbazuk WB, Schnable PS: SNP discovery by transcriptome pyrosequencing. In Meth Mol Biol. 2011, 729: 225-246. 10.1007/978-1-61779-065-2_15.View ArticleGoogle Scholar
- The Arabidopsis Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA: Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007, 317: 338-342. 10.1126/science.1138632.View ArticlePubMedGoogle Scholar
- Zeller G, Clark RM, Schneeberger K, Bohlen A, Weigel D, Ratsch G: Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays. Genome Res. 2008, 18: 918-929. 10.1101/gr.070169.107.PubMed CentralView ArticlePubMedGoogle Scholar
- Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D: Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 2008, 18: 2024-2033. 10.1101/gr.080200.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Jander G, Norris SR, Rounsley SD, Bush DF, Levin IM, Last RL: Arabidopsis map-based cloning in the post-genome era. Plant Physiol. 2002, 129: 440-450. 10.1104/pp.003533.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang W, Marth G: EagleView: A genome assembly viewer for next-generation sequencing technologies. Genome Res. 2008, 18: 1538-1543. 10.1101/gr.076067.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Lister R, Gregory BD, Ecker JR: Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr Opin Plant Biol. 2009, 12: 107-118. 10.1016/j.pbi.2008.11.004.PubMed CentralView ArticlePubMedGoogle Scholar
- Schneeberger K, Ossowski S, Lanz C, Juul T, Petersen AH, Nielsen KL, Jorgensen J-E, Weigel D, Andersen SU: SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat Methods. 2009, 6: 550-551. 10.1038/nmeth0809-550.View ArticlePubMedGoogle Scholar
- Neff MM, Neff JD, Chory J, Pepper AE: dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J. 1998, 14: 387-392. 10.1046/j.1365-313X.1998.00124.x.View ArticlePubMedGoogle Scholar
- Weigel D, Glazebrook J: Arabidopsis: A Laboratory Manual. 2002, Cold Spring Harbor Lab PressGoogle Scholar
- Lukowitz W, Gillmor CS, Scheible W-R: Positional cloning in Arabidopsis. Why it feels good to have a genome initiative working for you. Plant Physiol. 2000, 123: 795-806. 10.1104/pp.123.3.795.PubMed CentralView ArticlePubMedGoogle Scholar
- Smith A, Xuan Z, Zhang M: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics. 2008, 9: 128-10.1186/1471-2105-9-128.PubMed CentralView ArticlePubMedGoogle Scholar
- Ossowski S, Schneeberger K, Lucas-Lledó JI, Warthmann N, Clark RM, Shaw RG, Weigel D, Lynch M: The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science. 2010, 327: 92-94. 10.1126/science.1180677.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.