Mining of haplotype-based expressed sequence tag single nucleotide polymorphismsin citrus
© Chen and Gmitter; licensee BioMed Central Ltd. 2013
Received: 20 February 2013
Accepted: 22 October 2013
Published: 1 November 2013
Single nucleotide polymorphisms (SNPs), the most abundant variations in agenome, have been widely used in various studies. Detection andcharacterization of citrus haplotype-based expressed sequence tag (EST) SNPswill greatly facilitate further utilization of these gene-basedresources.
In this paper, haplotype-based SNPs were mined out of publicly availablecitrus expressed sequence tags (ESTs) from different citrus cultivars(genotypes) individually and collectively for comparison. There were a totalof 567,297 ESTs belonging to 27 cultivars in varying numbers andconsequentially yielding different numbers of haplotype-based quality SNPs.Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 qualitySNPs in 3,327 out of 4,228 usable contigs. Summed from all the individuallymining results, a total of 25,417 quality SNPs were discovered –15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) weretransversions (AC, GT, CG, and AT), and 1,293 (5.0%) wereinsertion/deletions (indels). A vast majority of SNP-containing contigsconsisted of only 2 haplotypes, as expected, but the percentages of 2haplotype contigs varied widely in these citrus cultivars. BLAST of the25,417 25-mer SNP oligos to the Clementine reference genome scaffoldsrevealed 2,947 SNPs had “no hits found”, 19,943 had 1 unique hit/ alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955(98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minorscaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotideidentities, accounting for 93% of all the alignments. Considering almost allthe nucleotide discrepancies in the 24/25 alignments were at the SNP sites,it served well as in silico validation of these SNPs, in additionto and consistent with the rate (81%) validated by sequencing and SNaPshotassay.
High-quality EST-SNPs from different citrus genotypes were detected, andcompared to estimate the heterozygosity of each genome. All the SNP oligosequences were aligned with the Clementine citrus genome to determine theirdistribution and uniqueness and for in silico validation, inaddition to SNaPshot and sequencing validation of selected SNPs.
Single nucleotide polymorphism (SNP) refers to an allelic single-base variationbetween two haplotype sequences in an individual or between any paired homologouschromosomes across homogenous members. SNPs are most abundant among genomic DNAvariations and ubiquitous in both functional genes and non-coding regions . Because they are conserved during evolution, associated with genetictraits, and suited for high throughput genotyping, SNPs are a popular and powerfultool for various genetics and genomics studies, such as mapping of whole genomes,tagging of important traits, comparison of genome evolution, classification ofdiverse clades, and many rapidly developing areas such as pharmacogenomics andfunctional proteomics [2–4]. These SNPs from expressed sequence tags (ESTs) represent hundreds ofthousands of functional genes and likely control many genetic traits [5–8]. Due to degeneracy of most three-nucleotide genetic codons, a SNP in thecoding regions may be synonymous (sSNP) if it does not result in change of theprotein sequence or non-synonymous (nsSNP) if it does. The nsSNPs are usually morebiologically relevant because the resulting amino acid changes in proteins maychange their secondary structures and functions and cause phenotypic mutations [1, 8, 9].
SNP discovery usually is accomplished through computational alignment of redundantDNA sequences with each other or with a high-quality reference genome wherediscrepant nucleotides can be detected and evaluated. For the redundancy-basedcomputational approach, in addition to sequencing errors as a source of false SNPs [5, 7, 10], it may be even more challenging to distinguish real SNPs among allelicsequences from single nucleotide discrepancies among highly identical paralogoussequences [8, 11]. Several bioinformatics programs (pipelines) have been developed forautomatic SNP mining, using different input data, computational algorithms, qualityevaluation strategies, and/or output formats. For example, the PolyPhred andPolyBayes pipeline typically requires sequence trace files or extracted sequenceswith base calling quality values to minimize false SNPs resulting from sequencingerrors [12–14]. PolyBayes also includes an extra implementation to identify paralogs andtheir derived false SNPs . Others like autoSNP and QuailitySNP can accept sequences without qualityfiles for initial redundancy-based detection, and then grade SNPs by confidencelevels, which are more commonly used with public ESTs that usually do not have traceor quality files [8, 15]. The QualitySNP pipeline implements a haplotype reconstruction algorithmand confidence scoring approach to detect reliable synonymous and non-synonymousSNPs from public ESTs without quality files and a reference genome . In other words, it re-clusters ESTs in a contig to determine thepotential haplotypes in the contig. Only single discrepant nucleotides between anytwo reconstructed haplotypes would be scored a potential SNP. Sequencing differencescan also result from sequencing errors or alignment of paralogs. Only thosepotential SNPs passing additional confidence interrogation are identified as qualitySNPs. Reliable quality SNPs represent the different alleles (haplotypes) of a gene.As opposed to low-confidence and false SNPs, the use of quality SNPs can benefitallele-trait association studies .
Most citrus species are diploid (2n = 2× = 18), withhighly heterozygous and relatively small genomes and over 30,000 predicted genes . In general, citrus refers to true biological species and ancestrallydomesticated introgressions in Citrus and those in the sexually compatibleFortunella (kumquat) and Poncirus (trifoliate orange) genera.Citrus fruit types are diverse, and include sweet orange (Citrus sinensis),mandarin (C. reticulata), grapefruit (C. paradisi), lemon (C.limon), lime (C. aurantifolia), pummelo (C. maxima), andcitron (C. medica). Each type consists of many cultivars primarily selectedfrom spontaneous bud sports, chance seedlings, induced mutants, or conventionalhybrids. It is widely believed that only C. maxima, C. reticulata,and C. medica are true species, although the binomial names for the otherancestral hybrid and introgression cultivars are widely accepted and used [17, 18]. These citrus types likely vary in levels of heterozygosity and sharealleles resulting from early introgressions across these genomes, according to SSRmarkers [19–21]. A haploid Clementine genome sequence was produced using Sangertechnology, and one diploid sweet orange genome using Roche 454 technology , along many other citrus genomes using other re-sequencing platforms(Gmitter et al. unpublished data). Together with other available citrus genomicresources, it is now possible for SNP detection and comparison of large-volumecitrus Sanger EST datasets within and among different citrus cultivars. Thesegene-based SNPs, once available for the citrus community, will be very valuable inmany genetic and genomic studies, and helpful for trait-targeted breeding as well [20, 21, 23].
In this paper, SNPs in public ESTs from 27 different citrus genotypes were detectedby the QualitySNP pipeline and compared to estimate the heterozygosity of eachgenome. All of the short SNP oligo sequences were also aligned with the Clementinecitrus genome to determine their distribution and uniqueness in the genome and forin silico validation. Selected SNPs were also validated by SNaPshot andsequencing.
Citrus ESTs and cultivars
Public ESTs in citrus cultivars/biotypes
C. reticulata x C. temple
C. clementina x C. reticulata
C. nobilis x C. kinokuni
Kankitsu Chukanbohon Nou 6 Gou tangor
C. sinensis x C. reticulata
Rangpur lime, Mandarin lime
Palestine Sweet lime
Sour orange, Bitter orange
C. paradisi x P. trifoliata
C. sinensis x P. trifoliata
SNP discovery and primer design
The QualitySNP pipeline was installed and used for SNP discovery, following theprogram manual and recommended parameters . QualitySNP first identified haplotypes in a contig by re-clusteringits ESTs and extracted all nucleotide discrepancies (called potential SNPs,pSNPs) between identified haplotypes in a contig, from which a subset ofso-called quality SNPs (qSNPs) was identified based on allele and SNP confidencescores defined in the haplotype-based mining algorithm . These qSNP-containing contigs and 25-mer oligo sequences, along withmuch other mining information, were saved in separate files for databaseconstruction and result summary. The ratios of qSNP/pSNP were calculated toindicate the percentage of nucleotide discrepancies (pSNPs) identified ashigh-qaality SNPs (qSNPs) by the QualitySNP algorithm. Bioinformatics programsincluded in the pipeline were cross_match in the phred-phrap-consed package [24, 25] to remove vectors, CAP3  to assemble ESTs, FASTY  to align ESTs to the proteins in the Uniprot database foridentification of non-synonymous and synonymous SNPs. BatchPrimer3  was used to design a forward (F), a reverse (R), and a single baseextension (SBE) primer flanking each SNP site. The F, R and SBE primers of 96SNPs from SO were selected for both sequencing and SBE genotyping validation(Additional file 2). After sorting by the lengths ofSBE primers, except the first, the other 7 primers of every 8 SBE primers weretailed in the 5’ end with three groups of non-homologous polynucleotidesof different lengths to facilitate future multiplex genotyping application. Allthe F, R and tailed SBE primers, 96 each, were synthesized by Eurofins MWGOperon (Huntsville, Al) in a 96-well plate, respectively, where every threeprimers of each SNP were placed in the same well of the three different platesand stored in ddH2O at 10 μM. The format facilitated easyprimer positioning and channel pipetting during the genotyping and sequencingpreparation.
SNP 25-nucleotide sequence blast
All 25-nucleotide oligo sequences (SNP in the middle nucleotide) generated fromevery citrus genotype by QualitySNP were combined together and used to align tothe haploid Clementine reference genome (version 1.0; phytozome.org andcitrusgenomedb.org) using BLASTN  and a cut-off e-value of 6e-004 (0.0006). Each query sequence (25-meroligo) against the subject scaffolds would yield either of the following BLASTNoutputs, “no hits found”, 1 hit on 1 scaffold with 1 alignment, orany other cases (i.e., 1 hit on 1 scaffold with 2+ alignments at differentpositions or 2+ hits on different scaffolds with 1+ alignment each hit). At thepreset e value, only alignments with 84% identities and higher (in other words,only 6 types of alignment hits: 25/25, 24/25, 24/24, 23/23, 22/22, and 21/21),were saved in the BLASTN output file. The information in the output file,including the scaffold, position, strand, e value, score, alignment identitiesof each hit, and hit status, was parsed into an EXCEL file to summarize SNPalignment status and to calculate distribution on the Clementine referencegenome scaffolds. The information was also used as additional criteria forcategorization of SNPs and selection of desired core sets.
SNP validation by sequencing and SNaPshot genotyping assay
BigDye Terminator V3.1 Cycle Sequencing Kit and SNaPshot Multiplex Kit (AppliedBiosystems, Foster City, CA) were used to validate SNPs, following themanufacturer’s protocols with some modifications in reaction volumesand/or quantity of proprietary reagents. 96-well plates were used for PCR,enzymatic incubation, and denaturation on iCycler (Bio-Rad, Hercules, CA) and/orGeneAmp PCR System 9700 (Applied Biosystems, Foster City, CA), and forgenotyping and sequencing on 3130xl Genetic Analyzer (Applied Biosystems, FosterCity, CA). Unless otherwise stated, brief centrifugation up to 1000 rpm inJuan MR 23i was applied after addition of a solution or before implementation ofnew steps, and all the PCR and enzymatic incubation programs were set to hold at4°C indefinitely at the end until a next procedure.
For both dye terminator sequencing and SNaPshot assays to validate SNPs, templatepreparation was carried out in 10 μl in each well consisting of3.3 μl ddH2O, 1.0 μl 10x dNTPs (2 mM),2.0 μl 5x colorless GoTaq Flexi buffer, 0.8 μl 25 mMMgCl2, 0.4 μl F and R primers each,0.1 μl GoTaq Flexi (5 units per μl Promega, Madison, WI), and2 μl genomic DNA (10 ng/μl). The touch-down PCR programstarted from an initial denaturation at 94°C for 3 min, followed by10 cycles of 93°C for 30 sec, 56°C for 45 sec(decreasing 0.5°C each annealing step), 72°C for 45 sec, and 30continuing cycles with 51°C at the annealing step, plus a final elongationat 72°C for 15 min. Removal of primers and unused dNTPs was performedby addition of 1 μl of ExoISAP-IT (Affymetrix, Santa Clara, CA) intoeach well of the plate, and incubation at 37°C for 60 min and75°C for 15 min.
Sequencing reactions for SNP validation were prepared in 10 μl in eachwell of a new plate including 2 μl 5x sequencing buffer,2 μl ready reaction premix in the sequencing kit, 1 μl10 μM SNP F primer, and 5 μl ExoSAP-IT treated PCR product,started at 95°C for 1 min, followed by 25 thermal cycles of 95°Cfor 10 sec, 50°C for 5 sec, and 60°C for 4 min.Following the manufacturer’s instructions, ethanol/EDTA/sodium acetateprecipitation was used to purify the sequencing product in the plate, which wassubsequently air dried, then mixed with 2 μl ddH2O and6 μl Hi-Di formamide in each well, denatured, and loaded to thegenetic analyzer to sequence. The sequence files generated were analyzed bySequencing Analysis software (Applied Biosystems, Foster City, CA) to generatesequences and electropherograms, in which a validated SNP was confirmed bycorrect alignment of SBE primer sequence into the corresponding sequences andvisualization of two different overlapped nucleotide peaks at the nucleotidesite in the electropherograms.
The SBE reaction for SNaPshot assays was prepared in 5 μl in each wellin a new plate including 0.5 μl ready reaction premix in the SNaPshotkit, 1 μl SBE 10 μM primer, and 3.5 μl ExoSAP-ITtreated PCR product, and repeated in 25 thermal cycles of 95°C for10 sec, 50°C for 5 sec, and 60°C for 30 sec. Removal ofunincorporated dye-labeled ddNTPs was completed by addition of 5 μlSAP mix (3.5 μl ddH2O, 1.0 μl 10x SAP buffer, and0.5 μl 1u/μl SAP) into the SBE reaction mix, and incubation at37°C for 60 min and 75°C for 15 min. Genotyping wasperformed using 8 μl mix in each well of a new plate consisting of1 μl SAP treated SBE product, 0.25 μl GeneScan 120 LIZ sizestandard, and 6.75 μl Hi-Di formamide, which was denatured at95°C 3 min then immediately moved on ice for at least 2 min. TheSNaPshot files were used to score SNPs by GeneMarker (SoftGenetics, StateCollege, PA) in which a validated SNP consisted of two differentnucleotides.
Haplotype-based EST-SNPs in citrus cultivars
For individual cultivars, their numbers of ESTs were different, soconsequentially were their quality SNPs and other related numbers. For example,in SO, 213,830 ESTs yielded 7,404 contigs of >=4 ESTs. Of these, 4,228contigs contained 43,655 potential SNPs and 3,327 contained qSNPs. The totalnumber of qSNPs was 11,182. In other words, there was only one haplotypedetected in 3,176 contigs (7,404 minus 4,228) and no quality SNP identified inthe additional 1,001 contigs (4,428 minus 3,327) with potential SNPs. There were3.4 quality SNPs per contig and one quality SNP per 723 bp in the contigson average. Of these 11,182 qSNPs, 6,822 (61.0%) were transitions (AG and CTtype), 3,879 (34.7%) transversions (AC, GT, CG, and AT type), and 481 (4.3%)insertion/deletion (Indels); and 2,619 (23.4%) were nsSNPs and 4,038 (36.1%)were sSNPs. The absolute numbers of quality SNPs were not comparable due tovarying numbers of ESTs among citrus cultivars, but the number of potential andquality SNPs from each cultivar were strongly correlated with its number ofESTs; more ESTs yielded more usable contigs (>=4 ESTs) available for SNPmining, as well as more quality SNPs (Additional file 1). Given the large differences in the numbers of ESTs availableamong the various cultivars, it is more interesting to compare SNP frequencies,rates, and ratios among cultivars with substantial EST numbers and distinctgenetic backgrounds, and differences between the mining results of the threegrouped ESTs (M12, L7, and C27) and the three sums/averages (SM12, SL7, andSC27) of separately mined counterpart individuals. These comparisons will beelaborated hereafter.
Haplotypes detected in contigs with SNPs
Alignment and distribution on the Clementine reference genome
BLASTN results of 25,417 25-mer oligo sequences
No hits found
1 hit (1 aln)
1 hit (2+ aln)
2+ hits (1+ aln each hit)
SNP validation by sequencing and SNaPshot genotyping assay
Of the 96 randomly selected sweet orange SNPs, 68 were validated by sequencingand 74 by SNaPshot in sweet orange (Additional file 4). There were 61 validated by both assays and the remainder validatedby only one assay. In other words, 7 were validated by only sequencing butfailed in SNaPshot, and 13 by only SNaPshot but failed in sequencing. Therefore,a total of 81 SNPs (84%) were validated by at least one of the two assays. Thehigh rate (84%) of validated SNPs was consistent with 93% alignments onto thereference genome with 100% (25/25) or 96% (24/25) identities (Table 2), indicating that QualitySNP, a haplotype-based SNP miningalgorithm and pipeline, is a very reliable tool to identify true EST SNPs, andit can effectively minimize the false discovery rate even without qualityfiles.
Estimation of heterozygosity of different citrus genomes by haplotype-basedSNPs
Many naturally evolved genomes are heterozygous, and the heterozygosity level maybe evaluated by the rate of allelic nucleotide variations between the twohaplotypes . SNPs, the most abundant polymorphisms in genomes, likely are themost appropriate index for the heterozygosity levels ofgenetically/taxonomically related genomes [19, 21, 22]. Given the different numbers and rates of haplotype-based SNPsdiscovered from these citrus individuals with substantial numbers of ESTs (forexample more than 5,000, Additional file 1), theratios of qSNPs/ESTs in most of them appeared reflective of their heterozygousstatus and genetic background. These hybrid derivatives had much higherqSNPs/ESTs ratio, while the other believed “pure” species had lowerratios. For example, some proven natural hybrid cultivars, such as SO, CM, andrecent hybrids such as SC, were among the higher qSNPs/ESTs ratios (SO - 5.23%,CM - 8.31%, and SC - 7.76%). Other presumed true species, including PM, fell inthe lower qSNP/ESTs ratios (PM - 0.60%). The number of needed ESTs to generatethe desired number of SNPs in given citrus genotypes, and vice versa, can beestimated. Such a tendency, along with the ratios and genome heterozygosity,could be strengthened and would be more conclusive if the numbers of ESTs in allthe cultivars were close to each other, or at least in a much smaller range.
SNP discovery and validation rates
SNP mining is no longer a bottleneck because computational capacity and sequencedata are exponentially increasing, and more SNP mining pipelines have becomeavailable in recent years [7, 8, 12–15, 31]. Hundreds of thousands of SNPs can be easily mined out of EST orgenomic sequences. Inclusion of false SNPs in genotyping certainly is wasteful;therefore, maximizing the true SNP rate (minimizing the false rate) is the mostimportant consideration or requirement for a SNP mining algorithm because anyvalidation approach can only validate these true SNPs, but not false ones [8, 13]. We found that 93% of SNPs identified by the QualitySNP pipeline werealigned onto the reference genome at 25/25 or 24/25 identities, and 81% ofrandomly selected sweet orange SNPs were validated by sequencing and SNaPshotgenotyping. It was undetermined whether the others not aligned at the twoidentity rates, and not validated by sequencing and/or genotyping, were true orfalse SNPs. For example, those failing in sequencing validation might be due toSBE primer sequences not being found (likely an intron in the region), orsequencing failures caused by primers of low quality or in a variable region, orno nucleotide discrepancies at the sites. It was unclear how these SNPs failedin SNaPshot validation; it is speculated some of these SBE primers might beincorrectly positioned, i.e., the singly extended nucleotides may not have beenexactly at the SNP sites. There were a few such cases identified (Chen et al.unpublished data); very likely due to the differences between these consensuscontigs and the original haplotype sequences. On the other hand, only 2haplotypes may exist in a diploid genome. If SNPs were from the contigs withmore than 2 haplotypes, such cases could result from either ESTs mixed fromdiverse genotypes in the same species or highly identical paralogs assembledinto the contigs. Paralogous genes, resulting from genomic duplication andevolving into different functions, are very common in many genomes and remainalmost identical in their conserved regions. ESTs from different paralogousgenes, if assembled into a same unigene, could yield false SNPs that arenon-allelic and useless.
Criteria for selection of citrus core SNP sets
In most cases the discovered SNPs could easily reach a number so large that onlya small portion of them, designated core SNP set, are selected and used ingenotyping to meet the restraints in available budget, desired platform,applications, and other factors [3, 11, 32–34]. These core sets of different numbers (e.g. 384, 1536, or othernumbers) are either required by certain SNP genotyping platforms or optimizedfor particular applications [35–38]. It may be a daunting job, but it is necessary to establish workablecriteria to select any core set of different numbers of SNPs. Based on thiscomplete mining and validation process, several attributes of SNPs can be veryuseful and distinguishing to refine these core sets of different numbers. SNPoligo alignment uniqueness, identity percentage, and distribution in thereference genome, co-existence across different genomes, along with SNP types(nsSNP vs. sSNP, and transition vs. transversion vs. indel) and numbers pergene, should be the main criteria for selection of citrus core SNP sets. Aspointed out, some extra haplotypes might result from paralogs across differentgenome regions. In that case, the resulting SNPs would not be allelic or useful.Whether they mostly were those SNPs that had multiple scaffold hits andalignments remains unclear pending further investigation. Those SNPs from eithercircumstance should be excluded or at least deprioritized for use in genotyping.Selection of SNPs for genotyping could be difficult when different attributes ofSNPs and genotyping platforms are considered. A tool based on these attributesis being developed to achieve the automatic selection of core SNP sets fortargeted applications/platforms [35, 36] and to allow geneticists and molecular breeders to be able to selectand use certain core SNPs of interest from among the thousands discovered [37, 38]. All the SNPs (Additional file 2)identified in this work are being added to a citrus genome database(citrusgenomedb.org). Very recently after this study, another draft genome ofsweet orange was reported, yielding 1.06 million genome-wide SNPs, about 3.6SNPs/kb, which could be an additional valuable resource in SNP applications .
High-quality SNPs in public ESTs from different citrus genotypes were detected by theQualitySNP pipeline and compared to estimate the heterozygosity of each genome. Allthe short SNP oligo sequences were also aligned with the Clementine citrus genome todetermine their distribution and uniqueness in the genome and for in silicovalidation. Selected SNPs were also validated by SNaPshot and sequencing.
We thank Dr. Harm Nijveen for contributing his valuable time to fix a bug in oneC program in the QualitySNP pipeline and Dr. Frank You to modify some code inthe BatchPrimer3 script to meet some primer output requests. The work waspartially supported by grants from the Citrus Research and DevelopmentFoundation (#67, #71), on behalf of the Florida citrus growers.
- Brookes AJ: The essence of SNPs. Gene. 1999, 234 (2): 177-186. 10.1016/S0378-1119(99)00219-X.View ArticlePubMedGoogle Scholar
- Dawson E: New collaborations make pharmacogenomics a SNP. Mol Med Today. 1999, 5 (7): 280-10.1016/S1357-4310(99)01503-8.View ArticlePubMedGoogle Scholar
- Rickert AM, Kim JH, Meyer S, Nagel A, Ballvora A, Oefner PJ, Gebhardt C: First-generation SNP/InDel markers tagging loci for pathogen resistance inthe potato genome. Plant Biotechnol J. 2003, 1 (6): 399-410. 10.1046/j.1467-7652.2003.00036.x.View ArticlePubMedGoogle Scholar
- Han Y, Chagne D, Gasic K, Rikkerink EH, Beever JE, Gardiner SE, Korban SS: BAC-end sequence-based SNPs and Bin mapping for rapid integration of physicaland genetic maps in apple. Genomics. 2009, 93 (3): 282-288. 10.1016/j.ygeno.2008.11.005.View ArticlePubMedGoogle Scholar
- Garg K, Green P, Nickerson DA: Identification of candidate coding region single nucleotide polymorphisms in165 human genes using assembled expressed sequence tags. Genome Res. 1999, 9 (11): 1087-1092. 10.1101/gr.9.11.1087.PubMed CentralView ArticlePubMedGoogle Scholar
- Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, Boyce-Jacino M: Mining SNPs from EST databases. Genome Res. 1999, 9 (2): 167-174.PubMed CentralPubMedGoogle Scholar
- Batley J, Barker G, O'Sullivan H, Edwards KJ, Edwards D: Mining for single nucleotide polymorphisms and insertions/deletions in maizeexpressed sequence tag data. Plant Physiol. 2003, 132 (1): 84-91. 10.1104/pp.102.019422.PubMed CentralView ArticlePubMedGoogle Scholar
- Tang J, Vosman B, Voorrips RE, van der Linden CG, Leunissen JA: QualitySNP: a pipeline for detecting single nucleotide polymorphisms andinsertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics. 2006, 7: 438-10.1186/1471-2105-7-438.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim H, Schmidt CJ, Decker KS, Emara MG: A double-screening method to identify reliable candidate non-synonymous SNPsfrom chicken EST data. Anim Genet. 2003, 34 (4): 249-254. 10.1046/j.1365-2052.2003.01003.x.View ArticlePubMedGoogle Scholar
- Matukumalli LK, Grefenstette JJ, Hyten DL, Choi IY, Cregan PB, Van Tassell CP: Application of machine learning in SNP discovery. BMC Bioinformatics. 2006, 7: 4-10.1186/1471-2105-7-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Mooney S: Bioinformatics approaches and resources for single nucleotide polymorphismfunctional analysis. Brief Bioinform. 2005, 6 (1): 44-56. 10.1093/bib/6.1.44.View ArticlePubMedGoogle Scholar
- Nickerson DA, Tobe VO, Taylor SL: PolyPhred: automating the detection and genotyping of single nucleotidesubstitutions using fluorescence-based resequencing. Nucleic Acids Res. 1997, 25 (14): 2745-2751. 10.1093/nar/25.14.2745.PubMed CentralView ArticlePubMedGoogle Scholar
- Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR: A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999, 23 (4): 452-456. 10.1038/70570.View ArticlePubMedGoogle Scholar
- Montgomery KT, Iartchouck O, Li L, Loomis S, Obourn V, Kucherlapati R: PolyPhred analysis software for mutation detection from fluorescence-basedsequence data. Curr Protoc Hum Genet. 2008, Chapter 7: Unit 7.16.1-21-Google Scholar
- Barker G, Batley J, O'Sullivan H, Edwards KJ, Edwards D: Redundancy based detection of sequence polymorphisms in expressed sequencetag data using autoSNP. Bioinformatics. 2003, 19 (3): 421-422. 10.1093/bioinformatics/btf881.View ArticlePubMedGoogle Scholar
- Gmitter F, Chen C, Machado M, Souza A, Ollitrault P, Froehlicher Y, Shimizu T: Citrus genomics. Tree Genet Genomes. 2012, 8 (3): 611-626. 10.1007/s11295-012-0499-2.View ArticleGoogle Scholar
- Federici C, Fang D, Scora R, Roose M: Phylogenetic relationships within the genus Citrus (Rutaceae) and relatedgenera as revealed by RFLP and RAPD analysis. Theor Appl Genet. 1998, 96: 812-822. 10.1007/s001220050807.View ArticleGoogle Scholar
- Nicolosi E, Deng Z, Gentile A, La Malfa S, Continella G, Tribulato E: Citrus phylogeny and genetic origin of important species as investigated bymolecular markers. Theor Appl Genet. 2000, 100: 1155-1166. 10.1007/s001220051419.View ArticleGoogle Scholar
- Chen C, Zhou P, Choi YA, Huang S, Gmitter FG: Mining and characterizing microsatellites from citrus ESTs. Theor Appl Genet. 2006, 112 (7): 1248-1257. 10.1007/s00122-006-0226-1.View ArticlePubMedGoogle Scholar
- Chen C, Bowman K, Choi Y, Dang P, Rao M, Huang S, Soneji J, McCollum TG, Gmitter F: EST-SSR genetic maps for Citrus sinensis and Poncirustrifoliata. Tree Genet Genomes. 2008, 4 (1): 1-10.View ArticleGoogle Scholar
- Ollitrault P, Terol J, Chen C, Federici CT, Lotfy S, Hippolyte I, Ollitrault F, Berard A, Chauveau A, Cuenca J, Costantino G, Kacar Y, Mu L, Garcia-Lor A, Froelicher Y, Aleza P, Boland A, Billot C, Navarro L, Luro F, Roose ML, Gmitter FG, Talon M, Brunel D: A reference genetic map of C. clementina hort. ex Tan.; citrusevolution inferences from comparative mapping. BMC Genomics. 2012, 13: 593-2164. 10.1186/1471-2164-13-593. 13-593,PubMed CentralView ArticlePubMedGoogle Scholar
- Gmitter FG: The haploid mandarin and diploid sweet orange genome sequences. Plant & Anim Genomes XIX Conference. 2011, W146-[abstract]Google Scholar
- Chen C, Cancalon P, Haun C, Gmitter F: Characterization of furanocoumarin profile and inheritance toward selectionof low furanocoumarin seedless grapefruit cultivars. J Am Soc Hort Sci. 2011, 136: 358-363.Google Scholar
- Gordon D: Viewing and editing assembled sequences using Consed. Curr Protoc Bioinformatics. 2003, Chapter 11: Unit11.2.1-11-Google Scholar
- de la Bastide M, McCombie WR: Assembling genomic DNA sequences with PHRAP. Curr Protoc Bioinformatics. 2007, Chapter 11: Unit11.4.1-7-Google Scholar
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMedGoogle Scholar
- Pearson WR, Wood T, Zhang Z, Miller W: Comparison of DNA sequences with protein sequences. Genomics. 1997, 46 (1): 24-36. 10.1006/geno.1997.4995.View ArticlePubMedGoogle Scholar
- You FM, Huo N, Gu YQ, Luo MC, Ma Y, Hane D, Lazo GR, Dvorak J, Anderson OD: BatchPrimer3: a high throughput web application for PCR and sequencing primerdesign. BMC Bioinformatics. 2008, 9: 253-10.1186/1471-2105-9-253.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database searchprograms. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Moroldo M, Paillard S, Marconi R, Fabrice L, Canaguier A, Cruaud C, De Berardinis V, Guichard C, Brunaud V, Le Clainche I, Scalabrin S, Testolin R, Di Gaspero G, Morgante M, Adam-Blondon AF: A physical map of the heterozygous grapevine 'Cabernet Sauvignon' allowsmapping candidate genes for disease resistance. BMC Plant Biol. 2008, 8: 66-10.1186/1471-2229-8-66.PubMed CentralView ArticlePubMedGoogle Scholar
- Clifford RJ, Edmonson MN, Nguyen C, Scherpbier T, Hu Y, Buetow KH: Bioinformatics tools for single nucleotide polymorphism discovery andanalysis. Ann N Y Acad Sci. 2004, 1020: 101-109. 10.1196/annals.1310.011.View ArticlePubMedGoogle Scholar
- Rahmann S: Fast large scale oligonucleotide selection using the longest common factorapproach. J Bioinform Comput Biol. 2003, 1 (2): 343-361. 10.1142/S0219720003000125.View ArticlePubMedGoogle Scholar
- Jena KK, Jeung JU, Lee JH, Choi HC, Brar DS: High-resolution mapping of a new brown planthopper (BPH) resistance gene,Bph18(t), and marker-assisted selection for BPH resistance in rice(Oryza sativa L.). Theor Appl Genet. 2006, 112 (2): 288-297. 10.1007/s00122-005-0127-8.View ArticlePubMedGoogle Scholar
- Shulaev V, Korban SS, Sosinski B, Abbott AG, Aldwinckle HS, Folta KM, Iezzoni A, Main D, Arus P, Dandekar AM, Lewers K, Brown SK, Davis TM, Gardiner SE, Potter D, Veilleux RE: Multiple models for Rosaceae genomics. Plant Physiol. 2008, 147 (3): 985-1003. 10.1104/pp.107.115618.PubMed CentralView ArticlePubMedGoogle Scholar
- Ahmadi KR, Weale ME, Xue ZY, Soranzo N, Yarnall DP, Briley JD, Maruyama Y, Kobayashi M, Wood NW, Spurr NK, Burns DK, Roses AD, Saunders AM, Goldstein DB: A single-nucleotide polymorphism tagging set for human drug metabolism andtransport. Nat Genet. 2005, 37 (1): 84-89.PubMedGoogle Scholar
- Chagne D, Gasic K, Crowhurst RN, Han Y, Bassett HC, Bowatte DR, Lawrence TJ, Rikkerink EH, Gardiner SE, Korban SS: Development of a set of SNP markers present in expressed genes of theapple. Genomics. 2008, 92 (5): 353-358. 10.1016/j.ygeno.2008.07.008.View ArticlePubMedGoogle Scholar
- Harlizius B, Lopes MS, Duijvesteijn N, van de Goor LH, van Haeringen WA, Panneman H, Guimaraes SE, Merks JW, Knol EF: A single nucleotide polymorphism set for paternal identification to reducethe costs of trait recording in commercial pig breeding. J Anim Sci. 2011, 89 (6): 1661-1668. 10.2527/jas.2010-3347.View ArticlePubMedGoogle Scholar
- Greenawalt DM, Sieberts SK, Cornelis MC, Girman CJ, Zhong H, Yang X, Guinney J, Qi L, Hu FB: Integrating genetic association, genetics of gene expression, and singlenucleotide polymorphism set analysis to identify susceptibility loci fortype 2 diabetes mellitus. Am J Epidemiol. 2012, 176 (5): 423-430. 10.1093/aje/kws123.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, Bertrand D, Jiao WB, Hao BH, Lyon MP, Chen J, Gao S, Xing F, Lan H, Chang JW, Ge X, Lei Y, Hu Q, Miao Y, Wang L, Xiao S, Biswas MK, Zeng W, Guo F, Cao H, Yang X, Xu XW, Cheng YJ, Xu J, Liu JH, Luo OJ, Tang Z, Guo WW, Kuang H, Zhang HY, Roose ML, Nagarajan N, Deng XX, Ruan Y: The draft genome of sweet orange (Citrus sinensis). Nat Genet. 2013, 45 (1): 59-66.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), whichpermits unrestricted use, distribution, and reproduction in any medium, provided theoriginal work is properly cited.