Nine L. culinaris and two L. ervoides genotypes, sourced from a wide geographical range, were used for SNP discovery (Table 1). A single plant of each genotype was selfed to produce sufficient seed to plant to generate tissues for library construction.
CDC Redberry  was used as the reference genome since it was the first small red lentil cultivar to combine resistance to both ascochyta blight and anthracnose, excellent lodging tolerance and high yield. It has a diverse genetic origin and has been a key genotype in the Canadian red lentil breeding program since 2005. It was the recurrent parent for backcrossing imidazolinone tolerance to create the cultivar CDC Maxim which now accounts for more than 75% of red lentil production in North America.
CDC Robin and 964a-46 are the parents of the recombinant inbred line (RIL) population LR-18 that was used for genetic mapping. CDC Robin is a small red cultivar released in 1999  and 964a-46 is a large green breeding line from the lentil breeding program at the Crop Development Centre (CDC), University of Saskatchewan. The RILs were bulked at F8 and a set of 147 lines has been grown and phenotyped extensively for the past five years. Lines were at F12 or higher. DNA was extracted from freeze-dried leaf tissue collected from at least five plants of each genotype using a modified CTAB method .
3′anchored cDNA library construction and sequencing
Five kinds of tissue samples were collected individually: (1) 2-week old leaf, (2) stem before flowering, (3) 1-week-old etiolated seedling, (4) mixed flower stages, and (5) developing seed at mixed stages. Total RNA from leaves was extracted using the RNeasy Plant Mini Kit (Qiagen) including on-column DNase digestion. Total RNA from other tissues was extracted using the CTAB method described by Meisel et al.  and then cleaned up using RNeasy Mini kits (Qiagen), including on-column DNase digestion. Two kinds of RNA samples were used for cDNA synthesis for library construction. For the first CDC Redberry library, equal amounts of the total RNA from each tissue were mixed and directly used for cDNA synthesis. For all other libraries, including a second CDC Redberry library, equal amounts of the total RNA from each tissue was mixed first, then further purified using the RiboMinus Plant Kit for RNA-Seq (Invitrogen), and then used for cDNA synthesis.
3′-anchored cDNA libraries for 454 sequencing were prepared based on a protocol described in Eveland et al.  and modified to incorporate AciI as the restriction enzyme used to generate 3′ cDNA fragments of the optimal size range for amplification during 454 Titanium chemistry sequencing . The use of AciI as the appropriate enzyme was determined by in silico digestion of ESTs collected from chickpea (Sharpe and Cram, NRC Saskatoon, unpublished). First-strand cDNA was generated from ~5 ug of total RNA or RiboMinus RNA (derived from ~8 ug total RNA) using the ArrayScript kit (Ambion) according to manufacturer’s instructions. The synthesis was carried out using a biotinylated oligo(dT) fused to the 454 B-adapter primer, 5′ –biotin-CCTATCCCCTGTGTGCCTTGGCAGTCTCAG(T)12VN-3′. First-strand cDNA was then treated with 2 ul of RNase Cocktail Enzyme Mix (Ambion) for 30 min at 37°C. Double-strand cDNA was synthesized immediately by addition of 30 U of E. coli DNA Polymerase I and 1 U of E. coli RNase H (Fermentas Canada Inc.) to the first-strand synthesis reaction, following the manufacturer’s suggested protocol for second-strand cDNA synthesis.
The double stranded cDNA product was purified using MiniElute PCR Purification Kits (Qiagen Inc.), and then digested with the restriction enzyme AciI (New England Biolabs) to create 2-base 5′CG overhangs for 454 A-adaptor ligation. The A-adapter was prepared by annealing 1μl each of top and bottom strand oligos (100 pmol/μl; top strand, 5′-CCATCTCATCCCTGCGTGTCTCCGACTCAGCAT-3′; bottom strand, 5′-CGATGCTGAGTCGGAGACACGCAGGGATGA-3′), 1 μl 10X Annealing Buffer (10 mM Tris-HCl, pH 8, 150 mM MgCl2, 150 mM NaCl, 1 mM spermadine), and 7μl dH2O for a total volume of 10 μl. The annealing mixture was heated at 55°C in a heating block for 5 min followed by slow cooling to room temp of the block together with samples. 30 μl dH2O was then added to achieve a final concentration of 2.5 pmol/μl for A-adaptor stock.
The AciI-digested cDNA was treated with Agencourt AMPure Beads (Beckman Coulter Inc.) to remove smaller fragments (< 250 bp). The 3′-fragments of cDNA were recovered using DynBeads M-270 streptavidin (Invitrogen) and then ligated with A-adaptor. Unligated adaptors were removed by washing beads twice with 1× B&W Buffer (5 mM Tris-HCl, pH 7.5, 0.5 mM EDTA, 1 M NaCl, 0.01% Tween-20), and then fill-in reactions for repairing the nicks were done by using Bst DNA Polymerase, Large Fragment (New England Biolabs). To isolate the single-stranded AB adapted library, the immobilized beads were washed twice with 1× B&W Buffer and twice with Molecular Biology Grade water (Sigma). The desired 5′-A-cDNA-B-3′ template strand was eluted with 125 mM NaOH, neutralized, and concentrated on a column from the MinElute PCR Purification Kit (Qiagen Inc.).
The quantity and quality of the resultant single-stranded DNA library was assessed using a Q-PCR strategy  by using Platinum SYBR Green qPCR SuperMix UDG (Invitrogen). Molecules per microliter of the library were calculated using a 578-bp DNA standard while amplification through emPCR primers: forward primer, 5′-CCATCTCATCCCTGCGTGTC-3′; reverse primer, 5′-CCTATCCCCTGTGTGCCTTG-3′. The qPCR products were also checked on a 1.2% agarose gel to assess the quality of the libraries.
Roche 454 Titanium sequencing of the titrated single strand DNA libraries was carried out following the procedure described by Margulies et al.  with modifications for the Titanium chemistry as described in protocols supplied by the manufacturer (Roche, Laval, Quebec).
Sequence assembly and analysis and SSR analysis
NGen (DNAStar) software was used for de novo assembly of the CDC Redberry 454 reads. Parameters used for the de novo assembly included: Min/Match Percent = 90; Max454 Sequence Length = 600; Repeat Handling On; and Expected Coverage = 20. Processing was carried out on a Dell R910, 2 × 2.40 GHz, 48GB RAM Windows 64Bit server. Following assembly, repeat class contigs were removed as well as any remaining contigs of <200bp. Sequence data from the other genotypes was then assembled against the CDC Redberry de novo reference assembly using NGen (DNAStar). To ensure removal of the 454 key sequence from reads, 10bp were trimmed from the 5′ ends and screened for contamination using the vector/adapter file. Adapter screening was used to remove the wobble primer, the adapter, and any poly-A tail. Contaminant filtering was implemented for a set of seven ribosomal sequences that were found to remove the majority of contaminant reads.
The identification of candidate SSRs was carried out using the software QDD  in the contigs generated from CDC Redberry.
The L. culinaris contigs, their L. ervoides orthologues and the corresponding M. truncatula genes were aligned using ClustalW and those without indels (628 contigs in total) were used to estimate evolutionary divergence time. Ks values were obtained with the codeml program in the PAML program suite . The estimate of divergence was derived from the mode Ks of binned values, based on a mutation rate of 5.17× 10-3 substitutions/synonymous site/Myr for the legume lineage .
Seqman Pro (DNAStar) was used to identify SNPs relative to CDC Redberry. Individual reports were parsed into spreadsheet format to compare using a custom Perl pipeline. Only transitions and transversions were reported; indels were ignored as the nature of the pyrosequencing technique makes them less robust. The final spreadsheet report indicates if the SNP is the same as the reference; the alternate allele, or if there is no sequence data at that position (Additional file 2). All low confidence SNPs (represented in <80% or <3 reads) were identified and reported as being below thresholds if found in the same position as confident SNPs.
The 454 contigs were mapped against M. truncatula and soybean genomes using GMAP  with the cross-species parameter. Flanking sequence length and gene annotation information was extracted from the sequence mapping output through custom Perl scripting.
GO categorization was assigned based on the high-quality annotation information available for Arabidopsis thaliana. Contigs were compared against a database of TAIR10 CDS sequences, and the highest-scoring hit used to assign Arabidopsis GOslim gene ontology terms (since any one contig can be assigned to multiple GOslim terms, the total percentage in each category could exceed 100%).
A set of 28 SNPs were chosen for initial validation (Additional file 2); two were chosen to test the effect of interfering SNPs and splice sites. Two allele specific forward primers and a common reverse primer were designed for use in fluorescence based competitive allele-specific PCR assays (KASP; KBioscience, Hoddeston, UK). DNA from the 11 individuals from the SNP discovery panel was assayed using these primers and KASP reaction mix (version 3 chemistry; KBioscience) following manufacturer’s instructions. PCR amplification was carried out in a StepOnePlus™ Real-Time PCR System (Applied Biosystems) and end-product fluorescence readings were analysed using StepOne Software v2.1 (Applied Biosystems).
Sequence data for contigs containing a further 150 SNPs were sent to KBioscience (Hoddesdon, UK) for design of additional KASP assays. Of these, it was possible to design assays for 124 SNPs and genotyping was carried out at KBioscience on 144 LR-18 RILs and on 10 individuals from the SNP discovery panel. The resulting data were visually inspected for allele calling errors using SNPviewer software (KBioscience) and corrected when deemed necessary.
Illumina GoldenGate OPA design
All SNPs in the collection (44,942 amongst 11,061 contigs) were surveyed and only those that revealed polymorphism amongst L. culinaris genotypes and revealed polymorphism between at least two L. culinaris genotypes and the CDC Redberry reference were kept for further interrogation. Sequence data for the contigs surrounding the SNPs was checked for the number of base pairs to the end of the contig, to the next SNP, or to the closest splice site. Those with less than 60 bp in this flanking region were eliminated since they cannot be used for Illumina GoldenGate arrays (Illumina Inc., San Diego, CA). All remaining candidate SNPs were submitted to Illumina for assay design and a total of 7,229 SNPs were returned with design scores greater than 0.4; preferential selection was given to ones scoring above 0.6 as recommended by Illumina.
To reduce the number of SNPs to 1,536 for final array design, only one SNP per contig or one per predicted SNP haplotype was selected. Only one SNP assay for duplicated contigs or single assay for each duplicate was selected and the final selection was based on even spread across the genome by inference of distribution from contig homology to M. truncatula gene models. This final set of 1,536 was submitted to Illumina for design and synthesis of the Lc1536 OPA.
A total of 144 RILs from the LR-18 population were genotyped using the Lc1536 OPA and a standard Illumina GoldenGate assay protocol (http://www.illumina.com/technology/goldengate_genotyping_assay.ilmn). Products generated by this assay were read with an Illumina HiScan (Illumina Inc., San Diego, CA) and the resulting data were clustered for allele calling using GenomeStudio software version 2010.3 (Illumina Inc., San Diego, CA). Allele calls were visually inspected for errors in automatic allele calling and corrected where deemed necessary. Any calls that were not clearly one allele or the other were reported as missing data to avoid errors. Five individuals were removed before mapping due to too many missing data points, leaving 139 individuals for map development.
Lentil SSR markers from Hamwieh et al.  were screened on the parents of LR-18 and those that were polymorphic were used to genotype the whole population. To enable marker analysis using an ABI 3130× Genetic Analyzer (Applied Biosystems, Carlsbad, CA), forward primers were synthesized with an additional M13 universal primer sequence on the 5′ end. These forward primers were used in three-primer PCR amplifications along with the respective reverse primers and a 5′ fluorescently labeled (HEX, NED or FAM) M13 universal primer sequence. PCR reactions were conducted as described by Schuelke  with slight modification to the reaction conditions as described in Ubayasena et al. . Product sizes were determined using GeneScan™ 500 ROX Size Standard (Applied Biosystems).
The LR-18 allele calls from all genotyping were exported for analysis and mapping using JoinMap 4.0 . Maximum likelihood mapping was used and LOD values of 6 or higher were used to group loci into linkage groups to form the map. Regression mapping was used to finalize the map order of each linkage group and genetic distances were determined using the Kosambi mapping function. Map order was verified visually by examining the raw genotypes and the linkage map was generated using MapChart .
Contigs containing SNPs present in the linkage map were processed through the NUCmer pipeline  and the results filtered for global alignment using length × identity weighted longest increasing subset. Dotplots were generated using MUMmer plot and further visualized by parsing NUCmer output with custom Perl scripts.
KnowPulse: Pulse Crop Breeding and Genetics (http://knowpulse2.usask.ca/portal) is a repository of legume genetic and genomic data. The assembled 454 sequencing data for the lentil reference line (CDC Redberry) is available with all polymorphic loci and associated markers (KASP and Illumina Golden Gate arrays) indicated. Researchers can use a BLAST interface to identify potential homologues of their gene of interest based on sequence similarity. This allows them to both find existing markers associated with their gene of interest, as well as export information on specific loci in a variety of formats to aid in marker design. A comparative GBrowse (Generic Model Organism Database (GMOD), http://gmod.org) with a M. truncatula genomic backbone graphically displays sequence similarity-based homology between legume species providing an alternative approach to finding candidate markers. For an overview of all data made available through this project, visit the project page (http://knowpulse2.usask.ca/portal/node/3214572).
Sequence data has been deposited in the NCBI Sequence Read Archive (SRA) under the study accession SRP019372.