- Research article
- Open Access
High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
© Walter et al; licensee BioMed Central Ltd. 2009
Received: 11 February 2009
Accepted: 17 August 2009
Published: 17 August 2009
Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated) variation, is problematic for follow up analyses. Polymorphisms result in a high incidence of false positive and false negative results in hybridization based analyses and hinder the identification of the true variation underlying genetically determined differences in physiology and behavior. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice) and the wealth of gene expression microarray and phenotypic studies using genetic models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing, we are now in a position to determine to what extent polymorphisms are currently cryptic in such models and their impact on downstream analyses.
We sequenced the two most commonly used inbred mouse strains, DBA/2J and C57BL/6J, across a region of chromosome 1 (171.6 – 174.6 megabases) using two next generation high-throughput sequencing platforms: Applied Biosystems (SOLiD) and Illumina (Genome Analyzer). Using the same templates on both platforms, we compared realignments and single nucleotide polymorphism (SNP) detection with an 80 fold average read depth across platforms and samples. While public datasets currently annotate 4,527 SNPs between the two strains in this interval, thorough high-throughput sequencing identified a total of 11,824 SNPs in the interval, including 7,663 new SNPs. Furthermore, we confirmed 40 missense SNPs and discovered 36 new missense SNPs.
Comparisons utilizing even two of the best characterized mouse genetic models, DBA/2J and C57BL/6J, indicate that more than half of naturally-occurring SNPs remain cryptic. The magnitude of this problem is compounded when using more divergent or poorly annotated genetic models. This warrants full genomic sequencing of the mouse strains used as genetic models.
With the recent completion of the Perlegen/NIEHS mouse resequencing , over ten million mouse single nucleotide polymorphisms (SNPs) are now annotated in the public databases, resulting in a dramatic increase in the genome-wide knowledge of variation among 16 of the most widely used mouse strains. Importantly, this is a lower bound estimate because the C57BL/6J (B6) strain, used for the mouse genome reference sequence (NCBI m37, Apr 2007), is the only mouse strain sequenced in its entirety. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice) and the recent insurgence of gene expression microarray and phenotypic studies using these mouse models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing (HTS), we are now in a position to determine to what extent polymorphisms remain cryptic (either undiscovered or not previously annotated as a SNP in a specific strain comparison) in various mouse models and assess their impact on downstream analyses. As an example, we demonstrate that cryptic SNPs are prevalent, even between two of the most commonly used and well-annotated inbred mouse strains, B6 and DBA/2J (D2).
The onset of next-generation HTS enabled us to obtain full sequence coverage of a region of chromosome 1 in the D2 and B6 mouse strains. There are several platforms for massively parallel DNA sequencing currently on the market , and we took this opportunity to directly compare the same dataset on two of the three most widely used platforms: Illumina (Genome Analyzer) and Applied Biosystems (SOLiD). The Genome Analyzer implements a version of cyclic reversible termination chemistry , and similarly, the SOLiD platform uses a self-checking ligation chemistry that maps into color space . Both methods generated short reads that were then realigned to a reference sequence.
The present analyses were limited to a region of chromosome 1 from 171.6 – 174.6 megabases (Mb). This interval was selected for four reasons. First, it is representative of the genome in that it spans discrete regions of high and low SNP densities. Second, it is a gene dense region containing 79 protein coding genes, 2 retrotransposed genes, and 6 noncoding RNA genes. Third, it harbors numerous quantitative trait loci (QTLs) affecting a wide variety of physiological and behavioral phenotypes [for summary, see ]. And finally, the low incidence of annotated polymorphisms in a SNP-sparse block has hindered high-resolution mapping in this region .
We report that comparisons utilizing two well-annotated mouse genetic models, D2 and B6, predict that more than half of naturally-occurring SNPs remain unknown or not annotated. These cryptic SNPs lead to a high incidence of gene expression microarray false-positive and false-negative results and lead to failures in identifying allelically-variant genes that can underlie QTL phenotypic effects.
Results and discussion
D2 BAC contig and sequence
B6 BAC contig and sequence
To evaluate our realignment strategy, we sequenced the corresponding region from the B6 strain, for which full reference sequence is available . We prepared a B6 contig based on public end-sequence data for BAC clones from the RPCI-23 library . The resulting B6 BAC contig spanned 170,806,384 – 174,768,169 bp on chromosome 1 (Figure 1). Using both the SOLiD and Genome Analyzer datasets we attained realignment coverage with the exception of two gaps. The first gap (11 kb) was expected since the RPCI-23 BAC library map lacks annotated coverage in this region. The second gap (100 kb) could be due to an error in the mapping of one or two of the RPCI-23 BACs, since we relied upon reported locations of the B6 BAC library clones, or, alternatively, could be due to one or two clones missing from our pools. These gaps were present in both assemblies indicating a template problem rather than a sequencing discrepancy.
Comparison of Applied Biosystems and Illumina sequence realignments
Applied Biosytems and Illumina each performed realignments of the D2 and B6 datasets to the public B6 reference sequence . The SOLiD platform produces short reads (35 bp) encoded in a color-space format, and Applied Biosystems used their realignment pipeline specially designed to take advantage of this format for color-space read mapping and downstream analysis including SNP detection. The Genome Analyzer platform produces short read (33 bp) datasets with bases encoded in the standard letter representations, so Illumina carried out a different realignment approach and ran their dataset through the Maq  pipeline which includes both realignment and SNP calling procedures. Because both datasets included only short reads, we did not assess potential insertion/deletions.
Comparison of our SOLiD and Genome Analyzer sequence data to the Ensembl reference sequence confirmed that the sequence from both platforms is complete. SOLiD generated 31.8 million reads for the pooled D2 BACs, with 12.4 million (39%) of those mapping to the chromosome 1 target interval. SOLiD generated 32.7 million reads of pooled B6 BACs with 10.5 million (32%) mapping to the chromosome 1 interval. Genome Analyzer generated 14.5 million D2 reads with 9.6 million (66%) mapping to the chromosome 1 interval, and 14.0 million B6 reads with 6.2 million (44%) mapping to the chromosome 1 region. Reads that did not align to the chromosome 1 interval were mostly BAC vector sequence (which was not removed when preparing the DNA), adapters from sequencing chemistry, and bacterial contamination. These are template specific problems and do not reflect upon differences between the sequencing platforms. Additional purification steps would have resulted in fewer unaligned sequences; however, the aligned sequences provided sufficient high read depth coverage, so the exclusion of the unaligned sequences did not adversely affect the present analyses.
B6 reference sequence quality
Comparison of Illumina and Applied Biosystems realignment data.
Total basepairs covered
Ns in realignment
Realignment discrepancies to B6 reference or SNPs detected in D2
SNP identification and confirmation
In order to compare the SOLiD and Genome Analyzer platforms under optimal conditions for each platform and to compare in a manner the majority of end-users are likely to employ, the manufacturers used parameters determined to be optimal for SNP calling on their particular platform. The Illumina SNP calling method (Maq) relies upon quality scores and applies a filter based on these qualities, read depth, and neighboring SNPs in order to discriminate between a SNP and a base-calling error. Applied Biosystems calls SNPs based on two-base encoding in color-space allowing for more sensitive discrimination between SNPs and base-calling errors. Illumina's method produced fewer no calls (Ns) than Applied Biosystems' method based upon realignments performed by each vendor (Table 1). No nucleotide bias was apparent in the frequency of the calls using the Genome Analyzer and SOLiD platforms. We conclude that because of Maq's probabilistic approach using qualities, more SNPs are called by Illumina/Maq than by Applied Biosystems; however, it remains to be confirmed if Illumina has more false positive calls. It is important to keep in mind that because the vendors used independent mapping and SNP calling approaches, differential results due to the platform cannot be distinguished from those due to the analysis pipeline.
Currently, the Mouse Phenome Database (MPD), which includes dbSNP and Perlegen data among other resources, offers the most inclusive SNP queries for mouse strains, including comparisons of the D2 and B6 strains (see Methods for further explanation of SNP Databases). While, MPD currently annotates 4,527 D2 vs. B6 SNPs in the chromosome 1 interval (171.6–174.6 Mb), custom HTS identified 11,824 SNPs (Figure 3) for the same interval (referred to as PARC SNPs because they were sequenced by one or both platforms in work supported in part by the Portland Alcohol Research Center, PARC). 9,152 (77%) of the 11,824 PARC SNPs identified by custom HTS were identified using both the Applied Biosystems and Illumina realignments (i.e., were identified by two independent experiments) and are therefore of very high quality (Figure 3). 2,033 (17.2%) PARC SNPs were identified only by Illumina's realignment, and 639 (5.4%) PARC SNPs were identified only by Applied Biosystems realignment. Only 271 (13%) of the Illumina-specific PARC SNPs and 56 (10%) of the Applied Biosystems-specific PARC SNPs confirmed known SNPs in MPD.
The SOLiD and Genome Anlayzer datasets were merged for subsequent comparisons. Our results confirmed 4,161 (92%) of the D2 vs. B6 SNPs currently annotated in the MPD public dataset within the chromosome 1 interval, while 236 (5%) of the SNPs reported in MPD for this interval were determined to be false-positive SNPs. The 130 remaining either lie in gaps in our realignments or had ambiguous (undetermined or low quality) calls in the sequence data. Our results identified numerous SNPs not previously annotated in MPD. In fact, our results identified 7,663 new SNPs, more than doubling the number of D2 vs. B6 SNPs found in this chromosome 1 interval.
SNP impact on protein function or expression
Missense B6 vs. D2 PARC SNPs discovered by custom HTS.
probably damaging (A)
probably damaging (A)
possibly damaging (A)
probably damaging (A+S)
possibly damaging (A)
possibly damaging (A)
possibly damaging (A)
SNP impact on gene expression microarrays
Naturally occurring genetic polymorphisms dramatically impact hybridization based techniques, including gene expression microarray analyses . With the ability to assess alternative transcript expression, exon microarrays have ten times as many probes as previous gene expression microarrays, so eliminating hybridization bias using SNP masks is increasingly critical. We have developed and applied a complete SNP mask using all of the PARC SNPs found by our custom HTS, which allowed us to rigorously assess differential expression between the D2 and B6 strains. We assessed the impact of 124 SNPs that lie within core probesets on the detection of differential (genotype-dependent) exon expression for genes in the chromosome 1 region of interest using Affymetrix Mouse Exon 1.0 ST array data (for details, please see Mooney et al., companion publication). 629 core probesets interrogate this interval. When compared to our unmasked data, masked results were consistent for 141 differentially expressed probesets and 437 non-differentially expressed probesets, but indicated 47 false positive and 4 false negative results due to SNPs.
Furthermore, we overlaid D2 vs. B6 SNPs discovered by our custom HTS with the probe locations of all four types of probesets within the chromosome 1 interval (i.e., core, extended, full, and free) for the Affymetrix Exon 1.0 ST gene expression microarray platform. For chromosome 1 (171.6 – 174.6 Mb), there are 8201 probes and 2126 probesets on the Affymetrix Mouse Exon 1.0 ST array, and we identified 861 probes that spanned at least one SNP encompassing 480 probesets or 23% of the probesets that interrogate this interval (unpublished data). Thus, compared to publicly available D2 vs. B6 SNPs, custom HTS identified 60% more probesets that span SNPs.
We report that comparisons utilizing even two of the most commonly used mouse genetic models, D2 and B6, predict that more than half of naturally-occurring SNPs remain unknown or not annotated. This is particularly striking given that the present comparison is between the B6 strain, upon which the mouse reference is based , and the D2 strain, which is one of the best annotated mouse strains with sequence from Celera and extensive SNP detection data primarily from Perlegen . There are approximately 1.8 million SNPs currently annotated between the D2 and B6 strains in MPD . Thus, cryptic SNPs would have been even more prevalent had we used more divergent or poorly annotated genetic models.
We compare two methods for next-generation HTS. By sequencing the same templates on both Applied Biosystems' SOLiD and Illumina's Genome Analyzer sequencing platforms, we determined that the platforms offer comparable results at a high read depth. More SNPs were called by Illumina/Maq than by Applied Biosystems, but because the vendors used independent mapping and SNP calling approaches, differential results are likely due to differences in the analysis pipelines, as no nucleotide bias in the frequency of the calls made by the Genome Analyzer and SOLiD was apparent.
Mouse models are an invaluable tool for identifying allelic variation that contributes to genetically determined differences in physiology and behavior. However, allelic variation is problematic for follow-up analyses, and removing technical bias resulting from naturally-occurring sequence variation is critical. Previously, we illustrated the impact of SNPs on gene expression microarray analyses  and argued that complete SNP masks for gene expression microarray and other hybridization techniques are essential to appropriately interpret these data. Here, we have taken the next step and sequenced a region of two of the most widely used mouse strains in order to determine the comprehensiveness of SNP data. What we have found is that the mouse SNP data currently available is incomplete. In fact, for the D2 vs. B6 strains, we predict that less than half of the true SNPs are currently annotated. As more divergent mouse strains, harboring even more cryptic SNPs, are used in studies, the impact of SNPs on interpreting results will become increasingly problematic. This glimpse at complete sequence data for two strains demonstrates that full genomic sequencing of the mouse strains used as research models is warranted.
D2 BAC library screen
Using 32 chemiluminescent labeled PCR probes designed across a 3 Mb region of chromosome 1 (see Additional File 1 for PCR probe primer pairs), we systematically probed a D2 BAC library consisting of 215,040 clones spotted on 12 nylon filters. The MM_DBa BAC library was generated at Clemson University Genome Institute  in 2002 using a single male D2 strain mouse from the Jackson Laboratory where their Genetic Stability Program uses a cryopreservation approach to effectively limit genetic drift and ensure strain stability. 90 BACs were identified in the library screen using the standard Roche DIG chemiluminescent protocol for probing DNA library filters.
D2 BAC end sequencing
BAC ends were sequenced in order to determine if we had overlap. This was done in several rounds, in which we designed new probes as needed to fill in gaps in our 3 Mb contig. In total, we sequenced the ends of the 99 BAC clones and aligned these BAC end-sequences to the reference mouse sequence. Given that the stringency of the filter hybridizations was variable, that interpretation of positive filter spots was not always straightforward, and that the quality of BAC end-sequencing was not always consistent, we proceeded to confirm each clone we identified. We confirmed a BAC as positive for our region if the end-sequence mapped uniquely to the region of chromosome 1, and the PCR probe mapped in between the end sequences. We confirmed 57 positive BACs for the chromosome 1 region and aligned these to create a minimal contig of 27 overlapping MM_DBa BACs: clones 75C22, 329K2, 163C23, 109M5, 438J2, 138A14, 372P6, 264L1, 69J19, 220P1, 196G24, 250H1, 17F16, 374A8, 271K8, 467L22, 405D20, 431P3, 359G4, 259B4, 246D12, 248E11, 270C10, 440I3, 98C12, 41K6, and 306F23.
B6 BAC identification
For the B6 BAC contig, we used the RPCI-23 B6 Mouse BAC Library available from Children's Hospital Oakland Research Institute , which was generated from a pool of five female B6 strain mice from the Jackson Laboratory, where strain stability is carefully controlled. This is the same library used by the Mouse Genome Sequencing Consortium to generate the B6 reference strain sequence. We used publically available BAC end-sequence for the BAC clones and assembled a minimal contig of 25 overlapping BACs including clones 27N12, 162D2, 90F16, 135E8, 161F18, 244J9, 247E12, 311M1, 354B19, 31N9, 395H6, 231O3, 362G20, 21D8, 295F11, 277L8, 125N14, 447P5, 58C22, 477M10, 22H23, 454E7, 411M14, 131N14, and 6N10.
Once the clones were identified, the same DNA preparation protocol was used for the D2 and B6 BAC clones. Vectors were not removed. Each clone was prepared from a glycerol stock by Clemson University using a standard protocol . This resulted in high-quality end-sequence data. We quantified the BAC DNA using a Nanodrop 1000 (Thermo Scientific) and assessed quality of the 260/280 ratios of each BAC clone. For each of two final samples (B6 and D2), BACs were pooled equimolarly and sent to Applied Biosystems (Foster City, CA) and Illumina (San Diego, CA) for next-generation short read DNA sequencing.
Illumina (Genome Analyzer) sequence and assembly
As per Illumina's requirements, equimolarly pooled BAC DNA was submitted for each B6 and D2 samples: 8.5 ug of B6 BACs and 22.8 ug of D2 BACs. Illumina prepared single read libraries and performed sequencing on the Genome Analyzer I (previously, Solexa) as per standard protocol . Briefly, the sequencing process relies on the amplification of fragmented DNA to form clusters. The sequence content of these clusters is then queried using cycles of fluorescently labeled dNTP addition, detection, and fluorescent groupremoval. The data was optimally assembled by Illumina using Maq (Mapping and Assembly with Quality)  with the default parameters in the easyrun script . We used the cns.final.snp output file with minimum read depth of 3 and minimum consensus quality of 30 required to call a SNP for these analyses.
Applied Biosystems (SOLiD) sequence and assembly
Equimolarly pooled BAC DNA samples were sent to Applied Biosystems as per their requirements: 49.7 ug of B6 BACs and 69.3 ug of D2 BACs. Applied Biosystems prepared two fragment libraries for sequencing from each pool of genomic DNA via standard methods on their SOLiD platform . In short, the SOLiD process attaches clonally-amplified template DNA to a bead and adapter substrate and queries cyclically by adding fluorescently labeled probes specific to two bases, detecting these di-base-calls, removing the fluorescent group, and repeating. This is followed by a primer-reset which shifts the starting primer, allowing different bases to be queried using the above base-calling cycle.SOLiD data was analyzed by Applied Biosystems using their freely available software package. The SOLiD Analysis Tools (SAT) process the array image, perform data filtering, calculate quality values, align to a reference genome, and generate base calls using default parameters . SNPs were called with a minimum read depth of 3 as a requirement.
Determining non-synonymous SNPs
Non-synonymous SNPs were computed using the Ensembl CDS sequence from UCSC for each transcript. Any SNP falling within coding region and the exon boundaries were used to convert the B6 allele into the D2 allele predicted by the realignments. The coding sequences were then reassembled and translated in silico using CLC Sequence Viewer 5 . The amino acid sequences were compared and any differences were noted.
Striatal tissue from D2 (n = 8) and B6 (n = 8) males and females was dissected and total RNA isolated using standard Trizol (Invitrogen) protocol. 3 μg of total RNA was pooled (n = 8) for each strain and sent to Illumina for library construction and standard transcriptome sequencing. The realignment was performed using ELAND with 2 errors allowed in the first 32 bases of a read. Because we had incomplete read coverage for lowly expressed genes, strain-specific sequence results were analyzed for highly expressed genes only. So this dataset offers confirmation of some of the SNPs detected in our DNA sequencing, but did not achieve complete transcriptome coverage.
Mouse Phenome database (MPD). The MPD SNP collection  contains data for approximately 10 million mouse SNPs for an expansive list of strains from a variety of sources, including datasets such as Broad Institute and Wellcome Trust that are not yet available in other databases. A significant portion of these SNPs are from Perlegen (NIEHS) . Genotype allele tables are generally provided by investigators; however, Perlegen did not include B6 in their analyses. In this case, MPD annotated the reference B6 alleles with the Perlegen data making SNP queries including the B6 strain more inclusive. Specifically, MPD has 4,527 D2 vs. B6 SNPs in the chromosome 1 interval (171.6–174.6 Mb) of interest. All SNPs are mapped to NCBI mouse genome build 37.1 reference assembly (B6). This MPD SNP data build includes annotation from dbSNP 128, Ensembl 48, and NCBI extracted during Dec 2007. The 18 strains with high density SNP data are B6, D2, 129S1/SvImJ, A/J, AKR/J, BALB/cByJ, BTBR_T+_tf/J, C3H/HeJ, CAST/EiJ, FVB/NJ, KK/HlJ, MOLF/EiJ, NOD/ShiLtJ, NZW/LacJ, PWD/PhJ, WSB/EiJ, 129X1/SvJ, and CZECHII/EiJ.
dbSNP Build 128. dbSNP is maintained at NCBI  and includes more than 14 million mouse SNPs. When querying for strain specific polymorphisms, dbSNP identifies 2,903 D2 vs. B6 SNPs in the chromosome 1 interval (171.6–174.6 Mb) of interest. This number is significantly lower than those found in MPD because the Perlegen data does not include B6 alleles, so dbSNP does not retrieve these in the strain specific queries.
Ensembl variation 53. The Ensembl SNP dataset is queried using Biomart  and primarily incorporates dbSNP 128. There are some additional SNPs specific to Ensembl without reference SNP (rs) accession numbers. When querying for strain specific polymorphisms, Ensembl retrieves 2,832 D2 vs. B6 SNPs in the chromosome 1 interval (171.6–174.6 Mb) of interest. Additionally, individual transcript queries in Ensembl include strain variation data that contain realignments of the original strain specific raw reads from Celera (including D2) to the reference B6 sequence. These SNPs provide independent confirmation of some of the SNPs identified by custom HTS and are annotated as confirmation by "Realignment of Celera raw reads" in Table 2.
All of the PARC SNPs discovered from both Applied Biosystems' SOLiD and Illumina's Genome Analyzer sequencing pipelines have been deposited in dbSNP at NCBI under the Handle PARC_SEQ as Computational SNPs (ss#119994841-120015816). D2 BAC end sequence has been submitted to GenBank as GSS (genome survey sequences). Raw sequencing data for SOLiD sequencing has been submitted to GenBank in the SRA (short read archive) without intensity data. Raw sequencing data for the Illumina sequencing is available upon request; however this data was generated before SRA standards were established, and the specific raw files needed for SRF format are no longer available.
This work is supported by 5R01AA011114, 5R01DA005228, 5P60AA10760, 5R01AA11034, 5R01AA13484, 5T15LM007088, 5P30CA069533, UL1RR024140, and the VA. We thank Illumina and Applied Biosystems for their assistance in the generation and realignment of these sequencing data.
- Frazer KA, Eskin E, Kang HM, Bogue MA, Hinds DA, Beilharz EJ, Gupta RV, Montgomery J, Morenzoni MM, Nilsen GB, et al: A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature. 2007, 448: 1050-3. 10.1038/nature06067.View ArticlePubMedGoogle Scholar
- Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008, 9: 387-402. 10.1146/annurev.genom.9.081307.164359.View ArticlePubMedGoogle Scholar
- Metzker ML: Emerging technologies in DNA sequencing. Genome Res. 2005, 15: 1767-76. 10.1101/gr.3770505.View ArticlePubMedGoogle Scholar
- Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM: Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005, 309: 1728-32. 10.1126/science.1117389.View ArticlePubMedGoogle Scholar
- Mozhui K, Ciobanu DC, Schikorski T, Wang X, Lu L, Williams RW: Dissection of a QTL hotspot on mouse distal chromosome 1 that modulates neurobehavioral phenotypes and gene expression. PLoS Genet. 2008, 4: e1000260-10.1371/journal.pgen.1000260.PubMed CentralView ArticlePubMedGoogle Scholar
- Denmark DL, Buck KJ: Molecular analyses and identification of promising candidate genes for loci on mouse chromosome 1 affecting alcohol physical dependence and associated withdrawal. Genes Brain Behav. 2008, 7: 599-608. 10.1111/j.1601-183X.2008.00396.x.View ArticlePubMedGoogle Scholar
- Ensembl. Sequence build used in this manuscript is NCBI m37, Apr 2007, [http://www.ensembl.org/Mus_musculus]
- Osoegawa K, Tateno M, Woon PY, Frengen E, Mammoser AG, Catanese JJ, Hayashizaki Y, de Jong PJ: Bacterial artificial chromosome libraries for mouse sequencing and functional analysis. Genome Res. 2000, 10: 116-28.PubMed CentralPubMedGoogle Scholar
- Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18: 1851-8. 10.1101/gr.078212.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Szatkiewicz JP, Beane GL, Ding Y, Hutchins L, Pardo-Manuel de Villena F, Churchill GA: An imputed genotype resource for the laboratory mouse. Mamm Genome. 2008, 19: 199-208. 10.1007/s00335-008-9098-9.PubMed CentralView ArticlePubMedGoogle Scholar
- Ng PC, Henikoff S: SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31: 3812-4. 10.1093/nar/gkg509.PubMed CentralView ArticlePubMedGoogle Scholar
- Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002, 30: 3894-900. 10.1093/nar/gkf493.PubMed CentralView ArticlePubMedGoogle Scholar
- Hitzemann R, Malmanger B, Reed C, Lawler M, Hitzemann B, Coulombe S, Buck K, Rademacher B, Walter N, Polyakov Y, et al: A strategy for the integration of QTL, gene expression, and sequence analyses. Mamm Genome. 2003, 14: 733-47. 10.1007/s00335-003-2277-9.View ArticlePubMedGoogle Scholar
- Walter NA, McWeeney SK, Peters ST, Belknap JK, Hitzemann R, Buck KJ: SNPs matter: impact on detection of differential expression. Nat Methods. 2007, 4: 679-80. 10.1038/nmeth0907-679.PubMed CentralView ArticlePubMedGoogle Scholar
- Clemson Univsersity Genomics Institute. [http://www.genome.clemson.edu]
- CHORI. [http://bacpac.chori.org/femmouse23.htm]
- Illumina DNA sequencing. [http://www.illumina.com/pages.ilmn?ID=251]
- Mapping and Assembly with Quality. [http://maq.sourceforge.net]
- Applied Biosystems DNA sequencing. [https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=604409]
- SOLiD Analysis Tools. [http://solidsoftwaretools.com/gf/project/corona/]
- CLC Sequence Viewer 5. [http://www.clcbio.com]
- The Mouse Phenome Database. [http://www.jax.org/phenome/SNP]
- dbSNP Mouse SNP query. [http://www.ncbi.nlm.nih.gov/SNP/MouseSNP.cgi]
- Ensembl Biomart SNP query. [http://www.ensembl.org/biomart/martview]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.