A model of binding on DNA microarrays: understanding the combined effect of probe synthesis failure, cross-hybridization, DNA fragmentation and other experimental details of affymetrix arrays
© Jakubek and Cutler; licensee BioMed Central Ltd. 2012
Received: 24 April 2012
Accepted: 16 December 2012
Published: 27 December 2012
DNA microarrays are used both for research and for diagnostics. In research, Affymetrix arrays are commonly used for genome wide association studies, resequencing, and for gene expression analysis. These arrays provide large amounts of data. This data is analyzed using statistical methods that quite often discard a large portion of the information. Most of the information that is lost comes from probes that systematically fail across chips and from batch effects. The aim of this study was to develop a comprehensive model for hybridization that predicts probe intensities for Affymetrix arrays and that could provide a basis for improved microarray analysis and probe development. The first part of the model calculates probe binding affinities to all the possible targets in the hybridization solution using the Langmuir isotherm. In the second part of the model we integrate details that are specific to each experiment and contribute to the differences between hybridization in solution and on the microarray. These details include fragmentation, wash stringency, temperature, salt concentration, and scanner settings. Furthermore, the model fits probe synthesis efficiency and target concentration parameters directly to the data. All the parameters used in the model have a well-established physical origin.
For the 302 chips that were analyzed the mean correlation between expected and observed probe intensities was 0.701 with a range of 0.88 to 0.55. All available chips were included in the analysis regardless of the data quality. Our results show that batch effects arise from differences in probe synthesis, scanner settings, wash strength, and target fragmentation. We also show that probe synthesis efficiencies for different nucleotides are not uniform.
To date this is the most complete model for binding on microarrays. This is the first model that includes both probe synthesis efficiency and hybridization kinetics/cross-hybridization. These two factors are sequence dependent and have a large impact on probe intensity. The results presented here provide novel insight into the effect of probe synthesis errors on Affymetrix microarrays; furthermore, the algorithms developed in this work provide useful tools for the analysis of cross-hybridization, probe synthesis efficiency, fragmentation, wash stringency, temperature, and salt concentration on microarray intensities.
DNA microarray chips consist of large numbers of probes, single stranded DNA molecules attached to a solid surface, that hybridize to nucleic acids. Microarrays have several uses in DNA analysis including CNV detection[2–5], re-sequencing, SNP typing[7, 8] , detection of species specific DNA in complex samples, and identification of protein-DNA binding sites. They are also used to assess transcript levels in samples of coding and non-coding RNA[1, 9, 10]. In the field of human genetics, DNA microarrays are used to investigate disease[4, 11, 12], to study variation[5, 8], and to detect variants in clinical samples.
Collections of identical probes are called probe spots or features. Each probe spot consists of many copies of identical single stranded DNA molecules. Many DNA array designs have multiple features querying the same target DNA. Often one set of features queries targets on the forward strand of the DNA while the other set queries targets on the reverse strand. The first step in a DNA microarray experiments is to isolate and amplify the target DNA or RNA. Next, the amplified target is fragmented and fluorescently labeled. The labeled target solution is then hybridized to the chip where target binds to probe DNA. Following hybridization the chip is washed in order to eliminate non-specific binding. Finally, the chip is scanned and the fluorescent intensity measured for each feature.
The fundamental assumption behind a DNA microarray experiment is that the intensity measure for a probe spot correlates to the concentration of target bound to that spot, which in turn correlates to the amount of target in the original solution[6, 13]. However, the relationship between observed intensity and target DNA composition is not straightforward. Known variables with microarrays include high variance in intensity between probe spots, high variance in a single probe spot’s intensity between chips, as well as background (non-specific binding) intensity differences between chips. Several studies have focused on the binding kinetics of DNA molecules attached to a solid surface and on cross-hybridization[14–18]. These studies have helped illuminate some aspects of the microarray experiment; however, several fundamental observations of microarray behavior remain poorly understood. First, when an array contains probe spots for both the forward and reverse target, simple liquid phase kinetics predict that both probe spots ought to have the same binding affinities and therefore should have equal amounts of bound target DNA. However, in practice forward and reverse strand probe spots usually have significantly different intensity measurements. These differences in intensity are observed in both GC and AT rich probes as well as probes with and without nucleotide runs (20). Second, liquid phase kinetics predicts that mismatches anywhere in the oligo (other than in the last 3 bases) ought to have equal effects on binding[19, 21–25]. However, mismatches near the center of the probe have a stronger effect on probe intensity compared to mismatches towards the edges. Third, chips manufactured on different days often have subtly different binding properties (so-called chip-effects); as do chips processed by different facilities or on different days (batch effects)[27, 28].
The goal of this study is to attempt to understand all of these aspects. To do so we develop a detailed model of the DNA microarray experiment and then use this model to predict probe intensities for seven different microarray designs.
The basic assumption behind this modeling approach is that the hybridization kinetics of DNA binding on a chip are fundamentally the same as liquid phase kinetics. Apparent differences between liquid phase predictions and microarray observations arise from the combined effect of different aspects of the microarray experiment. In the model we include the effect of probe sequence, cross-hybridization, nucleotide position, and hybridization conditions. We model the combined effect of these factors in one step rather than normalizing the data for each factor in a stepwise manner[29, 30]. Unlike previous studies we do not adjust binding strength for nucleotides based on their position on the probe. Instead the “positional” effect of nucleotides arises naturally in our model as a side effect of target DNA fragmentation and microarray synthesis. In particular we assume that microarray synthesis is not perfect[31–33]; more specifically, we assume that during probe synthesis individual A, C, G, and T nucleotides fail to incorporate at different rates. We also model abasic sites on the probe sequence where the probabilities that A, C, G, and T nuclotides become abasic are not necessarily equal to each other. Consequently differences in synthesis efficiency/abasic sites between nucleotides explain why forward and reverse DNA probe spots often have significantly different intensities and explain chip-effects. Additionally we explicitly model target DNA concentration, hybridization temperature, mean fragmentation size, wash stringency[15, 34, 35], and microarray scanner settings which together with errors along the probe sequence give rise to batch effects. Previous studies have reported that different probe sequences have different saturation intensities, under our model this is expected given that each probe spot consists of a “forest” of probes and that the number of probes capable of binding target DNA is sequence dependent. Furthermore, the strength of the wash impacts the final number of probe/target duplexes.
Our model consists of two parts. First, we calculate binding affinities for the probes and target DNA in the hybridization solution. To do so we use the Langmuir isotherm. This part of the model is independent from the chip intensity data and simply yields equilibrium constants for all possible target-probe complexes. In the second part of the model the binding data is used to predict probe intensity for Affymetrix arrays. In this step, we fit several parameters (probe synthesis efficiency, wash stringency, fragmentation, scanner’s dynamic range, and target DNA concentration) to each individual chip and predict the probe intensities for that chip. For the analyzed data the average correlations ranged from 0.88 and 0.55. Our results show that the different bases (A, C, G, and T) do not incorporate into the probe with the same efficiency.
Our model begins with the assumption that DNA hybridization on a microarray is the same as DNA hybridization in solution, but that many of the experimental details previously ignored as well as other details of the microarray experiments must be explicitly modeled to account for the observed differences between solution and microarray. In particular, we model target fragmentation, cross-hybridization, microarray synthesis imperfections, the effect of the wash, and the scanner’s dynamic range. In the final model we must fit at least ten parameters specific to each microarray (eight parameters related to synthesis efficiencies, one parameter for the mean fragmentation size of the target, and one concentration parameter per target molecule).The model also includes four parameters that vary between batches of microarrays processed at the same time (one parameter for the wash common between chips, and three parameters related to the shape of the scanner’s dynamic range).
Our approach to target DNA/probe DNA hybridization is exhaustive. We begin by assuming that hybridization temperature, hybridization solution salt concentration, probe sequences, and target DNA sequences are known. In our model the target DNA consists of one or more DNA “segments” which have unknown concentrations. These DNA “segments” are user defined and can be PCR fragments, reduced representations of the genome, chromosomes, transcripts, whole genomes, or any other set of DNA sequences that accurately represents the segments produced by the experimental protocol. First we fragment the target at every possible position; thus, modeling all potential cut sites on the target DNA. We then allow each of the resulting target fragments to bind to every position on every probe. Thus in our model, a probe spot consists of a “forest” of bound target/probe complexes. The target molecules that are hybridized to the probes in a given probe spot are of differing lengths and are bound at differing start and end-positions of the probe sequence. Even though each spot consists of a complex assortment of probe/target complexes the underlying thermodynamics of each individual binding reaction is fundamentally the same as in solution and follows a Langmuir isotherm with nearest-neighbor kinetics[19, 37, 38].
where R is the gas constant.
Fraction bound probes calculations
In order to use this formula for α, two conditions have to be met. First, equilibrium for target/probe formation must be reached, and second, [C] >> [probe], such that Δ[C] due to target/probe binding is negligible.
where Cn is equal to the concentration of the nth DNA segment.
where KeqW(i,j,k) are all Keq values greater than KeqW.
Type of synthesis error(s)
5’ C12T11A10C9C8G7T6A5C4C3G2T1 3’
Full length probe (no error)
5’ C8G7T6A5C4C3G2T1 3’
Incorporation error base 9
5’ C8G7T6_5C4C3G2T1 3’
Incorporation error base 9 and abasic site
5’ C12T11A10_9C8G7T6_5C4C3G2T1 3’
Full length probe with two abasic sites
5’ T6_5C4_3G2T1 3’
Incorporation error base 7 and two abasic sites
We calculate all other one error probabilities in a similar manner.
where the sums are taken over all possible errors x. To simplify calculations, we assume that probes with two or more errors have little to no binding and an αx value of zero.
Estimating probe spot intensity
We fit four parameters for the truncation rates, four parameters for the rate of abasic sites, and a variable number of parameters for the target DNA segment concentrations. To do so we minimize the square difference between the observed and expected probe intensities across all probe spots on a chip, using Powell’s method for numerical minimization. In order to ensure that the algorithm did not find a sub-optimal solution, we ran several searches on the same chip. Each time the algorithm would start at a different part of parameter space and it always found the same solution.
One dimensional nearest neighbor and initiation termination parameter search
We performed a one-dimensional search for the NN and the initiation/termination parameters used in the Keq calculations. In this search we maximized the observed mean correlation for a set of five chips each ran under four different combinations of wash/fragmentation parameter values.
The effect of mismatches on probe intensities
To understand the effect of mismatches under our model we compared the intensity of “perfect match” probes to all possible “mismatched probes.” More specifically we calculated the expected intensity of 20,102 probes that align perfectly to the human reference sequence. Then for each of these 25 bp long probes we calculated the expected intensity for all sequences that have all possible one base pair differences. Therefore, for each “perfect match” probe there are 75 “mismatch” probes. The “perfect match” probe sequences come from the FMR1 chip design and the parameters used to calculate the expected intensities are the same as the parameters that were fit for chip number 19 FMR1 design (Additional file1: Table S2).
Comparison of forward and reverse strand probe intensities
We compared the observed and expected intensities for probes that perfectly matched the reference sequence. For every forward and reverse probe pair we calculated the log ratio of the forward/reverse observed intensities. We then calculated the average of this ratio across all 62 chips from the FMR1 design. We did the same for the expected intensity. Then for each probe pair we estimated the difference in nucleotide composition between forward and reverse as the sum of (A-T) and (G-C). Where A, C, G, and T are the number of A, C, G, and T bases in the forward strand probe. We used this value to create bins for the means of the ratios. We then plotted the mean for the observed and expected ratios for each bin (Additional file2: Figure S1).
Number of chips
Number of probe spots per chip
Number of parameters that were fit to each chip
Number of PCR products
Mean correlation between expected and observed probe intensities
0.765 ± 0.038
0.605 ± 0.020
0.730 ± 0.021
0.714 ± 0.016
0.666 ± 0.028
0.629 ± 0.029
0.762 ± 0.081
0.985 ± 0.012
0.952 ± 0.017
0.916 ± 0.035
0.945 ± 0.018
1.000 ± 0.000
0.950 ± 0.009
0.990 ± 0.017
0.944 ± 0.005
0.999 ± 0.004
0.945 ± 0.015
0.995 ± 0.013
0.967 ± 0.008
0.996 ± 0.007
0.943 ± 0.013
0.990 ± 0.015
0.973 ± 0.006
0.983 ± 0.018
0.944 ± 0.010
0.943 ± 0.022
0.952 ± 0.009
0.987 ± 0.012
0.931 ± 0.009
0.923 ± 0.016
0.951 ± 0.007
0.967 ± 0.010
0.944 ± 0.015
0.908 ± 0.020
0.950 ± 0.006
0.963 ± 0.028
0.981 ± 0.013
0.865 ± 0.036
0.998 ± 0.007
0.882 ± 0.005
0.930 ± 0.011
0.791 ± 0.017
1.000 ± 0.001
0.904 ± 0.010
0.961 ± 0.015
0.791 ± 0.014
1.000 ± 0.000
0.901 ± 0.008
0.972 ± 0.015
0.809 ± 0.015
1.000 ± 0.000
0.944 ± 0.025
0.959 ± 0.013
0.866 ± 0.029
0.998 ± 0.003
0.938 ± 0.013
0.956 ± 0.011
0.884 ± 0.021
0.998 ± 0.006
0.959 ± 0.014
0.941 ± 0.027
0.953 ± 0.029
0.962 ± 0.010
Intuitively, this is the primary explanation of the observed difference between forward and reverse probe intensities. If, for example, the C incorporation rates are much higher than G incorporation rates and they both have similar rates of base retention, and one probe (say the forward probe) is C rich, while the other G rich, then we will observe that the forward C rich probe will be much brighter than the complimentary reverse G rich probe. Across chip designs adenosine seems to be synthesized more efficiently than other bases. More specifically, for the FMR1 chip design adenosine synthesis is more efficient than thymidine synthesis and cytidine synthesis is more efficient than guanosine synthesis. For this chip design we looked at the ratio of forward and reverse strand intensities as a function of probe base pair composition (Additional file2: Figure S1). For bins with more than 50 observations the observed and the expected intensity ratios followed the same trend. The deviations between the observed and the expected at the edges of the curve could be the results of small sample size and nucleotide runs that have binding affinities that deviated from those calculated using the nearest neighbor model.
Our model for DNA hybridization on microarrays is comprehensive. It includes parameters that are specific to the chip design and to the processing protocol. These parameters are probe length, temperature, salt concentration, wash stringency, target DNA size, and the parameters that are related to the scanners dynamic range. Furthermore, the NN values can be adjusted to accommodate DNA-RNA and RNA-RNA binding. The algorithm that calculates Keq values for probe target complexes is applicable to many array designs as well as to other binding reactions in thermodynamic equilibrium. Thus, this algorithm can provide valuable information on binding affinities and cross-hybridization for a wide array of applications including probe design.
Our approach to probe intensity calculations takes into account many of the “problems” inherent to microarrays such as batch effects and probe synthesis failure. Preparation of target DNA for microarray experiments often poses many technical challenges; consequently, the concentration of DNA products often varies between experiments as does the wash strength and average fragmentation size. Furthermore, probe synthesis efficiency varies between individual chips, and different batches of arrays. Our model directly tackles this problem by fitting the concentration as well as the nucleotide synthesis parameters for each chip. It is important to note that all the chips analyzed are Affymetrix chips with 25 bp long probes; therefore, the chip synthesis aspects of our model may be somewhat specific to the company’s manufacturing process. Our model fits one parameter for the concentration of each target molecule, which limits its use for arrays that quantify transcript levels. However, the Keq information from our model is applicable to any type of array.
For all the chips that were analyzed, the mean correlation between expected and observed probe intensities is 0.701, with average correlations for each chip ranging from 0.881 to 0.550. Furthermore, extreme intensity values do not dominate the correlation terms (Figure4). For the FMR1 design chips the correlation for replicated probe spots (probe spots with the exact same probe sequence) is 0.906. Similar measurements for correlation between probe spots have been previously reported[41, 42]. Thus, if we view ~90% as the maximum possible correlation between expected and observed intensity, because observed intensity varies this much between replicated spots within a single chip, our model can be seen to be doing a very good, but not quite a perfect job of predicting probe intensities.
Our model has helped us understand some puzzling observations regarding microarrays: the difference in intensity between complimentary forward and reverse probes; the larger decrease in intensity for mismatches towards the center of a probe; and batch effects.
First, assuming simple liquid phase kinetics, probe spots that target the forward DNA strand should have equal amount of bound target DNA as the complementary spot that targets the reverse DNA strand. However, the intensity of the forward and reverse strands usually have systematically different intensities, with probes targeting the forward strand of a given genomic region being brighter/darker than the set of probes that target the reverse strand of the same genomic region. Under our model this observation is simply the result of probe synthesis failure or sites becoming abasic after synthesis. This claim is supported by the fact that the A,C,G, and T synthesis parameters are statistically different from each other for each of the different chip designs (Additional file1: Table S2). The estimates for A,C,G, and T incorporation rates (Table3) are similar to previously published estimates of synthesis failure for Affymetrix arrays. Furthermore, this explanation for the difference between forward and reverse strand probes is far more parsimonious than relating this difference to G-stacks, given that this difference is observed in A/T rich probes that have no stretches of the same amino acid. There are two factors that we did not model, but could partially contribute to the observed difference between forward and reverse probes. For one there are base analogues that are often used to manufacture probes, this difference can easily be incorporated into our model by revising the NN numbers used for the Keq calculations. Second there could be an entropy penalty to base pairs that are closer to the array surface.
In our model, probe synthesis failure along with wash and fragmentation differences explain batch effects for microarrays. Batch effects can be subdivided into two types, those that happen during manufacturing of the chip and those that happen during the processing of the chip. In our model, the former is explained by differences in the efficiency of probe synthesis, while the latter is explained by differences in the fragmentation, washing steps, and as the rate of abasic site formation.
Another previously puzzling observation is the correlation between the distance of a mismatch from the edge of the probe and its effect on probe intensity. Mismatches towards the center have a larger effect than mismatches towards the edges. Under our model this observation is expected (Figure6), and is the result of fragmentation of the target molecule. Intuitively, there are more fragments that can bind the center of a probe, than fragments that can only bind a single edge (Figure1). Hence, if a target molecule contains a mismatch, its effect will be proportional to its distance from the middle of the probe. In our model, simple hybridization kinetics can explain these puzzling observations without the need to assign different weights along the probe sequence nor a penalty to probes with a mismatch.
Using our model we get an average correlation of ~70% between observed and expected probe intensity. This includes data from all probe spots regardless of their quality. Even so, our model comes close (0.881) on some chips, but never achieves our theoretical maximum correlation (0.906). From the manufacturing process up until the reading of probe spots intensities, the microarray experiment has several complex steps. Our model makes several “simplistic” assumptions that allowed us to develop efficient algorithms. In doing so we made several compromises. One of these assumptions is that probes with two or more errors do not bind target DNA. This assumption should have a relatively small effect on the correlation for 25 bp long probes; however, it is expected to have a larger effect on the correlation for longer probes. In our model we use the Langmuir isotherm to calculate the fraction of bound probes and do not take into account probe surface density[16, 17], non-equilibrium, and low target/probe ratio. Theoretically, commercial arrays of 25 bp long probes should have reached equilibrium at the end of the hybridization step; however, equilibrium might not be achieved by the end of the washing period; therefore, our approach to modeling the wash step of the protocol is rather simplistic. Furthermore, in these arrays the target/probe ratio should be very large. When arrays deviate from this ideal scenario our model loses predictive power. It is important to note that even though the probe surface density is not directly modeled by our approach, the parameters we use to describe the scanners dynamic range can indirectly be used to adjust for microarrays with varying probe surface densities. This “adjustment” however has some limitations. The most obvious one being if increasing probe density affects mismatched sequences and matched sequences disproportionately.
Overall our data suggests that the Langmuir isotherm appropriately and efficiently models binding between probe and target DNA on a microarray; however, other more computationally intensive measurements for binding on arrays have been proposed[39, 44]. Furthermore, when we calculate target-probe binding we do not account for the known in solution effects of dangling ends and the stabilizing effect of mismatches in the last three base pairs of a sequence[21–25]. We also do not model secondary structures that can form on arrays with long probes or arrays that hybridize to targets with extensive secondary structures, for example rRNA arrays.
Other details of the microarray experiment that are left out of our model are bleed-through between features and regional artifacts such as air bubbles, scratches, and miscellaneous particles. There are two sources of bleed-through between features: one, the probes at the edge of a feature may have a hybrid sequence due to incorporation of nucleotides during the synthesis of the neighboring probe; two, the scanner may be detecting light from neighboring features and falsely determining its origin. If this were going on, its effect would be most noticeable in probes that would otherwise be very dark, and appears to be present (Figure4) in our data, where a substantial fraction of probes that are predicted to have very low intensities appear to have much higher than expected intensity.
Our approach to modeling probe synthesis failure also has some limitations. First, there is the possibility that the concentration parameters are confounded with the synthesis efficiency parameters. This can be a problem when dealing with G/C or A/T rich PCR products. Second, the synthesis efficiency of a nucleotide can potentially be dependent on its position on the probe.
For our calculations we assume that the target DNA has the reference sequence. This assumption is never completely valid because each individual almost surely has a unique sequence that may differ slightly or even significantly from the genomic reference sequence. The impact of this assumption on the correlations for the analyzed data depends on the type of genetic variation of the samples. When the target DNA only has SNPs and/or other one base pair changes, then the genetic variation is unlikely to have a large impact on the average correlation over the entire chip; however, if the target DNA has large CNVS and/or several CNVs then these genetic variants would be expected to have a significant effect on the average correlation for the chip.
Our model is most applicable to 25 bp long arrays that are designed to detect genetic variation. With this in mind, the obvious next step is to apply our model to SNP arrays with the goal being to better determined which genetic variants are present, by incorporating our model into a variant calling algorithm.
The computer programs described in this paper are available athttps://genome.emory.edu/faculty/dcutler/.
We thank the Warren and the Zwick labs for providing the microarray data. We would also like to thank the ELLIPSE Emory High Performance Computer cluster, which we used for this project.
This work was supported by the National Institutes of Health [R01 HG003461].
- Stoughton RB: Applications of DNA microarrays in biology. Annu Rev Biochem. 2005, 74: 53-82. 10.1146/annurev.biochem.74.082803.133212.View ArticlePubMed
- Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO: Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999, 23 (1): 41-46. 10.1038/12640.View ArticlePubMed
- Emanuel BS, Saitta SC: From microscopes to microarrays: dissecting recurrent chromosomal rearrangements. Nat Rev Genet. 2007, 8 (11): 869-883. 10.1038/nrg2136.PubMed CentralView ArticlePubMed
- Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, Robson S, Vukcevic D, Barnes C, Conrad DF, Giannoulatou E, et al: Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature. 2010, 464 (7289): 713-720. 10.1038/nature08979.View ArticlePubMed
- Itsara A, Wu H, Smith JD, Nickerson DA, Romieu I, London SJ, Eichler EE: De novo rates and selection of large copy number variation. Genome Res. 2010, 11: 1469-1481.View Article
- Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SP: Accessing genetic information with high-density DNA arrays. Science. 1996, 274 (5287): 610-614. 10.1126/science.274.5287.610.View ArticlePubMed
- Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, et al: Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science. 1998, 280 (5366): 1077-1082. 10.1126/science.280.5366.1077.View ArticlePubMed
- The International HapMap Consortium: A haplotype map of the human genome. Nature. 2005, 437 (7063): 1299-1320. 10.1038/nature04226.PubMed CentralView Article
- Ohira M, Oba S, Nakamura Y, Isogai E, Kaneko S, Nakagawa A, Hirata T, Kubo H, Goto T, Yamada S, et al: Expression profiling using a tumor-specific cDNA microarray predicts the prognosis of intermediate risk neuroblastomas. Cancer Cell. 2005, 7 (4): 337-350. 10.1016/j.ccr.2005.03.019.View ArticlePubMed
- Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM: Expression profiling using cDNA microarrays. Nat Genet. 1999, 21 (1 Suppl): 10-14.View ArticlePubMed
- Davila S, Wright VJ, Khor CC, Sim KS, Binder A, Breunis WB, Inwald D, Nadel S, Betts H, Carrol ED, et al: Genome-wide association study identifies variants in the CFH region associated with host susceptibility to meningococcal disease. Nat Genet. 2010, 42 (9): 772-776. 10.1038/ng.640.View ArticlePubMed
- Collins SC, Coffee B, Benke PJ, Berry-Kravis E, Gilbert F, Oostra B, Halley D, Zwick ME, Cutler DJ, Warren ST: Array-based FMR1 sequencing and deletion analysis in patients with a fragile X syndrome-like phenotype. PLoS One. 2010, 5 (3): e9476-10.1371/journal.pone.0009476.PubMed CentralView ArticlePubMed
- Southern EM, Maskos U, Elder JK: Analyzing and comparing nucleic acid sequences by hybridization to arrays of oligonucleotides: evaluation using experimental models. Genomics. 1992, 13 (4): 1008-1017. 10.1016/0888-7543(92)90014-J.View ArticlePubMed
- Zhang L, Miles MF, Aldape KD: A model of molecular interactions on short oligonucleotide microarrays. Nat Biotechnol. 2003, 21 (7): 818-821. 10.1038/nbt836.View ArticlePubMed
- Zhang Y, Hammer DA, Graves DJ: Competitive hybridization kinetics reveals unexpected behavior patterns. Biophys J. 2005, 89 (5): 2950-2959. 10.1529/biophysj.104.058552.PubMed CentralView ArticlePubMed
- Peterson AW, Heaton RJ, Georgiadis RM: The effect of surface probe density on DNA hybridization. Nucleic Acids Res. 2001, 29 (24): 5163-5168. 10.1093/nar/29.24.5163.PubMed CentralView ArticlePubMed
- Watterson JH, Piunno PAE, Wust CC, Krull UJ: Effects of oligonucleotide immobilization density on selectivity of quantitative transduction of hybridization of immobilized DNA. Langmuir. 2000, 16 (11): 4984-4992. 10.1021/la991508m.View Article
- Wu C, Carta R, Zhang L: Sequence dependence of cross-hybridization on short oligo microarrays. Nucleic Acids Res. 2005, 33 (9): e84-10.1093/nar/gni082.PubMed CentralView ArticlePubMed
- SantaLucia J: A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA. 1998, 95 (4): 1460-1465. 10.1073/pnas.95.4.1460.PubMed CentralView ArticlePubMed
- Cutler DJ, Zwick ME, Carrasquillo MM, Yohn CT, Tobin KP, Kashuk C, Mathews DJ, Shah NA, Eichler EE, Warrington JA, et al: High-throughput variation detection and genotyping using microarrays. Genome Res. 2001, 11 (11): 1913-1925.PubMed CentralPubMed
- Peyret N, Seneviratne PA, Allawi HT, SantaLucia J: Nearest-neighbor thermodynamics and NMR of DNA sequences with internal a.A, C.C, G.G, and T.T mismatches. Biochemistry. 1999, 38 (12): 3468-3477. 10.1021/bi9825091.View ArticlePubMed
- Allawi HT, SantaLucia J: Nearest-neighbor thermodynamics of internal a.C mismatches in DNA: sequence dependence and pH effects. Biochemistry. 1998, 37 (26): 9435-9444. 10.1021/bi9803729.View ArticlePubMed
- Allawi HT, SantaLucia J: Thermodynamics of internal C.T mismatches in DNA. Nucleic Acids Res. 1998, 26 (11): 2694-2701. 10.1093/nar/26.11.2694.PubMed CentralView ArticlePubMed
- Allawi HT, SantaLucia J: Nearest neighbor thermodynamic parameters for internal G.A mismatches in DNA. Biochemistry. 1998, 37 (8): 2170-2179. 10.1021/bi9724873.View ArticlePubMed
- Allawi HT, SantaLucia J: Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry. 1997, 36 (34): 10581-10594. 10.1021/bi962590c.View ArticlePubMed
- Duan F, Pauley MA, Spindel ER, Zhang L, Norgren RB: Large scale analysis of positional effects of single-base mismatches on microarray gene expression data. BioData Min. 2010, 3 (1): 2-10.1186/1756-0381-3-2.PubMed CentralView ArticlePubMed
- Hong H, Su Z, Ge W, Shi L, Perkins R, Fang H, Xu J, Chen JJ, Han T, Kaput J, et al: Assessing batch effects of genotype calling algorithm BRLMM for the affymetrix GeneChip human mapping 500 K array set using 270 HapMap samples. BMC Bioinforma. 2008, 9 (Suppl 9): S17-10.1186/1471-2105-9-S9-S17.View Article
- Scharpf RB, Ruczinski I, Carvalho B, Doan B, Chakravarti A, Irizarry RA: A multilevel model to address batch effects in copy number estimation using SNP arrays. Biostatistics. 2010, 1: 33-50.
- Binder H, Preibisch S: "Hook"-calibration of GeneChip-microarrays: theory and algorithm. Algorithms Mol Biol. 2008, 3: 12-10.1186/1748-7188-3-12.PubMed CentralView ArticlePubMed
- Binder H, Krohn K, Preibisch S: "Hook"-calibration of GeneChip-microarrays: chip characteristics and expression measures. Algorithms Mol Biol. 2008, 3: 11-10.1186/1748-7188-3-11.PubMed CentralView ArticlePubMed
- McGall GH, Barone AD, Diggelmann M, Fodor SPA, Gentalen E, Ngo N: The efficiency of light-directed synthesis of DNA arrays on glass substrates. J Am Chem Soc. 1997, 119 (22): 5081-5090. 10.1021/ja964427a.View Article
- Pirrung MC, Fallon L: Proofing of photolithographic DNA synthesis methods. Fabrication of DNA microchips. Abstr Pap Am Chem Soc. 1997, 213: 362-ORGN
- Forman Jonathan E, Walton Ian D, Stern D, Rava Richard P, Trulson Mark O: Thermodynamics of Duplex Formation and Mismatch Discrimination on Photolithographically Synthesized Oligonucleotide Arrays. Molecular Modeling of Nucleic Acids. 1997, Washington, DC: American ChemicalSociety, 206-228.View Article
- Skvortsov D, Abdueva D, Curtis C, Schaub B, Tavare S: Explaining differences in saturation levels for affymetrix GeneChip arrays. Nucleic Acids Res. 2007, 35 (12): 4154-4163. 10.1093/nar/gkm348.PubMed CentralView ArticlePubMed
- Held GA, Grinstein G, Tu Y: Relationship between gene expression and observed intensities in DNA microarrays–a modeling study. Nucleic Acids Res. 2006, 34 (9): e70-10.1093/nar/gkl122.PubMed CentralView ArticlePubMed
- Shi L, Tong W, Su Z, Han T, Han J, Puri RK, Fang H, Frueh FW, Goodsaid FM, Guo L, et al: Microarray scanner calibration curves: characteristics and implications. BMC Bioinforma. 2005, 6 (Suppl 2): S11-10.1186/1471-2105-6-S2-S11.View Article
- Devoe H, Tinoco I: The stability of helical polynucleotides: base contributions. J Mol Biol. 1962, 4: 500-517. 10.1016/S0022-2836(62)80105-3.View ArticlePubMed
- Crothers DM, Zimm BH: Theory of the melting transition of synthetic polynucleotides: evaluation of the stacking free energy. J Mol Biol. 1964, 9 (1): 1-9. 10.1016/S0022-2836(64)80086-3.View ArticlePubMed
- Halperin A, Buhot A, Zhulina EB: On the hybridization isotherms of DNA microarrays: the langmuir model and its extensions. J Phys Condens Matter. 2006, 18 (18): S463-S490. 10.1088/0953-8984/18/18/S01.View Article
- Press W, Teukolsky S, Vetterling W, Flannery B: Numerical Recipes 3rd Edition: The Art of Scientific Computing. 2007, Cambridge University Press, 509-515.
- Wang X, Ghosh S, Guo SW: Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res. 2001, 29 (15): E75-75. 10.1093/nar/29.15.e75.PubMed CentralView ArticlePubMed
- Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S, Sharov V, Saeed AI, White J, Li J, et al: Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol. 2002, 3 (11): research0062.1-research0062.12. 10.1186/gb-2002-3-11-research0062.
- Wu C, Zhao H, Baggerly K, Carta R, Zhang L: Short oligonucleotide probes containing G-stacks display abnormal binding affinity on affymetrix microarrays. Bioinformatics. 2007, 23 (19): 2566-2572. 10.1093/bioinformatics/btm271.View ArticlePubMed
- Vainrub A, Pettitt BM: Theoretical aspects of genomic variation screening using DNA microarrays. Biopolymers. 2004, 73 (5): 614-620. 10.1002/bip.20008.View ArticlePubMed
- Bommarito S, Peyret N, SantaLucia J: Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res. 2000, 28 (9): 1929-1934. 10.1093/nar/28.9.1929.PubMed CentralView ArticlePubMed
- Mueckstein U, Leparc GG, Posekany A, Hofacker I, Kreil DP: Hybridization thermodynamics of NimbleGen microarrays. BMC Bioinforma. 2010, 11: 35-10.1186/1471-2105-11-35.View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.