### DNA samples

DNA samples were collected after obtaining a written, informed consent from each of the subjects. The study was conducted in accordance with the provisions of the Helsinki Declaration and according to a protocol approved by the Ethics Committee of the University of Florence.

### SNP selection

We selected 25 SNPs in 5 genes (Table 3) from the NCBI dbSNP [25], according to this order of criteria: 1) SNPs determining a not synonymous amino-acidic change; 2) SNPs located in promoter regions, putative splice sites or untranslated regions; 3) validated SNPs.

### Multiplex PCR and SBE reaction

For each sample, the sequences containing the SNPs were amplified by QIAGEN^{®} Multiplex PCR Kit (QIAGEN) from genomic DNA (40 ng in each reaction) in two 25-μl multiplex PCR mixes. Amplification conditions were as follows: hot-start at 95°C for 15 min; 30 cycles of 30 s at 94°C, 1 min and 30 s at 63°C, 1 min at 72°C; final elongation at 72°C for 10 min. Products of the two PCRs were pooled and incubated for 1 h at 37°C with 16 U of exonuclease I (USB-Corporation) and 1.6 U of shrimp alkaline phosphatase (USB-Corporation), then inactivated for 5 min at 65°C. Total yield of each amplified product was assessed after purification by MinElute PCR Purification Kit (QIAGEN). 1.5 μl of purified products were used as templates in a 30-μl SBE reaction, performed with all the SBE primers, 3.5 U of Thermo Sequenase™ DNA Polymerase (GE-Healthcare), 1X TS reaction buffer, the proper combination of cold (GE-Healthcare) and/or labelled ddNTPs (Perkin-Elmer) (0.125 μM each) (see Results). Cycling conditions were: hot-start at 92°C for 3 min and 30 s; 40 cycles of 45 s at 92°C and 30 s at 63°C.

PCR (Table 1) and SBE primers (Table 2) were designed by Genamics Expression 1.1 [31] and purchased from MWG-Biotech AG. We selected PCR primers aimed at minimizing formations of cross- and self-dimers; their specificity was tested by BLAST [32]. Primer amounts were scaled up and down to find out the concentrations which allowed for the amplification of all the expected fragments with equal efficiency, taking into account the number of SNPs in any sequence. Concerning the SBE primers, we used the one, between forward and reverse, which minimized hairpin and self-dimer formations. For those SNPs a few bases away from each other, we chose primers on opposite strands in order to avoid reciprocal interferences in the extension step. We tested matches between tags and SBE primers and we opted for the combinations avoiding self- and cross- secondary structures, as well as the annealing to unspecific templates.

### Microarray hybridization and data acquisition

Microarray slides, purchased from Leiden Genome Technology Center (LGTC, University of Leiden), were printed in an "array-of-arrays" configuration, including 48 identical arrays (Fig. 4) which allow the contemporary genotyping of 16 samples. Twenty-nine anti-tag probes (20-mer long), selected from the universal GenFlex^{®} Tag Array set (Affymetrix P/N 610026), were spotted in duplicate on each array and were used to detect 25 SNPs, a couple of identical positive hybridization controls and three negative hybridization controls (Fig. 4).

Before the hybridization, the slides were treated with 120-μl of pre-hybridization solution (warmed at 65°C), containing 48 μg herring sperm DNA (Gibco-BRL), 48 μg yeast tRNA (Gibco-BRL), 48 μg polyA-RNA (Sigma), 5X Denhardt's solution, 0.4% sodium dodecyl sulphate (SDS), 3.5X SSC (20X SSC is 3.0 M Sodium Chloride, 0.3 M Sodium Citrate, pH 7.0). The slides were covered with a cover-slip (sealing on a hot-plate at 80°C for 2 min), then kept at 65°C for 30 min in a dark and wet chamber. Cover-slip was removed in 2X SSC (2 min) and slides were next washed in 2X SSC, 70% ethanol twice, 90% ethanol, 100% ethanol (5 min each step) and dried by spinning (3 min at 250 × *g*).

For the hybridization, the slides were located in a custom-made aluminium rack and a silicon rubber grid was placed over each slide to create 48 reaction chambers in correspondence of the arrays. A 25-μl mix, containing 10 μl of SBE reaction, 6X SSC and 0.27 nM of two positive hybridization control probes (labelled in 5' with Cy3 or Cy5) (MWG-BiotechAG), was injected into each chamber. The hybridization was carried out at 42°C for 5–16 h. After disassembling the rack in 4X SSC, the slides were washed (5 min each step) twice in 2X SSC, 0.1% SDS (warmed at 40°C) and twice in 0.2X SSC, briefly rinsed in deionized water and spin-dried in the dark.

Slide scanning was performed by GenePix 4000B dual-laser scanner (Axon-Instruments). Signal intensity values were extracted by GenePix Pro 4.0 (Axon-Instruments).

### Data analysis

Data processing and statistical analyses were carried out by software, written in four independent modules, which we developed accordingly with the proposed protocol. We split the samples into three groups (#1, #2 and reference group) in order to perform correlation analysis, as in a case-control association study.

#### First module: colour balancing and signal counting

As a first step, the software regards a spot (*i*) as analyzable, i.e. correctly hybridized, if at least 70% of its pixels (foreground) have an intensity (*I*) higher than one standard deviation above the background intensity for at least one channel (red, R; green, G). Then, the intensity value of each spot is corrected by the red/green intensity ratio, a colour scaling factor which accounts for the scanner sensitivity and the labelling efficiency. For a whole slide the red/green ratio is calculated as \sqrt{{\overline{I}}_{R}/{\overline{I}}_{G}}, where \overline{I}_{
R
}is (1/n\cdot {\displaystyle \sum _{{i}_{G}=0}^{n}{I}_{{R}_{{i}_{G}}}}), i.e. the average intensity of the red channel calculated on all the spots which result analyzable in the green channel (*i*_{
G
}), and \overline{I}_{
G
}is (1/n\cdot {\displaystyle \sum _{{i}_{R}=0}^{n}{I}_{{G}_{{i}_{R}}}}), i.e. the average intensity of the green channel calculated on all the spots which result analyzable in the red channel (*i*_{
R
}). We considered the intensity of the other channel respect to the one for which a spot was analyzable, since this intensity could be either a real signal (for heterozygous SNPs) or background noise; therefore, the signal behaviour in both these situations (specific and unspecific signals) is taken into account.

For each array, the software counts in each channel how many analyzable spots have an average signal intensity value of at least 100. A given number of spots (half number of the expected hybridized spots) have to be counted to consider an array as successfully hybridized.

#### Second module: signal normalization

The second module of the software estimates the scaling factors for the signal intensities of the arrays *P*_{
1
}and *P*_{
2
}; normalizing the arrays by these factors yields values comparable between them. The factors are calculated by bilinear weighted-least-squares regression model (see additional file 1: TAMGeS theoretical basis, Applying bilinear regression), which considers the signal intensities of the *U*-array as the sum of the signal intensities of the arrays *P*_{
1
}and *P*_{
2
}. After correction, for each SNP the software verifies that the sum of the signal intensities from arrays *P*_{
1
}and *P*_{
2
}corresponds to the signal intensities from *U*-array; otherwise the SNP is considered lost.

#### Third module: genotype and allele assignment

For each SNP, the genotype can be assigned with a certain probability by comparing the intensity values (*I*_{
A
}, *I*_{
a
}) corresponding to the two possible alleles (*A*, *a*). We established the probability of each possible genotype following a scalar approach. The absolute value of the log-ratio between the logarithmic intensities, defined as Z = |ln (ln *I*_{
A
}/ln *I*_{
a
})|, gives an index for the strength of the signal difference between the two bases and allows for the discrimination between a true heterozygote (lower values of Z) and an artefact lower signal on the other base (higher values of Z). The distribution of Z was plotted on the whole data set (Fig. 5) and appeared as a bimodal density obtained as a mixture of two distributions: one, normal with positive mean representing the distribution of those SNPs with distant values of *I*_{
A
}and *I*_{
a
}(homozygous genotypes, *AA* and *aa*), and the other, the positive half of a normal with mean zero representing the distribution of those SNPs with similar values of *I*_{
A
}and *I*_{
a
}(heterozygous genotype, *Aa*).

In order to compute the *a posteriori* probabilities with which a given SNP belongs to each of the two distributions (*AA* + *aa* and *Aa*), a Bayesian test is applied explicitly decomposing Z as a mixture of them. The parameters of *AA* + *aa* and *Aa* (*a priori* probabilities, mean and standard deviation) are calculated through a standard EM (Expectation-Maximization) algorithm, which turns out to converge in ~120 iteractions; SNP *a posteriori* probability is then computed.

Dropping the absolute value from |ln (ln *I*_{
A
}/ln *I*_{
a
})|, a signed value of Z is obtained, which can be compared with the mixture of three normal distributions (*AA*, *Aa* and *aa*) by symmetrizing the original Z distribution. For each SNP, a probability for each of the three possible genotypes (*P(AA)*, *P(Aa)*, *P(aa)*, which sum up to 1) is therefore given. Two thresholds are set for univocally assigning the genotype (heterozygous if *P(Aa)*>0.5; homozygous if *P(AA)*>0.66 or *P(aa)*>0.66). There are two intervals of Z-values for which no genotype is assigned: they are symmetric with respect to zero and they correspond to the uncertain cases in which *P(Aa)*<0.5, *P(AA)*<0.66 and *P(aa)*<0.66. The software assigns at least one out of the two alleles if the sum of the probabilities of the genotypes containing that allele is at least 0.9 (allele *A*, if P(*AA*) + P(*Aa*)>0.9; allele *a*, if P(*aa*) + P(*Aa*)>0.9).

#### Fourth module: statistical analysis

To assess the correlation between the genotypes of every SNP and each group of subjects analysed (i.e., #1, #2, reference group), the software executes three different tests, named *standard*, *new* and *average*.

In the *standard* method, classical Fisher-Irwin tests are done on the allelic counts (group #1 vs. reference; group #2 vs. reference). As a drawback the information on how sure a genotype is gets lost: after applying the threshold, a 70% *AA* is the same as a 99% *AA*.

In the *new* and *average* methods, the probability of each genotype is taken into account. With the *average* method, a classical Fisher-Irwin test is applied on a *modified* allelic count, yielded by summing the *expected* number of alleles for each SNP in each group. Allele *A* average count is computed as ∑_{
i
}[2*P*_{
i
}(*AA*) + *P*_{
i
}(*Aa*)], where *i* is the sample and the sum is over all the samples in the group. A sample *i* with probabilities of *AA*, *Aa* and *aa* respectively 0.70, 0.25 and 0.05, counts for 2·0.70 + 0.25 = 1.65 *A* alleles and 0.35 *a* alleles. In the *new* method, a string with the genotypes of all the samples is randomly generated, according to given probabilities and assuming independence between different samples. This is done 20,000 times: each time the software computes the p-value of the Fisher-Irwin test computed on the simulated genotypes; the median of these p-values is returned.

The three p-values (*standard*, *new* and *average*), corrected for multiple comparisons, by applying either the exact correction for independent tests 1 - (1 - *p*)^{n}, or the Bonferroni correction *np* (here *n* is the number of the tests), turned out to be concordant.