Multilocus analysis of SNP and metabolic data within a given pathway

Kristensen, Vessela N; Tsalenko, Anya; Geisler, Jurgen; Faldaas, Anne; Grenaker, Grethe Irene; Lingjærde, Ole Christian; Fjeldstad, Ståle; Yakhini, Zohar; Lønning, Per Eystein; Børresen-Dale, Anne-Lise

doi:10.1186/1471-2164-7-5

Methodology article
Open access
Published: 13 January 2006

Multilocus analysis of SNP and metabolic data within a given pathway

Vessela N Kristensen¹,
Anya Tsalenko²,
Jurgen Geisler³,
Anne Faldaas¹,
Grethe Irene Grenaker¹,
Ole Christian Lingjærde⁴,
Ståle Fjeldstad⁵,
Zohar Yakhini²,
Per Eystein Lønning³ &
…
Anne-Lise Børresen-Dale^1,6

BMC Genomics volume 7, Article number: 5 (2006) Cite this article

6469 Accesses
6 Citations
Metrics details

Abstract

Background

Complex traits, which are under the influence of multiple and possibly interacting genes, have become a subject of new statistical methodological research. One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common multifactorial diseases and their association to different quantitative phenotypic traits.

Results

Two types of data from the same metabolic pathway were used in the analysis: categorical measurements of 18 SNPs; and quantitative measurements of plasma levels of several steroids and their precursors. Using the combinatorial partitioning method we tested various thresholds for each metabolic trait and each individual SNP locus. One SNP in CYP19, 3UTR, two SNPs in CYP1B1 (R48G and A119S) and one in CYP1A1 (T461N) were significantly differently distributed between the high and low level metabolic groups. The leave one out cross validation method showed that 6 SNPs in concert make 65% correct prediction of phenotype. Further we used pattern recognition, computing the p-value by Monte Carlo simulation to identify sets of SNPs and physiological characteristics such as age and weight that contribute to a given metabolic level. Since the SNPs detected by both methods reside either in the same gene (CYP1B1) or in 3 different genes in immediate vicinity on chromosome 15 (CYP19, CYP11 and CYP1A1) we investigated the possibility that they form intragenic and intergenic haplotypes, which may jointly account for a higher activity in the pathway. We identified such haplotypes associated with metabolic levels.

Conclusion

The methods reported here may enable to study multiple low-penetrance genetic factors that together determine various quantitative phenotypic traits. Our preliminary data suggest that several genes coding for proteins involved in a common pathway, that happen to be located on common chromosomal areas and may form intragenic haplotypes, together account for a higher activity of the whole pathway.

Background

The challenge of identification and characterization of susceptibility genes for complex multifactorial diseases is partly due to the limitations of parametric statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures [1, 2]. These limitations are reduced by non-parametric methods such as the combinatorial partitioning method (CPM) [3], which has been used to study the effect of many marker loci on quantitative phenotypes. The focus of the method is to form subsets of loci or genotypic partitions within which the trait variability is much lower than between the partitions [3]. The loci in such a set of genotypic partitions are then selected as candidates to influence the given trait and are then cross-validated.

A modification of this method is the multifactor dimensionality reduction (MDR) method, which has been used to study the impact of multiple loci on categorical endpoints such as presence or absence of disease or response to treatment. This is accomplished by reducing the dimensionality of the multilocus data where genotypes from multiple loci are pooled into high-risk and low-risk groups, depending on whether they are more common in affected or in unaffected individuals [4, 5]. This approach is so far limited to categorical parameters and cannot be applied to quantitative traits. The only possible approach to association mapping would then be to search for patterns of genotypes at different loci. Pattern recognition by machine learning techniques may then be applied to define pattern frequencies or relationships in a data set [6].

In the present study we have used a variation of the combinatorial partitioning method and compared that to a pattern recognition method by the machine learning approach to identify subsets of SNPs that may predict the levels of metabolites in the estradiol metabolic pathway in healthy post-menopausal women. We have chosen this pathway since a positive correlation between estradiol exposure and risk of breast cancer among postmenopausal women has been rather well documented [7, 8], and a significant correlation between plasma estrogen levels and subsequent risk of breast cancer development has been repeatedly described [9–12]. Estrone is synthesized from cholesterol in a cascade of subsequent hydroxylations [13] (Figure 1). After ovary seizure at menopause, the peripheral aromatization of androgens, mainly androstenedione into estrone, becomes the main source of circulating estrogen contributing to tumor stimulation [14]. A complex system of enzymes is responsible for estradiol synthesis and its further metabolism: CYP17, CYP11a, CYP19, 17 β-hydroxysteroid hydrogenase, steroid sulfatase (STS), sulfotransferase (EST), CYP1A1, CYP1B1, Catechol-O-methyltransferase ([15–17] (Figure 1). Polymorphisms in these enzymes have previously been associated with both breast cancer risk and estradiol levels [2]. In the present report we have studied genetic polymorphism in all these enzymes and addressed the methodological challenge of the analysis of multiple loci 1) by free combinatorial approaches 2) in relation to intergenic haplotype structures within a common biochemical pathway.

Results

The levels of 9 metabolites of the estradiol pathway were studied (Figure 1). High correlation was observed between the levels of the different metabolites in the plasma of healthy individuals, metabolites upstream (DHEA, DHEA-S, androstenedione and estrone) as well as downstream (estrone, estradiol, estrone- sulphate) in the pathway (Table 2). Weight and body mass index significantly correlated with the levels of estrone and estradiol, while levels of DHEA and DHEA-S inversely correlated with age. Testosterone levels correlated with height (Table 3).

Table 1 Summary of the selected SNPs and the respective method of genotyping. Gene/SNP, rs number given when available, aminoacid change or UTR. Primer set: for the assyas developed for this study or otherwise as referred to original publication. Method of analysis: platform and assay selection.

Full size table

Table 2 Correlation between metabolic levels of estradiol and its precursors in the plasma of healthy post-menopausal women. DHEA, dihydroepiandrostenedione, DHEA-SO4, dihydroepiandrostenedione sulfate, androstenedione, testosterone, E1, estrone, E2, estradiol, E1S, estrone- sulphate.

Full size table

Table 3 Association of estrogen and its precursors with age, height, weight and body mass index.

Full size table

Chi square analysis

A total of 18 SNPs in 10 genes were genotyped in 109 individuals resulting in a total of 1962 genotypes. The genotype distribution of the studied polymorphisms was significantly different between the groups of individuals with metabolic activities below and above median when Chi square test was applied. The levels of E₁ and E₂ were significantly associated with two polymorphisms in the 3'UTR of CYP19 as well as two non-synonymous substitutions in the CYP1B1 -R48G and A119S (Table 4). The El level was also associated to the T461N SNP in CYP1A1. Several other non-significant trends were observed.

Table 4 Statistical significance of distributions of genotypes in the listed estradiol metabolizing enzymes in below and above median levels of metabolites DHEA, dihydroepiandrostenedione, DHEA-SO4, dihydroepiandrostenedione sulfate, androstenedione, testosterone, E1, estrone, E2, estradiol, E1S, estrone- sulphate, SHBG, sex hormone binding globulin. NS – not significant.

Full size table

Combinatorial partitioning analysis (Mutual Information Score (MIS))

The metabolic groups were further re-defined by using other thresholds than the median, using either one optimal threshold, (partitioning A) or two optimal thresholds (Partitioning B). Several moderately significant SNPs using the optimal thresholds approach were found (all SNPs with p-value < 0.05, Table 5). Leave one out cross validation analysis was performed to find sets of genotypes that jointly predict the value of the trait (high or low levels). Estrone levels partitioned into samples with values < 68.2 pmol/l and >68.2 pmol/l revealing a maximal difference in genotype distribution. A graphical representation (infogram) of the genotypes for each locus and this partition is shown in Figure 2A, where each row corresponds to a SNP and each column – to a sample. Figure 2B shows the stack diagram. The leave one out cross validation method showed that while one SNP can make only ~50% correct predictions of the estrone levels at this partition, combining 6 SNPs, including CYP1A1m4, CYP1B1 A119S, CYP1A1m2, CYP19utr3' SNPS, GSTP1, COMT allows 65% correct prediction (Figure 2C). Two of the selected polymorphisms were known to be functional at the metabolic level from previous studies in vitro. In the case of random labels, the probability of finding a set of SNPs that can make better prediction was found to be 0.16 based on 100 simulations.

Table 5 Searching for optimal thresholds approach to identify genotypes associated with quantitative trait (all SNPs with p-value < 0.05). Partitioning A shows significant results for one optimal threshold (two partitions), and Partitioning B shows the results for two optimal thresholds (three partitions). The calculated p-values take into account the multiple search over all possible thresholds as described in [3].

Full size table

Locus CYP11A1 was a microsatellite repeat with 10 variant repeat length alleles. All variant alleles were categorized together: A1/A1 (wt/wt), A1/mut, mut/mut. The QT scores and p-values for this locus and each metabolite were calculated and significant differences were found for several of the metabolites. The variant allele was more frequent in women with DHEA-S level>92.6 μg/dl than in women with DHEA-S levels<92.6 μg/dl (p < 0.042). In the two threshold analysis the significance was even higher (p = 0.004) when comparing individuals with DHEA-S levels<69.7 μg/dl to those with DHEA-S levels>92.6 μg/dl (Figure 3A); stack diagram is shown in Figure 3B. For estrone the p-value was 0.008 when comparing groups of women with estrone >133 pmol/l to those with estrone< 133 pmol/l. Four women with estrone level > 133 pmol/l have genotypes A2/A2 and A4/A4. Similar to estrone, levels of estradiol were associated with the CYP11A variants when comparing individuals with estradiol > 33 pmol/l to those with estradiol < 33 pmol/l (p < 0.06).

Pattern recognition of SNPs in relation to hormone metabolizing enzymes

The optimal threshold of the metabolic levels was found by multiple testing close to the median. Interactions between set of SNPs and physical characteristics like age and weight was identified (Fig 4A,B). Carriers of the wt CYP1A1m1 and wt GSTT1 with age above 64 years and with a body weight above 75 kg were more often in the lower level group of the metabolite DHEA-SO4 (CorrMAX 0.54(49/56), P-value < 0.0001) (Figure 4A). An interaction between the levels of this metabolite and age and weight (r = 0.44, p < 0.002), also seen by the conventional Chi square analysis (Table 2), was detected by this method. Individuals with weight higher than 75 kg carrying the wt GSTM1 had significantly higher plasma levels of E₁S, (CorrMAX 0.43(43/62), p < 0.003) (Figure 4B).

Another pattern of SNPs was found correlated to the estrone level; the variant allele in the 5' flanking area of CYP11 in combination with the wt GSTT1 was present among 12 individuals with a plasma level of estrone above 68 pmol/l, while none of the individuals with E₁ plasma level below 68 pmol/l carried this combination (CorrMAX 0.36(53/52) P-value 0.05) (Figure 4C). Individuals homozygous for the variant alleles in the HSD17β (A3T), CYP1B1 A119S, and COMT1 had significantly higher levels of sex hormone binding globulin, (CorrMAX 0.38(59/49) P-value 0.05) (Figure 4D). A colored infogram illustrating the significant differences in SNP patterns above and below the different thresholds is given below each frequency diagram.

Haplotype analysis

Since some of the SNPs detected by the above methods reside in 3 different genes in vicinity on chromosome 15 we hypothesized that they could form common intragenic haplotypes, which in concert might account for a higher activity of the whole pathway. Our findings suggest that the SNPs in CYP19, CY11 and CYP1A1 are not inherited at random but form common haplotypes (Figure 5A). Individuals with variant number of repeats in the microsatellite repeat of CYP11 were also carriers of the variant alleles in both loci CYP1A1mi and m2 (D' 0.350 and 0.194, p < 0.001 (Bonferroni corrected) and p < 0.012, respectively) as well as in CYP19utr3' SNP2 (D' 0.293, p < 0.001 (Bonferroni corrected) (Figure 5B). A schematic presentation of the D' values is given in Figure 5C. High D' values and significant LD was observed in addition between the 3 SNPs in CYP19 and the 2 of the 4 SNPs in CYP1A1 Carriers of the haplotype CTTATATC and CGTA(T)C(T)ATC(A) had more often E₂ levels below median, while carriers of the TGTTT(C)ATC more often had E₂ levels above the median (p < 0.025) (Figure 5D). The SNPs in CYP1B1 were also in strong linkage disequilibrium forming steady haplotype blocks (Figure 6A,B). While the haplotype CGG, containing the C allele in CYP1B1 R48G and the G alleles in CYP1B1 A119S and CYP1B1 V432L was associated with high levels of E₁ and E₂, the haplotype GTC containing the alternative alleles in locus was associated with lower than median levels (p < 0.05).

Discussion

Finding effects of groups of SNPs on metabolite levels is complex since the effects of individual SNPs are small and the number of possible SNP combinations is large. We applied two different methods to help identify sets of SNPs correlated to metabolite levels: one using direct two-way classification based on combination of genotypes at selected loci, and another based on leave-one-out-cross-validation analysis. Direct classification method requires sufficiently big sample set for meaningful evaluation of genotype combination frequencies in groups with different metabolite levels. In studies like this with a small number of samples, the LOOCV method allows the evaluation of larger sets of SNPs, since the classifiers are constructed for each locus individually. In the first "pre-screening" phase of the genotype-phenotype analysis the metabolic levels were divided by median followed by sets of percentiles of the trait values. Finally, instead of pre-defining cut offs, we let the distribution of the genotypes lead us to those cut offs with a maximal difference in allele distribution. Interestingly, often these best thresholds converged to the median, i.e. for estrone in both the Mutual Information Score method as well as the pattern recognition. Whether or not these resulting cut offs have some physiological significance, remains to be investigated.

Long term exposure to estradiol increases the risk of breast cancer. The mechanisms responsible for this effect have not been firmly established. The prevailing theory proposes that estrogens increase the rate of cell proliferation by stimulating estrogen receptor-mediated transcription and thereby the number of errors occurring during DNA replication [19, 20]. An alternative hypothesis proposes that estradiol can be metabolized to quinone derivatives, which can react with DNA and then remove bases from DNA through a process called depurination. Error prone DNA repair then results in point mutations [21]. These two processes, increased cell proliferation and genotoxic metabolite formation, may act in an additive or synergistic fashion to induce cancer. It has been suggested that measuring total E2 concentration and SHBG concentration may be sufficient in large epidemiological studies [12]. Our study shows that even in a small size it is sufficient to monitor only few metabolites as we observed tight correlations between them. Several genetic polymorphisms that may influence estradiol metabolism have been associated with different hormone levels. A polymorphism in CYP19, a 3-bp deletion in intron 4 (TTTA)_{n = 7-3}, and a base substitution in exon 3 (G – >A) have been reported to be associated with levels of estradiol [22, 23]. Genetic polymorphism in the enzymes further hydroxylating estradiol and conjugating its metabolites has also been studied. Women carrying the COMT Met/Met genotype had 28% higher 2-hydroxyestrone (P = 0.08) and 31% higher 16α-hydroxyestrone concentrations (P= 0.02), compared to women with the Val/Val genotype [22].

The previous studies discussed above analyzed single loci. In a recent breast cancer case -control study, including categorical values only (genotypes), a four-locus susceptibility model including the polymorphisms of COMT, CYP1A1m1, CYP1B1 codon 48, and CYP1B1 codon 432 was found associated with breast cancer [2]. The four-locus model was significant at the p = 0.001 level by permutation testing bringing evidence of epistasis, or gene-gene interaction in the case-control setting. Each genotype at a particular locus had an influence on breast cancer disease risk dependent on the genotypes at each of the other three loci. With that in mind, we searched the hormone metabolic pathway for interactions between these loci both at random, combinatorial, without regards to the chromosomal localisation and with regards to the LD in genes that may form common haplotype structures. Two such domains were identified – the polymorphisms in CYP1B1 form stable haplotype structures in Norwegian population (Zimarina et al, submitted), and a block on the long arm of chromosome 15 consisting of CYP11A1 gene close to the CYP1A1 (less than 1 cM) gene and approximately 27.4 cM telomeric to the CYP19 gene. Indeed, we found 2 haplotypes in the CYP1B1, which were significantly overrepresented in individuals with El and E2 levels above median. Two of the three genotypes (R48G and V432L) were among the best predictors according to the combinatorial partitioning method as well. Furthermore, the same CYP1B1 haplotypes (CGG and GTC) were associated with breast cancer risk (Zimarina et al, submitted). The CGG haplotype includes both the V432 form of CYP1B1 and the R48 with higher 4-OH/2-OH E2 metabolic ratio and affinity (Km) towards 17b-estradiol respectively [24, 25]. More unexpectedly, we found high D' values among SNPs residing in different genes but coding for proteins in the same metabolic pathway. The haplotype comprising of the T allele of CYP19utr3' SNP1, the variant number of repeats of CYP11 and the variant alleles of CYP1A1m1 and m2 we associated with high E2 levels (above median). Upregulation of CYP1A1 by dioxin derivatives through the Ah receptor leads to down-regulation of CYP19 and ER. The close location of these genes gives an attractive opportunity to study whether they are regulated by a common regulatory unit. The haplotype structures may vary from population to population – hence explain the variability of the published data on various susceptibility alleles in a number of genes.

The fact that we manage to predict correctly 65% of the individuals according to their metabolic levels based on this limited selection of SNPs in healthy individuals, make us believe that we have identified markers of estradiol levels in the present study. Furthermore, we found similar combinations of SNPs as those involved in the susceptibility combinations from the case control study [2]. We are presently conducting a larger study of both cases and controls with a higher SNP density to improve our 65% prediction value. In the last stage of the preparation of this manuscript another large study of 1975 individuals was published [26] confirming our previous initial report of an association of the polymorphism in CYP193UTR and aromatase activity [27] and in concordance with our present observation of association with plasma levels. Here we demonstrate that the association can be discovered in much smaller number of individuals (109) using the multilocus data analysis.

Conclusion

These studies provide further evidence that genetic variation may appreciably alter plasma level of sex hormone and thus have an effect on disease risk. We describe an approach for multilocus approach to study multiple low-penetrance genetic factors that together determine quantitative phenotypic traits.

Methods

Blood samples were collected from 109 healthy female volunteers between 55 and 75 years of age on a regular mammographic screening. The women enrolled had 2 consecutive negative mammograms in a period of 2 years and were not on hormone replacement therapy (HRT). The plasma levels of the metabolites E₁, E₂, E₁S, DHEA, androstenedione and testosterone were analysed as described previously in [28]. DNA was isolated from EDTA blood using chloroform/phenol extraction followed by ethanol precipitation according to standard procedures using the Applied Biosystems 340A Nucleic Acid Extractor.

Genotyping

Primer sets and methods for analysis are summarized in Table 1 and as described in [29, 30].

Statistical Analysis

Parametric method

Metabolic levels for each metabolite were divided into below and above median and allele and genotype frequency distributions were compared using the Chi square test.

Non-parametric methods

Combinatorial partitioning method

To relax the choice of a metabolite level percentile at which to partition the set of samples, we seek an optimal threshold for each metabolite and each SNP. This optimality is defined by an information theoretic measure of concordance between the partition of the sample set according to genotypes and the partition of the sample set defined by the two sides of a metabolite level threshold. The methods we applied in this dataset are an adaptation of the combinatorial approach of [3] and are briefly described below.

Let (x_i, q_i), i = l,2,...,n denote the measurements for a particular pair consisting of a SNP and a quantitative variable (metabolite level) across all patients. For any such pair we seek a threshold t, such that the genotype frequencies in the samples with q<t will be the most different from the genotype frequencies in the samples with q≥t. This difference is measured by the mutual information score. The search process itself is exhaustive, considering all possible thresholds. More precisely, for a SNP locus L and a quantitative trait q, let G be a partition of the sample-set induced by the genotypes at locus L. For a threshold t, let C_tbe a binary partition of the sample-set defined by q<t and q≥t. The Mutual Information Score (MIS) is defined as the difference between the entropy of the partition C_tand the entropy of C_tconditioned on G:

MIS(C_t, G) = H(C_t) - H(C_t/G),

where H is the entropy or conditional entropy function, as appropriate. The best threshold $\hat{t}$ for a pair (L, q) is such that

M I S (C_{\hat{t}}, G)

(1)

= max_{min(q) ≤t≤max(q)}MIS(C_t, G).

The corresponding p-values were also calculated, effectively counting all possible ways of partitioning the samples that would give a score better than $M I S (C_{\hat{t}}, G)$ [3].

In a parallel approach to assessing SNP-metabolite association we tried all possible pairs of thresholds from the set of 5th, 10th, 15th, 20th, 25th etc percentiles of the trait values. For a pair of thresholds a and b, we considered the partition C_a,bof the samples into samples with trait q ≤ a, samples with a<q<b and samples with q≥b. For each such partition, we computed the corresponding mutual information scores MIS(C_a,b, G) and picked the pair of thresholds ( $\hat{a}, \hat{b}$ ) that gave maximum score. The calculated p-values take into account the multiple search over all possible thresholds as described in [3].

Further, for each SNP-trait pair, and a partition of the sample-set C defined by a single best threshold or by a pair of best thresholds, we ran leave one out cross validation analysis in order to find a set of SNPs such that genotypes at these SNPs can jointly predict the classification of samples with respect to C, i.e. whether the value of the quantitative trait level is above or below threshold [3]. Leave one out cross validation analysis, working with a given subset of SNPs, S, consists of the following steps:

1. Hide a sample

2. For each SNP in the subset S, construct a classifier based on the likelihood of each class of the partition C given the genotypes of remaining (non hidden) samples at this locus

3. Classify the hidden sample using the sum of single-SNP classifiers

4. Repeat steps 1–3 for each sample, and thus determine the number of correct predictions for the subset S

Eventually, we also seek a subset S with the best performance. Several search techniques were used for finding best subset of SNPs in order to avoid evaluating each possible subset for each locus/trait pair. These techniques included ordering SNPs by mutual information score and evaluating sets of top scoring SNPs, as well as performing forward and backward sequential searches [3]. Further, we estimated the probability of finding such predictive subsets of SNPs for random labels by simulations.

Pattern recognition by the two way classification method

The problem of finding an association between groups of individuals with metabolic levels above or below a certain threshold, referred to as the positive and negative groups, and combinations of genotypes at specific polymorphic loci may be formulated as a two-way classification problem. When evaluating a specific combination of genotypes against a particular patient record, the outcome can be a true-positive (i.e., the individual has all the genotypes in the given combination, and the patient is in the positive group), a true-negative (i.e., the patient does not have the specific combination of genotypes, and the patient is in the negative group), a false-positive (i.e., the individual has all the genotypes the given combination, and the patient is in the negative group) and a false-negative (i.e. the patient does not have the specific combination of genotypes and the patient is in the positive group). For a two-way classification problem, the correlation of a specific combination of genotypes for each combination of SNPs may be computed as

$C = \frac{N_{t p} N_{t n} - N_{f n} + N_{f p}}{\sqrt{(N_{t n} + N_{f n}) (N_{t n} + N_{f p}) (N_{t p} + N_{f n}) (N_{t p} + N_{f p})}}$

Where Ntp, Ntn, Nfn, and Nfp, denotes the number of patient records which are respectively true-positive, true-negative, false-negative, and false-positive. The significance of a specific combination of polymorphisms with correlation CorrMAX is defined as the probability of finding a better, or just as good, combination of polymorphisms by random. This probability is referred to as the p-value, and is computed by Monte Carlo simulation. The steps for the Monte Carlo procedure are as described in [31]. Briefly, for each individual record, permute the "above" or "below" labels randomly from the same distribution as in the original data. Calculate CorrMAX for the permuted data. If CorrMAX for the permuted data is larger than, or equal to CorrMAX from the original data, count 1; otherwise count 0. Repeat steps 1, 2 and 3 k times. Estimate the p-value, the total count for CorrMAX divided by the total number of shuffles k. When computing CorrMAX for the permuted data, we include all polymorphisms, not only the polymorphisms in the combination to test the significance of. The standard value for k(number of repeated steps) is 2000.

Haplotype analysis

Haplotypes were estimated using PHASE2.0 software. Linkage disequilibrium and D and D' values were calculated using DNAsp software. Significance of the linkage disequilibrium (LD) was estimated using Fisher exact test with Bonferoni correction for the final p-value.

Aurhor's contribution

VNK, final design of the study, supervised AF, carried out the haplotype analysis and drafted the manuscript

AT, carried out the Mutual Information Score and LOOCV analyses

JG, carried out the metabolic profiling

AF, performed the genotyping as a part of her master's project

GIGA, provided technical assistance in the lab and isolated the DNA

OCL, consulted the statistical analysis and comparison of the methods

SF, performed the pattern recognition analysis

ZY, supervised the statistical analysis and edited the manuscript

PEL collected the blood samples and participated in designing the study

ALBD designed the study and DNA preparation

All authors have read and edited the manuscript and approve the final manuscript.

References

Hoh J, Ott J: Mathematical Multilocus Approaches to Localize Complex Human Trait Genes. Nature Reviews. 2003, 4: 701-709. 10.1038/nrg1155.
Article PubMed Google Scholar
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001, 69: 138-47. 10.1086/321276.
Article PubMed PubMed Central Google Scholar
Tsalenko A, Ben-Dor A, Cox N, Jakhini Z: Methods for Analysis and Visualization of SNP genotype data for Complex Diseases. Pac Symp Biocomput. 2003, 6: 548-561.
Google Scholar
Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics. 2003, 19: 376-82. 10.1093/bioinformatics/btf869.
Article PubMed Google Scholar
Byng MC, Whittaker JC, Cuthbert AP, Mathew CG, Lewis CM: SNP subset selection for genetic association studies. Ann Hum Genet. 2003, 67: 543-556. 10.1046/j.1529-8817.2003.00055.x.
Article PubMed Google Scholar
Koza JR, Andre D: A case study where biology inspired a solution to a computer science problem. Pac Symp Biocomput. 1996, 34: 500-511.
Google Scholar
Nandi S, Guzman RC, Yang J: Hormones and mammary carcinogenesis in mice, rats, and humans: a unifying hypothesis. Proc Natl Acad Sci USA. 2001, 92: 3650-3657.
Article Google Scholar
Clemons M, Goss P: Estrogen and the risk of breast cancer. N Engl J Med. 2001, 344: 276-285. 10.1056/NEJM200101253440407.
Article PubMed Google Scholar
Endogenous Hormones and Breast Cancer Collaborative Group: Endogenous sex hormones and breast cancer in postmenopausal women: Reanalysis of nine prospective studies. Journal of the National Cancer Institute. 2002, 94: 606-616.
Article Google Scholar
Endogenous Hormones and Breast Cancer Collaborative Group: Free Estradiol and Breast Cancer Risk in Postmenopausal Women: Comparison of Measured and Calculated Values. Cancer Epidemiol Biomarkers Prev. 2003, 12: 1457-1461.
Google Scholar
Trichopoulos D, MacMahan B, Cole P: Menopause and breast cancer risk. Journal of the National Cancer Institute. 1972, 48: 605-613.
PubMed Google Scholar
Liehr JG: Genotoxicity of the steroidal estrogens estrone and estradiol: possible mechanism of uterine and mammary cancer development. Apmis. 2001, 109: S519-S527.
Article Google Scholar
Riza E, Dos SS, De Stavola B, Bradlow HL, Sepkovic DW, Linos D, Linos A: Urinary estrogen metabolites and mammographic parenchymal patterns in postmenopausal women. Cancer Epidemiol Biomarkers Prev. 2001, 10: 627-634.
PubMed Google Scholar
Jacobs S, Lonning PE, Haynes B, Griggs L, Dowsett M: Measurement of aromatisation by a urine technique suitable for the evaluation of aromatase inhibitors in vivo. J Enzyme Inhib. 1991, 4: 315-325.
Article PubMed Google Scholar
Huang CS, Chern HD, Chang KJ, Cheng CW, Hsu SM, Shen CY: Breast cancer risk associated with genotype polymorphism of the estrogen-metabolizing genes CYP17, CYP1A1, and COMT: a multigenic study on cancer susceptibility. Cancer Research. 1999, 59: 4870-4875.
PubMed Google Scholar
Kristensen VN, Kure EH, Erikstein B, Harada N, Borresen-Dale A: Genetic susceptibility and environmental estrogen-like compounds. Mutation Research. 2001, 482: 77-82.
Article PubMed Google Scholar
Kristensen VN, Borresen-Dale AL: Molecular epidemiology of breast cancer: genetic variation in steroid hormone metabolism. Mutation Research. 2001, 2: 30-33.
Google Scholar
Lonning PE, Helle SI, Johannessen DC, Adlercreutz H, Lien EA, Tally M, Ekse D, Fotsis T, Anker GB, Hall K: Relations between sex hormones, sex hormone binding globulin, insulin-like growth factor-I and insulin-like growth factor binding protein-1 in post-menopausal breast cancer patients. Clin Endocrinol. 1995, 42: 23-30.
Article Google Scholar
Henderson BE, Feigelson HS: Hormonal carcinogenesis. Carcinogenesis. 2000, 21: 427-433. 10.1093/carcin/21.3.427.
Article PubMed Google Scholar
Yue W, Santen RJ, Wang JP, Li Y, Verderame MF, Bocchinfuso WP, Korach KS, Devanesan P, Todorovic R, Rogan EG, Cavalieri EL: Genotoxic metabolites of estradiol in breast: potential mechanism of estradiol induced carcinogenesis. J Steroid Biochem Mol Biol. 2003, 86: 477-486. 10.1016/S0960-0760(03)00377-7.
Article PubMed Google Scholar
Shelley S, Tworoger J, Chubak EJ, Aiello CM, Ulrich C, Atkinson Potter JD, Yasui Y, Stapleton PL, Lampe JW, Farin FM, Stanczyk FZ, McTiernan A: Association of CYP17, CYP19, CYP1B1, and COMT Polymorphisms with Serum and Urinary Sex Hormone Concentrations in Postmenopausal Women. Cancer Epidemiol Biomarkers Prev. 2004, 13: 94-101.
Article Google Scholar
Somner J, McLellan S, Cheung J, Mak YT, Frost ML, Knapp KM, Wierzbicki AS, Wheeler M, Fogelman I, Ralston SH, Hampson GN: Polymorphisms in the P450 c17 (17-hydroxylase/17,20-Lyase) and P450 c19 (aromatase) genes: association with serum sex steroid concentrations and bone mineral density in postmenopausal women. J Clin Endocrinol Metab. 2004, 89: 344-345. 10.1210/jc.2003-030164.
Article PubMed Google Scholar
Shimada T, Watanabe J, Kawajiri K, Sutter TR, Guengerich FP, Gillam EMJ, Inoue K: Catalytic properties of polymorphic human cytochrome P450 1B1 variants. Carcinogenesis. 1999, 20: 1607-1613. 10.1093/carcin/20.8.1607.
Article PubMed Google Scholar
Shimada T, Watanabe J, Inoue K, Guengerich FP, Gillam EM: Specificity of 17 beta – estradiol and benzo[a]pyrene oxidation by polymorphic human cytochrome P450 1B1 variants substituted at residues 48, 119 and 432. Xenobiotica. 2001, 31: 163-176. 10.1080/00498250110043490.
Article PubMed Google Scholar
Dunning AM, Dowsett M, Healey CS, Tee L, Luben RN, Folkerd E, Novik KL, Kelemen L, Ogata S, Pharoah PD, Easton DF, Day NE, Ponder BA: Polymorphisms associated with circulating sex hormone levels in postmenopausal women. J Natl Cancer Inst. 2004, 96: 936-45.
Article PubMed Google Scholar
Kristensen VN, Harada N, Yoshimura N, Haraldsen E, Lonning PE, Erikstein B, Karesen R, Kristensen T, Borresen-Dale AL: Genetic variants of CYP19 (aromatase) and breast cancer risk. Oncogene. 2000, 19: 1329-33. 10.1038/sj.onc.1203425.
Article PubMed Google Scholar
Nedelcheva Kristensen V, Andersen TI, Erikstein B, Geitvik G, Skovlund E, Nesland JM, Borresen-Dale AL: Single tube multiplex polymerase chain reaction genotype analysis of GSTM1, GSTT1 and GSTP1: relation of genotypes to TP53 tumor status and clinicopathological variables in breast cancer patients. Pharmacogenetics. 1998, 8: 441-7.
Article PubMed Google Scholar
Geisler J, Berntsen H, Lonning PE: A novel HPLC-RIA method for the simultaneous detection of estrone, estradiol and estrone sulphate levels in breast cancer tissue. J Steroid Biochem Mol Biol. 2000, 5: 259-264. 10.1016/S0960-0760(00)00036-4.
Article Google Scholar
Cascorbi I, Brockmoller J, Roots I: A C4887A polymorphism in exon 7 of human CYP1A1: population frequency, mutation linkages, and impact on lung cancer susceptibility. Cancer Research. 1996, 56: 4965-4969.
PubMed Google Scholar
Matsui A, Ikeda T, Enomoto K, Nakashima H, Omae K, Watanabe M, Hibi T, Kitajima M: Progression of human breast cancers to the metastatic state in linked to genotypes of catechol-o-methyltransferase. Cancer Letters. 2000, 150: 23-31. 10.1016/S0304-3835(99)00368-7.
Article PubMed Google Scholar
McIntyre LM, Martin ER, Simonsen KL, Kaplan NL: Circumventing Multiple Testing: A Multilocus Monte Carlo Approach to Testing for Association. Genetic Epidemiology. 2003, 19: 18-29. 10.1002/1098-2272(200007)19:1<18::AID-GEPI2>3.0.CO;2-Y.
Article Google Scholar

Download references

Acknowledgements

This work was supported by grant D-03067 from The Norwegian Cancer Society and 152004/150 Functional Genomics, FUGE, NFR.

Author information

Authors and Affiliations

Department of Genetics, Institute of Cancer Research, the Norwegian Radium Hospital, 0310, Oslo, Norway
Vessela N Kristensen, Anne Faldaas, Grethe Irene Grenaker & Anne-Lise Børresen-Dale
Agilent Technologies, Palo Alto, CA, USA
Anya Tsalenko & Zohar Yakhini
Department of Oncology, Haukeland Hospital, Bergen, Norway
Jurgen Geisler & Per Eystein Lønning
Department of Informatics, University in Oslo, Norway
Ole Christian Lingjærde
Interagon AS, Trondheim, Norway
Ståle Fjeldstad
Faculty Division Radiumhospitalet, University in Oslo, Oslo, Norway
Anne-Lise Børresen-Dale

Authors

Vessela N Kristensen
View author publications
You can also search for this author in PubMed Google Scholar
Anya Tsalenko
View author publications
You can also search for this author in PubMed Google Scholar
Jurgen Geisler
View author publications
You can also search for this author in PubMed Google Scholar
Anne Faldaas
View author publications
You can also search for this author in PubMed Google Scholar
Grethe Irene Grenaker
View author publications
You can also search for this author in PubMed Google Scholar
Ole Christian Lingjærde
View author publications
You can also search for this author in PubMed Google Scholar
Ståle Fjeldstad
View author publications
You can also search for this author in PubMed Google Scholar
Zohar Yakhini
View author publications
You can also search for this author in PubMed Google Scholar
Per Eystein Lønning
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Lise Børresen-Dale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vessela N Kristensen.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kristensen, V.N., Tsalenko, A., Geisler, J. et al. Multilocus analysis of SNP and metabolic data within a given pathway. BMC Genomics 7, 5 (2006). https://doi.org/10.1186/1471-2164-7-5

Download citation

Received: 08 December 2004
Accepted: 13 January 2006
Published: 13 January 2006
DOI: https://doi.org/10.1186/1471-2164-7-5

Multilocus analysis of SNP and metabolic data within a given pathway