Nucleotide sequence analyses of the MRP1 gene in four populations suggest negative selection on its coding region

Background The MRP1 gene encodes the 190 kDa multidrug resistance-associated protein 1 (MRP1/ABCC1) and effluxes diverse drugs and xenobiotics. Sequence variations within this gene might account for differences in drug response in different individuals. To facilitate association studies of this gene with diseases and/or drug response, exons and flanking introns of MRP1 were screened for polymorphisms in 142 DNA samples from four different populations. Results Seventy-one polymorphisms, including 60 biallelic single nucleotide polymorphisms (SNPs), ten insertions/deletions (indel) and one short tandem repeat (STR) were identified. Thirty-four of these polymorphisms have not been previously reported. Interestingly, the STR polymorphism at the 5' untranslated region (5'UTR) occurs at high but different frequencies in the different populations. Frequencies of common polymorphisms in our populations were comparable to those of similar populations in HAPMAP or Perlegen. Nucleotide diversity indices indicated that the coding region of MRP1 may have undergone negative selection or recent population expansion. SNPs E10/1299 G>T (R433S) and E16/2012 G>T (G671V) which occur at low frequency in only one or two of four populations examined were predicted to be functionally deleterious and hence are likely to be under negative selection. Conclusion Through in silico approaches, we identified two rare SNPs that are potentially negatively selected. These SNPs may be useful for studies associating this gene with rare events including adverse drug reactions.


Background
The development of drug resistance poses a serious limitation to the effective treatment of cancer. Although sev-eral different drug resistance mechanisms have been described, members of the ABC transporter superfamily have generated great interest because of their contribution      to multidrug resistance of tumor [1,2]. The 170 kDa Pglycoprotein, encoded by the MDR1 gene, was the first member of this family to be described [3]. Subsequently, the 190-kDa multidrug resistance-associated protein-1 (MRP1/ABCC1) was isolated from a multidrug resistance lung cancer cell line that does not express MDR1 [4]. Both these transporters have been implicated in the resistance of various cancers to chemotherapy. Although MRP1 is only 18% identical to MDR1 at the amino acid level, it transports several similar drugs as MDR1 including doxorubicin, vincristine and colchicine. However, while drugs transported by MDR1 are usually neutral or cationic, drugs effluxed by MRP1 are anionic, frequently conjugated with glutathione and other anions, or are co-trans-ported with glutathione [2]. MRP1 has also been implicated to play important roles in cellular anti-oxidative defense and inflammation [5,6].
Genetic polymorphisms in MDR1 have been associated with differences in MDR1 expression and function as well as drug response and disease susceptibilities [8][9][10]. SNPs within MDR1 that have been associated with functional differences were found to demonstrate evidence of recent positive selection [11]. However, less is known about the polymorphisms within MRP1. Although numerous SNPs have been identified within this gene.( [12][13][14][15][16][17], most of these studies were performed on a single population which was primarily either Chinese, Japanese or Caucasians in origin. Thus far, no association have been observed between the few SNPs at MRP1 and functional differences [12,[17][18][19] possibly because neither the functionally important SNP nor SNPs in LD with the functional SNP were examined. These studies which examined only a few of the many SNPs within MRP1, without knowledge of the functional SNP nor the LD or haplotype profile in that population may not have been powerful enough to identify any association. Recently, we found evidence of genomic signatures of recent positive selection in a SNP at the 5' flanking region (5'FR) of MRP1 in a Caucasian population and demonstrated that this SNP altered MRP1 promoter activity [20].
In the present study, we sequenced all the exons as well as the 5' and 3' flanking regions of MRP1 to comprehensively scan for polymorphisms in 142 DNA samples from four different populations, namely, the Chinese, Malays, Indians and Caucasians. Nucleotide diversity of the exonic polymorphisms was determined and the functional effects of the non-synonymous SNPs were predicted using three programs, SIFT, PolyPhen and PANTHER. We found that SNPs E10/1299G>T, which resulted in arginine-serine substitution at amino acid position 433 (R433S) and E16/2012G>T, which resulted in glycine-valine substitution at amino acid position 671 (G671V), may potentially adversely affect the function of MRP1. While these two SNPs, which have low minor allele frequencies (<3%), may not be useful for studies associating this gene with common diseases/drug response, it may, nonetheless, be useful for studies associ-ating this gene with rare events including adverse drug reactions (ADRs).

Profile of polymorphisms within MRP1 in the different populations
De novo sequencing of approximately 18 kb of genomic DNA at MRP1, including all the 31 exons as well as flanking regions, was performed in 142 healthy individuals from four different populations to identify polymorphisms at MRP1 in the different populations. A total of 71 polymorphisms were identified including 60 bi-allelic SNPs, ten indels and one short tandem repeat (STR) ( Figure 1, Tables 2, 3, 4). An examination of currently reported SNPs in the dbSNP Build 125 database [21] and published reports [12][13][14][15][16][17]22] revealed that 26 SNPs and 8 indels were not previously reported and hence represent novel polymorphisms.
Nineteen of the 60 SNPs identified were found in all four populations while 28 of these SNPs were population-specific with 22 of these population-specific SNPs occurring only once out of 284 chromosomes examined (singletons). While the STR and two indels were polymorphic in all the populations examined, the other seven indels were population-specific of which six were singletons.
None of the indels or STR identified occurred within exons ( Fig. 1, Tables 3 and 4). Eighteen of the 60 bi-allelic SNPs were found in exonic regions, six of which resulted in non-synonymous change ( Fig. 1, Table 2). These results suggest that polymorphisms at MRP1 are largely conservative since less than 10% of these polymorphisms (6/71) presented as non-synonymous changes which are potentially capable of disrupting the MRP1 protein structure/ function. Nonetheless, it is possible for synonymous or intronic SNPs to affect MRP1 expression or function through the alteration of the mRNA transcript stability or folding [23] thereby affecting downstream splicing [24,25], processing [26], translational control [27] or regulation [28]. Additionally, polymorphisms at the 5'UTR/promoter and 3'UTR may influence promoter activity and hence gene expression or mRNA transcript stability.
Interestingly, although no polymorphisms were identified at the 3'UTR region (exon 31), four polymorphisms, including the STR (Table 4) and one indel (Table 3) were found to reside at the 5'UTR/reported core promoter region [29] of MRP1. Three of these promoter polymorphisms were novel but population-specific with SNPs 5'UTR/-46C>T and 5'UTR/-51C>T occurring only in the Malay population and the polymorphism 5'UTR/-74 14 bp indel occurring only in the Indian population. Inser-tion/deletion polymorphisms in promoter regions have been correlated with the modulation of the expression of genes (e.g. matrix metalloproteinase I gene [30]). It is thus possible that the 14 bp indel polymorphism in the Indian population may influence the promoter activity and hence the expression of MRP1.
The STR polymorphism found at the 5'UTR/promoter region of MRP1 is a GCC trinucleotide repeat and 7-16 of such repeats were observed in the four populations (  Caucasians. Interestingly, while seven GCC repeats occurred at relatively high frequencies in the Indian and Caucasian populations (≥ 12%), this number of repeats was not observed in either the Chinese or Malay population. These observations highlight the differences in the distribution of the number of the MRP1 promoter GCC repeats in the different populations with the Indians and Caucasians being more similar to each other than to the Chinese and Malays. The number of STR repeats residing within or close to promoters has been found to modulate the promoter activity of genes [31][32][33][34]. Interestingly, differences in the CGG and GCC trinucleotide repeats at the 5'UTR/promoter region of the Fragile X mental retardation genes (FMR1 and FMR2, respectively) have been associated with differences in the methylation status of the promoter and expression of the genes [35]. Hence this common polymorphism at the 5'UTR/promoter region of MRP1 with distinctly different distribution of repeat numbers in the different population may have potential functional significance.   Table 5A, only 19 and 14 polymorphisms that we identified were also genotyped in the HapMap and Perlegen projects. Curiously, 25/23 and 1/1 polymorphisms reported in HapMap and Perlegen, respectively, were found to be monomorphic in similar populations that we examined. Nonetheless, all the SNPs examined in the two databases that did not occur in our populations were found to be either monomorphic or of low frequency (<5%) in similar populations examined in the two databases (Table 5A). On the other hand, 41/ 41 and 46/46 polymorphisms that we identified were not examined in either the HapMap or the Perlegen project, respectively. While many of these polymorphisms were of low frequencies or were monomorphic in the two populations that were similar to the HapMap/Perlegen populations, 8 of these polymorphisms were found to be of relatively high frequencies (>5%) in at least one of the two populations. Some of the low frequency polymorphisms represent novel SNPs identified in this study.

Comparison of polymorphisms identified in this study with those reported in the HapMap and Perlegen databases
Polymorphisms in our study that was also genotyped in the HapMap and Perlegen projects were found to have similar frequencies in similar populations (Table 5B).
Paired T-test revealed no significant difference (P > 0.05) between allele frequencies in the respective populations from our study and those from the HapMap or Perlegen database (Table 5C). Interestingly, significant difference (P < 0.05) was observed between data obtained from the HapMap database and those from the Perlegen database especially for the Chinese population probably due to fewer samples being examined in the Perlegen database.

Nucleotide diversity at MRP1
The extent of variation at MRP1 was evaluated using two conventional measures of nucleotide diversity: π, the average heterozygosity per site and θ, the population mutation parameter [40]. Tajima's D statistic was also calculated to assess deviation from the neutral mutation model [41]. A positive Tajima's D value for a single gene is indicative of positive heterozygote advantage while a negative Tajima's D value for an individual gene suggests selection of a specific allele over the alternative allele(s) [42]. However, when a negative Tajima's D value is observed in most of the genes that were examined in a particular population, it is suggestive of a recent expansion in that population [42].
With all the exonic regions sequenced, the above nucleotide diversity statistics were determined for non-synonymous versus synonymous SNPs at MRP1 ( Table 6). The θ value for synonymous SNPs at MRP1 was found to be 16.15 × 10 4 (Table 6) which was comparable to mean θ values of other reported genes including 24 transporter genes (20.14 ± 4.10) × 10 4 [43], 75 candidate genes associated with blood pressure homeostasis (15.1 ± 3.6) × 10 4 [44] but slightly higher than the mean θ values of 106 random genes (10.03 ± 2.52) × 10 4 [45]. However, the θ value for non-synonymous SNPs (11.73 × 10 4 ) at MRP1 was much higher than mean θ values of the other reports (3.59 ± 0.90 to 5.7 ± 1.4) × 10 4 [43][44][45] probably due to the small size of the MRP1 exons in which the non-synonymous SNPs reside. Interestingly, while the π of synonymous SNPs (π s ) at MRP1 (12.62) was comparable to the reported mean π s values in other genes (9.73 ± 4.86 to 10.67 ± 5.07) × 10 4 [43,45], the π ns at MRP1 (0.94) was much lower than the mean reported π ns values for the other genes (2.20 ± 1.12 to 2.75 ± 1.31) × 10 4 [43,45]. This low π ns at MRP1 was also reported previously [43] with the reported π ns value (0.15) being much lower than the present observation (0.94). Notably, the π ns /π s at MRP1 was less than 1 (0.0743 in this study and 0.0110 in the previous study [43]), suggesting that this gene is likely to be under selective pressure. Importantly, the θ values for both synonymous and non-synonymous SNPs were greater than the corresponding π values, resulting in negative Tajima's D statistic which suggests that the coding region of MRP1 may have undergone negative selection or population expansion. It is more likely that MRP1 gene have undergone negative selection since the average total nucleotide diversity in the MRP1 gene (π total ) (9.25) was found to be greater than the amino acid diversity (π ns ) (0.94) [43].

SNPs E10/1299G>T and E16/2012 G>T are potentially deleterious
As nucleotide diversity statistics suggest that the coding region of MRP1 may be under negative selection, we thus further analyzed the exonic SNPs at MRP1 to evaluate if any of these SNPs may have deleterious effects on MRP1 structure/function.
Exonic SNPs, particularly non-synonymous SNPs, have the potential to alter the secondary/tertiary structure of proteins and/or affect the protein function. A total of 18 exonic SNPs were identified at this gene locus of which five have not been previously reported ( Fig. 2A, C). Most of the exonic SNPs occurred at low frequencies (<5%) in only one or two populations. While at least 30% of the synonymous SNPs at MRP1 occurred at greater than 5% frequency in all the four populations examined, all of the non-synonymous SNPs occurred at less than 3% in only one or at most two populations (Fig. 2C). This observation highlights the conservation of exonic polymorphisms at MRP1 and suggests that altering the nonsynonymous SNPs may have a deleterious effect and are  likely to be selected against, resulting in their low frequencies.
To assess if any of the non-synonymous SNPs at MRP1 have potentially damaging effect on the protein structure/ function, the location of these six SNPs were displayed on the MRP1 protein topological image using the SOSUI and TOPO2 programs. As evident in figure 2B, none of the non-synonymous SNPs reside in the transmembrane regions although four of these SNPs reside near or within the nucleotide binding domain (NBD) of the MRP1 protein. Nonetheless, SNPe1 (SNP e2/218 C>T) and SNPe6 (SNP e10/1299G>T) reside near the transmembrane region, while SNPe12 (SNP E16/2012 G>T) reside on a conserved glycine residue near the conserved Walker A consensus motif of the NBD [12], suggesting that these SNPs may have functional significance. SNP e2/218 C>T was only found at less than 3% in the Chinese and Malay populations while SNP E10/1299 G>T occurred at less than 2% in the Caucasian population only and SNP E16/ 2012 G>T occurred at less than 3% in the Indian and Caucasian populations (Fig. 2C). The SNP frequencies of SNP E10/1299 G>T and SNP E16/2012 G>T in the Caucasian population were comparable to a previous report [12].
Three different algorithms, SIFT [46], Polymorphism Phenotyping (PolyPhen) [47]and PANTHER [48]were then utilized to predict the functional significance of the six non-synonymous SNPs. SIFT predicts the effect of amino acid substitutions based on the assumption that the important amino acid will be conserved in the protein family [46]. PolyPhen predicts the effect of the amino acid variant on the function or structure of the protein based on current knowledge of protein structure, interactions and evolution [47] while the PANTHER program predicts the effect of an amino acid substitution on the protein's function using amino acid substitution scores derived from an alignment of related protein sequences and statistics from hidden Markov models [48].
Interestingly, SNP E10/1299 G>T, which is located near the transmembrane domain, was predicted to be potentially deleterious by the PANTHER but not the SIFT or PolyPhen algorithms. This SNP was reported to affect the ability of MRP1 to confer drug resistance as well as to transport organic anions [49] suggesting that the PAN-THER program may be more accurate in predicting the functional impact of polymorphisms than SIFT or PolyPhen. This observation is similar to a previous report that utilizes both bioinformatics and biochemical approaches to compare the accuracy of the PolyPhen and PANTHER programs in predicting functionally deleterious polymorphisms in the ABCA1 gene [50]. They found that the PANTHER software is significantly (P < 0.05) more accurate in its prediction of the functional consequence of nonsynonymous SNPs. They also reported that the PANTHER program is capable of correctly predicting the functional impact of greater than 94% of the polymorphisms examined while PolyPhen is only ~88% accurate in predicting the functional impact of polymorphisms [50].
Significantly, all of the three different algorithms predicted that SNP E16/2012 G>T, which resides close to Walker A and results in G671V substitution, was likely to have a potentially deleterious effect on protein function (Fig. 2C). The significance of this polymorphism has also been demonstrated previously by Conrad et al. [12] who reported that the mRNA expression of peripheral lymphocytes from individuals carrying the SNP E16/2012 G>T polymorphism was lower than the average expression level. The lower expression of the MRP1 G671V transcript is suggestive of greater accumulation of MRP1 drugs in the cells which may lead to adverse drug reactions. Curiously, that report also found that the G671V polymorphism did not affect the transport of MRP1 substrates including leukotriene C 4 , 17β-estradiol 17β-(D)-glucuronide and estrone sulfate by membrane vesicles prepared from transiently transfected HEKSV293T cells [12]. Recently, the same group also reported similar MRP1 protein expression levels and transport properties in human embryonic kidney cells were transfected with MRP1 constructs carrying either glycine or valine at amino acid position 671 [51]. The observation that the G671V polymorphism did not affect MRP1 protein expression or transport ability of some MRP1 substrates in vitro [12,51] does not rule out the possibility of functional significance of this polymorphism in vivo especially since the same group reported decreased transcript expression in individuals carrying this polymorphism. It is still possible that this polymorphism affect the transport of other MRP1 substrates that has not been examined. It is also possible that although the SNP E16/2012 G>T polymorphism does not affect MRP1 transport ability, it may affect other yet-to-be-examined functional properties of the protein (e.g. drug resistance capability or cellular anti-oxidative defense or inflammation). It has been reported that an artificial mutation E1089Q created in MRP1 markedly affected the ability of MRP1 protein to confer resistance without affecting its ability to transport organic anions [52]. Hence, the SNP E16/2012 G>T polymorphism warrants further investigation.
Hence, the bioinformatics approach may be useful in facilitating the prediction of potentially functionally significant polymorphism so that future research may be directed to characterizing these polymorphisms.

Functional implications of polymorphisms at MRP1
The current detailed characterization of polymorphisms at MRP1 in four different ethnic populations highlights several characteristics about this gene that may facilitate more rational approaches to studies associating this gene with functional changes. We have previously reported that the diverse haplotypes and weak LD across MRP1 [20]could perhaps provide an explanation for the failure of previous studies to detect association between polymorphisms in this gene and functional differences [12,[17][18][19] and highlight the importance of fully characterizing the LD and haplotype profiles of the gene before embarking on association studies. Its LD and haplotype architecture suggest that it may be necessary to identify alternative approaches for association studies of this gene as it may not be feasible to utilize tag SNPs. A possible approach is to identify polymorphisms with potential functional significance before performing association studies, possibly by identifying those polymorphisms that may have been subjected to selection pressures.
We recently identified a high frequency SNP at the 5' flanking promoter region of MRP1 that demonstrated evidence of recent positive selection and affected the promoter activity of MRP1 [20]. In this report, through the sequencing of the MRP1 exonic and flanking regions, we identified a GCC-trinucleotide multi-allelic STR polymorphism residing within the 5'UTR/promoter region of MRP1 that was found at relatively high frequencies in all populations examined. Notably, the frequency distribution of the different number of STR alleles in the different population was found to be different (Table 4). Although it was previously reported that the 5'UTR/promoter region contains the GCC-triplet repeats that is absent in the rodent sequence and 7, 13 and/or 14 of these repeats were observed in different cell lines and PBMC from a single individual [7,53], no reports have yet examined the variation of this polymorphism in the different ethnic populations. This STR is approximately 296 bp from the SNP that we previously reported to show evidence of recent positive selection [20]. Given that the selection is recent it may be expected that it would be in strong LD with the positively selected SNP. Since STRs within/near promoters have been implicated to affect promoter activity and expression levels of the gene, it would be worthwhile to further examine the effect that this polymorphism together with the positively selected SNP have in influencing promoter activity and hence expression of MRP1.
Interestingly, while the promoter region of MRP1 may be under recent positive selective pressure [20], in this study we also found that the coding region of this gene may have undergone negative selection pressure as suggested by nucleotide diversity indices. Two coding SNPs E16/ 2012 G>T and E10/1299 G>T have been predicted by either the PANTHER program or all three programs (SIFT, PolyPhen and PANTHER) to have a deleterious effect on the structure/function of the protein. The significance of these SNPs for general association studies may be limited since these SNP occurs at very low frequencies (<3%) in only one or two of the four populations examined. Nonetheless, this SNP may be associated with rare events including ADR.

Conclusion
In summary, based on the "common disease-common variant" hypothesis, the previously reported common polymorphism within the promoter of MRP1 that showed evidence of recent positive selection [20]would be useful for association studies of common diseases/drug response. Nonetheless, the rare exonic SNP(s) in this gene that we demonstrate here to be likely to be under negative selection pressure may be useful for studies associating this gene with rare phenotypes including ADR, which has been listed as the top five leading causes of death in Western countries [54].

Study population
The populations examined include individuals residing in Singapore from the following ethnic groups: 36 Caucasians and Chinese as well as 35 Malays and Indians. Race and ethnic group were declared by the volunteers to be true to three generations. Informed consent from the volunteers and ethical approval from the National University Hospital and the Changi General Hospital Institutional Review Boards were obtained.

PCR and DNA sequencing
The MRP1 genomic DNA (NT_0101393.13) sequence was obtained from GenBank [55] and used as the reference sequence. For the sequencing of all 31 exons of MRP1, 30 pairs of primers (see Table 1) were designed using Vector NTI 7.0 software and utilized to amplify these exons. The amplicons spanned the entire exon as well as some flanking sequences, to ensure that the splice donor and acceptor sites were also included. The PCR reaction was performed in a 10 µl volume reaction containing 40 ng genomic DNA template from the above mentioned samples, 5 µl 2 × PCR master mix buffer (Qiagen, Valencia, CA, USA), with or without 1 µl Q-solution (depending on the GC content of the amplicon) as well as 0.20 µM/L of sense and anti-sense primers. PCR was carried out in a GeneAmp ® PCR System 9700 (Applied Biosystems, Foster City, CA) with the thermal cycling conditions as follows: an initial denaturation at 94°C for 15 min followed by 35 cycles at 94°C for 30 sec, temperature for the optimal annealing of each amplicon as specified (Table 1) for 90 sec, and extension at 72°C for 60 sec. This was then followed by a final elongation step at 72°C for 10 min. The PCR products obtained were then treated with exonuclease I and shrimp alkaline phosphatase (SAP, United States Biochemical). Sequencing reactions were performed using ABI PRISM Big Dye Terminator (V3.0) kit and the conditions for the sequencing reactions were (for all the exons except exon 1): 94°C for 15 min followed by 30 cycles at 96°C for 10 sec, 50°C for 5 sec and 60°C for 4 min. Due to the high GC content of exon 1, the sequencing conditions were modified as follows: 94°C for 15 min followed by 35 cycles at 98°C for 30 sec, 48°C for 10 sec and 60°C for 5 min. The final product was resolved by automated capillary electrophoresis on an ABI PRISM 3700 ® DNA analyzer (Applied Biosystems). The DNA sequence of each exon obtained experimentally was then aligned against the reference sequence (NT_0101393.13) using the Vector NTI 7.0 software to identify the polymorphic sites. Polymorphisms identified were verified through bidirectional re-sequencing of all samples whose chromatograms do not clearly display the polymorphism as well as randomly selected samples whose chromatograms clearly show the polymorphism.

Population genetic parameters
Two common parameters of nucleotide diversity were calculated: the neutral parameter (θ) which is the estimate of population mutation parameter based on the number of polymorphic sites in the samples [40] and nucleotide diversity (π) which is the direct estimate of heterozygosity per site, or the average proportion of nucleotides that differ between any randomly sampled pair of sequences [40]. Each of these two parameters was calculated for synonymous and nonsynonymous SNP sites. Tajima's D statistic was also calculated to assess deviations from the neutral mutation model [41].

In Silico characterization of polymorphisms in exons
The programs Sorting Intolerant From Tolerant (SIFT) [46], Polymorphism Phenotyping (PolyPhen) [47] and PANTHER [48] were utilized to evaluate the potential effect of amino acid substitutions resulting from the polymorphisms. Since the position of the non-synonymous polymorphic amino acid residue on the MRP1 protein may provide important clues with regards to its potential functionality, the SOSUI program [56] was utilized to predict the topology of the MRP1 protein and the TOPO2 program [57] was used to display the location of the SNPs on the MRP1 protein topological image.

Authors' contributions
ZW contributed to the design of the experiments, data analysis and the write-up of the manuscript. PHS carried out the sequencing experiments. HA and SR contributed to the critical review of the draft manuscript. SSC and EJDL contributed to the conception and critical review of the draft manuscript. CGL (corresponding author) conceived the study, and contributed to the design of experiments, coordination, critical evaluation of the data and analyses as well as the final writing of the manuscript. All the authors have given the final approval of the version to be published