Characterization of familial breast cancer in Saudi Arabia

Background The contribution of genetic factors to the development of breast cancer in the admixed and consanguineous population of the western region of Saudi Arabia is thought to be significant as the disease is early onset. The current protocols of continuous clinical follow-up of relatives of such patients are costly and cause a burden on the usually over-stretched medical resources. Discovering the significant contribution of BRCA1/2 mutations to breast cancer susceptibility allowed for the design of genetic tests that allows the medical practitioner to focus the care for those who need it most. However, BRCA1/2 mutations do not account for all breast cancer susceptibility genes and there are other genetic factors, known and unknown that may play a role in the development of such disease. The advent of whole-exome sequencing is offering a unique opportunity to identify the breast cancer susceptibility genes in each family of sufferers. The polymorphisms/mutations identified will then allow for personalizing the genetic screening tests accordingly. To this end, we have performed whole-exome sequencing of seven breast cancer patients with positive family history of the disease using the Agilent SureSelect™ Whole-Exome Enrichment kit and sequencing on the SOLiD™ platform. Results We have identified several coding single nucleotide variations that were either novel or rare affecting genes controlling DNA repair in the BRCA1/2 pathway. Conclusion The disruption of DNA repair pathways is very likely to contribute to breast cancer susceptibility in the Saudi population.


Background
The discovery of the BRCA1 and BRCA2 genes as major breast cancer susceptibility genes led to great advances in the genetic screening for the disease and the understanding of its inheritance [1,2]. Several other genes were found to play a role in increasing susceptibility to breast cancer but at a markedly lower frequency and penetrance. These genes include ATM, TP53, CHECK2, PTEN, STK11, PALB2, BRIP and the RAD51 genes [3][4][5][6][7][8][9][10][11]. GWAS studies led to the identification of 21 susceptibility loci that are considered only as low risk alleles [9,[12][13][14][15][16][17]. All these factors combined can account for only 35% of heritable breast cancer with the majority of cases remain with an unknown genetic etiology [18]. This problem is confounded for the admixed and consanguineous population of the western region of Saudi Arabia where virtually no research has been done so far to elucidate the genetic background of heritable breast cancer. A remarkable characteristic of breast cancer in this population is the relatively younger age of onset of the disease where the majority of cases (sporadic or familial) are diagnosed with invasive ductal carcinoma before they are 50 years old [19]. This early onset could be attributed, at least partly, to undetermined genetic susceptibility factors accumulating in the population due to consanguineous marriages and increased exposure to environmental insults due to life-style shifts in the past two decades.
Sanger sequencing of all known breast cancer susceptibility genes could be a daunting task. Developments in massively parallel sequencing technology and wholeexome sequencing alleviate many of the problems associated with such approach and allow for the simultaneous determination of known factors as well as the discovery of novel ones. And in the age of personalized medicine, whole-exome sequencing of each breast cancer patient is fast becoming a standard approach towards genetic diagnosis [20]. In the present study we employed wholeexome sequencing of seven cases diagnosed with familial breast cancer and with unknown BRCA1 or BRCA2 status. We determined the BRCA1 and BRCA2 status in these cases and report the identification of several rare variants that can potentially explain breast cancer susceptibility in each case analyzed.

Patients' samples
Patients were selected for this study if they have a firstdegree relative(s) diagnosed with breast cancer. Peripheral blood was obtained from the patients following obtaining their informed consent and their family history of breast cancer. Patients' recruitment and blood sampling was all performed according to the institutional ethical procedures (Additional file 1 Figure S1). Genomic DNA was prepared using the Qiagen QIAamp DNA Blood Mini kit according to the manufacturer's recommendations.

Whole-exome sequencing and SNP genotyping
Three micrograms of genomic DNA was sheared using the Covaris S2 system. Exome capture was performed on seven cases and six non-cancer controls using the SureSelect Whole-Exome Enrichment version 2 kit from Agilent. Fragment libraries were prepared from the captured exomes for sequencing on the SOLiD 4 platform (AB). Sequencing for each library was performed on one part of the quad slide and fragments were sequenced in in single reads of 50 bp. Sequence capture and primary analysis were performed by the instruments ICS and SETS softwares. SNP genotyping using Taqman was performed using assay ID (C___7530120_20) from Life Technologies targeting the rs1799950 SNP. Genotyping was performed on DNA from peripheral blood of breast cancer patients or non-cancer controls.

Analysis pipeline
Color-space sequences in .csfasta and .qual files were exported to LifeScope software were mapping to the human genome version 19 (hg19) was performed using standard settings. Identification of single nucleotide polymorphisms was achieved by the diBayes software incorporated in the LifeScope pipeline. Variant call format (vcf) files were analyzed using the SNPs & Variation Suite 7 (SVS7) from Golden Helix where short-listing of candidate SNVs was performed by filtering the detected SNVs to include only those with more than 10x coverage and MQV of >=20. Rare variants were identified by filtering out SNVs present in the 1000genomes or NHLBI Exome sequencing data. Disease-associated SNVs were determined following filtering out rare SNVs found in the 6 non-cancer control cases from the same ethnic background. Damaging nonsynonymous variations were determined by the SIFT, PolyPhen or Mutation Taster softwares within the SVS7 suite.

Results
Exome sequencing revealed several single nucleotide variants affecting key genes that could be involved in increased susceptibility to breast cancer. The single nucleotide variants or short indels obtained for every sample were filtered against the NHLBI Exome project and the 1000genomes project databases. Novel or rare variants (MAF of <0.01) were filtered against our inhouse database of exome sequencing of non-cancer patients or healthy individuals. The statistics of each breast cancer exome sequenced are shown in Table 1. The mutational status of BRCA1 and BRCA2 in the sequenced samples was unknown. Therefore, variants affecting those genes were analyzed first. We have identified one novel frameshift mutation affecting BRCA2 caused by an -/AC insertion affecting one patient only ( Table 2). Other BRCA1 or BRCA2 variants identified were previously reported in dbSNP137. However, the nonsense variant represented by SNP rs80358972 is very rare and no information about its MAF could be found. We have found this variant in one BC patient only. Other Missense single nucleotide variants affecting BRCA1 and BRCA2 were identified. However, when selection is based on rarity and degree of predicted damage to the protein, SNP rs1799950 is found in one patient. In order to determine the frequency of the rs1799950 SNP in our cohort, we performed Taqman ® SNP genotyping assay on DNA obtained from the peripheral blood of 204 breast cancer patients samples as well as 120 non-cancer controls. The rs1799950 SNP was in a highly significant Hardy-Weinberg disequilibrium in the patient group (X 2 =133.124) compared to the control group (X 2 =0.108). The GG state of the rs1799950 SNP is significantly associated with breast cancer compared to the AA and AG states combined (p=0.0003, OR=22.79, CI=1.366-380.1). Predisposition to breast cancer is often caused by genetic defects in DNA repair mechanisms. Therefore, SNVs affecting known genes with DNA repair function were examined (Table 3). In addition, SNVs were also identified affecting the APC, EGF and EGFR genes. An interesting mutation c.148G>A / p.Ala62Thr is found affecting the PARP1-interacting region of the Cockayne Syndrome group B (ERCC6) gene. Analysis of DNA from the family of the affected female revealed that this mutation segregated in the heterozygous state in one sibling affected with breast cancer as well as in the mother who also suffered from breast cancer. The father did not harbor this mutation (Figure 1). This SNV was recently reported by the 1000Genome project (rs186839348) where it was found only once in 1094 individuals. We could not detect this SNV in 228 non-cancer control samples from Saudi Arabia.

Discussion
Breast cancer incidence is on the rise in the Kingdom of Saudi Arabia with a remarkable number of those affected are being diagnosed before they are 50 years old [19]. The early-onset of the breast cancer in this population could be partly explained by the accumulation of breast cancer predisposition genetic factor(s) due to high incidence of consanguineous marriages. The effects of these genetic factor(s) is probably becoming more evident now due to the social and life-style changes brought upon by the relatively recent positive economical upheavals in the country. In order to identify such genetic factors, we performed a pilot whole-exome sequencing study on DNA obtained from the peripheral blood of seven cases suffering from hereditary breast cancer. First, the status of the known breast cancer predisposition factors, mainly BRCA1 and BRCA2, was determined. We could not identify recurrent BRCA1/2 mutations in our cohort. However, we identified a novel insertion that led to a frameshift mutation (p.Thr363fs) in BRCA2 causing the synthesis of a truncated and presumably dysfunctional protein. We identified another rare mutation in BRCA2 in one of our patients. Represented by the rs80358972 SNP, the p.Arg2494Stop affecting BRCA2 has been reported by the Breast Cancer Information Core submitted by Myriad Genetics as a direct result of their diagnostic services.   Additionally, we have identified the relatively rare rs1799950 SNP in BRCA1 which is a p.Gln356Arg mutation reported by the 1000Genomes project to have an MAF of 0.026. We found this SNP in our cohort with a MAF of 0.058 (7 heterozygous cases in 120 non-cancer cases). The minor allele frequency of the rs1799950 SNP did not differ significantly from controls. However, we observed an increase in the number of breast cancer cases displaying the homozygous GG minor allele state that is not seen in the control cases. When the GG state is analyzed in comparison to the combined frequency of the AA and AG states, a highly significant association with breast cancer becomes evident. The rs1799950 SNP is one of 25 SNPs in cancer predisposition genes that were identified to confer minor but cumulatively significant risk of breast cancer [21]. However, a later study dismissed the association of the rs1799950 SNP with breast cancer [22].
Unfortunately, it is difficult to perform direct comparisons between our findings and reported studies due to the differences in sample size and the ethnic makeup of the cohorts analyzed. Whole-exome sequencing revealed several candidate risk factors for breast cancer. We made the assumption that the most likely risk factor is a gene(s) involved in DNA repair, cell cycle or apoptosis [18]. Applying this filter to the SNVs obtained reveal rare polymorphisms that could affect important genes such as WRN, APC, EGF, EGFR and ERCC6. The contribution of these SNVs towards increasing predisposition to breast cancer remains unknown. Therefore, we analyzed the segregation with breast cancer of the SNVs affecting ERCC6 (p.Ala62Thr) and WRN (p.Ala616Pro) in a family with reported breast cancer affecting three generations (case_574). The WRN p.Ala616Pro was detectable in the  two siblings diagnosed with breast cancer. However, this SNV could not be found in the mother who died of breast cancer. In contrast, the ERCC6 p.Ala62Thr SNV segregated with breast cancer in the same family and it was not detectable in the father or control samples. This mutation affects the PARP1-interaction region of ERCC6, also known as Cockayne Syndrome group B (CSB) [23]. ERCC6-dependent activation of the poly (ADP-ribose)polymerases, or PARPs is an early event in the cellular response to genotoxic stress [24]. Carrying a variant ERCC6 therefore will cause a less-efficient DNA repair response and could therefore lead to an increased predisposition to breast cancer.

Conclusions
This is the first report on the breast cancer predisposition factors in the population of the Kingdom of Saudi Arabia. The high consanguinity and life-style shifts in this population are coupled to an early-onset breast cancer and the SNVs identified in this study could partly explain this phenomenon. We have identified a novel BRCA2 mutation as well as found a case with a very rare nonsense mutation truncating the BRCA2 protein. We demonstrate the potential importance of the homozygous risk allele to breast cancer predisposition in the Saudi population. We suggest that mutations in the ERCC6 gene could be considered as potential risk factors for breast cancer. Although no recurrent mutations were identified, this study validates the use of whole-exome sequencing for the determination of the "breast cancer predisposition genome".