Impact of novel SNPs identified in Cynara cardunculus genes on functionality of proteins regulating phenylpropanoid pathway and their association with biological activities

Background Cynara cardunculus L. offers a natural source of phenolic compounds with the predominant molecule being chlorogenic acid. Chlorogenic acid is gaining interest due to its involvement in various biological properties such as, antibacterial, antifungal, antioxidant, hepatoprotective, and anticarcinogenic activities. Results In this work we screened a Cynara cardunculus collection for new allelic variants in key genes involved in the chlorogenic acid biosynthesis pathway. The target genes encode p-coumaroyl ester 3′-hydroxylase (C3′H) and hydroxycinnamoyl-CoA: quinate hydroxycinnamoyl transferase (HQT), both participating in the synthesis of chlorogenic acid. Using high-resolution melting, the C3′H gene proved to be highly conserved with only 4 haplotypes while, for HQT, 17 haplotypes were identified de novo. The putative influence of the identified polymorphisms in C3′H and HQT proteins was further evaluated using bioinformatics tools. We could identify some polymorphisms that may lead to protein conformational changes. Chlorogenic acid content, antioxidant and antithrombin activities were also evaluated in Cc leaf extracts and an association analysis was performed to assess a putative correlation between these traits and the identified polymorphisms. Conclusion In this work we identified allelic variants with putative impact on C3′H and HQT proteins which are significantly associated with chlorogenic acid content and antioxidant activity. Further study of these alleles should be explored to assess putative relevance as genetic markers correlating with Cynara cardunculus biological properties with further confirmation by functional analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3534-8) contains supplementary material, which is available to authorized users.


Background
Plants have always been considered an important source of biologically active compounds, mostly associated with health improvement. In the last decade nutritional therapy and phytotherapy appeared as new concepts related to the prevention and treatment of different human diseases [92]. In this context, the extraction and identification of plant-derived bioactive compounds has become an important target in agriculture and food science research. Although Cynara cardunculus L. (Cc) has been known for its therapeutic activities and used in folk medicine since ancient times, renewed interest has arisen through recent studies on its chemical and biological characterization [23,72,73].
Cynara cardunculus is native to the Mediterranean region and well adapted to hot and dry climates [33]. It comprises three botanical varieties: the globe artichoke (var. scolymus L.); the cultivated cardoon (var. altilis DC.); and the wild cardoon (var. sylvestris (Lamk) Fiori) [70], which is considered to be the wild progenitor of the first two varieties [2,45,74]. Globe artichoke is economically important mainly in Italy, Spain, France and Turkey, having been selected for production of edible immature capitula ("heads") used in traditional cuisine [37]. Cultivated Cc has different potential applications, namely the production of solid biofuel [25,31,69], seed oil [18,52], biodiesel [21,22], paper pulp [1,33] and green forage [24]. In Portugal, cultivated and wild cardoon are the most used varieties. Cardoon has been selected for its fleshy stems and leaf petioles, which are consumed as vegetables, while flowers are used as a source of aspartic proteases for milk clotting during cheese production [25]. However, Cc also has potential for industrial use as a source of bioactive phenolic compounds [72]. In artichoke, leaf extracts have been reported as having relevant biological properties, such as antibacterial [93], antifungal [94], antioxidant [42,72,88], hepatoprotective [3], anticarcinogenic [57,58,65] and inhibition of cholesterol biosynthesis [5,30,43]. In previous experiments we found that cardoon phenolic extracts prepared from different organs showed a high antioxidant and reduction capacity as well as a strong antitumorigenic effect in a human breast cancer cell line [86]. Many of the bioactivities beneficial for human health have been attributed to the presence of phenols, particularly caffeic acid derivatives such as 5-caffeoylquinic acid (chlorogenic acid, CGA) and di-caffeoylquinic acids and flavonoids [3,48,58,72,86,88,93], These findings have encouraged the use of breeding programs aimed at increasing their levels in crops.
Artichoke and cardoon have been domesticated since Roman times, and artichoke breeding programs were established since the XX century for quality and crop productivity improvement [19,26,50,59,79,84]. However, despite the economic and pharmacological value of C. cardunculus its improvement through breeding is still limited. An increased knowledge of genes involved in the biosynthesis of selected compounds could help in precision breeding focused on enriching their levels.
Therefore, the objective of this work was to screen a natural population of C. cardunculus plants for single nucleotide mutations in selected genes of the phenylpropanoid pathway to characterize their genetic variability. Sequences of C3′H and HQT genes were selected and the high-resolution melting technique (HRM) was applied to identify single nucleotide polymorphisms (SNPs). Moreover, the CGA content, the antioxidant and the antithrombin activities were also evaluated in an attempt to correlate chemical and biological variations with the different Cc haplotypes identified.

Plant materials
A collection of 29 accessions comprising 127 individuals of Cynara cardunculus from different sources was established at CEBAL. This material included 25 accessions of wild cardoon, 3 of cultivated cardoon and 1 of artichoke. The material was obtained from Portugal (2 accessions), Spain (4 accessions), Italy (18 accessions from 4 regions) and England, Moldavia, Hungary, Algeria and Norway with 1 accession each (see Additional file 1 for details). Seeds were washed in distilled water and 7-8 seeds were placed on wetted paper in Petri dishes. The plates were capped and stored at 4°C for 4 days to break seed dormancy. The plates with seeds were then transferred to a growth chamber with a 16/8 h photoperiod, temperature of 22°C and 70% humidity to germinate. Five days after germination the seedlings were transplanted to pots with soil (90%) and sand (10%) and maintained at room temperature with regularly watering. After 90 days, young leaves were collected and immediately used for DNA isolation.
Four months after germination, plants were transplanted to larger pots with the same proportions of soil and sand and transplanted to the field. For the phenotypic traits analysis 1 individual each from Portugal, Napoli, Hungary, Norway, Moldavia and England, 3 individuals from South Italy and Rome, 2 individuals from North Italy and 6 from Spain were selected as representative of the Cc haplotypes diversity comprising 20 individuals (1, 3 and 16 individuals, respectively, of artichoke, cultivated and wild variety). Nine-month-old leaves were collected and preserved at −80°C until phenotypic analysis.

High-resolution melting (HRM) analysis
DNA was extracted from 100 mg of young leaf tissue using the DNeasy Plant Mini kit (Qiagen, Germany) and quantified on a 1% agarose gel. High-resolution meltingspecific primers for C3′H and HQT coding sequences (CDS) [GenBank: FJ225121.1 and DQ915590.1, respectively] were designed in order to amplify overlapping segments of DNA coding sequences using Primer 3 software (http://bioinfo.ut.ee/primer3-0.4.0/) (Additional files 2 and 3). The presence of C3′H and HQT gene polymorphisms associated with natural variation was assessed by HRM, performed as described by Han et al. [35]. PCR amplifications were performed in 96-well plates, in a total volume of 10 μL containing 0.5 ng of genomic DNA, 1 μM of forward and reverse primer, 1× HotShot Diamond™ PCR Mastermix (Clent Life Science, UK) and 1 μL of LCGreen® Plus + Melting Dye (Idaho Technologies, Salt Lake, UT, USA). Amplification reaction was performed in a thermocycler Mastercycler® pro Eppendorf (Hamburg, Germany) as follows: 94°C, 10 min; 45 cycles of 94°C, 30 s, 62°C, 30 s and a final cycle at 94°C, 30 s and 26°C, 30 s. Melting profiles were analysed in a Light-Scanner® (Idaho Technology, Salt Lake, UT, USA) by increasing the temperature by 0.5°C s −1 from 65 to 95°C. Melting data were analysed using the LightScanner Software 2.0 (Idaho Technologies). After data calibration and normalization, samples were grouped based on the shape of the normalized melting curves (profile) and the sensitivity level was adjusted to distinguish all genotype groups.

Haplotype analysis and sequence validation
From each HRM profile, three DNA samples were amplified twice and sequenced to confirm the results obtained. PCR amplifications products were purified and sequenced by Beckman Coulter Genomics (Takeley, UK), with an ABI 3730xl sequencing platform. Sequences obtained were assembled and single nucleotide polymorphisms (SNPs) were identified in the different individuals. SNPs analyses that resulted in aminoacid (a.a) change allowed the identification of specific haplotypes. Based on previous background information of cultivated cardoon from Beja, Portugal, one individual (AC2) of this accession was chosen as reference.

Prediction of protein structural changes
ExPAZy translate tool (http://web.expasy.org/translate/) was used to translate the nucleic acid sequence to corresponding peptide sequence and to identify a.a. changes, while BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to find similarities between sequences from other species and C. cardunculus. Sequence alignments were performed using ClustalW2 (http://www.ebi.ac.uk/Tools/ msa/clustalw2/).
Cc leaf phenolic-derived extractions A Soxhlet extraction was performed from leaves of the 20 individuals representative of haplotypes diversity (indicated in grey in Additional file 4a and b) according to Ramos et al. [72]. Before extraction, the samples were freeze-dried and ground to a granulometry of 40-60 mesh. Each sample (1.5-4 g of dry weight) was Soxhlet extracted with dichloromethane (EMPLURA) for 7 h to remove the lipophilic fraction. Two grams of the dry leftover solid residue were further extracted with 200 mL of methanol/water/acetic acid (49.5:49.5:1) under constant stirring, protected from light for 24 h at room temperature (RT). The liquid extract was then filtered; methanol and acetic acid were removed by low pressure evaporation (37-40°C) and water by freezedrying. The extraction yield was determined as the percentage of dry biomass material obtained. The obtained extracts were kept at RT protected from light and processed for the different analyses described below. One extract was prepared from each individual plant.
Qualitative composition analysis of Cc leaf phenolic-derived extracts by HT-UHPLC-MS n A HT-UHPLC-MS n analysis was performed following Ramos et al. [72]. The HPLC system was coupled to a LCQ Fleet ion trap mass spectrometer (ThermoFinnigan, San Jose, CA, USA), equipped with an electrospray ionization (ESI) source. The ESI-MS was operated under the negative ionization mode with a spray voltage of 5 kV and capillary temperature of 360°C. The flow rates of nitrogen sheath and auxiliary gas were 50 and 10 (arbitrary units), respectively. The capillary and tube lens voltages were set at −28 and −115 V, respectively. CID-MS n experiments were performed on mass-selected precursor ions in the range of m/z 100-2000. The isolation width of precursor ions was 1.0 mass units. The scan time was equal to 100 ms and the collision energy was 35%, using helium as collision gas. Phenolic-derived extracts were diluted in methanol/water (50:50) (12 mg/mL) and filtered through cellulose acetate filter, 0.22 μm pore size (Millipore, USA), prior to injection. The data acquisition was carried out by using Xcalibur® data system (ThermoFinnigan, San Jose, CA, USA).

Antioxidant activity
Antioxidant activity was determined by 2,2-diphenyl-1picrylhydrazyl (DPPH) assay according to Sánchez-Moreno et al. [75]. To determine the antioxidant activity, Cc leaf phenolic-derived extracts were diluted in MiliQ water (25 mg/mL) and filtered through cellulose acetate filter, 0.22 μm pore size. An aliquot of 150 μL of DPPH (100 μM) (Sigma, USA) was added to 17 μL of different Cc leaf phenolic-derived extract concentrations (0,10-12 mg/mL diluted in ethanol, in duplicate) in a 96-well plate. The absorbance at 520 nm was measured during 60 min against 167 μL of DPPH and 167 μL of ethanol in a Multiskan FC microplate reader (Thermo Scientific, USA). The inhibition of DPPH was calculated using the following formula: IC 50 was calculated from plotting % of DPPH inhibited and each correspondent extract concentration in logarithm. IC 50 was also calculated on standard antioxidant compounds: butylated hydroxyanisole (BHA), ascorbic acid and CGA. Antioxidant activity index (AAI) was also calculated according to Scherer & Godoy [76]: AAI = final concentration of DPPH (μg/mL)/IC50 (μg/mL), using the DPPH final concentration of 35.45 μg/mL.

Antithrombin activity
Antithrombin activity was measured according to Chistokhodo et al. [15]. Extracts were diluted in miliQ water (25 mg/mL) and filtered through sterile cellulose acetate filter, 0.22 μm pore size. A 96-well plate was used and 50 μL of extract (or MiliQ water as control) with 25 μL of the thrombin solution (500 units) (Sigma, USA) was placed in each well. The plate was incubated at 37°C, and after 5 min, 50 μL of thrombin generation chromogenic substrate 2 mM (β-Ala-Gly-Arg p-nitroanilide diacetate) (Sigma, USA) was placed in each well. The absorbance at 405 nm was measured during 5 min in a Multiskan FC microplate reader (Thermo Scientific, USA). The percent activity was calculated using the following formula: The antithrombin activity was measured in duplicate with Cc leaf phenolic-derived extracts at 1 mg extract/mL. CGA pure standard was alto tested between 4 and 300 μg extract/mL.

Statistical analysis
All parameters measured were analysed using the PROC GLM (procedure general linear model) option of SAS (Statistical Analysis System, SAS Institute Inc., Cary, NC). Where differences existed, the source of the differences at a P < 0.05 of significance level was identified by all pairwise multiple comparison procedure. The Tukey's test was used for pairwise comparisons.

SNPs-phenotype and phenotype-phenotype association evaluation
For association analysis between genotype-phenotype, the C3′H and HQT coding sequences with, respectively, 3 and 8 SNPs of the 20 Cc plants representative of the haplotypes diversity were used (Additional file 4 a and b). Association analysis between the CGA content, biological activities and the SNPs present in target genes was performed using TASSEL 5.0 software (http://tassel.bitbucket.org) [9] with the statistical standard GLM (general linear model). Polymorphic sites carrying rare alleles (frequencies <2%) were discarded to avoid biased associations. Significant associations were accepted with p-value ≤ 0.05 threshold.
To correlate phenotype-phenotype (traits) significances, a Pearson correlation between CGA and biological activities was performed using SAS (SAS Institute, Inc., Cary, NC, USA).

Results
Haplotypes identified for C3′H and HQT genes in C. cardunculus from different origins The C3′H and HQT genes were screened for point mutations in 127 Cc individuals from different origins (Additional file 1). HRM analysis allowed identification of different melting curves that, after sequencing, revealed 12 and 34 SNPs in C3′H and HQT coding sequences, respectively (Additional file 2). Single, double and multiple SNPs were identified in the amplicons covering the coding sequences of both genes. From these SNPs, we focused only on the nonsynonymous ones, namely 3 in C3′H and 8 in HQT (Table 1) that derived, respectively, in 4 and 17 groups of individuals with identical sequence named haplotypes (phased genotypes obtained with SNiPlay pipeline) ( Table 2). All the haplotypes and SNPs identified are described, respectively, in Additional file 4(a, b) and Additional file 5. Figure 1 represents examples of SNPs identified in C3′H and HQT amplicons by HRM profiles and confirmed by sequencing.
Network analysis of different haplogroups was constructed based on all the SNPs found in coding regions for both genes to infer relationships between haplotypes ( Fig. 2a and b). Haplotype A of C3′H gene is the dominant haplotype while haplotype B is only represented by a single individual. Haplotypes A and D of this gene present higher diversity within all the origins represented. Haplotype C of HQT gene is the dominant one among the analysed populations. From the 8 studied regions, haplotypes C, O and G of this gene were always present in 5 regions -Algeria, Italy, Spain, England and Portugal or Moldavia. All individuals from Norway were homozygous and fall within haplotype I of HQT gene. The four individuals from Hungary (heterozygous) comprise the two haplotypes J and L, while the five individuals of the studied Portuguese cultivated variety were heterozygous and comprise haplotypes A and B of HQT gene.

Changes at the protein level
We focused on the SNPs present in coding regions and therefore we decided to further examine changes in amino acid residues assessing their putative influence on protein structure as compared to the reference individual/allele AC2. AC2 was heterozygous for both C3′H and HQT genes.

Putative C3′H protein changes
Bioinformatic tools such as InterPro and PredictProtein were used to analyse the position of these SNPs in C3′H protein allowing the identification of putative structural/ functional protein domains. This analysis revealed that C3′H is a member of the cytochrome P450 family, an oxireductase acting as monooxygenase (InterPro). The  (Table 1), which occurs in haplotypes B and D relative to reference haplotype A, occurs in the transition zone between a random coil protein organization and on a α-helix conformation. Furthermore, NetSurfP 1.1 predicted that leucine or methionine in position 198 are buried between the two adjacent residues that are solvent exposed. Since both amino acids are classified as nonpolar, our hypothesis is that the straight side chain of methionine with a S-methyl thioether at the γ-carbon could alter the hydrophobicity causing protein destabilization as indicated by Lipscomb et al. [47]. The V352I (valine to isoleucine) alteration occurs only in haplotype B and could have lower impact on the protein secondary structure as both residues are very similar and the SNP occurs at a region predicted to be mostly in a buried α-helical zone. Moreover both residues were not susceptible to post-translational modifications. Haplotype C is unique in carrying the S194L mutation (serine to leucine). According to InterPro and PredictProtein tools, this alteration occurs in a predicted random coil solvent exposed zone of the protein, as well as in a proteinbinding region (S194 and E195). The putative impact of SNPs changes in the predicted protein secondary structure was also evaluated at the level of post-translational modifications such as phosphorylation, ubiquitination, SUMOylation and methylation. The serine-to-leucine substitution in residue 194 could imply loss of a residue putatively phosphorylated. The possibility that these changes account for the alteration of a polar to a hydrophobic residue, could lead to protein conformational changes and functional modification. Compared to reference haplotype A, haplotype C of C3′H (Fig. 3) presents two SNPs identified as M198L and V352I while haplotypes C and D carry only the S194L and M198L mutations, respectively.

Putative HQT protein changes
The hydroxycinnamoyl HQT protein from Cc should be accumulating in the cytoplasm (Predotar tool). The alignment of HQT sequences belonging to the plant hydrocinnamoyl transferase family from Ipomoea batatas (BAA87043), Nicotiana tabacum (CAE46932), Solanum lycopersicum (CAE46933) and Theobroma cacao (XP_007023475) revealed identities between 70 and 80% and indicated the existence of two structural motifs, HXXXD and DFGWG, highly conserved in the BAHD acyltransferase family. The HXXXD domain lies in the central protein portion and the histidine (H153) residue is necessary for full enzymatic activity [81]. The DFGWG motif, located near the protein C-terminus, is important for catalytic activity or binding to coenzyme A [6,82]. The schematic representation of HQT aminoacid sequence is represented in Fig. 3. The reference individual AC2 is heterozygous in nucleotide position 653 and, according to Phase and Gevalt algorithms, could present the two haplotypes A and B (Additional file 4b) that encode respectively, a serine (S) or a phenylalanine  Both haplotypes occur at the same frequency (2%) but in the studied population nucleotide position 653, the most prevalent allele, codes for cytosine (frequency of 56.3%), as it happens in haplotype A, which was chosen as the reference haplotype. This S218F alteration, relative to haplotype A, occurs in haplotypes B, C, M and Q in a predicted protein random coil solvent-exposed zone. Serine is a polar residue while phenylalanine is hydrophobic and this difference could have an effect on protein activity. Additionally, the high phosphorylation probability of the S218 is no longer expected in F218, with a putative impact on protein function. The T167A modification (threonine to alanine) occurs in haplotypes E, F, G, H, I, L, O, P and Q in a predicted buried α-helix region of the protein. The alteration from a polar threonine to a hydrophobic alanine, along with loss of a predicted phosphorylation site (T167), could affect HQT protein conformation and the stability of these haplotypes. The I183V alteration (isoleucine to valine) is present in the same haplotypes as the previous mutation T167A, and occurs in a buried random coil protein zone. Given the chemical and structural similarity of both residues this change may have a low impact on the overall protein structure/function. The A196T alteration (alanine to threonine) that occurs only on haplotype F could have a high impact in the HQT function. This alteration occurs near a sulphatebinding region (E199) and the change from a hydrophobic to a polar residue in the solvent exposed-zone could therefore affect protein structure, and increase chances of threonine phosphorylation. Haplotype F is composed by 50% accessions from Italy and 50% from Spain. The A228T change present in haplotypes D, E, F, G, H, J, K, L and P leads to an alteration from a hydrophobic alanine to a polar threonine in a predicted buried β-strand zone of the protein that could affect its conformation and stability. The alteration observed in 11 haplotypes (Additional file 4b) from lysine to glutamate (K268E) is predicted to be solvent exposed and occurs between two buried adjacent residues in a protein α-helix zone. Although, glutamate (E) residues can be methylated [80], available bioinformatics tools are only available for lysine (K) and arginine (R) residues. Glutamate (K268) it was not predicted to be a target for PTM. Taking this information into account plus the fact that both K and E residues are predicted to be solvent exposed, a change in a positive (K) to negative (E) local charge is likely to affect protein function. The D322N modification (aspartate to asparagine) observed in haplotypes D and P also occurs in a solvent exposed site between two buried adjacent residues in a protein α-helix zone. Although these residues are not susceptible to phosphorylation or ubiquitination, the N322 modification in haplotypes D and P reduced the chances for PTM at residues S319, T323 and K333. If these are indeed sites for HQT regulation by phosphorylation and ubiquitination, the D to N alteration reduces the PTM probability, with putative impact on protein function. Regarding the serine to isoleucine (S329I) modification in haplotypes E, L and M of HQT, it occurs in a predicted α-helix exposed zone, which could account for a modified polarity. The change of a serine residue could also account for a loss of PTM, with a putative implication in protein function.

Extraction yield of Cc leaf phenolic extracts
The extraction yield determined in 20 methanol/water/ acetic acid (49.5:49.5:1) extracts representative of Cc haplotype diversity are presented in Table 3. The extraction yields of the Cc leaf varied between 6.0% (AC2) and 47.0% (K3 and M1) (7.8-fold) ( Table 2). With the exception of AC2 (6.0%), D4 (6.5%) and A4 (19.3%) individuals, the extraction yields were slightly higher than those previously reported in the literature for methanol extracts of cardoon leaves (34.72%) [23] and for methanol/water/acetic acid extracts of cultivated cardoon leaves (28.0%) [72], the same ecotype of AC2 reference individual. Since the Cc plants used in this study were maintained, stored and handled in identical conditions, genetic differences could explain extraction yield differences, especially for the lower yields obtained for AC2 and D4 individuals. Yield differences obtained when comparing this study with others in the literature could not only be associated with the extraction procedures, but also with the different geographic location, life cycle stage, age and collection year.

Major phenolic compounds identified in Cc leaf phenolic extracts
Phenolic compounds were identified in the methanol/ water/acetic acid extracts of Cc leaves by HT-UHPLC-UV-MS n analysis. Table 4 shows the retention time, the maxima UV wavelengths, and the [M-H] − ion of main phenolic compounds identified in the Cc leaf phenolicderived extracts. These compounds were identified by comparing these data with those obtained by Ramos et al. [72]. Two hydroxycinnamic acids were identified in the Cc extracts, named as CGA and 1,5-di-O-caffeoylquinic acid, based on the UV spectra and the detection of [M-H] − ions at m/z 353 and 515, respectively (Table 4) [72]. Two flavones, as luteolin 7-O-glucoside and a luteolin acetyl-hexoside isomer, were identified according with the UV spectra and the detection of [M-H] − ions at m/z 447 and 489, respectively [72]. These phenolic compounds were found in all the methanol/water/acetic acid extracts of Cc leaves, derived from the individual A1 to the individual AC2 (Table 4).

Chlorogenic acid content, antioxidant and antithrombin activities
The CGA content, as well as antioxidant and antithrombin activities were determined in the 20 phenolic-derived extracts of Cc leaves and the results are presented in Table 3.
Regarding the CGA content (Table 3), Cc leaf extracts prepared from the J5 individual showed the highest CGA abundance, accounting for 1.19% (w/w extract) (P < 0.05). The CGA content of this extract was 8.5-fold higher than the lowest one in the Y1 individual with 0.11% (w/w). CGA levels of the analysed Cc leaf phenolic extracts were slightly higher than the one earlier described for cultivated cardoon leaf extracts (0.01% w/w) [72], including that for the same ecotype as the AC2 reference individual (0.35%). This difference could not only be related to a different life cycle stage and collection year but also to the age of the plant material analysed since, in this work, the CGA level was quantified in 9-month-old plants, while Ramos et al. [72] determined it in adult plants (>5 years).
Phenolic compounds present in medicinal plants are well-known for their antioxidant properties [91]. Scavenging activities of antioxidant agents are thought to be highly important in preventing damaging actions of free radicals involved in cardiovascular diseases and cancer [49]. In the present work, the antioxidant activity of Cc leaf phenolic-derived extracts was assessed through their scavenging capacity against DPPH free radicals. The inhibitory concentrations of Cc extracts and standard antioxidants needed to decrease by 50% (IC 50 ) the initial concentration of DPPH were thus determined ( Table 3). The lower the IC 50 the higher the antioxidant power. IC 50 values, regarding DPPH scavenging capacity of Cc extracts ranged from 44.48 μg/mL to 1417.64 μg/mL in the Q5 and Y1 individuals, respectively, with a 31.9-fold difference (P < 0.05). Nevertheless, Cc leaf phenolicderived extracts were less active relatively to the tested standard antioxidants, namely BHA (IC 50 = 7.27 ± 0.13 μg/mL) and ascorbic acid (IC 50 = 2.38 ± 0.09 μg/ mL). The IC 50 values of Cc extracts were also higher than CGA (IC 50 = 5.24 ± 0.35 μg/mL). Comparing values in terms of molarity, CGA IC 50 (0.014 ± 0.000 μM) is similar to the standard antioxidant ascorbic acid (0.013 ± 0.000 μM) and lower than BHA (0.040 ± 0.000 μM) showing an advantageous antioxidant potential for CGA present in Cc extracts. In order to compare the antioxidant efficiency of these extracts with data reported in the literature, the antioxidant activity index (AAI) was calculated taking the final concentration of DPPH into account (35.45 μg/mL). AAI of Cc leaf phenolic-derived extracts from the Q5 and J5 individuals (0.80 and 0.75, respectively) are in the same range as those obtained in previous studies using cardoon leaf extracts [23] and higher than those obtained for cultivated cardoon leaf extracts (0.14) [72]. Even for the reference individual AC2, the same ecotype previously studied by our research team [72], a higher AAI was also observed in the present work (0.23) which could be related to the higher CGA levels observed herein and explained above. Given the ability of CGA to scavenge DPPH free radicals, fractionation methodologies should be developed for hydroxycinnamic acids, or other phenolic compounds from Cc (especially for Q5 and J5 individuals), for potential use as alternative antioxidants in the food industry. Previous experiments from our research team using nanofiltration resulted in total phenolic compounds concentration, as well on chlorogenic acid levels improving the antioxidant activity in C. cardunculus leaf extracts [10].
Thrombin is an important enzyme in the blood coagulation process, being responsible for the conversion of soluble fibrinogen into stabilized insoluble fibrin. Its inhibition is important against many blood coagulation and platelet disorders [15]. Considering that 50% of the patients with tumours or cardiac irregularities such as atrial fibrillation undergo thrombosis and 95% show clotting activation [54], the effective inhibition of thrombus development can be reached by using selective thrombin inhibitors with no effect on any other coagulant enzyme [15]. Within this context, there is currently a quest for identification of novel thrombin inhibitors. Thus the Cc leaf phenolic extracts from all 20 individuals (at 1 mg/mL) were tested for their thrombin inhibitory potential ( Table 3). The inhibition levels detected varied from 14.95 to 42.85%, respectively, in the Y1 and B3 individuals (2.9-fold). The antithrombin effect of B3 was not statistically different from those of C1 and A4 Cc extracts (41.18 and 41.16%, respectively) (P > 0.05). Results from other studies using methanol extracts (in the same extract concentration) of Melia azedarach, Cyperus globulosus, Cinnamomum camphora, Ambrosia artemisiifolia, Paspalum dilatatum and Woodwardia radicans, revealed similar percentages of thrombin inhibition (≈40%). However, higher percentages of antithrombin activity (≈80-90%) were also described, namely with Callistemon lanceolatus, Lagerstroemia indica, Myrica cerifera, Polytrichum commune and Calocasia esculenta [15,54]. Interestingly, CGA standard (at 300 μg/mL), also showed thrombin inhibition activity (59.2%), in contrast to what was previously described by Bijak et al. [7], who reported that CGA had no significant inhibitory effect on thrombin activity at ≈354 μg/mL.

Association and correlation analysis
For SNPs phenotype association analysis we used CGA content, antioxidant and antithrombin activities (Table 3) of the 20 individuals representative of all the allelic variants found at C3′H and HQT coding sequences (Additional file 4a, b). The association analysis on TAS-SEL software identified 3 significant associations between SNPs and the studied traits with the statistical standard GLM model, minor allele frequency (MAF > 2%) and p-value ≤ 0.05 ( Table 5). The SNPs present in the C3′H gene did not reveal any association with traits. The two SNPs (S218F and D322N) from the HQT gene showed significant associations with CGA content and antioxidant activity.
To correlate trait significances, a Pearson correlation between results of CGA, antioxidant and antithrombin activities was further performed. This study revealed an inversely significant association with CGA content and IC 50 (P = 0.005 and r = −0.6). Antithrombin activity, however, was not related to any SNP or trait.
A collection of 29 accessions comprising 127 individuals of Cynara cardunculus was used for HRM analysis. Considering the genetic variability of Cc, a diploid crosspollinated species [55], whenever possible, 5 individuals from the same accession were selected. Most accessions analysed were from wild cardoon (25 accessions, 109 individuals). In addition, accessions of cultivated cardoon and artichoke were also analysed. In the context of germplasm origin, Italy was the best-represented country, with accessions available from different regions.
Considering the advantages of the HRM technique, such as the possibility of evaluating a large sample number (avoiding sequencing all of them), its simplicity, robustness and fast technique [38,41], this method proved to be, in our work, quite efficient in discriminating DNA with only one base difference and to allow distinction between heterozygous/homozygous SNPs in the amplicons of C3′H and HQT genes. Variability was observed by the heterozygosity level obtained. HRM allowed identification of 46 SNPs (confirmed by sequencing) in the coding sequences of C3′H and HQT genes. From these SNPs, 35 were classified as synonymous and, since they locate within exonic regions expected to be conserved in mature mRNA and consequently not interfering with mRNA splicing, they were no longer discussed within the present study.
For diploid species, such as Cynara, multiple statistical algorithms are used to infer phase genotypes. To reconstruct haplotypes of nonsynonymous SNPs, we used the SNiPlay pipeline that employs Phase or Gevalt algorithms. This analysis is based on identification of sets of alleles that are found together in multiple individuals and allowed, in our study, the definition of 21 haplotypes. The C3′H gene proved to be very conserved, with only 4 haplotypes, while for HQT, 17 haplotypes were identified. These results confirm HRM as good strategy to discover possible new allelic variants to genotype plants for assisted breeding.
The isolation and mapping of the C3′H genomic sequence from Cynara cardunculus L. var. scolymus was reported by Moglia et al. [61] with a nucleotide difference at position 447 (nucleotide not identified). The HQT cDNA sequences of artichoke and cultivated cardoon were first described by Comino et al. [17] and a G834C alteration was reported in both sequences. Sonnante et al. [77] also identified T486C and G271A alterations in C3′H and HQT coding sequences of artichoke, respectively. Although we could not identify these changes in our study, this work reports a number  The 11 nonsynonymous SNPs identified on C3′H and HQT genes were characterized using bioinformatics tools to assess their possible influence at the protein level. The C3′H protein belongs to the CYP98 family of plant cytochromes P450 (plant metabolism related) which has been labelled as the family of enzymes performing the meta-hydroxylation step in the phenylpropanoid pathway. This hydroxylation is essential not only for the synthesis of chlorogenic acid but it is also important for lignin biosynthesis [61]. The chlorogenic acid synthesis pathway is not yet completely defined but it seems that C3′H could use p-coumaroylquinic (first route) or/and p-coumaroylshikimic acids (second route) as substrates. C3′H from artichoke appears to have a lower affinity for quinate esters than shikimate esters [61] and route 2 has been reported as an important way for plants to synthesize CGA [17,51,66,85]. In our study, C3′H gene sequence analysis by bioinformatics tools revealed a putative endoplasmatic reticulum (ER) location for the C3′H protein. Phobius and SignalP tools revealed a signal peptide at the amino terminus of the protein.
According to the Uniprot database, signal peptides are found in proteins that are targeted to ER and either destined to be secreted to the extracellular or periplasmic space, or to be retained in the ER lumen. Predotar confirmed the cellular ER localization of the C3′H protein although this has not been confirmed in the literature. The S194L and M198L alterations may account for changes in protein structure/function. The M198L can lead to an alteration of hydrophobicity and protein destabilization since there is evidence of an estimated loss of stability of about 1.4 kcal/mol for each leucine-to-methionine substitution at a buried site within a folded protein [47].
HQT is a hydroxycinnamoyl transferase implicated in CGA production that can use either p-coumaroyl-CoA or caffeoyl-CoA esters as an acyl donor, using quinic acid as an acceptor [17]. A strong support for the second route (using caffeoyl-CoA) has been provided in tomato, where HQT gene silencing resulted in a 98% reduction in the level of chlorogenic acid [66]. The Predotar tool predicted HQT localization within the cytoplasm, but there is no cellular information to confirm this. The T167A, A196T, S218F, A228T, K268E, D322N, S329I alterations suggest destabilization of protein conformation, thus affecting HQT protein function. The discovery of these SNPs on C3′H and HQT coding sequences offers new research approaches to the study of the phenylpropanoid pathway in Cynara cardunculus.
Based on the different haplotypes obtained for C3′H and HQT genes (Additional file 4a, b), 20 individuals representative of the diversity observed were selected to produce leaf phenolic-derived extracts in a search for a putative association between nucleotide variations and phenotype. Leaves were chosen to produce the phenolicderived extracts, since it has been previously reported as the Cc organ with the higher extraction yields [23,72] as well as a higher polyphenol content [23].
Association analyses are becoming increasingly important to exploit the natural diversity of genes related to a particular phenotype. The SNPs present in the C3′H gene did not reveal any association with traits, and the antithrombin activity was also not significantly associated with any SNP. A total of two out of eight SNPs (S218F and D322N) from the HQT gene showed significant associations with CGA content and antioxidant activity. As previously explained, the S218F mutation leads to an alteration of hydrophobicity (S to F residues) in a solvent-exposed protein zone and also leads to a loss in a serine phosphorylation site resulting in a putative regulatory level alteration likely to affect protein function. The S218F alteration present in haplotypes B, C, M and Q is associated both with CGA content and antioxidant activity ( Table 5). As we observed (Table 3), the individuals J4, J5 and Q5 carrying this mutation (haplotypes C, M and Q) presented the highest chlorogenic acid content (not counting J3 and E4 individuals) and in turn the highest antioxidant activity (lowest IC 50 values). Despite the protein conformation predictions explained above, it seems that the existence of the S218F alteration on these HQT haplotypes could increase CGA content. Since biological properties of plants are due to their content in different metabolites, alterations in HQT gene can influence, positively the final CGA content and thus the level of the antioxidant activity. This result was confirmed by the Pearson correlation, which revealed an inversely significant correlation between CGA content and IC 50 meaning that CGA content improved the plant antioxidant activity. The D322N alteration present in haplotypes D and P could cause protein destabilization and changes at the regulatory level. This mutation leads to a change from a negative (D residue) to a polar uncharged residue (N) and influences the 319 and 323 phosphorylation sites as well as 333 ubiquitination causing loss of these regulatory elements. This alteration was significantly associated with antioxidant activity (IC 50 ). Heterozygous individual Y1 (haplotype D and P) yielded the highest IC 50 value. However, this association was not correlated with the CGA content.
Nevertheless, it should be highlighted that these putative associations could represent false positives and, since the association analysis power is highly dependent on the number of genotypes employed [4,28], a larger population would be desirable. In addition, further studies should validate the present results by replicating and verifying the associations to include functional analysis.
Antithrombin activity was not related to any SNP or trait.

Conclusion
In order to develop a targeted breeding strategy, directed at improving the content of the relevant plant metabolites, it is important to have a better knowledge of the C. cardunculus genome. Our work provides, for the first time, detailed information about the natural allelic variants of C3′H and HQT genes involved in C. cardunculus phenylpropanoid pathway.
The qualitative analysis in all the 20 Cc leaf phenolicderived extracts, identified two hydroxycinnamic acids, CGA and 1,5-di-O-caffeoylquinic acid, as well as two flavones, luteolin 7-O-glucoside and luteolin acetylhexoside. The differences obtained in the biological activities characterized here could be related to different synergisms of the total phenolic content including flavonoids of Cc extracts. The phenolic content of a plant depends on a number of intrinsic (genetic) and extrinsic (environmental) factors. The genotype proved to be a major determinant of variation in Cc polyphenol content and profile [23,27,67]. In this study different polyphenolic compositions could also be associated with genotypic differences of Cc individuals from diverse geographic locations. Usually, antioxidant activity has been correlated with total phenolic content [86]. Although in the present work CGA content is less than 1% in Cc leaf extracts, we found an evident correlation between CGA content and antioxidant activity meaning that this compound could have a significant impact on overall antioxidant power. Thus further studies on quantification, isolation and characterisation of Cc compounds responsible for specific biological activities are highly recommended.
In the context of the molecular diversity of target genes, the association results shown in this study may provide markers that are useful for Cc genetics, trait selection and breeding applications. Association analysis allowed identification of interesting haplotypes, such as C, M and Q haplotypes, which present the S218F alteration on the HQT sequence, associated with higher chlorogenic acid content and improved antioxidant activity. Further, the SNPs detected herein will be interesting for future association studies with other traits. This study opens new lines for research, specifically for validation of the identified SNPs by functional analysis and correlation to particular phenotypes. Orto Botanico di Sapienza Università di Roma (Italy). We also thank Teresa Brás for phenolic-derived extractions support, Conceição Fernandes for HPLC support, Flávia Fernandes for antioxidant activity support, Ângela Guerra for antithrombin support and Anabel Usié Chimenos and Marcos Ramos for bioinformatic support. We are also grateful for the access to the Horticultural Centre of School of Agriculture (ESA) of Beja, to place Cc plants. Finally, we are grateful for the careful English revision made by Prof. Margarida O. Krause (Univ. New Brunswick, Canada).

Availability of data and materials
The datasets supporting the conclusions of this article are included within the articleand in Additional files 6, 7 and 8.
Authors' contributions AMF performed the experimental work and carried out the bioinformatic data analysis. PR contributed for HT-UHPLC-MS n the planning and execution. OG and EJ contributed for the statistical analysis. IP contributed for the TASSEL software analysis. CC and JC contributed for HRM planning and execution. RL, MFD, MMO and SG contributed for the research planning, funding, discussion and paper writing. All authors read and approved the final manuscript.

Competing interests
The authors declare that they have no competing interests.

Consent for publication
Not applicable.
Ethics approval and consent to participate Not applicable.