Expansion and subfunctionalisation of flavonoid 3',5'-hydroxylases in the grapevine lineage

Background Flavonoid 3',5'-hydroxylases (F3'5'Hs) and flavonoid 3'-hydroxylases (F3'Hs) competitively control the synthesis of delphinidin and cyanidin, the precursors of blue and red anthocyanins. In most plants, F3'5'H genes are present in low-copy number, but in grapevine they are highly redundant. Results The first increase in F3'5'H copy number occurred in the progenitor of the eudicot clade at the time of the γ triplication. Further proliferation of F3'5'Hs has occurred in one of the paleologous loci after the separation of Vitaceae from other eurosids, giving rise to 15 paralogues within 650 kb. Twelve reside in 9 tandem blocks of ~35-55 kb that share 91-99% identity. The second paleologous F3'5'H has been maintained as an orphan gene in grapevines, and lacks orthologues in other plants. Duplicate F3'5'Hs have spatially and temporally partitioned expression profiles in grapevine. The orphan F3'5'H copy is highly expressed in vegetative organs. More recent duplicate F3'5'Hs are predominately expressed in berry skins. They differ only slightly in the coding region, but are distinguished in the structure of the promoter. Differences in cis-regulatory sequences of promoter regions are paralleled by temporal specialisation of gene transcription during fruit ripening. Variation in anthocyanin profiles consistently reflects changes in the F3'5'H mRNA pool across different cultivars. More F3'5'H copies are expressed at high levels in grapevine varieties with 93-94% of 3'5'-OH anthocyanins. In grapevines depleted in 3'5'-OH anthocyanins (15-45%), fewer F3'5'H copies are transcribed, and at lower levels. Conversely, only two copies of the gene encoding the competing F3'H enzyme are present in the grape genome; one copy is expressed in both vegetative and reproductive organs at comparable levels among cultivars, while the other is transcriptionally silent. Conclusions These results suggest that expansion and subfunctionalisation of F3'5'Hs have increased the complexity and diversification of the fruit colour phenotype among red grape varieties.

Anthocyanin biosynthesis takes place over 8-10 weeks, from shortly after berry softening (~60 days after blooming) until harvest [5]. F3'Hs are expressed at comparable levels in both anthocyanin-pigmented and green-skinned varieties, before and after the onset of ripening [6,4]. However, regulation of F3'5'Hs is largely genotype-specific and responsive to environmental cues [3,7]. The breadth of diversity in fruit colour among different grapevine accessions suggests a fine regulation of F3'5'H expression. Dark blue cultivars transcribe F3'5'Hs at higher levels than light red cultivars, which nevertheless maintain traces of 3'5'-OH anthocyanins and barely detectable F3'5'H transcripts. In green-skinned cultivars, F3'5'H transcripts are completely absent [8,9]. The invariant presence of some 3'5'-OH anthocyanins in red pigmented grapes contrasts with many other flowering plants such as roses, carnations, chrysanthemums, lilies, gerbera, and Arabidopsis, which accumulate anthocyanins but do not synthesise 3'5'-OH derivatives.
The lack of grapevines with F3'5'H loss-of-function genotypes could be explained either by selection, which acted against knockout mutations, or by gene redundancy, which obscured the effect of single-gene loss/ silencing. The observation that an absence of 3'5'-OH anthocyanins is generally tolerated in plants disfavours the first hypothesis. Furthermore, gene redundancy of F3'5'Hs is commonplace in grape genomes [10,11], contrasting with most other species that have single or twocopy F3'5'Hs, or none at all. We have previously shown that F3'5'Hs are highly duplicated, with multiple copies arrayed in clustered contigs of the 'Cabernet Sauvignon' physical map [10]. The genome assembly of the nearlyhomozygous line PN40024 [12] allows a deeper investigation into the structure of the F3'5'H locus and into the evolutionary events that caused their proliferation in grapevine.
Expansion of gene families is common in plant genomes [13], and results from various mechanisms of duplication: whole-genome duplication (WGD), segmental duplication, tandem duplication, and transpositional duplication [14,15]. WGDs have repeatedly occurred over evolutionary time in the common ancestor of eudicots and in specific lineages [12,16]. Segmental duplications occur over chromosomal regions, which may undergo subsequent rearrangement. Tandem duplications generate nearby gene copies [13]. Small-scale duplications may also cause transposition of one of the duplicate genes to an ectopic site. In this paper, local duplications of small fragments (<10 kb) containing a single gene are referred to as tandem duplications. Duplication of DNA blocks >10 kb are referred to as segmental duplications.
Retention of duplicate genes results from a stochastic process, in which the effect of the earliest mutation occurring after duplication governs the fate of extra copies. Deleterious mutations occur much more frequently than mutations resulting in novel and favourable functions [17]. Following this assumption, gene disruption would largely prevail, with genomes populated by vestiges of ancient duplicates. This raises the question as to why intact duplicates are maintained and expressed much more frequently than expected by chance. According to the duplication-degeneration-complementation (DDC) model [18], degenerative mutations promote preservation of duplicate genes. Deleterious mutations in regulatory regions could eliminate different cis-elements in either duplicate, making both copies necessary to provide the full-complement of the expression profile of the ancestral single copy [19]. This kind of partitioned expression among duplicate genes is referred to as subfunctionalisation, and includes differential expression among organs and developmental stages, or in response to environmental cues [20][21][22][23][24][25].
Duplicate genes involved in secondary metabolism or that are responsive to environmental stimuli appear to be more frequently maintained [26][27][28], and have more highly diverged transcriptional patterns and intraspecific variation in expression [29] than duplicate genes in other categories. The pioneering study of [30] provided a paradigmatic case of duplication and transcriptional diversification in members of the stilbene synthase gene family in grapevine. It is generally assumed that maintenance of duplicate genes provides a foundation for consolidation and refinement of established functions, particularly in secondary metabolism, by preserving extra copies that guarantee a gene reservoir for adaptive evolution, free from the constraints of purifying selection [31][32][33].
In this paper, we present (i) the evolutionary path that led to the structural architecture of the F3'5'H gene family in grapevine, (ii) the transcriptional sub-functionalisation of duplicate copies among organs and developmental stages, and (iii) the extent of variation of expression patterns in four cultivars with divergent anthocyanin profiles.

F3'5'Hs and F3'Hs in grapevine: genomic location and phylogeny
Sixteen copies of F3'5'Hs are present in the PN40024 genome. Each F3'5'H copy is referred to as F3'5'Ha through F3'5'Hp, with the alphabetical order reflecting their genomic coordinates [see Additional file 1]. Fifteen of them (F3'5'Ha-o) reside in a tandem array within a 650-kb region on chromosome (chr) 6. This chromosomal region is syntenic with the homoeologous chr1 and 9 in poplar, and with supercontig157 in papaya ( Figure 1a). An isolated F3'5'H copy (F3'5'Hp) resides on grapevine chr8, a chromosome that was homoeologous to chr6 in the paleohexaploid ancestor [12]. However, other genes in a 100-kb interval around F3'5'Hp are single-copy, and not collinear with genes in the region on chr6 surrounding the other F3'5'Hs [see Additional file 2]. F3'5'Hp is an orphan gene that lacks orthologues in other sequenced dicots and in EST databases. In poplar, one or both homoeologous loci syntenic with the grapevine F3'5'Hp region, which are present in the homoeologous chr6 and chr16 generated by the Salicoid WGD [34], have maintained the collinear genes present in grapevine, except for F3'5'Hp ( Figure 1b). Seven F3'5'Hs on grapevine chr6 (F3'5'Hd, -f, -j, -l, -m, -n, -o) and F3'5'Hp on chr8 encode full-length proteins. In the haplotype of PN40024, the remainder gene models are either gene fragments without homology outside of conserved regions, or coding regions interrupted by transposable elements (TEs) or frameshift indels [see Additional file 3].
Grapevine contains two copies of F3'H (F3'Ha and F3'Hb) located in a 25-kb interval on chr17 [see Additional file 4a]. F3'Hs reside in two blocks of~5 kb, which share 93.5% identity over 4.3 kb of conserved sequence, separated by~16 kb largely consisting of repetitive elements. Both F3'Hs encode full-length proteins. F3'Ha and F3'Hb share 97% amino acid identity, but their genomic sequences differ extensively due to a large indel in the terminal intron [see Additional file 4]. Other genes surrounding the two F3'H copies on chr17 are not collinear with genes surrounding F3'Hs on chr6 or on chr8.
Evolution of the F3'5'H locus on chromosome 6 The pattern and mode of gene duplication were characterised through several approaches: (i) dot plot selfcomparison of the entire locus, (ii) conservation of non-coding sequences, TE patterns, and sequence divergence between long terminal repeats (LTRs) of retrotransposons in duplicate blocks, (iii) level of identity between 10-kb windows around each F3'5'H, (iv) intron divergence between the most recent duplicated Figure 2 Relatedness between F3'5'Hs and F3'Hs in completely sequenced genomes of six plant species (grapevine, poplar, papaya, Arabidopsis, rice, and sorghum) and in another 28 plants (indicated in the tree by the genus). The dashed grey branch connects the halves of the tree including either F3'5'Hs (top) or F3'Hs (bottom). A magnified view of the relatedness between grapevine F3'5'Hs is given in the box. Bootstrapping was performed with 1,000 replicates. Percentage of replicates supporting each branch is given for major branches discussed in the text.  A dot plot self-comparison of the locus identified 9 blocks of DNA ranging in size from 35 to 55 kb, each containing one or two copies of F3'5'H at the forefront of the block (Figure 4). The remaining F3'5'H copies in this locus (F3'5'Hm, F3'5'Hn, and F3'5'Ho) are located downstream of the segmental duplications. Duplicated blocks do not contain genes other than F3'5'Hs and are largely composed of repetitive DNA (Figure 4 and 5).
Blocks 1, 2, 3, 5, 6, 7, and 8 share 90-99% nucleotide identity, and each contain a CACTA and a Gypsy TE ( Figure 5 and [see Additional file 6]). The ubiquitous presence of this Gypsy element across these blocks and the nucleotide substitution rate of 0.092 ± 0.023 between its LTRs date the Gypsy insertion to the ancestral single-copy sequence, recently in the evolutionary history of Vitaceae. The present-day block 6 is more reminiscent of the ancestral state of the sequence that initiated segmental duplications than blocks 1, 2, 3, 5, 7, and 8, as evidenced by the wide conservation of block 6 sequences among all of the other blocks, and by the fact that all of the other blocks resemble block 6 with various structural modifications. Blocks 1 and 2 resemble block 6 except for vestiges of a Gypsy element in the middle of the block. Block 3 is nearly identical to block 6, except for a recent Gypsy insertion into the shared Gypsy element. Sequence divergence between LTRs of this nested Gypsy is 0.003. Block 5 has undergone the most rearrangements, including hAT and Gypsy insertions at the extremities of the block, and two Gypsy invasions upstream and downstream of the proximal CACTA with low divergence between their LTRs (0.068). Block 7 has~17 kb of extra DNA with respect to block 6 due to a Copia insertion and a nested Gypsy insertion into the shared CACTA. With respect to block 6, block 8 has an additional CACTA. Blocks 4 and 9 differ extensively from all other blocks and share 94.5% identity with each other ( Figure 5 and [see Additional file 6]). A Mutator insertion predated the duplication of their common ancestor. In block 4, a Gypsy element has moved into the Mutator shared with block 9, and a Copia with 0.068 divergence between its LTRs has invaded the distal side. Block 9 was invaded by a Gypsy element with identical LTRs and by a Copia with 0.018 genetic distance between its LTRs. Sequence conservation in a 10-kb window surrounding each F3'5'H copy supports the hypothesis that most of the copies were generated by duplications of the entire segment in which they reside [see Additional file 7], with the following exceptions. Downstream of the segmental duplications, sequence similarity between the nearly identical copies F3'5'Hm and F3'5'Hn does not extend more than~2 kb beyond each side of their coding regions. F3'5'Hk and -l are both located upstream of block 9. F3'5'Hl and its 5' non-coding region are dissimilar from the paralogous F3'5'H in duplicate blocks 4 and 9, as though F3'5'Hl originated from a small scale duplication of F3'5'Hg, -m, or -n. F3'5'Ho, the copy at the far extremity of the locus, shares low similarity only upstream of the coding region with F3'5'Ha, -b, -c, -d, -e, and -h. F3'5'Hp, the copy on chr8, has no similarity outside of the coding region with other F3'5'Hs.
Intronic sequences of highly similar paralogous F3'5'Hs reflect the relatedness of the entirety of the duplicated block in which each F3'5'H resides [see Additional file 5b]. The few F3'5'Hs that lie in pairs at the forefront of a duplicate block (F3'5'Ha and -b; F3'5'Hc and -d) are less similar within the pair than with a member of a different pair. Thus, paired F3'5'Hs at the forefront of blocks 1 and 2 originated from an ectopic duplication before the duplication of the corresponding segment. The absence of intronless F3'5'Hs excluded a role for retroposition in the process of gene duplication.
Conservation of duplicate F3'5'Hs in the family Vitaceae was assayed by PCR with copy-specific primers. The orphan F3'5'Hp gene on chr8 was detected in the genera Parthenocissus and Vitis, while it was faintly amplified in Ampelopsis, likely due to more divergent priming sites [see Additional file 8]. In contrast, only a few primer pairs that amplified the most recent duplicate genes in Vitis genomes yielded amplicons in Parthenocissus or Ampelopsis. A wide sample of cultivars and species within the genus Vitis bears the marks of that expansion [see Additional file 8], including wine and table cultivars of Vitis vinifera, Asian and American Vitis species, and the muscadine grape.

Prediction of functional domains among duplicate F3'5'Hs
According to [35] and [36], six functional domains in the F3'5'H enzyme are important for the determination of substrate specificity and 3' vs. 3'5'-OH activity (substrate recognition sites, SRS; candidate region, CR1). F3'5'Ha, -c, -e, and -h are truncated in the PN40024 genome, and lack one or more functional domains [see Additional file 9]. All other grapevine F3'5'Hs except F3'5'Ho have invariant amino acids specific for 3'5'hydroxylation activity. In plants, F3'5'Hs are conserved at three critical positions in the CR1 (positions 1, 3, and 10, which correspond to amino acids 178, 180, and 187 in the Osteospermum F3'5'H reference sequence used in [36]) and at two positions in the SRS6 (positions 5 and 8, which correspond to amino acids 484 and 487 in Osteospermum F3'5'H) [see Additional file 9]. All grapevine F3'5'Hs that diverged less than 4DTV~0.046 show complete amino acid conservation at the CR1 and SRS6 domains. F3'5'Hp on chr8 and F3'5'Ho, -m, and -n, the most divergent copies in the F3'5'H array on chr6, have a Met-to-Ile substitution at CR1 position 3 with respect to other paralogues. This substitution is shared with F3'5'Hs in grasses. F3'5'Ho also has an Ala-to-Thr substitution at SRS6 position 8, which is shared with corn and sorghum F3'5'Hs, as well as with most of the F3'Hs. F3'5'Hp has an Ala-to-Val substitution at the same position, which is uniquely shared with F3'5'Hs from orchids. F3'5'Ho has extensively diverged from all other F3'5'Hs at SRS1 and SRS2, while F3'5'Hp has peculiar amino acid substitutions at SRS2, SRS4, and SRS5.

Variation in promoter regions of duplicate F3'5'Hs
Duplicate F3'5'Hs have originated from segmental duplications of large DNA blocks, which included the coding sequences and several kilobases of the surrounding DNA. In some cases, reorganisation of promoter regions within 2-kb upstream of the start codon occurred via TE insertion, for example Copia and hAT elements in the common ancestor of the present-day F3'5'Hc and -e duplicates. In other cases ( Figure 6), structural variation in the promoter was caused by insertions/deletions of DNA segments of variable length up to a few hundred nucleotides, which do not belong to any annotated class of repetitive elements. These inserted/deleted portions are neither detected by algorithms of repetitive DNA search such as ReAS, nor are they duplicated elsewhere in the genome based on blastN searches. Structural variation in the promoters of F3'5'Hs often occurred in a complementary fashion among gene copies, with a segment of one promoter having been lost in one duplicate but maintained in another, and vice versa. Comparison among triplets of promoters indicated that those segments were more often conserved in two F3'5'Hs and absent from the third one than vice versa. All of this evidence excludes a mechanism of copy-and-paste insertion in the promoter of either duplicate gene, and favours the alternative hypothesis that structural deletions in the promoters of daughter copies have progressively degenerated the original sequence of the ancestral single-copy gene, partitioning the full complement of the regulatory information among copies.
Deletions may have asymmetrically erased cis-elements from regulatory regions of duplicate F3'5'Hs. Thus, the 2-kb promoter regions of duplicate F3'5'Hs were searched for DNA-binding motifs ( Figure 6). Segments that were alternatively maintained in either promoter contained binding sites for Myb-type transcription factors, light-responsive and drought-inducible ciselements, motifs sensitive to ABA and methyl-jasmonate, and heat stress responsive motifs. Relatedness between the alignable regions of duplicate promoters was also evident from a phylogenetic tree [see Additional file 5c].

Spatial expression patterns of duplicate F3'5'Hs and F3'Hs
Expression analyses were conducted on nine out of the sixteen F3'5'H copies for which primer pairs could individually distinguish each paralogue and that passed the thresholds of PCR efficiency as set in the Methods section.
Duplicate F3'5'Hs are asymmetrically expressed across organs (Figure 7) [see Additional file 10]. The orphan copy F3'5'Hp is highly expressed in all vegetative organs (leaf, petiole, tendril, flower, and shoot) and very weakly in fruit. The highly duplicated F3'5'Hs that reside in segmental duplications on chr6 are preferentially expressed in berry skin. Expression of F3'5'Hm, -n, and -o, three copies located outside of the segmentally duplicated region on chr6, was detectable in some vegetative organs, but not in berry skin during ripening in all cultivars tested [see Additional file 11]. In fruit, none of the F3'5'Hs that are expressed in cultivars accumulating anthocyanins ('Aglianico', 'Marzemino', 'Grignolino', and 'Nebbiolo') are expressed during ripening in the greenskinned cultivar 'Tocai' (data not shown). Transcripts of F3'Hb were never detected in the organs analysed in this study [see Additional file 4c] and weak expression of this copy was detected exclusively in adventitious roots of 'Cabernet Sauvignon' [37].

Expression of the F3'5'H gene family and variation of anthocyanin profiles across different cultivars
Berries of four cultivars were sampled at eight developmental stages in order to quantify cumulative expression of the F3'5'H gene family and relative contribution of individual F3'5'H copies, and to determine anthocyanin profiles. The accessions 'Aglianico', 'Grignolino', 'Marzemino', and 'Nebbiolo' were chosen for their contrasting phenotypes of fruit colour, based on literature reports [4,9].
As a whole, expression of the F3'5'H gene family levelled off before veraison [see Additional file 12], in step with other genes of the flavonoid pathway [5]. F3'5'Hs became increasingly more expressed at 10% veraison, peaking at full-veraison and ten days after fullveraison. Expression then declined two weeks before harvest and at harvest, but remained at higher levels than those detected before the onset of ripening.
The level of expression of every F3'5'H copy was highly variable in berry skin of different cultivars (Figure 9). As a result, the contribution of individual gene copies to the F3'5'H transcript pool was unique to each cultivar. PCR efficiency differences across cultivars are inherent when dealing with four heterozygous grapevine accessions of unrelated pedigree, due to possible nucleotide divergence across the eight haplotypes. For each F3'5'H primer pair we assessed that the standard deviation of PCR efficiency among cultivars is less than 10%, and it is therefore unlikely to explain these results. A two-way ANOVA identified significant differences in relative transcript levels among duplicate F3'5'Hs within each cultivar. F3'5'Hf was the predominately expressed copy in 'Aglianico'. PCR efficiency for this copy in 'Aglianico' was 96.2%, which is within the bounds of the standard deviation of the average PCR efficiency of this gene family in the same cultivar (92.9% ± 4.6%). F3'5'Hi was the predominately expressed copy in 'Nebbiolo', and also in 'Grignolino' together with F3'5'Hf. In contrast, F3'5'Hj expression predominated in 'Marzemino'. F3'5'Hg, -h, -l, and -p were consistently expressed at lower levels across all cultivars, despite the observation that PCR efficiencies of their primer pairs were not lower than other F3'5'H copies in the accessions under study. Traces of transcripts of the copies F3'5'Hm, -n, and -o were never detected in the preliminary semiquantitative PCR screening at any stage of berry ripening in any of the accessions tested, even when PCR products were stained with silver nitrate for high sensitivity. Thus, they were excluded from further investigation by qPCR.
A three-way ANOVA was used to decouple and test the significance of three factors that contributed to the observed variation of expression patterns: gene-copy, cultivar, and developmental stage [see Additional file 12]. All three factors were significant, as well as the interactions: gene-copy × developmental stage, gene-copy × cultivar, cultivar × developmental stage, and gene-copy × cultivar × developmental stage (P < 0.00001).

Distinct temporal expression patterns of duplicate F3'5'Hs during ripening
Individual gene copies were differentially regulated during ripening. Differences in the expression pattern of individual F3'5'Hs with regard to developmental time were statistically significant in each of the four varieties, separately analysed by one-way ANOVA and when averaged across cultivars ( Figure 10). F3'5'Hi and -j were expressed early, and attained a peak of expression between full-veraison and ten days post-veraison, consistently among cultivars. Late in ripening, F3'5'H expression was predominated by transcripts of F3'5'Hf, -g ,-h, and -l.

Expansion of the F3'5'H family in grapevine
Gene-copy number of F3'5'Hs has increased in the grapevine lineage through recurrent cycles of duplication. The most ancient duplication resulted in two F3'5'H loci. One of these, F3'5'Hp, has been maintained as a single-copy gene on chr8 in grapevine and other Vitaceae but lost from other dicot genomes. The other was the founder of the present-day F3'5'H gene array on chr6, orthologous to the F3'5'Hs expressed in other dicot species and syntenic with the F3'5'H loci found in poplar and papaya (Figure 1). The 4DTV distance between F3'5'Hp and other F3'5'H copies is close to the peak of 4DTV distances between grape paleologues observed by Tang and coworkers [16] ( Figure 3). Timing of the earliest F3'5'H duplication is  therefore coincident with the event of eudicot γ hexaploidy [16], and the chromosomes in which the duplicate genes reside are indeed paleologous chromosomes [12].
The orphan copy F3'5'Hp is predominantly expressed in grape vegetative organs, in contrast with the F3'5'H copies on chr6, which are predominantly expressed in fruit (Figure 7). Several amino acid substitutions in F3'5'Hp are shared with F3'Hs and monocot F3'5'Hs. For instance, F3'5'Hs are present in many monocot species, but in all cases studied, their transcription is    uncoupled from the expression of other genes in the anthocyanin pathway. As a result monocots seldom accumulate 3'5'-OH anthocyanins [38]. For example, seed coats of rice varieties with dark red pigmentation contain exclusively 3'-OH anthocyanins, and the same holds true for sorghum and purple corn. 3'5'-OH anthocyanins are also absent in blue flowers of Dendrobium and Phalaenopsis orchids, albeit the detection of 3'5'-OH flavonols provides evidence for F3'5'H activity [39].
Expansion of F3'5'Hs on chr6 occurred in the Vitaceae lineage after the separation from other dicots. Indeed, F3'5'H genes are present in low copy number in other fully sequenced plant genomes, if not lost. F3'5'H is absent from Arabidopsis, single-copy in rice and papaya, and dual-copy in poplar and sorghum. In poplar, the two copies of F3'5'H were generated by the Salicoid WGD [34]. The presence of a single-copy gene in the syntenic locus of poplar and papaya (Figure 1), and molecular dating of grapevine paralogues favour the hypothesis of lineage-specific gene duplications. The estimated age of F3'5'H duplications based on transversion rate at fourfold synonymous third-codon positions predicts most duplicate copies having diverged by less than 4DTṼ 0.046 (Figure 3). If the molecular clock in grape is approximately calibrated by comparing the evolutionary rates in perennial dicots, the 4DTV distance of~0.046 in grape is roughly half of the median 4DTV distance (~0.091) observed in poplar between duplicate genes that arose from the 60-65 myr-old Salicoid duplication [34]. However, grape has evolved more slowly than poplar, and the distances between paleologous genes that arose from the γ triplication are lower in grape (median Ks, 1.22) than in poplar (median Ks, 1.54), as estimated by [16]. Thus, recalibrating the mutation rate in grape, the 4DTV distances between F3'5'H in the chr6 array suggest that most duplications occurred within the past~40 myr.
Molecular dating based on rate of nucleotide divergence is consistent with the conservation of duplicate gene copies across lineages in the family Vitaceae. While most of the 4DTV~0.046 copies are conserved among Vitis species, they failed to be amplified from the DNA of related genera Ampelopsis and Parthenocissus. Conversely, the paleologous F3'5'Hp was conserved among these genera. Fossil records from the Late Cretaceous dates the radiation of Vitis, Ampelopsis, and Parthenocissus genera back to~65 mya [40], confirming that most of the F3'5'H expansion occurred in an ancestor of the Vitis lineage, after the separation from the related lineages Ampelopsis and Parthenocissus.
The founder of the array of F3'5'Hs on chr6 was initially duplicated through tandem gene duplication. Subsequently, different F3'5'H copies were involved in reiterated segmental duplications of large DNA blocks in which they resided, generating 9 blocks that range in size from~35 to 55 kb (Figure 4 and 5). This modular structure suggests that unequal crossing-over between mispaired blocks was the most likely force that shaped the locus. Subsequent reorganisation via TE insertion, deletion, etc., resulted in structural variation among blocks, which might have reduced illegitimate recombination between adjacent blocks, thus resulting in the maintenance of the number of duplicates within the current bounds. Although our data suggest that most of the F3'5'H copies are maintained across grape varieties, at least in a heterozygous state, the extent of structural variation among haplotypes remains to be determined.

Regulatory diversification within the F3'5'H family and anthocyanin profiles
Transcriptional subfunctionalisation has widely occurred within the F3'5'H family and is detectable even between some of the most recent duplicates that diverged less than 4DTV~0.046. This is evident, for instance, among F3'5'Hf, -j, and -l, which have retained >94% amino acid identity, and among F3'5'Hf, -g, and -l, which show conservation at the CR1 and SRS6 domains for 3'5'-OH activity. Transcriptional subfunctionalisation is therefore one of the forces, if not the predominant one, that is responsible for the retention of the most recent duplicate F3'5'Hs in grapevine. The extensive structural variation found in their 5' regulatory region, and the observed partitioned expression among organs and developmental stages might have promoted the diversification of duplicates shortly after their origination, and thus the preservation of both duplicates. These pieces of evidence fit well into the DDC model. Deletion of regulatory modules is expected to occur by chance in promoters of duplicate genes, eliminating different cis-elements in either duplicate and diversifying their expression profiles [18].
Alternatively, a gene dosage model may also explain retention of duplicate F3'5'Hs [41], under the assumption that a fitness advantage is provided by extra F3'5'H copies. F3'5'H gene products compete with F3'H gene products for the enzymatic transformation of flavonoid substrates into delphinidin or cyanidin precursors. Copy number variation is a common cause of altered stoichiometry of concerted enzyme activities within metabolic pathways, which results in phenotypic variation [42]. Unbalanced phenotypes with increased levels of 3'5'-OH anthocyanins might have increased fitness, due to dissipation of high-energy blue wavelengths, attenuation of UV-B radiation, or conspicuousness of fruits to seed dispersers [43][44][45][46].
Transcriptional regulation of duplicate F3'5'Hs in berry skin is largely dependent on genotype, consistent with the observation in other plants that tandem duplicates have highly variable expression patterns [29]. In the present work, differential expression within the F3'5'H gene family between different cultivars was associated with the differential accumulation of 3'5'-OH anthocyanins. In the field, F3'5'H gene expression has a functional impact on anthocyanin biosynthesis that persists during fruit ripening. Different copies of duplicate F3'5'Hs have also become temporally specialised for different developmental stages of berry ripening ( Figure  10). The question remains as to why these nuanced expression patterns have been maintained evolutionarily. One hypothesis is that copy-specific cis-elements confer unique, adaptive patterns of expression and environmental responsiveness by increasing the ratio of F3'5'H/ F3'H enzyme concentration (and thus 3'5'-OH anthocyanins) under circumstances when accumulation of this class of metabolites is advantageous.

Conclusions
Expansion in copy-number and transcriptional specialisation of F3'5'Hs have increased the regulatory complexity of anthocyanin biosynthesis and fruit colour among red grape varieties. Most duplications occurred rather recently within this gene family, long after the Vitaceae lineage had separated from other dicot lineages. Among duplicate copies, accumulation of structural variation in promoter regions was more significant than divergence in coding regions. Transcriptional subfunctionalisation across organs and along developmental stages in ripening fruit was commonplace among gene copies, in addition to the extensive variation in gene expression among different cultivars. Transcriptional differences within the F3'5'H gene family in different accessions were paralleled by significant changes in the major metabolites synthesised by the F3'5'H gene products. In berry skin, the abundance of different anthocyanins that modulate the pigmentation of red grapes and wines was greatly affected by these transcriptional variations.

Sequence analyses
F3'5'Hs and F3'Hs were identified in grapevine (on chr6, chr8, and chr17 sequence assemblies deposited under the NCBI accession no. FN597024, FN597027, FN597042 as of 25 November 2009), poplar (version 1.0, [34]), Arabidopsis, rice, papaya, and sorghum (version of the genome assemblies available at Phytozome [56] as of November 2009) by tBlastN homology, using cytochrome P450 monooxygenases of the CYP75A subfamily (accession no. AAP31058, AB078781, AJ011862, Z22544, BAA03439, BAA03440) and the CYP75B sub-family (AY117551, BAD00189, AF155332) as a query. Matches were retained at thresholds of E<e -20 and amino acid identity >50%. Each sequence was extended on each side until the next gene and annotated using GenScan, FgenesH, GeneMark, and Geneid. Sequence alignments were carried out using ClustalX. Exon-intron structure was predicted by comparison with ESTs and amino acid sequences from other plants. Trees were constructed using MEGA. Nucleotide substitution rate was calculated using DNAsp 4.0. 4DTV values were calculated and corrected for possible multiple transversions according to [16]. Gene models other than F3'(5')H were given the predicted function of their best match in the NCBI protein database. Syntenic regions were identified using the Genome Evolution tool [57]. Transposable elements were annotated according to the grape genome browser information [58]. LTRs in Copia and Gypsy retrotransposons were identified by dot plot analysis. Global DNA alignments of chromosomal segments were performed using LAGAN [59] in a window of 100 bp with a minimum identity of 70%. Dot plots of segmental duplications were made using Dotter. Alignments of 2-kb promoter regions were performed with DiAlign2, using a minimum HSP length of 10 bp and visualised with GEvo. DNA binding motifs were predicted by PlantCARE [60].

Selective amplification of F3'5'Hs and F3'Hs paralogues
Selective primers were designed across dissimilar exonic DNA stretches or using a 3'-terminal SNP between the perfect match of the target gene-copy and the mismatched annealing site of paralogous sequences [see Additional file 13]. Absence of illegitimate cross-amplification of other paralogues was validated by amplification of genomic DNA, Sanger sequencing of the PCR products, and detection of variable sites inside of primer sequences that distinguished the target gene-copy from other paralogues. qPCR efficiencies in amplifying the DNA of PN40024 (from whose genome sequence genecopy specific primers were designed) and of the mixed haplotypes of every heterozygous cultivar used in the present study were calculated using the equation E = 10 -1/slope of the standard curve. The standard curve was constructed with five 10-fold serial dilutions, using cDNA from organs and developmental stages in which the specific gene-copy was expressed or, if not possible, genomic DNA. Paralogue-specific primers with a PCR efficiency comprised between 90 and 110% in PN40024 were considered acceptable, and were used for qPCR if the standard deviation of their PCR efficiencies among the accessions under study was less than 10%. PCR primers that distinguished individual paleologous copies, as well as highly similar paralogues, and passed the thresholds set for the qPCR experiment, could be developed for nine out of the sixteen F3'5'H copies. The remaining copies were either highly identical in sequence or contained only a few polymorphic sites within DNA segments unsuitable for primer design. The range of variation in average PCR efficiency of primer pairs among the accessions tested was within the bounds of 87% in 'Marzemino' and 102% in 'Nebbiolo', with a similar average efficiency of 93% in 'Aglianico' and 'Grignolino'. This excluded a substantial cultivar effect of the efficiency of primer annealing during qPCR on the estimation of transcript levels of the whole gene family among cultivars, caused by possible SNPs in the annealing sites across haplotypes.

Experimental design and statistics in expression and metabolite analyses
Variation in anthocyanin profile and in transcriptional level of duplicate genes among developmental stages and cultivars was studied using a complete randomized design, and tested for significance using ANOVA run by COSTAT statistical package (CoHort Software, Monterey, CA, USA). Each plot consisted of 10-in-a-row clonally replicated plants in north-south oriented rows.
Vines were grown at the germplasm repository of Vivai Cooperativi Rauscedo, northeastern Italy (46°04' N; 12°50' E; 110 masl). Vines were trained using the Sylvoz system. Three biological replicates of 20 berries per cultivar were collected at each developmental stage [see Additional file 14]. Berries of each replicate were collected in the vineyard on both sides of canopy by random sampling on every plant within each plot. Samples were frozen immediately in liquid nitrogen and stored at -80°C until processed. Skin of each biological replicate was peeled from frozen berries, powdered in liquid nitrogen, and split to obtain a 100 mg aliquot for RNA extraction and a 200 mg aliquot for anthocyanin extraction. A three-way ANOVA was used to partition the factors that contributed to expression divergence in ripening fruit: gene-copy, cultivar and developmental stage, and their interactions. A two-way ANOVA was used to assess the effect of gene-copy and developmental stage on expression level, regardless of the cultivar.
A one-way ANOVA was used to assess the same effect in each cultivar, as well as the differences in metabolite content and composition among cultivars. Statistically significant differences were determined using the Student-Newman-Keuls test (P < 0.05).

Transcript profiling
Total RNA was extracted as described in [61], treated with RNase-Free DNase I Set (Qiagen S.p.A., Milan, Italy), and purified with RNeasy MinElute Cleanup (Qiagen S.p.A., Milan, Italy) according to manufacturer's instructions. Complete removal of gDNA was assessed by direct use of treated RNA as a template for PCR reactions using the gene VvUbiquitin1. Absence of PCR products was visually inspected in 1% agarose gel stained with ethidium bromide. Absence of gDNA in reverse-transcribed samples was further confirmed by the melting curve performed during qPCR cycling using the intron-flanking primers for the normalisation gene VvUbiquitin1. The integrity of treated RNA was verified by electrophoresis in 1% agarose gel stained with ethidium bromide. RNA purity (A 260 /A 280 nm) and quantification were estimated using a Nanodrop 1000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE, USA). cDNA was synthesised using 2 μg of treated RNA, 0.5 μM (dT) 18  solution (5 PRIME GmbH, Hamburg, Germany, Cat# 2200800), and 200 nM of each forward and reverse primer. Thermal cycling parameters were: initial denaturation at 95°C for 3 min, followed 40 cycles of 94°C for 15 s, 61°C for 20 s, and 68°C for 30 s, plate read at 78-82°C depending on each primer pair for 1 s, melting curve from 65°C to 95°C, read every 1°C, hold 1 s, and a final extension at 68°C for 5 min. Threshold cycle (C t ) was determined using the Opticon Monitor analysis software (version 2.02, MJ Research, Waltham, MA, USA) with a threshold level of fluorescence signal detection of log -1.7. Aliquots from the same cDNA were run in duplicate in the qPCR assay. Intra-assay repeatability between technical replicates was below 1 C t . All assays included no-template controls. Relative gene expression of the target gene was calculated with the 2 -ΔΔCt method, using the constitutive expression of the housekeeping Ubiquitin gene (VvUbiquitin1) [6]. VvUbi-quitin1 has been widely used in qPCR experiments conducted in grapevine across various organs by several research groups, in particular for berry samples. Semiquantitative PCR was performed upon cDNA normalisation based on VvUbiquitin1 expression and visualised in a 1% agarose gel stained with ethidium bromide, or on SSCP gel stained with silver nitrate.

Additional material
Additional file 1: Chromosomal positions of F3'Hs and F3'5'Hs in the grapevine genome.
Additional file 2: Lack of gene collinearity around the isolated F3'5'Hp on chr8 and the F3'5'H multi-copy array on chr6. The positions of F3'5'Hs are shown as cyan ticks, gene models are shown in blue, and partial peptides are shown in grey, above and below the corresponding GEvo diagrams. Regions of sequence similarity were identified by comparing both DNA strands using GEvo, and are shown as red ticks or boxes. Red lines connect regions of similarity within gene models, all other regions of similarity are either microsatellite DNA or transposable elements. Gaps in the sequence assembly are indicated by orange boxes.
Additional file 3: Genome landscape in a 10-kb window around F3'5'Hs in the chr6 array. Exons are indicated as thick blue bars, introns are thin blue connectors. Coloured models indicate annotated TEs. Sequence gaps (Ns) in the PN40024 genome assembly are indicated by dotted red lines.
Additional file 4: Genomic organisation and transcription of two copies of F3'Hs present in the grapevine genome. In section a, exon/intron structure of F3'Hs is shown as blue boxes (exons) connected by blue lines (introns); TEs are shown as coloured boxes. In section b and c, selective amplification of exon junctions astride the terminal intron and expression of each F3'H copy are shown. Two primer pairs (orange and green triangles) were designed in the internal and terminal exons. The terminal intron varied in size between 249 bp and 96 bp in F3'Ha and -b, respectively. Each primer pair anneals perfectly to the target F3'H, but has a mismatch at the 3'-terminal nucleotide with the paralogous F3'H. Selectivity of primer pairs for either F3'Ha or F3'Hb was validated by amplifying PN40024 genomic DNA and by Sanger sequencing of the PCR amplicons. Selectivity for either F3'Ha or F3'Hb was also confirmed by assessing the size of the amplified genomic DNA (vs. the size prediction of 523 bp and 370 bp astride the second intron in F3'Ha and F3'Hb, respectively) and, for the expressed F3'Ha, by inferring intron size from the comparison between amplicons from genomic DNA and cDNA. Expression of F3'Ha was assessed by semi-quantitative PCR using cDNA from leaf, petiole, tendril, flower, shoot, and berry skin and flesh, in two grapevine cultivars ('Merlot' and 'Aglianico'). Expression of F3'Ha was also assessed in berry skin of four cultivars ('Aglianico', 'Marzemino', 'Grignolino', and 'Nebbiolo') at four stages of fruit development. cDNA was normalised using the constitutive gene VvUbiquitin. Transcripts of F3'Hb were never detected under the same experimental conditions. In section d, expression of F3'Ha was assessed by quantitative PCR in berry skin at 8 developmental stages in the cultivars 'Aglianico', 'Marzemino', 'Grignolino', and 'Nebbiolo'. Transcript levels of F3'Ha increased at full-veraison (stage of 100% coloured berries) by approximately 2-fold in all cultivars, with substantial differences among cultivars only at harvest. Transcript levels are expressed as arbitrary units, normalised using the constitutive gene coding for VvUbiquitin. Bars represent the standard deviation of three biological replicates. Letters above the histograms indicate significant differences between means, based on a Student-Newman-Keuls test (P < 0.05).
Additional file 5: Evolutionary relationships among grapevine F3'5'Hs. Phylogenetic trees are inferred using the Maximum Parsimony method and are based on (a) mRNA sequence alignments of all grapevine F3'5'Hs and (b) intron sequences of F3'5'Hs that reside in duplicate blocks on chr6. The most parsimonious tree was obtained using the Close-Neighbor-Interchange algorithm with search level 3, in which the initial trees were obtained with the random addition of sequences. The rectangular and radiation trees are drawn to scale, with branch lengths calculated using the average pathway method, and are expressed in units of the number of changes over the whole sequence. There were a total of 419 positions in the mRNA dataset, out of which 39 were parsimony informative, and 1546 positions in the intron dataset, out of which 180 were parsimony informative. For each gene, tree topology is compared to genomic location. Bootstrap values >70 are reported above the corresponding branch. DNA sequences were aligned with ClustalX and trees were obtained using MEGA4. (c) Tree based on LAGAN alignments of 5' regulatory sequences 2-kb upstream of the translation start codon.
Additional file 6: Multiple alignments of non-coding DNA within each of 9 tandemly duplicated blocks in the F3'5'H locus on chr6. On top of each page, coloured bars indicate annotated TEs in the PN40024 genome; sequence gaps (Ns) in the genome assembly are indicated by dotted red lines. Plots of sequence identity range from 50 to 100% on the y-axis in the LAGAN multi-panels. The number of base pairs shared by each duplicated block with the reference block (on top) is given on the right-hand side, with the average nucleotide identity. Additional file 7: Multiple alignments of non-coding DNA in 10-kb surrounding duplicate F3'5'H genes. In the panel on top of each page, F3'5'H exons are indicated as thick blue bars, introns are thin blue connectors. Coloured boxes indicate annotated TEs. Plots of sequence identity range from 50 to 100% on the y-axis in the LAGAN multi-panels.
Additional file 8: Conservation and SSCP polymorphisms of duplicate F3'5'Hs in the family Vitaceae. PCR amplicons were obtained from genomic DNA using copy-specific primers. DNA samples included the ornamental grapevines Virginia creeper Parthenocissus quinquefolia, native to Northeastern-America, and the porcelain berry Ampelopsis brevipedunculata, native to temperate areas of Asia (segment A), wild grapevines (segment B) including the 2n = 40 Muscadinia rotundifolia, two North American species V. riparia and V. candicans, two Asian species V. armata and V. romanetii, and a spontaneous ecotype of V. vinifera ssp sylvestris collected in woods of Northeastern Italy; red-skinned cultivars of the domesticated V. vinifera ssp sativa (segment C); whiteskinned cultivars (Pinot bud sports with mutations for skin colour are shown beside Pinot blanc) and the nearly-homozygous line PN40024 (segment D). PCR amplicons were run in agarose gel (section a) and in denaturing gel for detecting single-strand conformational polymorphisms (section b). Among F3'5'Hs, the isolated gene copies F3'5'Hp, -o, -m, and -n showed the lowest levels of conformational polymorphisms, while segmentally duplicated F3'5'Hs were more variable across taxa.
Additional file 9: Amino acid alignment of substrate recognition sites (SRS) and functional domains for hydroxylation activity (CR1) in plant F3'5'Hs. Amino acid positions crucial for 3' vs. 3'5'-hydroxylation in CR1 and SRS6 are indicated by black arrows; significant amino acid substitutions in grapevine F3'5'Hs are in green background. Relevant amino acid substitutions within domains putatively involved in substrate recognition are highlighted in grapevine F3'5'Hs by blue background when they are unique with respect to all other plant F3'5'Hs or when they are shared exclusively with either monocot F3'5'Hs or other plant F3'Hs, as possible remnants of ancestral transition stages in the evolution of dicot F3'5'Hs.
Additional file 10: Transcripts of duplicate F3'5'Hs detected in various organs of two grape cultivars by semiquantitative PCR. Bold + indicates high expression of PCR amplicons visualised on agarose gel stained with ethidium bromide (see Figure 7), regular + indicates weak expression detected only by the more sensitive silver staining, -indicates lack of detectable transcripts.
Additional file 11: Expression of duplicate F3'5'Hs in berry skin of four cultivars accumulating 3'5'-OH anthocyanins detected by semiquantitative PCR. Berry skin was sampled at four developmental stages. cDNA was normalised using the housekeeping Ubiquitin gene. UFGT was used as a marker for anthocyanin gene expression. Even though the pre-veraison berries were sampled over green bunches immediately before visible colour transition, expression of UFGT had already been triggered in 'Aglianico' and was barely detectable in 'Nebbiolo'. Either primer of the oligonucleotide pairs targeting the F3'5'Hi and -l copies anneals to either exon of the corresponding gene model. The corresponding PCR bands obtained from gDNA are approximately 400 bp longer than the cDNA amplicons shown in the stripes of the electrophoresis gel of this figure.
Additional file 12: Analysis of variance of duplicate F3'5'H expression in berry skin of four cultivars along eight developmental stages.
Additional file 14: Berry sampling in four red-skinned cultivars and a green-skinned cultivar (Tocai) across eight developmental stages.