Consensus pan-genome assembly of the specialised wine bacterium Oenococcus oeni

Background Oenococcus oeni is a lactic acid bacterium that is specialised for growth in the ecological niche of wine, where it is noted for its ability to perform the secondary, malolactic fermentation that is often required for many types of wine. Expanding the understanding of strain-dependent genetic variations in its small and streamlined genome is important for realising its full potential in industrial fermentation processes. Results Whole genome comparison was performed on 191 strains of O. oeni; from this rich source of genomic information consensus pan-genome assemblies of the invariant (core) and variable (flexible) regions of this organism were established. Genetic variation in amino acid biosynthesis and sugar transport and utilisation was found to be common between strains. Furthermore, we characterised previously-unreported intra-specific genetic variations in the natural competence of this microbe. Conclusion By assembling a consensus pan-genome from a large number of strains, this study provides a tool for researchers to readily compare protein-coding genes across strains and infer functional relationships between genes in conserved syntenic regions. This establishes a foundation for further genetic, and thus phenotypic, research of this industrially-important species. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2604-7) contains supplementary material, which is available to authorized users.


Background
Oenococcus oeni, formerly Leuconostoc oenos, is a member of the lactic acid bacteria (LAB) and is noted for its ability to perform malolactic fermentation (MLF) in wine, a deacidification reaction in which malic acid is decarboxylated to lactic acid [1]. MLF is particularly common in the production of red wines (although MLF is used in some white wines) where the decarboxylation reaction and associated by-products impart favourable sensory characteristics and decrease the likelihood of spoilage [2][3][4]. O. oeni is found on grapes or in the natural environment at very low levels, but can be commonly found in the hostile environment of wine, where it readily grows in low pH, presence of alcohol and scarcity of nutrients that inhibit the growth of other microbes [5][6][7].
In addition to spontaneously occurring MLF, purified strains of O. oeni are commonly added to wine as starter cultures to enable more reliable secondary fermentation [1][2][3][4]. Specific strains are selected for this purpose based on production of desirable flavour compounds and/or resilience to stresses such as acidity, ethanol, sulfites and phenolic compounds. Understanding the genotypic attributes of this species is important for identifying these industrially-relevant phenotypes.
In contrast to the historical use of a single strain genome as the de facto "reference" for any one species, the ongoing reduction in the cost of whole-genome sequencing now allows for large numbers of representatives from within the same species to be sequenced. Intra-specific comparison of the variation in coding potential of these strains has led to the conceptualisation of the pan-genome the full complement of genes for a species [20,21]. Very recently, the first bacterial consensus "pan-chromosome" of Acinetobacter baumannii was assembled independent of any pre-assigned genome reference and identified both invariant (core) and variable (flexible) regions within the chromosome [22]. This allowed for the order and orientation of core genes and flexible genomic regions to be established and led to the characterisation and comparison of clusters of functionally-related genes.
In this study, we have utilised this approach to assemble the first O. oeni pan-genome. In conjunction with 49 previously described genome sequences, we sequenced the genomes of a further 142 strains from commercial and environmental sources. By utilising this expanded set of strains, we have broadened the scope and scale of genomic comparisons and provided a genetic basis for phenotypic characterisations of this industrially-important microbe. Specifically, we report the pan-genome assembly and phylogenomic association of regions predicted to encode intra-specific variations in amino acid biosynthesis, sugar transport and utilisation and natural competence.

Results and Discussion
Genome sequencing of O. oeni Genome sequences of 187 wine isolates and four cider isolates, predominantly originating from Australia and France, but also from Lebanon, USA, Switzerland and England were used in this study, 49 of which have been previously characterised [10,11]. On average, the additional 142 genome sequences were each assembled from 450,000 Illumina sequencing reads (300 bp, paired-end library) into 390 contigs, forming a consensus sequence of 1,970,000 bp in size and with 2200 predicted proteincoding sequences.

Genetic diversity of O. oeni
Independent of coding-region predictions, the genetic relatedness of the various strains were deduced from the patterns of single-nucleotide polymorphisms (SNPs) from reference-based read mapping (Fig. 1). The resulting neighbour-joining dendrogram could be broadly split into two major genetic groups (A and B). Group A was in turn comprised of two subgroups; one genetically diverse and a second that contained a very large number of highly-related strains. Relative to Group A, Group B represents a highly-divergent clade comprised of genetically-distant strains. Consistent with previous reports [19], three out of four of the cider isolates cluster closely together in this group. However, while it has been suggested that this reflects domestication of O. oeni in a cider environment, the presence of numerous neighbouring wine-derived strains suggests that information from additional strains isolated from cider is required before any conclusions regarding the possibility of a cider-specific subset of O. oeni can be reached.
The variety, region and year of isolation did not appear to influence the clustering of strains into specific groups with the exception of the upper, closely-related group which is mostly comprised of Australian isolates (Fig. 1, Group A). Approximately 60 % of the known Australian isolates, but only 15 % of the known non-Australian isolates clustered into this genetic group. Strains isolated from France, Switzerland and the USA were also found in this closelyrelated group, making assertions about the ancestral geographic origin of these strains difficult. Unfortunately, very little is known regarding the stage of fermentation these strains were isolated from. Strains in this group may represent a robust variety that is capable of out-competing other strains during fermentation. Sampling late in the fermentation would therefore result in over-representation of this phylotype. Another possibility is that strains in this group are well suited to Australian winemaking conditions and the enrichment of Australian isolates in this genetic group is actually an accurate representation of the broader Australian population. As the concept of regional identity is very important in the valuation of wine and can be influenced by the bacterial strains performing MLF, further investigation whether Australian wines are typically dominated by this very closely-related subset of O. oeni population compared to other geographic regions represents a research direction worth consideration.
The O. oeni pan-genome Previous comparative genomic studies of much smaller cohorts of O. oeni strains revealed substantial genomic diversity between some isolates [8][9][10][11][12][23][24][25]. Despite these efforts, the full extent of the pan-genome remains unclear. The core-and pan-genome sizes of O. oeni were therefore determined for this large collection of strains using the pan-genome ortholog clustering tool, PanOCT [22,26]. Unlike pure sequence-based clustering tools, PanOCT differentiates paralogous and non-paralogous ORFs using the conserved gene neighbourhood to separate duplicated gene families. Thus, singleton clusters can be formed from insertion sequence (IS) elements that are in novel contexts, even though the IS elements are identical [22]. In this context, there were 1661 core clusters (partial or complete ORF sequences in ≥75 % of the strains) and 1950 variable clusters assembled from the 191 strains (Fig. 2a). In order to determine if the genetic diversity of O. oeni had been sufficiently sampled, medians and exponential law regressions were calculated from 500 randomly sampled combinations of 191 strains (Fig. 2b). The sequencing of approximately 50 strains was sufficient to estimate the final core-genome size of 1661 genes. The size of the pangenome was predicted to continue to expand, albeit at a slowing rate, beyond the size calculated using 191 genomes (Fig. 2b), indicating that the O. oeni pan-genome is still "open". In addition, to ensure that potential bias derived from oversampling of closely-related strains, such as those in Group A, did not confound the core-and pan-genome estimates, 60 of the very closely-related genomes in Group A were excluded from the analysis. The estimations of core-and pan-genome sizes were not substantially different when compared to analysis of the complete set of genomes, indicating a negligible bias in the original calculations (Additional file 1: Figure S1). Using 500 iterations of 100 randomly sampled genomes, the median core-genome sizes were 1659 and 1631, and median pan-genome sizes were 3150 and 3162 for the full set and partial set respectively.
Further genome sequencing is therefore expected to be required to characterise the entire spectrum of genetic diversity in O. oeni, however additional variation is likely to be rare. Whilst prioritising future sequencing towards strains from winemaking locales not currently represented in this study may accelerate the discovery of additional genetic diversity, it is interesting to note that phylogenomic analysis of a small number of O. oeni strains from Italy [13][14][15] and South America [16,17], which were released after this analysis was initiated, indicate that these strains actually fall within the existing genomic clades and so may not provide a substantial amount of additional variation (Additional file 2: Figure S2).
The O. oeni genome has previously been described to contain regions likely to have been horizontallyacquired from members of the Lactobacillales [10]. To infer the evolutionary relationship of O. oeni on a larger complement of strains, BLAST best hits were attributed to each cluster. A total of 329 clusters (9 %) did not display O. oeni as a best match in the NCBI non-redundant dataset. As could be expected, all of these clusters were found within the variable (noncore) genome and indicate new ORFs that have previously not been identified in other annotated strains of O. oeni. These non-O. oeni clusters appear to originate from members of Lactobacillales family, particularly the genera Lactobacillus, Oenococcus, Leuconostoc and Enterrococcus (Fig. 2c). A comparatively small number of clusters appear to originate from outside Lactobacillales, including members of the Bacillales and Bacteroidales families, and the phyla Actinobacteria, Bacteroidetes and Proteobacteria (Fig. 2c).

Pan-genome assembly
Substantial genomic variation in exopolysaccharide and amino acid biosynthesis, and sugar transport and utilisation have been reported previously for O. oeni [10]. Amongst bacteria, these variations are often due to the insertion of mobile elements or variable regions described as flexible genomic islands (fGIs), which usually contain highly conserved ORFs from bacteriophage [27][28][29][30][31][32][33][34]. Establishing a syntenic order of sequences was therefore critical for determining the orthology of genomic regions and to reflect important functional relationships between genes [35][36][37].
To capture this information, a consensus core-genome and fGI assemblies were computed for the O. oeni pangenome as described by Chan et al. [22] (Additional file 3). This methodology links clusters together based on the consensus of the layout of ORFs in individual de novo genome assemblies. In this study, the PSU-1 strain was used as a basal reference sequence to initially guide the arrangement of the clusters and this ultimately resulted in a coregenome assembly that closely resembles the arrangement of the PSU-1 genome (Fig. 3a). To help elucidate whether individual ORFs in a cluster are likely to be truncated, the lengths of each peptide sequence were calculated as a percentage of the longest peptide sequence in the cluster (Additional file 3).
Clusters at the beginning and end of the core-genome assembly (defined as a composition of clusters which contain sequences from at least 75 % of the strains) were linked, indicating assembly into a circular topology, as expected. The core-genome assembly revealed several large regions, up to 25 ORFs in size, which were completely absent in some clades of the genetic relatedness dendrogram (Fig. 3a, Additional file 3).
In addition to generating an assembled consensus coregenome, fGIs were also assembled. The fGIs were exclusively linear in topology and were located in specific clades of the relatedness dendrogram (Fig. 3b, Additional file 3). A total of 1950 clusters were assembled into 390 fGIs, the largest of which representing a bacteriophage insertion containing 52 ORFs. Interestingly, several fGIs were found to be unique to the closely related genetic group that consists mostly of Australian isolates.

Amino acid biosynthesis
O. oeni has previously been reported to exhibit a variety of amino acid auxotrophies, with many strains showing intra-specific genomic differences [10,[38][39][40][41][42][43][44]. Early genomic analysis of the PSU-1 strain using the COG database [45] suggested the capacity for biosynthesis of eight amino acids: alanine, aspartate, asparagine, cysteine, glutamine, lysine, methionine and threonine [9]. In this expanded collection of strains and utilising KEGG, RAST and BLAST annotations, pathways leading to the biosynthesis of nine amino acids were observed in at least one strain ( Table 1).
The presence or absence of the complete sets of enzymes for each of these pathways in each strain was compiled and correlated with the genetic relatedness dendrogram (Fig. 4a) and highlighted in a pathway overview (Fig. 5). The complete pathways to synthesise glutamine, glycine, serine, cysteine, proline, aspartate and threonine were found to be conserved across the majority of strains. The ability to synthesise aspartate from lactic and malic acids was predicted to be disrupted in certain phylogenomic clades due to the presence of a frameshift mutation in pyruvate orthophosphate dikinase (EC 2.7.9.1), which is responsible for the conversion of pyruvate into phosphoenolpyruvate. Threonine biosynthesis deficiencies were also observed in specific clades. Loss of threonine biosynthesis capability exhibited intra-specific differences, as the deficient enzyme varied between strains and particularly in homoserine kinase (EC 2.7.1.39) where two different truncated versions of the peptide sequence were observed. The ability to synthesise leucine and arginine was predicted to exist in a small proportion of strains and was typically restricted to several small clades. Loss of a functional leucine biosynthesis pathway was attributed to mutations within 3-isopropylmalate dehydrogenase (EC 1.1.1.85) and isoproylmalate isomerase (EC 4.2.1.33). Similar to threonine biosynthesis, loss of the complete arginine biosynthesis pathway is attributed to several different mutations within genes from throughout the entire pathway. The ability to synthesise a tenth amino acid, alanine from aspartate, remains a possibility in three strains (AWRIB879, AWRIB708, AWRIB202) since the ORF encoding an aspartate 4-decarboxylase (EC 4.1.1.12) was found in fGIs excluded from the assembly because Fig. 3 Visualisation of the core-genome and fGI assemblies. Full versions of the annotated assemblies are available in Additional file 3. a. Core-genome assembly of 1661 clusters. b. Concatenated fGI assemblies of 1950 clusters into 390 fGIs they contained less than three ORFs. In addition to these biosynthesis pathways, several incomplete pathways were also observed (Table 2, Fig. 5).
Since amino acid concentrations are low in wine, amino acid biosynthesis capabilities are considered to be an important growth requirement. Early phenotypic studies predicted between five and thirteen amino acids to be essential for the growth of different strains of O. oeni [39][40][41]. Furthermore at least two organic acids, malic and citric acid, were involved in the biosynthesis of aspartate-derived amino acids [42]. A recent study which utilised a more sensitive methodology reported that two different O. oeni strains were auxotrophic for 13 and 16 amino acids, respectively [43]. Of the 16 essential amino acids found in one of these strains, only 8 were found to be essential in alternate strains from previous phenotypic studies, possibly reflecting substantial intra-specific variation. Of the 191 genomes analysed in this report, 11 to 15 amino acids were predicted to be unable to be converted from other amino acids or organic acids (Fig. 4a, Fig. 5).
O. oeni and other lactic acid bacteria are often described as having exacting nutritional requirements. Commercially used started cultures are often selected based on their resilience to wine stress conditions such as ethanol concentration, pH and temperature. Despite this, the rate of MLF can be substantially affected by nutrient availability [44], often resulting in sluggish or stuck fermentations. Comprehensive characterisation of amino acid auxotrophies can be useful for identifying essential nutritional requirements to help assess the suitability of wines or added nutrients for microbial growth and fermentability. Furthermore, it may also be used to assess the microbial stability of finished wines [43].

Phosphotransferase enzyme systems in fGIs
The range of sugars that O. oeni is capable of utilising is strain dependent [46]. Previous studies have revealed intra-specific variation in the phosphotransferase system (PTS) enzyme II sugar transporters [25]. Similar to the characterisations of amino acid biosynthesis, variation in PTS enzyme II components (typically consisting of IIA, IIB, IIC and occasionally IID subunits) were analysed in this expanded set of strains (Fig. 4b). Four phosphotransferases, containing all of the required subunits, were conserved in the majority of strains: mannosespecific II, galactitol-specific II, cellobiose-specific II and beta-glucoside-specific II. Two phosphotransferases were observed to correspond to specific clades: the fructosespecific II and ascorbate-specific II. The full complement of subunits of the fructose-specific II transporter was conferred by the presence of an fGI encoding fructosespecific IIB and IIC components. This fGI was comparatively large with 29 ORFs encoding various cell wall related proteins (Additional file 4: Figure S3A) and generally corresponded to the Group A clade. For the ascorbate-specific II transporter, the majority of strains encoded the ascorbate-specific IIA and IIC subunits however only certain clade-specific strains encoded the ascorbate-specific IIB subunit.
In addition to these differences, two highly strainspecific phosphotransferases were observed: sucrosespecific II and lactose-specific II. The sucrose-specific IIA and IIBC subunits occurred in an fGI specific to the strain BAA-1163. Upon further investigation, the fGI encoding these subunits was predicted to also encode additional sucrose-related proteins including sucrose operon repressors and both a partial and complete sucrose-6-phosphate hydrolase. This group of ORFs would therefore be expected to allow for the perception, transport and metabolism of sucrose (Additional file 4: Figure S3D). The ORF encoding the lactose-specific IIA component was predicted to be present only in the S13 strain whereas the IIB and IIC components were commonly found in other   Fig. 4 Intra-specific differences in amino acid biosynthesis, sugar transport and utilisation and natural competence. ORFs which contained a contig break are shaded in a lighter colour. a. Intra-specific differences in amino acid biosynthesis. Each pathway requires multiple enzymes, as described by their KEGG module numbers. b. Intra-specific differences in PTS components. Each sugar-specific system requires multiple subunits (typically IIA, IIB, IIC and occasionally IID). c. Intra-specific differences in the genes involved in five-carbon sugar utilisation, as described in Fig. 6. d. Intra-specific differences in the genes encoding natural competence proteins Fig. 5 Overview of amino acid biosynthesis pathways in O. oeni. KEGG, RAST and BLAST annotations were used determine the presence of ORFs associated with amino acid biosynthesis across 191 strains. Pathways containing the full set of required genes, mostly between two amino acids (highlighted in yellow), are highlighted in blue and represented in Fig. 4a. ORFs forming incomplete pathways are highlighted in green. Pathways to make nine different amino acids were observed strains. It is interesting to note that despite O. oeni existing in a relatively specific ecological niche, this bacterium retains diversity in the specific collection of PTS systems encoded in each strain.

Five-carbon sugar utilisation pathways in fGIs
It has been previously reported that O. oeni exhibits strain-dependent sugar utilisation phenotypes, particularly with the five-carbon sugars arabinose, xylulose and xylose and the metabolic pathways for arabinose and xylulose utilisation have previously been shown to be strain-specific [10,46]. By comparing this larger set of strains, it was possible to define the extent of the arabinose and xylulose utilisation pathways (Fig. 4c and Fig. 6). L-arabinose utilisation is encoded by a set of three enzymes (L-arabinose isomerase EC 5.3.1.4, L-ribulokinase EC 2.7.1.16 and ribulosephosphate 4-epimerase EC 5.1.3.1) (Fig. 6) which were present in the core-genome assembly, indicating that they were present in at least 75 % of the strains, however the enzyme required for the hydrolysis of the arabinose polymer arabinan (Alpha-N-arabinofuranosidase EC 3.2.1.55) was only found in a subset of strains predominantly found in Group B of the genetic relatedness dendrogram ( Fig. 1 and Fig. 4c). Three enzymes responsible for L-xylulose utilisation (L-ribulose-5-phosphate 4epimerase EC 5.1.3.4, L-xylulose 5-phosphate 3epimerase EC 5.-.-.-and L-xylulokinase EC 2.7.1.53) (Fig. 6) were found to be encoded in adjacent positions within the same fGI (Additional file 4: Figure S3C) and generally appeared in a closely-related clade in Group A of Fig. 1.
In addition to these pathways, it was also possible to define an fGI that is predicted to encode for the ability to utilise D-xylose via the pentose phosphate pathway, the first time that this pathway has been described in O. oeni. This fGI is predicted to encode the two enzymes required to interconvert xylose to xylulose-5P (xylose isomerise EC 5.3.1.5 and xylulose kinase EC 2.7.1.17) (Fig. 6), in addition to a xylose transcriptional regulator and a D-xylose proton-symporter (XylT) and was  generally confined to a single specific phylogenomic clade (Additional file 4: Figure S3B). It is interesting to note that these two fGIs (Additional file 4: Figure S3B and C) correspond to different clades. Given that these genomic regions are not found in other clades, it is tempting to hypothesise that specialisation of O. oeni in an environment composed of residual fivecarbon sugars like xylose and arabinose (i.e., in wine) has directed the acquisition of these regions in different instances throughout the course of evolution.

Natural competence of O. oeni
Many bacteria are naturally competent and able to actively transport environmental DNA fragments across their cell envelope and into their cytoplasm [47][48][49][50][51][52]. Competence represents an important mechanism to allow for horizontal gene transfer as well as providing access to nutrients. Uptake of extra-cellular DNA in Gram-positive bacteria, such as O. oeni, requires a suite of proteins which include DNA receptors (ComEA), transmembrane pores (ComEC), transformation pili (ComGC), ATP-dependent translocases (ComFA) and additional proteins encoded by the ComG operon. Substantial intra-specific diversity with O. oeni was observed for these natural competence proteins (Fig. 4d). Interestingly, the highly diverse clade (Group B in Fig. 1) retains full-length peptide sequences for proteins that appear truncated elsewhere on the tree. For example, strains in Group B mostly retained full-length versions of ComEA whereas other strains contained one of three different frameshift mutations in the gene encoding ComEA, all resulting in prematurely-encoded stop codons (Fig. 7). ComEA is a bitopic membrane protein often described as being obligatory for natural genetic transformations. ComEA consists of a transmembrane N-terminal domain and a C-terminal domain outside the cytoplasm membrane [53,54]. The C-terminal domain contains a helix-hairpin-helix DNA-binding motif which is the structural basis for non-sequence-specific recognition of DNA [55]. Two of the three frameshift mutations preclude the entire DNA-binding motif from being encoded and this is anticipated to have an adverse effect on the ability of O. oeni to bind DNA from the extracellular environment. It is unknown whether the predicted ORF downstream of a premature stop is transcribed in vivo (Variant E, Fig. 7), however the loss of a large N-terminal end would presumably affect the functionality of this protein.
Conceivably, retention of the functional versions of ComEA and other competence proteins has allowed for a protracted evolutionary divergence of Group B, as evidenced by the higher inter-strain branch lengths in the phylogeny (Fig. 1). With the exception of ComGC, all the genes encoding these proteins were found in the coregenome assembly. Presence of the truncated versions of these proteins in the core-genome may indicate that most modern-day O. oeni strains share a naturally competent ancestor but have lost this competence by processes such as genome decay. Since O. oeni has become specialised to a relatively stable, abundant, simplified and less competitive ecological niche, the ability for it to adapt to environmental conditions by up taking extracellular DNA is presumably no longer essential for survival and may actually serve to disrupt its already specialised genome. To this day, the ability to reproducibly transform O. oeni for research purposes remains a considerable challenge. Given the intra-specific variations in their DNA uptake machinery, careful selection of strains which may be more amenable to transformation provides a sensible avenue for researchers to explore.

Conclusions
Like other industrial species, phenotypic variation in O. oeni will have direct economic consequences through impacts on product quality and production efficiencies. This study has conducted the largest pan-genome analysis of O. oeni to date and expanded upon previous comparative genomic approaches by providing a consensus pan-genome assembly. The pan-genome assembly provides a powerful tool for researchers to compare protein-coding genes across a large number of strains with the added benefit of being able to infer likely functional relationships between genes in conserved syntenic regions. The applicability of the pan-genome assembly was demonstrated in this study by substantially expanding upon previous observations of intra-specific variation in amino acid biosynthesis and sugar transport and utilisation as well as characterising previously unreported variability in natural competence. Compilation of this vast amount of genomic information can be used to inform research on the industrial implications by allowing for identification of strains with combinations of desirable genetic, and therefore phenotypic, characteristics.

Strains and growth conditions
Strains used in this study are listed in Additional file 5. Strains were selected to represent a cross-section of commonly used commercial strains, in addition to Australian environmental isolates present in the AWRI culture collection. The strains were prepared by growing each strain in MRS (Amyl Media, Australia) supplemented with 20 % apple juice [56] for between six and ten days at 27°C. DNA was prepared by phenol chloroform extraction as previously described [27].

Genome sequencing and assembly
Genome sequencing was performed at the Ramaciotti Centre for Gene Function Analysis (University of New South Wales, NSW, Australia) using the Illumina MiSeq platform and 2 × 300 bp paired-end sequencing reads with a target depth of 60x coverage. For each strain, reads were assembled using MIRA (v 4.0rc5) [57] and potential coding regions were predicted using GLIM-MER v3.0.2 [58]. These Whole Genome Shotgun projects have been deposited at DDBJ/EMBL/GenBank under the BioProject accession PRJNA304199.

Pan-genome annotation and assembly
Clusters of orthologous proteins were generated by PanOct v 3.23 [22,26] using default parameters. The resulting centroid sequences were annotated using BLAST [61], KAAS via the KEGG website [62] and RAST [63]. Consensus core and fGI assemblies of the pan-genome were calculated using the script, gene_order.pl, and the 75_core_adajacency_vector.txt output from PanOCT. Spurious fGIs potentially due to random IS elements or bad gene calls defined as fGIs containing less than three ORFs were removed and compiled in Additional file 3. To eliminate redundancy, the core-genome centroids present at both ends of the fGI assemblies were trimmed. The core and fGI assemblies consisting of annotated centroids were then compiled into a spreadsheet (Additional file 3). The percentage length of each peptide relative to the longest peptide in the corresponding centroid was calculated and used to generate a heat map for the compiled assemblies. For intra-specific comparisons, such as summarised in Fig. 4, a functional version of an ORF was defined as an ORF length being