High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome
© Liti et al.; licensee BioMed Central Ltd. 2013
Received: 22 May 2012
Accepted: 10 October 2012
Published: 31 January 2013
Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate.
Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny.
We suggest that the genome of S. arboricolus will be useful in future comparative genomics analysis of the Saccharomyces sensu stricto yeasts.
The budding yeast, Saccharomyces cerevisiae, is a leading system in genomics due to the small genome size (12 Mb) and the availability of powerful genetic techniques. Genome sequencing of multiple hemiascomycete yeasts and multiple individuals from several species have allowed the application of a range of powerful comparative approaches. Comparative genomics have revealed evolutionary mechanisms that shape genomes and provided a formidable tool for assigning function to DNA sequence[1, 2].
The closely related sensu stricto Saccharomyces species (S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. arboricolus and S. bayanus) provide a clade with multiple genetically tractable species. The genome sequence of several sensu stricto species[4, 5] revealed a level of nucleotide divergence comparable to that between humans and birds yet a level of structural variation comparable to that between humans and chimps. Comparisons of genome structures have provided insight into mechanisms of genome evolution and speciation. For example, the presence of a limited number of genomic rearrangements that are not consistent with the phylogeny, provide strong evidence against the chromosomal speciation model.
Sequence comparisons between the sensu stricto species have allowed improved genome annotation[4, 5]. Sequence conservation allowed the identification of additional small open reading frames and the refinement of translation start and stop positions. Lack of sequence conservation resulted in the elimination of spurious open reading frames. Combining experimental data for protein binding sites with sequence conservation allowed the identification of functional DNA sequences[8, 9]. The power of these and other comparative genomic approaches rely upon the number of species sequenced, the evolutionary divergence of the selected species and the quality of the assembled genome sequence.
Recently the yeast Saccharomyces arboricolus was isolated from the bark of the Fagaceae tree in China. The S. arboricolus karyotype is consistent with the other sensu stricto species in terms of chromosome number and size. Sequence information (limited to a portion of the rDNA) unambiguously grouped this species within the sensu stricto complex. S. arboricolus can form viable hybrids with the other sensu stricto species but resulting gametes are not viable. Together these data demonstrate that S. arboricolus is a novel sensu stricto species.
Here, we report high-quality sequence and assembly of the S. arboricolus genome (type strain H-6T; CBS 10644TT) by combining two deep sequencing platforms. We report chromosome size scaffolds, genome annotation and synteny analysis. Genome wide phylogenetic analysis places S. arboricolus between S. bayanus and S. kudriavzevii in the sensu stricto phylogenetic tree. Finally, we considered the phenotypic profile of S. arboricolus under multiple environmental conditions in the light of its gene content and phylogeny.
Genome sequence and assembly
Deep sequencing metrics
Mean mapped read length (bp)
Roche 454 Fragment
Roche 454 8 kb Paired
Life Technologies SOLiD
De novo assembly of the Roche 454 (fragment and pair-end) reads was performed using the Newbler algorithm (see Methods). This resulted in 290 contigs (≥500 bp; N50 117,280 bp) that were joined using the Roche 454 paired-end reads to give 35 scaffolds. There are 17 scaffolds that are comprised of a single contig (2024 - 5644 bp) and one scaffold comprising of two contigs (9948 bp; Additional file1: Table S1). The remaining 17 scaffolds account for >99% of the assembly and are between 72 and 1246 kb long. The total base count of the assembly, 11.6 Mb, is comparable to the physical genome size predicted from the karyotype and is similar to the completed S. cerevisiae genome sequence (12.1 Mb) and genome sequence of other sensu stricto yeasts (11.6 - 11.9 Mb)[3, 13].
The coverage of mapped SOLiD reads gives a measure of sequence copy number and can be used to reveal repeat regions that have collapsed during assembly. Overall we find a scarcity of high-coverage regions (Additional file2: Figure S1), implying that there has been very little collapse of repeat regions during assembly. Short (<100 bp) regions of elevated copy number frequently correspond to highly repetitive tandem repeats and/or homopolymer tracks. Longer regions of elevated copy number are predominantly subtelomeric with the noteworthy exception of the rDNA repeats on chromosome XII.
The resulting S. arboricolus genome assembly comprises of whole chromosome scaffolds with only 186 gaps. These gaps have an average size of 1206 bp, the smallest two are just 1 bp, and the largest is 5846 bp. These regions consist of complex or repetitive sequences resulting in poor mapping of the SOLiD data. By comparison to recently improved assemblies of other sensu stricto yeasts, our S. arboricolus genome sequence has a higher proportion of the sequence (>99% compared with 96-98%) in a smaller number of scaffolds (35 compared with 147-226). Therefore, after the ‘gold standard’ S. cerevisiae, our S. arboricolus genome sequence represents the next most complete assembly.
Genome structure and annotation
The gene content of sensu stricto budding yeast species is thought to be similar[18, 19], therefore we used comparative gene-annotation methods based on the well-annotated S. cerevisiae proteome to identify and annotate the ORFs in the S. arboricolus genome. Using exonerate, we aligned each S. cerevisiae protein to the S. arboricolus genome (see Methods). We assigned the top matching S. arboricolus ORF (based on the exonerate score) as a putative ortholog to each S. cerevisiae protein. We then compared the neighbouring genes of each S. cerevisiae gene with the neighbours of the putative orthologous S. arboricolus ORF to define a first set of 4798 orthologous gene pairs where the gene order has been conserved, which we refer to as “syntenic orthologs”. Because this method uses the best sequence match, missing assignments of syntenic orthologous ORFs may occur when the match with greatest sequence similarity is not the syntenic ortholog. To overcome this problem, we again used exonerate but emphasized the position of the predicted ORF, allowing the score to be slightly below the best scoring match (see Methods). An additional 519 S. cerevisiae genes had a high-scoring, but not top match with the expected syntenic gene pair. We considered these to be syntenic orthologs as sequence similarity together with gene order conservation is thought to be a more reliable indicator of orthology than sequence similarity alone. To identify genes that may be found in S. arboricolus, but not in S. cerevisiae, we used Genemark, which is a de novo gene prediction method, and does not rely on sequence similarity (see Methods) and identified 106 genes that were not predicted using exonerate. These Genemark predictions contain novel genes and ORFs that were missed by exonerate as only the best hit from exonerate was considered in our gene prediction.
We also searched for tRNA coding sequences within the 16 chromosomes using tRNAscan-SE and annotated whether they were syntenic using a similar strategy to that described above (see Methods). In total, 257 tRNAs were found, 252 of which are syntenic with S. cerevisiae tRNAs. Next we used BLAST to search for the presence of repetitive elements in the genome such as subtelomeric genes and Ty elements. We detected the most distal subtelomeric element, Y’, in the genome sequence. This element is therefore present in all the sensu stricto species except S. bayanus[26, 27]. We also detected Ty2 element sequences using as a query the region that does not share similarity with the Ty1 element (1.7-kb ClaI Ty2- specific sequences).
We next set out to estimate the evolutionary distances between the species in the sensu stricto clade. To do so, we used the phylogeny as determined above, and computed codon-based maximum likelihood estimates (see Methods) of evolutionary distance based on alignments of 3899 genes for which we had 1 to 1 orthologs in all of the sensu stricto species, and that were syntenic between S. arbicolus and S. cerevisiae. We computed the median branch lengths (in substitutions per codon) for these, and they are shown in Figure5.
S. arboricolus web and strain resources
To make the S. arboricolus genome sequence available and to facilitate analysis, we have made available a number of web resources (http://www.moseslab.csb.utoronto.ca/sarb). These include genome sequence, annotation and datasets for genes and proteins. The gene and protein sets are annotated based on the S. cerevisiae ortholog systematic name. Novel putative ORFs identified by Genemark are also reported. A BLAST server and a genome browser (Gbrowse and GBrowse_syn) are available. The S. arboricolus genome browser offers the opportunity to view and compare the genome structure of S. cerevisiae, S. arboricolus and S. bayanus. Using S. arboricolus as the central reference species, gene order conservation or chromosomal rearrangements between the three species can easily be observed. Genes in S. arboricolus are coloured differently based on their annotation (e.g. syntenic orthologs, Genemark predictions, etc.). Finally, to facilitate experimental analysis of S. arboricolus the HO gene was disrupted (in the type strain H-6) and stable haploids were generated.
S. arboricolus phenotypic landscape
Among the rare examples of superior S. arboricolus performance were better utilization of the sugar alcohol mannitol, one of the most abundant energy storage molecules in nature and tolerance to biotin depletion, rarely observed in S. cerevisiae due to ancestral loss of the biotin synthesis genes BIO1 and BIO6. Both BIO1 and BIO6 are present as conserved cistrons in S. arboricolus strain CBS 10644TT (Figure6B), explaining the biotin auxotrophy. S. arboricolus also featured consistently good utilization of the monosaccharide galactose (Figure6C), a highly variable trait in both S. cerevisiae and S. kudriavzevii due to frequent loss-of-function mutations emerging in different lineages of these species that impair growth on galactose (e.g. the reference strain S288C). The coding sequences of the GAL pathway genes are also fully conserved in S. arboricolus strain CBS 10644TT. Interestingly, S. arboricolus strain CBS 10644TT has also retained an intact melibiase encoding MEL1, which is lost in most S. cerevisiae lineages; all three S. arboricolus isolates also utilized the disaccharide melibiose, a less common plant energy storage compound, with a vastly superior rate and efficiency.
Overall, the three S. arboricolus isolates showed virtually identical trait profiles (Pearson correlation, r=0.73-0.91). Moreover, using data from Warringer et al., we found S. arboricolus traits to closely mimic those of its relatives, S. bayanus and S. kudriavzevii (Figure6D). These remarkable trait similarities, encompassing mitotic performance in a wide variety of environments, imply that these three species, despite billions of generations of separation, are adapted to rather similar ecological conditions. The few cases of S. arboricolus deviations from S. bayanus and S. kudriavzevii primarily included nitrogen utilization traits, such as superior S. arboricolus utilization of ammonium, glycine and ornithine, but inferior utilization of methionine, serine and citrulline (Figure6E). Presumably, this reflects differences in nitrogen storage compounds among plant species that dominate the main habitats of these species and hint at ecological factors that may have driven speciation of the ancestral lineages. Further indications as to the nature of these factors may also be found in the reduced tolerance of S. arboricolus to malic acid, concentrated in e.g. apples, and to the toxin geneticin, produced by bacteria of the Micromonospora genus (Figure6E).
Our approach of using multiple high-throughput sequencing strategies resulted in high quality genome sequence that continuously covers the large majority of the S. arboricolus genome. The de novo assembly revealed that S. arboricolus is largely syntenic to S. cerevisiae, similar to the other sensu stricto species. We mapped and validated a single reciprocal translocation that occurred in the S. arboricolus lineage and identified a few additional small-scale rearrangements. Our assembly extends into the subtelomeric regions of most chromosome ends. However, these repetitive sequences pose a major challenge to genome assemblies and the subtelomeric structure presented here will benefit from further experimental validation.
Sequence analysis unambiguously revealed the position of S. arboricolus in the phylogenetic tree (Figure5; correcting the previously reported position based on a limited amount of ribosomal sequence). Phylogeny of individual genes revealed a limited number of conflicting tree topology as has been previously reported for other sensu stricto species. We did not observe large segments of the genome (for example, equivalent in size to the average gene) with high similarity with other species as signature of introgression as previously reported in other species and strains[35–37].
We have looked for the presence and absence of middle repetitive elements such as Ty and subtelomeric genes. For some of these elements the presence and absence is consistent with the phylogeny. We detected the subtelomeric element Y’ in S. arboricolus, indicating that this element entered in the sensu stricto ancestor after the divergence of S. bayanus. Much more puzzling is the phylogeny of Ty2, present in S. cerevisiae, S. mikatae and S. arboricolus but absent in S. paradoxus, S. kudriavzevii and S. bayanus. Both multiple loss and acquisition can explain the scattered phylogeny but are unlikely events. One possibility is a recent exchange of Ty2 among these species (horizontal transfer) as supported by high sequence similarity. A possible mechanism is the ability of these species to fuse their cytoplasms, without progressing to karyogamy, and allowing the exchange of Ty particles that can self-propagate in the genome.
Phenotype analysis demonstrated a remarkable similarity in trait profiles between S. arboricolus and both S. bayanus and S. kudriavzevii (Figure6D). These similarities suggest that these three species have adapted to similar environmental niches. The limited number of phenotypic differences between these species may reflect the specific nutrients available within each species habitat. Phenotypic comparisons between S. cerevisiae and S. arboricolus frequently reflect differences in gene content, including the sensitivity of S. arboricolus to elevated Li+ and Cu2+ and the ability of S. arboricolus to utilize melibiose.
So far, the Chinese isolates of S. arboricolus are the only ones available. Future surveys will reveal whether this species is limited to this region or whether other geographic populations exist. It is interesting to note that two of the Saccharomyces species have only been isolated in Asia (S. mikatae in Japan and S. arboricolus in China) despite extensive surveys in similar environments in other continents[38–40].
The Saccharomyces sensu stricto complex offers a powerful range of sequence divergences that have allowed the mapping of functional elements[4, 5], improved genome annotation and comparisons of genome organization[7, 19, 41]. Genome sequencing has revealed levels of divergence ranging from 0.1 - 0.6% among S. cerevisiae strains, 1.5 - 4.5% between geographic subpopulations of S. paradoxus, 6% between S. bayanus var. uvarum and S. eubayanus, and 15% - 30% between S. cerevisiae and the other sensu stricto species[4, 5]. The S. arboricolus genome sequence should enhance the power of comparative genomics by increasing the total sequence divergence and improve the quality of alignments by adding a new branch between S. bayanus and S. kudriavzevii, the more divergent species.
Genomic DNA and library preparation
We extracted DNA from the type strain of S. arboricolus H-6T (CBS 10644TT) isolated in China from the bark of Quercus fabri. For the Roche 454 library construction and sequencing, 5 μg of high molecular weight genomic DNA was used to make standard shotgun DNA library as described in the Roche GS FLX Titanium General Library Preparation Method Manual with the exception of DNA fragmentation, which was done with Covaris S2 sonicator (fragmentation parameters: Duty cycle - 5%, Intensity - 1, cycles/burst - 200, time - 85 s, bath temperature - 5°C). 15 μg of high molecular weight genomic DNA was used to make the 8 Kb paired end library as stated in the Roche GS FLX Titanium 8 Kb Span Paired end library preparation method manual. Exceptions include: DNA extraction was done using QIAquick Gel extraction kit (Qiagen, Cat no. 28760) instead of Electroelution as stated in the manual and fragmentation of circularised DNA was done using Covaris S2 sonicator (Duty cycle −5%, Intensity - 3, cycles/burst - 200, time - 120 s, bath temperature - 5°C). Sequencing of standard shotgun fragment library was carried out on ¾ of a PTP and the 8 Kb paired end library was sequenced on a full PTP using Roche 454 Titanium sequencing chemistry. For SOLiD library construction and sequencing, 500 ng of high molecular weight genomic DNA was used to make a barcoded DNA fragment library as stated in the SOLiD 4 library preparation guide. Enzymes and reagents were used from NEBNext DNA sample prep Master mix set 3 (NEB, Cat no. E6060L). The barcoded DNA fragment library was quantified using Kapa Library Quantification kit (Kapa Biosystems, Cat no. KK4823). 200-300 bp library size selection was carried out using 2% SizeSelect E-Gel (Life Technologies, Cat no. G6610-02). SOLiD EZ Bead System was used according to manufacturer’s guide to prepare ePCR and templated bead enrichment. Sequencing was performed on a SOLiD 4 analyser according to the manufacturer’s instructions to generate 50 bp reads in colour space.
We assembled the genome of S. arboricolus using the Newbler algorithm (v2.3, Roche) for de novo assembly of reads generated by the 454 pryosequencing platform. Combinations of read datasets, reads added in assembly iterations, and assembler parameters were explored before selecting the optimal combination according to assembly metrics (number of scaffold sequences and contigs, the average and longest contig length and N50 value). All reads were trimmed against a dataset of adapter and vector sequences in the initial step of the assembly process.
The selected assembly parameters used an expected coverage value of 40X with all other settings remaining at default values. Two assembler iterations were employed; the first iteration included all 734,353 single fragment reads and one set of 583,674 paired-end reads. The second iteration incorporated an additional set of 518,434 paired-end reads. A third set of paired-end reads was excluded from the assembly due to decreased performance with their inclusion.
The resulting genome assembly comprised of 32 scaffold sequences with a total length of 11,465,281 bp. The scaffolds were comprised of 266 contigs (≥500 bp) with an average length of 43,102 bp (538,482 bp max.) and an N50 value of 136,945 bp. The mapped read coverage of the assembly was 49X.
Pyrosequencing error correction
In order to resolve small errors in the assembly arising from pyrosequencing artifacts, such as homopolymer sequence regions[43, 44], we acquired deep sequence coverage (~100X) from short reads. We generated a total of 31,316,59 short (50 bp) reads from a SOLiD 4 single fragment library. Subsequent gapped read alignment and variant calling was achieved using Bioscope 1.3.1 (Life Technologies).
An iterative correction process was devised in which errors in the assembled sequence were identified from the SOLiD read alignment data as either a single nucleotide polymorphisms (SNP) for single base errors, or as small InDels (insertion/deletion) for homopolymer pyrosequencing errors. Each iteration of the assembly correction process involved the initial mapping of SOLiD reads against the 454 assembly, followed by SNP calling. Selected putative SNPs were then integrated into the assembly sequence and SOLiD reads were remapped to allow InDels to be called and integrated. This process was repeated until no additional variants were detected. In subsequent iterations additional reads were mapped allowing the identification and correction of a small number of further errors. Both SNPs and InDels were calculated from alignment data using Bioscope 'high stringency' variant parameter settings. Additionally, integrated variants were required to represent a minimum of 60% of the alignment data.
Gene annotation and orthology assignments
S. cerevisiae was used as the reference proteome for the program exonerate, which uses comparative approaches for gene finding based on protein sequence similarity. An initial pass with the protein2dna model and a refine boundary of 2000 was used to find the best orthologous candidate of each S. cerevisiae gene. For intronic genes, the max intron size was limited to 1500 bp and the model used was protein2genome.
To annotate genes within the S. arboricolus genome, we first identified gene orthologs with conserved synteny. To do so, we analysed the top hit by exonerate for each S. cerevisiae gene. When three neighbouring genes within S. cerevisiae all identified three neighbouring genes within S. arboricolus, we assigned the S. arboricolus gene in the middle (flanked by its two neighbours) as a syntenic ortholog. This initial step discovered most of the syntenic genes within S. arboricolus. Other genes within S. cerevisiae that had not been assigned an ortholog were further analysed with the hypothesis that these may have exonerate hits within the expected positions but were not the most similar sequence within S. arboricolus. We looked at the top 10 exonerate hits of the remaining S. cerevisiae genes for matches in S. arboricolus between the initially assigned syntenic ortholog. When only one hit was found between these syntenic orthologs, we used this hit as a newly discovered syntenic ortholog. This process was repeated until no more syntenic orthologs could be found. Finally, we assigned the top exonerate hit of few remaining S. cerevisiae genes that were still not assigned a syntenic ortholog as the non-syntenic ortholog provided that they did not overlap with another gene prediction.
De novo gene prediction on the S. arboricolus genome was performed using GeneMark-ES, version 2. The total number of the predicted genes was 5005 within the 16 assembled chromosomes (5038 in total). Of these, 95 genes had non-overlapping coordinates with the genes predicted by Exonerate within the 16 assembled chromosomes (106 in total when including the 19 scaffolds that did not assemble into the chromosomes).
A significant issue when using a comparative-based method, such as exonerate, for gene prediction is that gene boundaries are often incorrectly predicted if there is a lack of homology at these ends. Initially, a large number of predicted genes did not contain a start or stop codon (637 genes and 1121 genes respectively). We have attempted to rectify these starts and ends by extending or truncating the predicted CDS. First, CDSs were extended if a stop codon could be found within 9 codons from the end of our gene prediction. This corrected 857 cases of missing stop codons and further extension only slightly improved the annotation. Second, for start codons, the methionine can be on either side of the predicted gene start. We therefore extended the predicted gene until a methionine was found, but only when a methionine could be found within 9 codons and without any intervening stop codons. In the cases where a stop codon occurred before a suitable methionine was identified, we truncated the CDS to a downstream methionine if it occurred within 9 codons. This corrected 348 cases of missing start codons. Finally, intron-containing genes were left untouched as missing starts and ends for these genes could be due to a missing exon. We note that for intron-containing genes, we specifically use the protein2genome model in exonerate that explicitly attempts to predict all exons found in S. cerevisiae genes. This assumes that the presence of introns is conserved between S. cerevisiae and S. arboricolus.
We aligned the protein sequence orthologs for the sensu stricto using MAFFT with default settings, either with or without S. castellii orthologs as an outgroup. For the coding sequence analysis we inserted the gaps back into the DNA sequences. Phylogenetic analysis was performed using PAML, either with the codon model for the DNA sequence analysis or with empirical model for the amino acid analysis. Because we are only concerned with the placement of S. arboricolus within the established sensu stricto yeast phylogeny, we compared the likelihood of several putative tree topologies that differ only in the position of S. arboricolus (Figure S2).
To annotate tRNA coding sequences, we predicted tRNAs using tRNAscan-SE with default settings on the 16 assembled chromosomes. To determine whether or not these predicted tRNAs are syntenic with respect to S. cerevisiae we used an analogous strategy to that described above for gene annotations. tRNA coding sequences were annotated as syntenic orthologs if they were flanked by genes within S. arboricolus that were assigned as syntenic orthologs and if a tRNA was also found in S. cerevisiae between those genes. In all but one cases, the syntenic tRNAs code for the same amino acids.
Chromosomal structure plots
Chromosome structure plots for the Saccharomyces sensu stricto species were constructed using Mauve. Assembled chromosomes for S. paradoxus (strain CBS432) were obtained from and for S. mikatae (IFO 1815 T), S. kudriavzevii (IFO 1802 T) and S. bayanus var. uvaru m (strain CBS 7001) from. As these chromosome assemblies have been constructed partly by using the S. cerevisiae genome to orient and order scaffolds, alignments were also made to the unordered scaffolds using MUMmer to confirm the relative orientation of chromosomal segments inverted between species.
Mapping of the phenotype landscape of S. arboricolus
The bulk of the phenotypic data was taken from our recent publication on sensu stricto phenotypes where it was included for completeness but where S. arboricolus phenotypes were not specifically analyzed or considered. The data displayed as growth curves in this study correspond to novel confirmatory runs performed to ensure the reliability of specific statements. Three diploid isolates of Saccharomyces arboricolus were collected as described previously and long time stored in 20% glycerol at -80C. Isolates were subjected to high throughput phenotyping by micro-cultivation (n=2) in an array of environments (Additional file5: Table S2) essentially as previously described. For pre-cultivations, strains were inoculated in 350 μL of SD medium (0.14% yeast nitrogen base, 0.5% ammonium sulfate and 1% succinic acid; 2% (w/v) glucose; 0.077% Complete Supplement Mixture (CSM, ForMedium), pH set to 5.8 with NaOH or KOH) and incubated for 48 h at 30°C. For experiments where the removal of a specific media component was studied, the pre-culture was performed in absence of this component in order to completely deplete the component in question. For experiments where alternative nitrogen sources were used, two consecutive pre-cultures were performed, the first containing low amounts of ammonium sulphate (0.05%), the second replacing ammonium with the indicated nitrogen source in amounts corresponding to equivalent moles of N. For all experimental runs, strains were inoculated to an OD of 0.03 - 0.1 in 350 μL of SD medium and cultivated for 72 h in a Bioscreen analyser C (Growth curves Oy, Finland). Optical density was measured using a wide band (450-580 nm) filter. Incubation was at 30.0°C (±0.1°C) with ten minutes preheating time. Plates were subjected to shaking at highest shaking intensity with 60s of shaking every other minute. OD measurements were taken every 20 min. Strains were run in duplicates on separate plates with ten replicates of the universal S. cerevisiae reference strain BY4741 or its prototrophic mother S288C, in randomised (once) positions on each plate as a reference. The reproductive rate (population doubling time), lag (population adaptation time) and efficiency (population total change in density) were extracted from high density growth curves and put in relation to the corresponding fitness variables of the reference strain BY4741, or in conditions directly involving alterations of nitrogen content, its prototrophic mother S288C, as described previously. The derived Log2 ratios (Log2 (BY4741/isolate) or, in case of efficiency, Log2 (isolate/BY4741) were used for subsequent analysis.
Raw sequencing reads are available from the European Nucleotide Archive (EBI ENA) for the SOLiD reads [EMBL: ERP001702], Roche 454 single fragment reads [EMBL: ERP001703] and Roche 454 paired-end reads [EMBL: ERP001704]. The assembled genome is available from NCBI as Saccharomyces arboricola [GenBank: ALIE00000000].
We thank Dr. Feng-Yan Bay for kindly providing S. arboricolus strains. GL is supported by CNRS, ATIP-AVENIR and ARC. JW is supported by the Royal Swedish Academy of Sciences and the Carl Trygger foundation. CAM, CCS and CAN are supported by the Biotechnology and Biological Sciences Research Council (grant numbers BB/E023754/1, BB/G001596/1); CAN is a David Phillips Fellow. ANNB is supported by a postgraduate scholarship from the Natural Sciences and Engineering Research Council of Canada.
- Dujon B: Yeast evolutionary genomics. Nat Rev Genet. 2010, 11 (7): 512-524. 10.1038/nrg2811.View ArticlePubMed
- Nieduszynski CA, Liti G: From sequence to function: insights from natural variation in budding yeasts. Biochim Biophys Acta. 2011, 1810 (10): 959-966. 10.1016/j.bbagen.2011.02.004.PubMed CentralView ArticlePubMed
- Scannell DR, Zill OA, Rokas A, Payen C, Dunham MJ, Eisen MB, Rine J, Johnston M, Hittinger CT: The awesome power of yeast evolutionary genetics: new genome sequences and strain resources for the saccharomyces sensu stricto genus. G3. 2011, 1: 11-25. doi:10.1534/g3.111.000273.PubMed CentralView ArticlePubMed
- Cliften P, Sudarsanam P, Desikan A, Fulton L, Fulton B, Majors J, Waterston R, Cohen BA, Johnston M: Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science. 2003, 301 (5629): 71-76. 10.1126/science.1084337.View ArticlePubMed
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423 (6937): 241-254. 10.1038/nature01644.View ArticlePubMed
- Dujon B: Yeasts illustrate the molecular mechanisms of eukaryotic genome evolution. Trends Genet. 2006, 22 (7): 375-387. 10.1016/j.tig.2006.05.007.View ArticlePubMed
- Fischer G, James SA, Roberts IN, Oliver SG, Louis EJ: Chromosomal evolution in Saccharomyces. Nature. 2000, 405 (6785): 451-454. 10.1038/35013058.View ArticlePubMed
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104. 10.1038/nature02800.PubMed CentralView ArticlePubMed
- Nieduszynski CA, Knox Y, Donaldson AD: Genome-wide identification of replication origins in yeast by comparative genomics. Genes Dev. 2006, 20 (14): 1874-1879. 10.1101/gad.385306.PubMed CentralView ArticlePubMed
- Müller CA, Nieduszynski CA: Conservation of replication timing reveals global and local regulation of replication origin activity. Genome Res. 2012, 22 (10): 1953-1962. 10.1101/gr.139477.112.PubMed CentralView ArticlePubMed
- Wang SA, Bai FY: Saccharomyces arboricolus sp. nov., a yeast species from tree bark. Int J Syst Evol Microbiol. 2008, 58 (Pt 2): 510-514.View ArticlePubMed
- Naumov GI, Naumova ES, Masneuf-Pomarede I: Genetic identification of new biological species Saccharomyces arboricolus Wang et Bai. Antonie Van Leeuwenhoek. 2010, 98 (1): 1-7. 10.1007/s10482-010-9441-5.View ArticlePubMed
- Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M: Life with 6000 genes. Science. 1996, 274 (5287): 546-563-547. 10.1126/science.274.5287.546.View ArticlePubMed
- Ronaghi M: Pyrosequencing sheds light on DNA sequencing. Genome Res. 2001, 11 (1): 3-11. 10.1101/gr.11.1.3.View ArticlePubMed
- Liti G, Carter DM, Moses AM, Warringer J, Parts L, James SA, Davey RP, Roberts IN, Burt A, Koufopanou V: Population genomics of domestic and wild yeasts. Nature. 2009, 458 (7236): 337-341. 10.1038/nature07743.PubMed CentralView ArticlePubMed
- Siow CC, Nieduszynska SR, Muller CA, Nieduszynski CA: OriDB, the DNA replication origin database updated and extended. Nucleic Acids Res. 2012, 40 (1): D682-D686. 10.1093/nar/gkr1091.PubMed CentralView ArticlePubMed
- Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR: Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 2012, 40 (1): D700-D705. 10.1093/nar/gkr1029.PubMed CentralView ArticlePubMed
- Bon E, Neuveglise C, Casaregola S, Artiguenave F, Wincker P, Aigle M, Durrens P: Genomic exploration of the hemiascomycetous yeasts: 5. Saccharomyces bayanus var. uvarum. FEBS Lett. 2000, 487 (1): 37-41. 10.1016/S0014-5793(00)02276-6.View ArticlePubMed
- Fischer G, Neuveglise C, Durrens P, Gaillardin C, Dujon B: Evolution of gene order in the genomes of two related yeast species. Genome Res. 2001, 11 (12): 2009-2019. 10.1101/gr.212701.View ArticlePubMed
- Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005, 6: 31-10.1186/1471-2105-6-31.PubMed CentralView ArticlePubMed
- Byrne KP, Wolfe KH: The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. Genome Res. 2005, 15 (10): 1456-1461. 10.1101/gr.3672305.PubMed CentralView ArticlePubMed
- Besemer J, Lomsadze A, Borodovsky M: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001, 29 (12): 2607-2618. 10.1093/nar/29.12.2607.PubMed CentralView ArticlePubMed
- Brown CA, Murray AW, Verstrepen KJ: Rapid expansion and functional divergence of subtelomeric gene families in yeasts. Curr Biol. 2010, 20 (10): 895-903. 10.1016/j.cub.2010.04.027.PubMed CentralView ArticlePubMed
- Zill OA, Scannell D, Teytelman L, Rine J: Co-evolution of transcriptional silencing proteins and the DNA elements specifying their assembly. PLoS Biol. 2010, 8 (11): e1000550-10.1371/journal.pbio.1000550.PubMed CentralView ArticlePubMed
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964.PubMed CentralView ArticlePubMed
- Liti G, Peruffo A, James SA, Roberts IN, Louis EJ: Inferences of evolutionary relationships from a population survey of LTR-retrotransposons and telomeric-associated sequences in the Saccharomyces sensu stricto complex. Yeast. 2005, 22 (3): 177-192. 10.1002/yea.1200.View ArticlePubMed
- Naumov GI, James SA, Naumova ES, Louis EJ, Roberts IN: Three new species in the Saccharomyces sensu stricto complex: Saccharomyces cariocanus, Saccharomyces kudriavzevii and Saccharomyces mikatae. Int J Syst Evol Microbiol. 2000, 50 (Pt 5): 1931-1942.View ArticlePubMed
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A: The generic genome browser: a building block for a model organism system database. Genome Res. 2002, 12 (10): 1599-1610. 10.1101/gr.403602.PubMed CentralView ArticlePubMed
- McKay SJ, Vergara IA, Stajich JE: Using the Generic Synteny Browser (GBrowse_syn). Current protocols in bioinformatics. 2010, 31: 9.12.1-9.12.25.
- Warringer J, Zorgo E, Cubillos FA, Zia A, Gjuvsland A, Simpson JT, Forsmark A, Durbin R, Omholt SW, Louis EJ: Trait variation in yeast is defined by population history. PLoS Genet. 2011, 7 (6): e1002111-10.1371/journal.pgen.1002111.PubMed CentralView ArticlePubMed
- Song SH, Vieille C: Recent advances in the biological production of mannitol. Appl Microbiol Biotechnol. 2009, 84 (1): 55-62. 10.1007/s00253-009-2086-5.View ArticlePubMed
- Hall C, Dietrich FS: The reacquisition of biotin prototrophy in Saccharomyces cerevisiae involved horizontal gene transfer, gene duplication and gene clustering. Genetics. 2007, 177 (4): 2293-2307. 10.1534/genetics.107.074963.PubMed CentralView ArticlePubMed
- Hittinger CT, Goncalves P, Sampaio JP, Dover J, Johnston M, Rokas A: Remarkably ancient balanced polymorphisms in a multi-locus gene network. Nature. 2010, 464 (7285): 54-58. 10.1038/nature08791.PubMed CentralView ArticlePubMed
- Rokas A, Williams BL, King N, Carroll SB: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003, 425 (6960): 798-804. 10.1038/nature02053.View ArticlePubMed
- Argueso JL, Carazzolle MF, Mieczkowski PA, Duarte FM, Netto OV, Missawa SK, Galzerani F, Costa GG, Vidal RO, Noronha MF: Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production. Genome Res. 2009, 19 (12): 2258-2270. 10.1101/gr.091777.109.PubMed CentralView ArticlePubMed
- Liti G, Barton DB, Louis EJ: Sequence diversity, reproductive isolation and species concepts in Saccharomyces. Genetics. 2006, 174 (2): 839-850. 10.1534/genetics.106.062166.PubMed CentralView ArticlePubMed
- Novo M, Bigey F, Beyne E, Galeote V, Gavory F, Mallet S, Cambon B, Legras JL, Wincker P, Casaregola S: Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118. Proc Natl Acad Sci USA. 2009, 106 (38): 16333-16338. 10.1073/pnas.0904673106.PubMed CentralView ArticlePubMed
- Johnson LJ, Koufopanou V, Goddard MR, Hetherington R, Schafer SM, Burt A: Population genetics of the wild yeast Saccharomyces paradoxus. Genetics. 2004, 166 (1): 43-52. 10.1534/genetics.166.1.43.PubMed CentralView ArticlePubMed
- Kuehne HA, Murphy HA, Francis CA, Sniegowski PD: Allopatric divergence, secondary contact, and genetic isolation in wild yeast populations. Curr biol. 2007, 17 (5): 407-411. 10.1016/j.cub.2006.12.047.View ArticlePubMed
- Sampaio JP, Goncalves P: Natural populations of Saccharomyces kudriavzevii in Portugal are associated with oak bark and are sympatric with S. cerevisiae and S. paradoxus. Appl Environ Microbiol. 2008, 74 (7): 2144-2152. 10.1128/AEM.02396-07.PubMed CentralView ArticlePubMed
- Langkjaer RB, Nielsen ML, Daugaard PR, Liu W, Piskur J: Yeast chromosomes have been significantly reshaped during their evolutionary history. J Mol Biol. 2000, 304 (3): 271-288. 10.1006/jmbi.2000.4209.View ArticlePubMed
- Libkind D, Hittinger CT, Valerio E, Goncalves C, Dover J, Johnston M, Goncalves P, Sampaio JP: Microbe domestication and the identification of the wild genetic stock of lager-brewing yeast. Proc Natl Acad Sci USA. 2011, 108 (35): 14539-14544. 10.1073/pnas.1105430108.PubMed CentralView ArticlePubMed
- Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ: Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011, 12: 38-10.1186/1471-2105-12-38.PubMed CentralView ArticlePubMed
- Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT: Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009, 6 (9): 639-641. 10.1038/nmeth.1361.View ArticlePubMed
- Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M: Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008, 18 (12): 1979-1990. 10.1101/gr.081612.108.PubMed CentralView ArticlePubMed
- Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30 (14): 3059-3066. 10.1093/nar/gkf436.PubMed CentralView ArticlePubMed
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13 (5): 555-556.PubMed
- Darling AC, Mau B, Blattner FR, Perna NT: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004, 14 (7): 1394-1403. 10.1101/gr.2289704.PubMed CentralView ArticlePubMed
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol. 2004, 5 (2): R12-10.1186/gb-2004-5-2-r12.PubMed CentralView ArticlePubMed
- Warringer J, Anevski D, Liu B, Blomberg A: Chemogenetic fingerprinting by analysis of cellular growth dynamics. BMC Chem Biol. 2008, 8: 3-10.1186/1472-6769-8-3.PubMed CentralView ArticlePubMed
- Homer N, Merriman B, Nelson SF: BFAST: an alignment tool for large scale genome resequencing. PLoS One. 2009, 4 (11): e7767-10.1371/journal.pone.0007767.PubMed CentralView ArticlePubMed
- Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27 (2): 573-580. 10.1093/nar/27.2.573.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.