- Research article
- Open Access
Long-read based assembly and synteny analysis of a reference Drosophila subobscura genome reveals signatures of structural evolution driven by inversions recombination-suppression effects
© The Author(s). 2019
- Received: 18 December 2018
- Accepted: 6 March 2019
- Published: 18 March 2019
Drosophila subobscura has long been a central model in evolutionary genetics. Presently, its use is hindered by the lack of a reference genome. To bridge this gap, here we used PacBio long-read technology, together with the available wealth of genetic marker information, to assemble and annotate a high-quality nuclear and complete mitochondrial genome for the species. With the obtained assembly, we performed the first synteny analysis of genome structure evolution in the subobscura subgroup.
We generated a highly-contiguous ~ 129 Mb-long nuclear genome, consisting of six pseudochromosomes corresponding to the six chromosomes of a female haploid set, and a complete 15,764 bp-long mitogenome, and provide an account of their numbers and distributions of codifying and repetitive content. All 12 identified paracentric inversion differences in the subobscura subgroup would have originated by chromosomal breakage and repair, with some associated duplications, but no evidence of direct gene disruptions by the breakpoints. Between lineages, inversion fixation rates were 10 times higher in continental D. subobscura than in the two small oceanic-island endemics D. guanche and D. madeirensis. Within D. subobscura, we found contrasting ratios of chromosomal divergence to polymorphism between the A sex chromosome and the autosomes.
We present the first high-quality, long-read sequencing of a D. subobscura genome. Our findings generally support genome structure evolution in this species being driven indirectly, through the inversions’ recombination-suppression effects in maintaining sets of adaptive alleles together in the face of gene flow. The resources developed will serve to further establish the subobscura subgroup as model for comparative genomics and evolutionary indicator of global change.
- Genome structure evolution
- Inversion originating mechanisms
- Inversion fixation and polymorphism
- Spatiotemporally fluctuating selection
- Global change
Drosophila subobscura Collin  is a fruitfly species of the obscura group of the subgenus Sophophora endemic to, and common in Europe and the western Palearctic, where it spans over thirty latitudinal degrees commonly associated to forest fringes, from sea level to the timber line . The species was found to be unusual among Drosophila because it is entirely monandrous [3–5], does not mate in the absence of light [6, 7], and does not produce courtship-song by wing vibration .
The rise of D. subobscura to its current status as model organism for biological research owes to a long-held effort to understand the genetics and evolutionary biology of the species . Early investigations on its salivary gland nucleus revealed that it has the ancestral Drosophila karyotype of a small dot and five large acrocentric rods, does not show a chromocenter  and, especially, shows extraordinary levels of chromosomal polymorphism caused by large, cytologically visible paracentric inversions segregating on all five rods. Elaboration of detailed polytene drawings [11, 12] and photomaps [13–15] greatly facilitated the study of the inversions, and paved the way for subsequent development of the over 600 linkage  and cytologically mapped genetic markers presently available, which cover most of the euchromatic genome [17, 18].
Besides nuclear genetics studies, obtention of the first restriction, and at present the only map available for the D. subobscura mitogenome  allowed to identify an intriguing geographical pattern of variation with two major mitotypes, named I and II, that segregate at nearly equal frequencies in most populations, and which have associated measurable differences in fitness-related traits [20, 21].
The discovery of the two Macaronesian island-endemic species D. guanche Monclús , from the Canarian archipelago, and D. madeirensis Monclús , from Madeira allowed new possibilities for comparison. Together with D. subobscura, they form the subobscura subgroup . The three species are isolated reproductively from each other, except for D. madeirensis and D. subobscura [25, 26], which are capable of limited gene exchange in collinear genomic regions not affected by inversions . Hybrid males show extra sex combs, among other anomalies whose genetic basis and role in species formation has only begun to be elucidated [28–30]. Interestingly, the two island endemics show differences in gene ordering between them, and with respect to D. subobscura, but are thought to be monomorphic for inversions.
Of the various features of the D. subobscura model, its rich inversion polymorphism has received special attention . In total, more than 65 inversions have been identified, which range in length from ~ 1 Mb (e.g. inversion E20) to as long as ~ 11 Mb (O7). They include both simple and multiple overlapping inversions on the same chromosome, which appear strongly associated into about 90 different chromosomal rearrangements [9, 32]. All combined, structurally segregating regions represent approximately 83% of the species genome. The inversions are nonrandom as to their lengths and distribution of breakpoints along the chromosomes, with cytological evidence of multiply reused breakpoints in 26 cases (~ 20% ). Recently, breakpoint nucleotide sequences were determined for nine polymorphic inversions using in situ hybridization and chromosome walking methods, which found one case of breakpoint reuse, and overall supported a mechanism of inversion formation through chromosomal breakage and repair by non-homologous end joining, rather than through ectopic recombination [33, 34].
Inspired by the work of Dobzhansky et alia on natural populations of its Nearctic sister basal within the obscura group D. pseudoobscura, research on D. subobscura found the inversion frequencies in all major chromosomes to be highly structured according to both spatial and temporal environmental gradients. Specifically, chromosomal polymorphisms vary geographically between cold and warm climates , with genomewide warm climate inversion frequencies peaking in summer and dropping in winter repeatedly every year (and the reciprocal for the cold climate arrangements) . The introduction, rapid spread, and successful establishment of D. subobscura throughout the southern Neotropical  and western Nearctic  regions, from few colonizers , in contemporary time  attested for the high dispersal ability and potential for local adaptation of the species . The establishment of latitudinal patterns of the same sign across three separate territories  which, additionally, stood in contrast with the uniformity found for neutral nucleotide markers , further corroborated the adaptive significance of the chromosomal polymorphisms. On top of these patterns, southernmost populations of the species were found segregating for a sex-ratio distorting drive arrangement, whose carrier males have offspring consisting of only or mainly females [43, 44]. The realization that the frequencies of cold climate karyotypes are declining with the globally rising temperatures [45–47] expanded the interest on the species as indicator of evolutionary effects of contemporary global-warming [48–50]. In fact, the standing inversion variation, maintained by the spatiotemporally fluctuating thermal environment allowed a rapid genomewide evolutionary response in a time scale as short as “few days” during a sudden heatwave .
Although the recombination-suppression effects of inversions may not suffice to suppress gene flow in the inverted regions entirely [52, 53], it is strong enough to cause nucleotide variation in D. subobscura to be extensively structured in regions affected by the rearrangements , and to allow evolution of genomic islands of concerted evolution of ecologically-relevant gene families like Hsp70 . In the wild, inversions covariate with life-history and fitness-related traits . Until now, however, attempts to reproduce observed spatiotemporal patterns of inversions and their phenotypic associations under laboratory conditions have been largely unsuccessful [56, 57].
Many of the above and other findings would not have occurred without the previous development of the cherry-curled (ch-cu) recessive marker-  and the Varicose/Bare (Va/Ba) balancer-strains . Motivation to use D. subobscura as a model to continue research on central issues of evolutionary biology is, however, presently hindered by the lack of a reference genome for the species. Recently, one step to narrow this gap was taken with the publication of a short-read second-generation Illumina-based genome of D. guanche . In this paper, we took an additional step using flow-cytometry and long-read third-generation single-molecule real-time (SMRT) PacBio technology, together with the available wealth of genetic marker and synteny data, to assemble and annotate a high-quality nuclear and complete mitochondrial genome for D. subobscura, from our laboratory stock of the ch-cu strain. Long-read based assemblies are advantageous over short-read based ones because they are better at traversing across common repetitive structures, which results in more contiguous and complete assemblies. Our goals were two-fold. First, to provide a preliminary account of main features of the newly assembled genome and, second, to perform a comparative synteny analysis with D. guanche to trace the evolutionary history of fixed chromosomal rearrangement in the subobscura subgroup. Until now, this latter issue has been approached using wholly cytological methods [14, 25, 60] which are coarse-grained compared to the single-nucleotide resolution furnished by comparative genomics.
Knowing the sequence identity of synteny breakpoints can help determine both the evolutionary polarity of chromosomal rearrangement states by comparison with an outgroup, and the mechanism of rearrangement formation through assessment of remains of its molecular footprints. Drosophila inversions are commonly thought to originate by one of two major mechanisms, namely ectopic recombination, and chromosomal breakage and subsequent repair (reviewed in ). The first mechanism predicts occurrence of duplications on the flanks of the inverted segment in both the ancestral and the derived arrangement states, whereas the second predicts absence of duplications or their presence only in the derived state. Knowing how an inversion originated can shed light on why it evolved . Inversions can have direct, indirect, or both types of fitness effects. New inversions can themselves be direct targets of selection because of functional disruption by the breakpoints. The main importance of inversions, however, might stem indirectly from the fact that they suppress recombination in heterokaryotypes. Through their linkage generation effects, inversions can contribute to keep sets of adapted alleles together in the face of gene flow [63–65].
Size estimation and de novo long-read assembly of the D. subobscura genome
The genome size of the inbred ch-cu line was estimated using k-mer counting and flow cytometry methods. By the first method, GenomeScope (http://qb.cshl.edu/genomescope/ ) analysis of 21-mer frequencies obtained by Jellyfish (Ver. 2.2.4. ) using 20 million Illumina short (300 bp) reads  resulted in a genome size of 136.943 Mb. By the second, flow cytometry of PI-stained female brain cell nuclei using a 328.0 Mb genome from D. virilis  as internal standard resulted in a genome size of 148.069 Mb (0.151 pg ± 0.001; for the mean plus/minus one standard deviation across five replicates; see Methods). This latest measure conforms to previous flow cytometry-based estimates of the D. subobscura genome size (146.7 Mb [69, 70]); rounded to 150 Mb, it was the value set as genome size for the Canu assembler.
The PacBio 7 SMRT cells sequencing of the ch-cu genome generated a raw output of 1,252,701 subreads, hereon referred to as reads, with mean and longest read lengths of 8003 bp and 52,567 bp, respectively (Additional file 1: Table S1). These sequences totaled 10,025,366,103 bp, or a ~ 67-fold estimated genome coverage. The average yield per SMRT cell (~ 1,4 Gb) was on the upper bound of the manufacturer range for the typical SMRT cell (0.75–1.25 Gb ), which highlights the suitability of the used high-quality genomic DNA isolation protocol. Canu correction and trimming of the PacBio data retained 1,060,943 reads of 6103 bp average read length, or the equivalent to a 43-fold genome coverage for the assembly, well within Canu’s default sensitivity range (30-fold to 60-fold) (Additional file 1: Table S1). Of the 327 Canu-generated contigs, 115 (totaling 6624 Mb) showed evidence of foreign sequences. All the contigs in this subset were solely of bacterial origin, each being exclusively either from Acetobacter or from Providencia, which are genera known to be part of the Drosophila microbiome . After removing these contigs, the primary Canu assembly consisted of 212 contigs spanning 129.183 Mb, with an N50 of 3.129 Mb and a maximum contig length of 15.083 Mb (Additional file 1: Table S1).
A first round of quality control and scaffolding of the Canu contigs carried out combining recursively i) automated BLAT and BLASTN searches against the D. melanogaster and D. pseudoobscura genomes, and ii) evaluation of consistency with published data on the chromosomal position of 621 cytological (604) and genetic linkage (17) markers (Additional file 2: Table S2; see Methods) did not detect any misassembling. Scaffolding of the Canu contigs using SSPACE-LongRead (Ver. 1–1. ) resulted in 157 scaffolds. Submission of these scaffolds to a second round of quality control and scaffolding as in step one resulted in 186 validated scaffolds with a total length of 129,237 Mb (Additional file 1: Table S1). Half of the assembly was in 7 scaffolds longer than 5.954 Mb, while an additional 45% was in 44 scaffolds longer than 313 Kb. The GC content of the assembly was 45.0%, similar to that found for the close relative D. pseudoobscura (45.3%; r3.04 assembly ). Based on the available cytogenetic and genetic linkage marker data, it was possible to assign confidently genomic coordinates to 96.6% of the assembled sequence (63 scaffolds spanning 124,862 Mb, with half of it in 6 scaffolds longer than 8.237 Mb). On average, there were 10 markers per scaffold. A total of 38 scaffolds, representing 91.4% of the positioned sequence, were placed using ≥2 markers. The remaining 25, relatively shorter scaffolds with only 1 marker (10; average length 0.656 Mb) or 0 markers (15; 0.363 Mb) were placed confidently aided by synteny-based inferences of orthology with the close relative D. guanche and/or with D. melanogaster. Detailed information about the markers used for the anchoring, ordering and orientation of the scaffolds is provided in Additional file 2: Table S2.
Overview of D. subobscura nuclear pseudochromosome and mitochondrial reference genome assembly (N50: length of the contig for which 50% of the total assembly length is contained in scaffolds of that size or larger; L50: ranking order of the scaffold that defines the N50 length; lengths are in bp)
No. of scaffolds
Ab initio gene prediction and functional annotation
The complete ch-cu assembly was predicted to contain 13,939 protein-coding genes, nearly the same number as in the current release of the D. melanogaster genome (13,931; r6.18 assembly ). Of them, 13,317 (95.5%) were successfully annotated by the MAKER annotation pipeline, which corresponds to a gene density of one gene every 9.70 kb of the genome assembly. The average gene length was 3.502 kb. All genes combined span 46.635 Mb of coding sequence, with a GC content of 55.6%. The average number of exons and introns per gene was 4.6 and 3.6, with average (median) exon and intron lengths of 379 (213) bp and 529 (66) bp, respectively. A total of 87.2% of the genes were multi-exonic.
Of the 13,317 annotated protein-coding genes, 13,181 (99.0%) are placed in the six pseudochromosomes that were assigned genomic coordinates (Table 1). The numbers of annotated genes per pseudochromosome (dot: 91; A: 2322; J: 2452; U: 2496; E: 2591; and O: 3229) depart from the expected from pseudochromose length (G = 113.61; d.f. = 5, P < 10− 6), with E and U showing, respectively, the greatest excess (393 genes) and deficiency (228), in line with previous findings in D. melanogaster . With respect to the small subset of genes that could no be assigned precise genomic coordinates (139), most of them (89) were anchored to chromosomes. In addition, 3090 non-coding genes were annotated, including 1191 and 1899 short and long non-coding RNA genes, respectively. Of note, the 5S rDNA gene family was found to consist of > 160 copies of the 5S rDNA repeat unit, tandemly arranged in one cluster located on the distal end of segment II of autosome O, in agreement with early in situ hybridization results . Also, we identified > 80 copies of the 18S–28S rDNA repeat unit distributed over the 19 rDNA annotated scaffolds. With respect to the relatively more rapidly evolving lncRNA genes, BLASTing with FlyBase lncRNA (Dmel_Release_6) detected 1898 out of the 2965 lncRNA annotated genes, with a strong bias towards the longer ones (10.2 kb vs. 1.2 kb, for the average lengths of detected vs. undetected lncRNAs, respectively).
The high-quality of the genome assembly and annotation is further buttressed on three validation metrics. Firstly, the overall size of the assembly (129.237 Mb) closely matches the estimated size of the genome using the k-mer counting (94.4% of 136.943 Mb) and flow-cytometry (87.3% of 148.069 Mb) methods. Secondly, both the low values of the average and median of the MAKER-defined AED scores (0.127 and 0.070, respectively), and the fact that nearly all genes attained AED scores lower than 0.5 (AED50 = 97.9%) are indicative of a good agreement between the annotations and their evidence. And thirdly, BUSCO analysis using the 2799 25-dipterans orthologous gene set resulted in 96.5% (2671) single complete genes, 0.5% (14) duplicated complete genes, and 3.0% (84) fragmented. Only 1.1% (30) of the BUSCO genes were missing, indicating that the assembly is nearly complete.
Phylogenetic placement of the D. subobscura genome and age of the subobscura subgroup
To further assess the quality of the obtained genome, we subjected it to a phylogenetic analysis together with closely related species with known relationships. We took advantage of the carefully curated 12 Drosophila multiple sequence alignment (MSA) data set used by Obbard et al.  (see also ). The MSA consists of 67,008 characters from 50 concatenated nuclear protein-coding loci selected for (i) having only 1:1 orthologs, (ii) including an exon longer than 700 bp, and (iii) not showing unusually high codon usage bias. To this MSA, we added the corresponding reciprocal-BLAST-identified orthologs from D. subobscura and D. guanche using MAFFT (Ver7; http://mafft.cbrc.jp/alignment/software/), and then identified the best-fit model of sequence evolution (GTR + G + I; with α = 0.53, and I = 0.27) for maximum-likelihood (ML) tree estimation using MEGA7 . The resulting tree topology (Additional file 3: Figure S1) is consistent with the known phylogeny of the species. Using this topology, and the RelTime-ML method  with the mutation rate-based estimates found by Obbard et al.  to perform best as calibration dates, the age of the subobscura species subgroup was found to be 1.72 ± 0.51 Mya (Additional file 3: Figure S1). This estimate is at the lower bound of published dates for this divergence, which were all based on one or few available markers (ranging from 1.8 to 8.8Mya, median 2.75Mya [27, 81–84]).
Mitochondrial genome identification and annotation
As a quality check, the obtained mitogenome was subjected to a phylogenetic analysis with available mitogenomes from the same 13 Drosophila as above. As the mitogenomic sequence from D. guanche  was found containing multiple unusual features as to size (20.7 kb) and number of duplications and rearrangements, this analysis was focused only on the PCG regions. We performed multiple sequence alignment (MSA) separately for each of the 13 PCGs using MAFFT (version 7; http://mafft.cbrc.jp/alignment/software/ ) followed by evaluation of the 13 MSAs using Gblocks  and visual inspection. Then, we merged the MSAs into a single, 11,244 characters long MSA, identified the best-fit maximum-likelihood evolutionary model of the concatenated MSA (GTR + G + I; with α = 0.29, and I = 0.34), and used this model to find the maximum-likelihood tree using MEGA7 . The resulting mitogenomic tree topology was concordant with the known phylogeny of the species (Additional file 3: Figure S1).
Genomic distribution of repetitive DNAs
A 14.3% of the ch-cu genome was annotated as repetitive (Additional file 5: Table S4). This density of repetitive sequence is low compared to estimates from other obscura group species, including the close relative D. guanche (30% ), and the more distantly related D. pseudoobscura (23.9%; release R3.04) and D. persimilis (39.0%; release R1.3). That D. subobscura has a relatively compact, less repetitive genome is further supported by available measures of genome size obtained using flow cytometry from brain cell nuclei: its genome is nearly 30 Mb smaller than that of D. guanche (167.230 Mb), and the smallest of the ten obscura group species values stored in the animal genome size database [70, 89].
Repetitive DNAs were analyzed by classifying them into five categories: long terminal repeat (LTR) and non-LTR retrotransposons, DNA transposons, satellites (including sat290 and SGM-sat), and simple repeats or microsatellites. The extrinsic null of no deviation in repeat number content from the expected from relative chromosomal length was tested using G-tests among all six chromosomes, between A and the four large autosomes, and among the four large autosomes. Overall, there were significant differences in proportion of repetitive sequence among the six chromosomes, whether repeats were considered together or separately by category (all G-tests: P < 10− 6). The dot showed the largest aggregated excess (2.5-fold; 2.7% of total repeat number content), because it showed 2.6 (3.0%), 4.1 (4.8%) and 4.9-fold (5.7%) more non-LTRs, DNA transposons and satellites than expected from its length, whereas A was the only chromosome that showed a consistent excess of repetitive sequence across all five repeat categories, particularly microsatellites (1.5-fold; 26.9%). Comparatively, the four large autosomes showed a dearth of repetitive DNA. When the dot and the A were excluded from the analysis the magnitude of the deviations in amount of repetitive sequence dropped markedly [with the single exception that the E chromosome shows a 1.3-fold (27.7%) excess of non-LTRs], and no definite pattern emerged.
Satellites Sat290 and SGM-sat have gathered special interest. Sat290 is a 290 bp repeat satellite . Early in situ hybridization studies in the three members of the subobscura subgroup concluded that sat290 was absent in D. madeirensis and D. subobscura, and that in D. guanche the repeat comprised a major satDNA class distributed in centromeric regions . SGM-sat would be derived from the MITE-like SGM-IS transposable element that was already present in the last common ancestor of the obscura group . The repeat underwent a species-specific expansion in D. guanche, which gave rise to another major satDNA class in this species. Some of these findings were reassessed by a recent study of the D. guanche genome combining Illumina short-read whole genome sequencing and dual-color fluorescence in situ hybridization . The study found sat290 and SGM-sat to comprise the first and second most abundant satDNAs of D. guanche, respectively, adding up to ~ 30% of the species’ genome. In addition, SGM-sats were found to be concentrated in the centromeres, but in more peripheral positions relative to the chromosome ends than sat290s. In contrast with this picture, our initial characterization of these repeats in the ch-cu assembly showed that sat290 is present in D. subobscura, and in non negligible numbers (637), of which nearly one half are dispersed throughout the euchromatin. SGM showed a similar pattern, but conversely to the situation in D. guanche, in D. subobscura SGM sequences are 8-fold more abundant than sat290s (Additional file 5: Table S4).
Overall, our preliminary screen of the genomic distribution of repetitive DNAs did not find evidence of an association between repeat density and numbers of segregating chromosomal rearrangements. For example, the J and E chromosomes, which are about the same size show comparable percentages and distributional patterns of repetitive sequence (~ 11.0%; Fig. 3), in spite that the former exhibit 4-fold lower number of polymorphic inversions than the latter (5 vs. 22, respectively ).
Orthologous group assignment and variation in gene family size
OrthoMCL clustered the 152,068 PCGs in the Drosophila pan-genome dataset into 23,394 orthologous groups, of which 8390 (35.9%) formed the core set shared by all 14 species (Additional file 6: Figure S2). Of this core set, 6293 were single-copy gene families. D. subobscura contained 10,483 orthologous groups, including 904 (8,6%; 965 genes) lineage-specific, of which 867 were single-copy orphans. These numbers and categories of orthologous groups are similar to those obtained for its close relative D. guanche in a previous comparison of the same 13 Drosophila, excluding D. subobscura (10,417 orthologous groups, including 838 species-specific, of which 828 were orphans ). Also, the number of orphan genes in D. subobscura is within the range of those estimated for the other 13 Drosophila, which varies from 294 in D. erecta to 2341 in D. persimilis (Additional file 6: Figure S2).
CAFE analysis (see the Methods section for details) carried out without taking into account variation in genome quality across genomes indicates that the best description of the data is provided by the five λ model, which distinguishes average fast- (λF), medium- (λM) and slow-evolving (λS) branches, in addition to allowing the terminal branches leading to D. subobscura and D. guanche to have their own rates (λDs and λDg; Additional file 7: Table S5). According to this model, the rate of gene family size evolution in these two lineages would be of the same order of magnitude as the average fast rate (λDs = 0.0257 and λDg = 0.0191, vs. λF = 0.0216). Adding a global error term (ε) improves the model fit significantly (−2ΔL = 191,98; p < 1 × e− 6; 1 d.f.), which indicates an effect of variation in quality across genomes. The effect, measured as the ratio (λ-λε)/λ , is lowest for slow-evolving lineages (24%) and D. subobscura (24%), and largest for fast-evolving lineages (43%) and D. guanche (41%). The best score of D. subobscura compared to D. guanche according to this criterion may be a reflection of a greater contiguity of the assembly provided by the PacBio long-read sequencing used in the first case, compared to the Illumina short-read sequencing used in the second.
The CAFE five λ model with global error term indicates that, of the 9155 gene families inferred to have been present in the Drosophila most recent common ancestor, 567 have increased and 636 decreased in size in the terminal branch leading to D. subobscura (Additional file 8: Figure S3). Of them, 62 show significant expansions (43; 272 genes) or contractions (19; 121 genes) relative to the genome-wide average (P < 0.01; Additional file 9: Figure S4). Functional enrichment analysis of the rapidly evolving families showed the expanding and contracting families to be significantly enriched for 52 and 77 GO terms, respectively (Additional file 10: Table S6 and Additional file 11: Table S7). Most-encompassing GO terms associated with the families that have expanded include, among others, ‘thermosensory behavior’ (GO:0040040) (Additional files 12, 13 and 14: Figures S5-S7), and those associated with the families that have contracted include ‘sensory perception of sound’ (GO:0007605) and ‘response to red light’ (GO:0010114) (Additional files 15, 16 and 17: Figures S8-S10). These terms appear particularly noteworthy considering the continuing role of D. subobscura as a model for research on insect thermal biology, and that, as previous research has shown, the species may be unique within the obscura group in not producing courtship auditory cues by wing vibration , and being unable to mate in the dark [6, 7, 93]. We hope that these results will stimulate future research on the role that those gene families play in the functional biology of D. subobscura.
Evolutionary history of chromosomal rearrangement in the subobscura subgroup
Comparative synteny mapping of the genome of D. subobscura with those of three increasingly distant relatives, namely D. guanche, D. pseudoobscura and D. melanogaster using SyMAP showed the amount of genome rearrangement to scale up with evolutionary distance. Aggregated across the five Muller elements, D. subobscura synteny with each of the aforementioned species is fragmented into an increasingly larger number of increasingly smaller blocks: 31 blocks of 3.952 Mb average size (13 inverted), 333 of 0.345 Mb (164), and 540 of 0.220 Mb (264), respectively (Additional file 18: Table S8 and Additional file 19: Table S9). Chromosome A shows the greatest degree of synteny fragmentation in all three pairwise species comparisons (12, 90, and 125 vs. 5, 61, and 104 blocks, for A vs. the average autosome), in agreement with reported higher rates of rearrangement evolution for this Muller element compared to the autosomes [65, 94].
Of those 13 rearrangement differences, nine are thought to be fixed between the two species, including Ah1-Ah3, A5 and A6, JST, Eg1 and EST, and Oms; three are thought to be fixed in D. guanche and polymorphic in D. subobscura, including Ah4 (assumed to be the same as D. subobscura’s A1), U1 and U2; and one, namely O4, it is found only as polymorphic in D. subobscura  (here, it may be helpful to recall that the ch-cu homokaryotypic strain used to represent D. subobscura is standard for all chromosomes except chromosome O, for which it is O3 + 4; see below). For none of these 13 inversions, except O4 , the nucleotide sequences of their breakpoints have been molecularly characterized. Yet this knowledge could allow testing current cytology-based ideas about the identities and evolutionary polarities of the rearrangement states, as well as ascertaining their originating mechanisms through assessment of remains of their molecular footprints.
To further validate the high quality of the newly obtained D. subobscura genome, we applied it to determining the unknown breakpoint sequences of the aforelisted 12 inversions as follows (Additional file 20: Figure S11). We defined synteny breakpoint as the nucleotide interval between contiguous SyMAP synteny blocks. Suppose two orthologous gene arrangements A|BC|D and A|CB|D in taxa 1 and 2, respectively, where the second arrangement is identical to the first one, but for the inverted sequence CB, with the vertical lines denoting the inversion breakpoints. If e.g. region A|B, spanning the proximal breakpoint in taxon 1 plus 5 kb towards the inside of each of its two flanking synteny blocks is BLASTed against the genome of taxon 2, there should produce two hits, one in locus A, and the second one in locus B. In addition, each hit should carry associated an alignment overhang due to lack of homology between B and C, and between A and D, respectively; and the coordinates of the hits should match the SyMAP coordinates for the breakpoints spanning the inversion. Furthermore, the results from breakpoints of the same inversion must be reciprocally consistent, regardless the taxon used as query.
By the above described approach, we were able to isolate and characterize the putative breakpoint sequences of all the 12 targeted inversions. The results challenge previous cytology-based assumptions about the identity and evolutionary polarity for some of the rearrangement states. Additional file 21: Table S10 and Fig. 4 summarize the main results. The proximal half of chromosome A provides an all-embracing example. In this region, the structural transition between the two genomes requires minimally four inversions (Ah1-Ah4; Fig. 4). Cytological evidence for shared breakpoints supporting that Ah4 is the same inversion as A1, led to postulate that Ah1-Ah3 were fixed in the lineage of D. guanche [14, 25]. Recently, the proximal and distal breakpoints of inversion A1 segregating in D. subobscura were assessed by a mixed approach combining cytological and molecular methods . Although the attempt was unsuccessful, it managed to narrow them down to within a few kilobases distal to the markers cm (CG3035) and dod (CG17051), respectively. The coordinates of those markers in the newly obtained ch-cu, i.e., AST, genome (chrA:1,206,180 bp and chrA:8,875,126 bp, respectively) lie more than half a megabase proximal to their corresponding nearest breakpoint of the Ah4 inversion separating the ch-cu strain from D. guanche (chrA:692,605 bp and chrA:8,198,726 bp; Additional file 21: Table S10). This finding indicates that the previously supposed-to-be same inversion shared by the two species, i.e., Ah4 equal to A1, in fact represents two different inversions that originated separately.
Comparative analysis of gene arrangement of the regions around the breakpoints of Ah4 in D. subobscura and D. guanche with those in the outgroup D. melanogaster indicates that D. guanche shows the ancestral arrangement state (CG2076, CG2081, CG18085, |, CG15203, CG1537, CG1545, and CG32677, CG43347, CG1628, |, CG15302, CG32683, CG2096; for the proximal and distal breakpoints, respectively, in both D. guanche and D. melanogaster; with the vertical lines denoting the inversion breakpoints; Additional file 21: Table S10), whereas D. subobscura shows the derived state (i.e., CG2076, CG2081, CG18085, |, CG1628, CG43347, CG32677, and CG1545, CG1537, CG15203, |, CG15302, CG32683, CG2096; Additional file 21: Table S10). In addition, no evidence was found for duplicated and/or repetitive sequences in the breakpoint regions from reciprocal BLAST searches, which supports that the inversion originated through a chromosomal breakage mechanism, either straight-breaks, or nearly straight-breaks, i.e., staggered-breaks whose resulting duplications are too short to leave long-lasting traces . Be that as it may, no gene was found to have been directly disrupted by the inversion, suggesting that Ah4 may have been favored indirectly because of its recombination suppression effects.
Apart from the example of Ah4, it is worth pinpointing the cases of A6 and the pair U1 and U2. The first inversion seems a reversal of the telomeric end of chromosome A. Alternatively, it could be subtelomeric , and that the tip of the chromosome not affected by the inversion was not included in the assembly. In any case, the rearrangement produced the peritelomeric peak of transposable element repetitive content shown in Fig. 3. With respect to the pair U1 and U2, available cytological evidence could not distinguish between the distal breakpoint of U1 and the proximal breakpoint of U2, pointing to an instance of breakpoint reuse . However, from the assembly the two breakpoints are clearly distinct, although they are only 31 kb distant from each other (Additional file 21: Table S10).
Between lineages, considering only rearrangement replacements, D. subobscura has evolved at a rate of 5.6 inversions/Myr (assuming 1.72 Myr to the common ancestor of the species subgroup; Additional file 3: Figure S1), which is over 10 times higher than that for the average island endemic (0.4 inversion/Myr; assuming 0.92 Myr to the split of D. madeirensis ). The difference is highly significant (P = 3.3 × 10−5, Poisson distribution). The lower rate of rearrangement accumulation in D. guanche and D. madeirensis compared to that in D. subobscura could be a reflection of a lower rate of rearrangement formation in small-sized island species. Another, not mutually exclusive possibility is that the difference could be related to that D. guanche and D. madeirensis remained localized to the small oceanic islands in which they arose, which have maintained relatively homogenous conditions , whereas, in comparison, D. subobscura is vastly distributed across multiple contrasting environments with high dispersion. This latter situation was shown to result in increased rates of structural evolution, when the evolutionary fate of inversions is driven by their effect in keeping sets of positively selected alleles together against maladaptive gene flow .
The role of the inversions recombination-suppression effect as one driver of genome structural evolution in the subobscura subgroup is further supported by the observed ratios of chromosomal divergence to polymorphism between the A sex chromosome and the autosomes in D. subobscura. In this lineage, compared to the average autosome the A chromosome shows 8 times larger inversion fixation rate (0.44 vs. 3.5 inversions/Myr, respectively; P = 0.006, Poisson distribution; so-called “faster-X” pattern ), while 1.8 fewer presently segregating inversions (14 vs. 8, respectively ). These conclusions remain qualitatively the same after accounting for chromosome length. The observation of contrasting ratios of polymorphism to divergence between the A sex chromosome and the autosomes agrees with expectations from positive selection models of inversion evolution as byproduct of their recombination-suppression effects in the face of gene flow, which explain this pattern as resulting from: (i) the higher efficiency of negative selection against locally recessive maladaptive alleles at A-linked genes, whereby A-linked inversions would be expected to capture higher-fitness genotypes with greater probability of fixation; and (ii) the higher likelihood that recessive deleterious alleles generate associative overdominance on autosomes, which would hinder autosomal inversions from fixation .
While we may have identified a signature of indirect inversion effects in driving the observed non-random patterns of genome structure evolution in the subobscura subgroup, that would not preclude the contribution of other mechanisms. Two would seem be particularly plausible and better suited to be assessed with the data on hand, including mutational biases in the formation of new inversions and direct inversion effects. Overall, however, no positive evidence for any of these two mechanisms could be obtained in the present study. With respect to the first, the observed accelerated rate of structural evolution of the A chromosome compared to the autosomes in D. subobscura could result from a bias in the formation of inversions arising from the comparatively higher repetitive content of the A chromosome (for example, if inversions tended to originate by ectopic recombination ). However, reciprocal BLASTN searches using the inversions breakpoints did not detect evidence for an enhanced repeat-based formation of A-linked relative to autosomal inversions. With respect to direct inversion effects, we provide a discussion of our findings in the context of related results below.
Figure 5 shows that, in all five inversion-rich chromosomes of D. subobscura, presently segregating standard structural variants arose in the mainland after the split of D. guanche. In addition to these finding, all the 12 inversions were inferred to have originated by chromosomal breakage. In 4 of the cases, the presence of duplicated sequences in opposite orientation on the flanks of the derived rearrangement provided clear-cut evidence of an origin by staggered breaks, including U1 (689 bp-long duplication), U2 (1007 bp), EST (513 bp) and Oms (538 bp) (Additional file 21: Table S10). The remaining 8 cases could have originated through straight- or nearly straight-breaks. In no case evidence for gene disruptions at the breakpoints could be found, which does not support direct positive selection on the inversions as a major driver of genome structure evolution in the subobscura subgroup (see above). Overall, our results suggest that chromosomal breakage is the dominant originating mechanism for inversions in the subgenus Sophophora. This contrasts with the situation in the subgenus Drosophila, in which inversions appear to originate mainly via ectopic recombination. Although its causes remain to be understood, this difference supports that inversions can arise by alternative major mechanisms in different lineages .
We presented the first high-quality, long read-based nuclear and complete mitochondrial genome for D. subobscura, and applied it to a synteny analysis of the evolution of genome structure in the subobscura species subgroup. We found the sequenced genome to exhibit a relatively compact size, compared to known values from the obscura group. SGM-sat and sat290 represent the first and second most abundant satDNAs classes, conversely to the situation in the close relative D. guanche. D. subobscura exhibits the highest rate of accumulation of paracentric inversions of its subgroup. All identified inversions originated by chromosomal breakage, which adds to the evidence favoring this as the prevailing mechanism of inversion formation in the Sophophora subgenus of Drosophila. No evidence for direct gene disruption at the inversions breakpoints was found. This observation, together with the finding of contrasting ratios of inversion fixation to polymorphism between the A sex chromosome and the autosomes, overall suggests that the evolution of genome structure in the lineage leading to D. subobscura has been driven indirectly, through the inversions recombination-suppression effects in keeping sets of adaptive alleles together in the face of the high dispersion ability of the species. We have built a genome browser and a BLAST server (http://dsubobscura.serveftp.com/) to facilitate the further use of this resource.
D. subobscura karyotype and chromosome arrangement designation
D. subobscura has six pairs of chromosomes: five acrocentric and one dot. The five acrocentric chromosomes are symbolized by the alphabet vowels capitalized: A (the sex chromosome; Muller’s element A, homologous to X in D. melanogaster); I, commonly replaced by J (D, 3 L); U (B, 2 L); E (C, 2R); and O (E, 3R) [9, 11]. The species karyotype is divided into 100 numbered sections as follows: A (1–16), J (17–35), U (36–53), E (54–74), O (75–99) and dot (100). Each section is subdivided into 3–5 lettered subsections (from A to E ).
Gene arrangements are denoted by subscripts next to chromosome symbols (ST: standard; otherwise: alternative arrangements to ST). Overlapping inversions are denoted by underlines below number subscripts . The O chromosome has been particularly amenable for study of structural variation, for it is the only chromosome for which a balanced lethal strain (namely, the Varicose/Bare, or abbreviated Va/Ba balancer stock ) is available. By convention, the O chromosome is divided into two segments, designated I (sections 91 to 99) and II (sections 75 to 90), which are located distal and proximal to the centromere, respectively. The structural variant of the O chromosome used in this study is designated O3 + 4, a rearrangement of segment I thought to have originated by superimposition of inversion 4 on the ancestral, and now extinct in D. subobscura gene order O3. It may be worth noting here, that the findings herein show that the immediate ancestral state to inversion 4 is not O3, but arrangement Oms, previously called Og because it was thought to have originated in D. guanche (see the Results and Discussion section, and Fig. 5 legend).
D. subobscura lines
We used one inbred line for de novo complete genome assembly using PacBio long-read data. The inbred line was obtained by 10 generations of full-sib mating of progeny of a single gravid female from our highly homozygous laboratory stock of the ch-cu marker strain. The ch-cu strain was established by Loukas et al.  from flies descended from the “β-ch-cu-stock” [101, 102]. Structurally, it is homokaryotypic for the ST arrangements in all chromosomes except in chromosome O, for which it is homokaryotypic for the O3 + 4 configuration. Crossing schemes and the methods for polytene chromosome staining and identification are described elsewhere . The assayed inbred line was stored frozen at − 80 °C immediately upon obtention.
Genome size estimation by flow cytometry
Genome size of adult D. subobscura ch-cu was quantified from five replicates of brain cell nuclei using propidium-iodide (PI) based flow cytometry . By this method, the size of a target genome is estimated by comparing stain uptake of the target genome (PI–fluor target), with that of a standard genome of known size (PI–fluorstandard). A D. virilis strain with known 328 Mb genome size  was used as the standard.
Nuclei were extracted from samples of 10–80 °C—frozen heads from four-days-old ice-immobilized females, each including 5 heads from ch-cu and 5 heads from the standard. Each sample was transferred to a glass/glass homogenizer (Kontes Dounce Tissue Grinder 7 ml), ground on ice-cold LB Galbraith buffer using the large clearance pestle (pestle A), and the homogenates filtered through nylon mesh (20 μm). The filtrates were stained for 2 h in 50 μg ml− 1 PI, and subsequently analyzed on a BD Biosciences (BDB) Dual Laser FACSalibur (Becton Dickinson, Franklin Lakes, NJ, USA) flow cytometer, using the forward (FS) and side (SS) scattering, together with the red PI fluorescence (> 670 nm) detected by the FL3 detector. Data were generated at low flow rate (~ 1000 nuclei/min). Data analysis was performed using the BD FACSDiva 4.0 software (BD Biosciences, San José, CA, USA). Individual nuclei were gated from aggregates and debris by their area (FL3-A) vs. width (FL3-W) fluorescence signal. Measures were obtained from a minimum of 10,000 nuclei per sample. Genome sizes were estimated using the formula: GSD. subobscura (ch-cu) = GSD. virilis × (PI–fluor D. subobscura (ch-cu)) / (PI–Fluord. virilis).
High molecular weight genomic DNA isolation and PacBio whole-genome sequencing
High-quality high molecular weight gDNA was obtained from 60 mg mixes of − 80 °C frozen adult males and females, using a modified version of the phenol/chloroform method of Chen et al.,  that yields ~ 25 μg of high quality DNA per assay, as assessed by NanoDrop ND1000 (NanoDrop Technologies Inc., Wilmington, DE, USA) spectrophotometer and standard agarose gel electrophoresis. The genome of the inbred ch-cu line was sequenced to nominal 40-fold genome coverage using PacBio (Pacific Biosciences, Menlo Park, CA, USA) RSII single-molecule real-time (SMRT) technology from a 20-kb SMRTbell template library, using P6-C4 chemistry and seven SMRT cells. Libraries construction and PacBio sequencing were outsourced to Macrogen (Macrogen Inc., Seoul, South Korea).
De novo genome assembly
Raw PacBio reads were assembled using the Canu assembler (Ver. 1.5 ) on recommended settings for read error correction, trimming and assembly, and genome size set at 150 Mb (see below). In addition, we also tried HINGE , FALCON  and MECAT . Compared to Canu, these alternative bioinformatics pipelines produced smaller and less contiguous assemblies on our data. These analyses were performed on a 2.80-GHz 8-CPU Intel Xeon 64-bit 32 GB-RAM computer running Ubuntu 16.04 LTS.
Chromosomal assignment, ordering and orientation of Canu contigs was accomplished in four steps. In step I, the contigs were checked for the presence of inter- and intra-chromosomal chimeras using a semi-automatic recursive approach combining: i) cross-species synteny information inferred using BLAT  and BLASTN , setting the genome of D. melanogaster (release r6.22) and the more closely related, yet not so-well characterized genome of D. pseudoobscura (r3.04) as the reference. Here, BLASTN was used in relatively few cases where BLAT either did not return a hit, returned multiple equal score hits, or returned a hit to scaffold unknown from D. pseudoobscura. The first of these three cases involved short and fast-evolving contigs and bacterial contigs; the second one involved contigs containing Repbase (Ver. 20,150,897 ) identified repetitive sequences, which were re-examined after masking of the repeats; in the third case, BLASTN was used to confirm that the target contig mapped exclusively to one scaffold. Cross-species synteny information obtained in this way was combined with ii) the wealth of available D. subobscura’s physical mapping [18, 84, 111] and genetic linkage [13, 112, 113] data. Markers’ sequences were retrieved from FlyBase 2.0 (release FB2017_02) using gene names and/or annotation symbols provided by the authors. In step II, Canu contigs that passed step I were scaffolded using SSPACE-LongRead (Ver. 1–1 ). In step III, the resulting SSPACE scaffolds were submitted to a second round of quality check as in step I. In step IV, the assembled sequence that passed step III was assigned genomic coordinates based on the physical location of the markers.
Prediction and annotation of the genome assembly was conducted using MAKER (Ver. 3.01.02. -beta [114, 115]) annotation pipeline with default parameters. Repetitive elements were identified using RepeatMasker (Ver. 4.0.6 ) combined with the Drosophila genus specific repeat library included in Repbase database. Two previously described satellites, namely sat290  and SGM-sat  were absent from the Repbase database, thereby they were ascertained separately by BLAST search using already available sequences from the D. guanche sat290  and the D. subobscura SGM-sat (GenBank accession AF043638.1 ) as queries. SNAP , AUGUSTUS , GeneMark-ES , and geneid  were selected for ab initio gene model prediction on the repeat masked genome sequence. Proteomes from 12 Drosophila species from Flybase database (FB2017_05, released October 25, 2017 ), and additional 491 D. subobscura protein sequences from UniProt database (release 2017_12 ) were used in the analysis.
The quality of the annotation was controlled using the Annotation Edit Distance (AED) metric . AED values are bounded between 0 and 1; an AED value of 0 indicates perfect agreement of the annotation to aligned evidence. Conversely, a value of 1 indicated no evidence support.
Functional annotation of MAKER-predicted proteins was made by BLASTP (Ver. 2.6.0+) searches against the Drosophila UniProt-SwissProt manually curated datasets . Prediction of protein functional domains was accomplished using InterProScan (Ver. 5.29–68.0 ) on the Pfam , InterPro , and Gene Ontology (GO) [128, 129] domain databases. UniProt-SwissProt BLAST and InterProScan functional assignments were extracted using the ANNotation Information Extractor (ANNIE ), which assigns gene names and products by database cross-referencing. InterProScan functional assignments were mapped to Gene Ontology (GO) terms using Blast2GO (Ver. 5.0.13 ). The combined graph function of Blast2GO was used to generate gene ontology graphs and pie charts from the GO terms.
Genome assembly and annotation completeness was gauged using the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool (BUSCO, Ver. 3 ) analysis against the diptera_odb9 dataset, which contains 2799 highly-conserved, single-copy genes likely to be present in any dipteran genome. The dipteran set was selected, because being the most narrowly defined Drosophila-including set, it is also the largest, therefore the one expected to provide the best resolution.
Mitogenome assembly and annotation
Annotation of the D. subobscura mitogenome was conducted using the MITOS online tool (http://mitos.bioinf.uni-lipzig.de/index.py ), with default settings, metazoan reference, and invertebrate genetic code, and was further adjusted manually according to its alignment with available mitogenomes from other Drosophila species.
Orthologous group assignment and gene family expansion/contraction analyses
The complete set of D. subobscura annotated proteins were clustered into orthologous groups by comparison with the 12 Drosophila genomes (FlyBase releases dana_R1.06, dere_R1.05, dgri_R1.05, dmel_R6.22, dmoj_R1.04, dper_R1.3, dpse_R3.04, dsec_R1.3, dsim_R2.02, dvir_R1.07, dwil_R1.05, and dyak_R1.05), plus that of its close relative D. guanche (dgua_R1.0 ). Orthologous group assignment was conducted using OrthoMCL (Ver. 5 ) on default settings. OrthoMCL generates orthologous groups via all-to-all BLASTP comparison followed by Markov clustering of the reciprocal best similarity pairs.
Analysis of gene family expansion and contraction was conducted using the Computational Analysis of gene Family Evolution (CAFE Ver. 3.1 ) tool. For a specified ultrametric phylogenetic tree, and given the gene family sizes in the extant species, CAFE uses a maximum likelihood stochastic birth-and-death process to model the rate and direction of change in gene family size (in number of gene births and deaths per gene per million years; symbolized λ) over the tree. CAFE was run on default parameters using a 14 species ultrametric tree built by grafting D. subobscura and D. guanche onto the 12 Drosophila tree used by Hahn et al.  at positions obtained from the TimeTree database (http://www.timetree.org/ ):
(((((((Dsim:2.1,Dsec:2.1):3.2,Dmel:5.3):5.9,(Dere:8.5,Dyak:8.5):2.7):42.1,Dana:53.3):2.3,((Dpse:1.4,Dper:1.4):13.1,(Dsub:3.1,Dgua:3.1):11.4):41.1):6.8,Dwil:62.4):0.8,((Dvir:32.7,Dmoj:32.7):4.3,Dgri:37):26.2); with branch lengths given in million years.
Model-fitting considered three nested likelihood models of gene family size evolution. The first model assumes a single global λG for all lineages. The second model allows for three λ to accommodate for fast- (λF ≥ 0.010), medium- (0.010 > λM > 0.002), and slow-evolving (λS ≤ 0.002) branches. Assignment of each branch to its corresponding λ category (i.e., λF, λM or λS) in this model was made a priori, based on the best results for a fully 26 λ-parameters model (i.e., one for each branch of the phylogeny), as in Hahn et al. . The third model is a five λ generalization of the second model to allow for the terminal branches leading to D. subobscura and D. guanche having their own rates (i.e., λDs and λDg, respectively). Estimates of λ obtained using this approach are sensitive to suboptimal genome assembly and/or annotation. Therefore, the obtained best-fitting model was refined by adding to it a term of error (ε) in genome quality. The effect of the error term on λ provides an indirect measure of genome assembly and/or annotation completeness . For each model, at least five CAFE runs were performed and those runs with the highest likelihood score per model were included. To meet the CAFE assumption that gene families must have been present at the root of the tree, only families found in at least one species of both the Sophophora and Drosophila subgenera, were considered. Both OrthoMCL and CAFE analyses were conducted considering only the longest splice forms.
Functional enrichment analyses of gene families uncovered to have been rapidly evolving along the terminal branch of this species by CAFE were carried out using the Blast2GO  implementation of the one-sided Fisher’s exact test, with false discovery rate (FDR) < 0.001. Enriched GO terms were summarized and visualized using the online version of REVIGO (http://revigo.irb.hr/ ). This tool identifies representative GO terms by semantic similarity.
Whole-genome synteny analysis
The genome of D. subobscura was analyzed for conservation of synteny against those of three increasingly distant species, namely D. guanche (dgua_R1.01 ), D. pseudoobscura (dpse_R3.04), and D. melanogaster (dmel_R6.22), using the Synteny Mapping and Analysis Program (SyMAP, Ver. 4.2. [138, 139]) tool on default options. SyMAP is a long-range whole-genome synteny mapping tool devised to accommodate for intervening micro-rearrangements which could result from misassembling, but also from real structural changes. Therefore, SyMAP seemed especially suited for investigating large, cytologically visible recent chromosomal rearrangement events that are the focus of the present study.
We thank Manuela Costa from the Cytometry Unit at Universitat Autonòma de Barcelona for her technical support and advice in flow cytometry measures.
This study was supported by the Spanish Ministerio de Ciencia e Innovación grant CGL2017-89160P; and Generalitat de Catalunya grant 2017SGR 1379 to the Grup de Genòmica, Bioinformàtica i Biologia Evolutiva, Universitat Autònoma de Barcelona (Spain). CK was supported by a PIF PhD fellowship from the Universitat Autònoma de Barcelona (Spain). Note that the funding agencies were not involved in the design of the study or in any aspect of the collection, analysis and interpretation of the data or paper writing.
Availability of data and materials
All the D. subobscura sequencing data obtained in this work are available in the European Nucleotide Archive (ENA) under the project ID PRJEB31081. Furthermore, all the scaffolds and the pseudomolecules, along with their annotations, are available in our local server in the form of a genome browser (http://dsubobscura.serveftp.com/).
CK, RT, FR-T conceived and designed the research. CK performed most of the analyses with support from RT and FR-T. VG-V carried out the flow cytometry measures with supervision from RT and FR-T. CK, RT, and FR-T wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Collin JE. Note: Drosophila subobscura sp. n. male, female. J Genet. 1936;33:60.Google Scholar
- Buzzati-Traverso AA, Scossiroli RE. The “obscura group” of the genus Drosophila. Adv Genet. 1955;7:47–92.PubMedView ArticleGoogle Scholar
- Maynard-Smith J. Fertility, mating behaviour and sexual selection in Drosophila subobscura. J Genet. 1956;54:261–79.View ArticleGoogle Scholar
- Holman L, Freckleton RP, Snook RR. What use is an infertile sperm? A comparative study of sperm-heteromorphic Drosophila. Evolution. 2007;62:374–85.PubMedView ArticleGoogle Scholar
- Fisher DN, Doff RJ, Price TA. True polyandry and pseudopolyandry: why does a monandrous fly remate? BMC Evol Biol. 2013;13:157.PubMedPubMed CentralView ArticleGoogle Scholar
- Philip D, Rendel JM, Spurway H, Haldane JBS. Genetics and karyology of Drosophila subobscura. Nature. 1944;154:260–2.View ArticleGoogle Scholar
- Rendel JM. Genetics and cytology of D. subobscura. II. Normal and selective matings in Drosophila subobscura. J Genet. 1945;46:287–302.View ArticleGoogle Scholar
- Ewing AW, Bennet-Clark HC. The courtship songs of Drosophila. Behaviour. 1968;31:288–301.View ArticleGoogle Scholar
- Krimbas CB. The inversion polymorphism of Drosophila subobscura. In: Krimbas CB, Powell JR, editors. Drosophila inversion polymorphism: Boca Raton: CRC Press; 1992. p. 127–220.Google Scholar
- Emmens CW. The morphology of the nucleus in the salivary glands of four species of Drosophila (D. melanogaster, D. immigrans, D. funebris and D. subobscura). Z Zellforsch u mikroskop Anal. 1937;26:1–20.View ArticleGoogle Scholar
- Mainx F, Koske T, Smital E. Untersuchungen über dier chromosomale Struktur europaischer Vertreter der Drosophila obscura Gruppe. Z Indukt Abstamm Vererbungsl. 1953;85:354–72.PubMedGoogle Scholar
- Kunze-Mühl E, Müller E. Weitere Untersuchungen über die chromosomale Struktur und die natürlichen Strukturtypen von Drosophila subobscura Coll. Chromosoma. 1958;9:559–70.PubMedView ArticleGoogle Scholar
- Loukas M, Krimbas CB, Mavragani-Tsipidou P, Kastritsis CD. Genetics of Drosophila subobscura populationsVIII. Allozyme loci and their chromosome maps. J Hered. 1979;70:17–26.PubMedView ArticleGoogle Scholar
- Moltó MD, De Frutos R, Martínez-Sebastián MJ. The banding pattern of polytene chromosomes of Drosophila guanche compared with that of D. subobscura. Genetica. 1987;75:55–70.PubMedView ArticleGoogle Scholar
- Papaceit M, Prevosti A. A photographic map of Drosophila madeirensis polytene chromosomes. J Hered. 1991;82:471–8.PubMedView ArticleGoogle Scholar
- Loukas M, Krimbas CB, Vergini Y. The genetics of Drosophila subobscura populations. IX. Studies on linkage disequilibrium in four natural populations. Genetics. 1979;93:497–523.PubMedPubMed CentralGoogle Scholar
- Santos J, Serra L, Solé E, Pascual M. FISH mapping of microsatellite loci from Drosophila subobscura and its comparison to related species. Chromosom Res. 2010;18:213–26.View ArticleGoogle Scholar
- Orengo DJ, Puerma E, Papaceit M, Segarra C, Aguadé M. Dense gene physical maps of the non-model species Drosophila subobscura. Chromosom Res. 2017;25:145–54.View ArticleGoogle Scholar
- Latorre A, Moya A, Ayala FJ. Evolution of mitochondrial DNA in Drosophila subobscura. Proc Natl Acad Sci U S A. 1986;83:8649–53.PubMedPubMed CentralView ArticleGoogle Scholar
- Christie JS, Picornell A, Moya A, Ramon MM, Castro JA. Mitochondrial DNA effects on fitness in Drosophila subobscura. Heredity. 2011;107:239–45.PubMedPubMed CentralView ArticleGoogle Scholar
- Kurbalija Novičić Z, Immonen E, Jelić M, AnÐelković M, Stamenković-Radak M, Arnqvist G. Within-population genetic effects of mtDNA on metabolic rate in Drosophila subobscura. J Evol Biol. 2015;28:338–46.PubMedView ArticleGoogle Scholar
- Monclús M. Distribución y ecología de drosofílidos en España. II. Especies de Drosophila de las Islas Canarias, con la descripcion de una nueva especie. Bol R Soc Esp His Nat Secc Biol. 1976;74:197–213.Google Scholar
- Monclús M. Drosophilidae of Madeira, with the description of Drosophila madeirensis-n. sp. Z Zool Syst Evolutionsforsch. 1984;22:94–103.View ArticleGoogle Scholar
- Taxodros: a taxonomic database of the Drosophilidae. Bächli G. University of Zürich. https://www.taxodros.uzh.ch/. Accessed 17 October 2018.
- Krimbas CB, Loukas M. Evolution of the obscura group Drosophila species. I. Salivary chromosomes and quantitative characters in D. subobscura and two closely related species. Heredity. 1984;53:469–82.View ArticleGoogle Scholar
- Rego C, Santos M, Matos M. Quantitative genetics of speciation: additive and non-additive genetic differentiation between Drosophila madeirensis and Drosophila subobscura. Genetica. 2007;131:167–74.PubMedView ArticleGoogle Scholar
- Herrig DK, Modrick AJ, Brud E, Llopart A. Introgression in the Drosophila subobscura–D. madeirensis sister species: evidence of gene flow in nuclear genes despite mitochondrial differentiation. Evolution. 2014;68:705–19.PubMedView ArticleGoogle Scholar
- Papaceit M, San Antonio J, Prevosti A. Genetic analysis of extra sex combs in the hybrids between Drosophila subobscura and D. madeirensis. Genetica. 1991;84:107–14.PubMedView ArticleGoogle Scholar
- Khadem M, Krimbas CB. Studies of the species barrier between Drosophila subobscura and D. madeirensis. III. How universal are the rules of speciation? Heredity. 1993;70:353–61.PubMedView ArticleGoogle Scholar
- Mittleman BE, Manzano-Winkler B, Hall JB, Korunes KL, Noor MAF. The large X-effect on secondary sexual characters and the genetics of variation in sex comb tooth number in Drosophila subobscura. Ecol Evol. 2017;7:533–40.PubMedView ArticleGoogle Scholar
- Powell JR. Progress and prospects in evolutionary biology: the Drosophila model. New York: Oxford University Press; 1997.Google Scholar
- Zivanovic G, Arenas C, Mestres F. Individual inversions or their combinations: which is the main selective target in a natural population of Drosophila subobscura? J Evol Biol. 2016;29:657–64.PubMedView ArticleGoogle Scholar
- Orengo DJ, Puerma E, Papaceit M, Segarra C, Aguadé M. A molecular perspective on a complex polymorphic inversion system with cytological evidence of multiply reused breakpoints. Heredity. 2015;114:610–08.PubMedPubMed CentralView ArticleGoogle Scholar
- Puerma E, Orengo DJ, Aguadé M. Multiple and diverse structural changes affect the breakpoint regions of polymorphic inversions across the Drosophila genus. Sci Rep. 2016;6:36248.PubMedPubMed CentralView ArticleGoogle Scholar
- Menozzi P, Krimbas CB. The inversion polymorphism of D. subobscura revisited: synthetic maps of gene arrangement frequencies and their interpretation. J Evol Biol. 1992;5:625–41.View ArticleGoogle Scholar
- Rodríguez-Trelles F, Alvarez G, Zapata C. Time-series analysis of seasonal changes of the O inversion polymorphism of Drosophila subobscura. Genetics. 1996;142:179–87.PubMedPubMed CentralGoogle Scholar
- Brncic D, Budnik M. Colonization of D. subobscura Collin in Chile. Dros Inform Serv. 1980;55:20.Google Scholar
- Beckenbach AT, Prevosti A. Colonization of North America by the European species, D. subobscura and D. ambigua. Am Midl Nat. 1986;115:10–8.View ArticleGoogle Scholar
- Pascual M, Aquadro CF, Soto V, Serra L. Microsatellite variation in colonizing and palearctic populations of Drosophila subobscura. Mol Biol Evol. 2001;18:731–40.PubMedView ArticleGoogle Scholar
- Prevosti A, Ribo G, Serra L, Aguade M, Balaña J, Monclus M, Mestres F. Colonization of America by Drosophila subobscura: experiment in natural populations that supports the adaptive role of chromosomal-inversion polymorphism. Proc Natl Acad Sci U S A. 1988;85:5597–600.PubMedPubMed CentralView ArticleGoogle Scholar
- Huey RB, Gilchrist GW, Carlson ML, Berrigan D, Serra L. Rapid evolution of a geographic cline in size in an introduced fly. Science. 2000;287:308–9.PubMedView ArticleGoogle Scholar
- Ayala FJ, Serra L, Prevosti A. A grand experiment in evolution: the Drosophila subobscura colonization of the Americas. Genome. 1989;31:246–55.View ArticleGoogle Scholar
- Jungen H. Abnormal sex ratio, linked with inverted gene sequence, in populations of D. subobscura from Tunisia. Dros Inform Serv. 1967;42:109.Google Scholar
- Verspoor RL, Smith JML, Mannion NML, Hurst GDD, Price TAR. Strong hybrid male incompatibilities impede the spread of a selfish chromosome between populations of a fly. Evol Lett. 2018;2:169–79.PubMedPubMed CentralView ArticleGoogle Scholar
- Rodríguez-Trelles F, Rodríguez MA. Rapid micro-evolution and loss of chromosomal diversity in Drosophila in response to climate warming. Evol Ecol. 1998;12:829–38.View ArticleGoogle Scholar
- Rodríguez-Trelles F, Rodríguez MA. Comment on ‘Global genetic change tracks global climate warming in Drosophila subobscura’. Science. 2007;315:1497.PubMedView ArticleGoogle Scholar
- Balanyá J, Oller JM, Huey RB, Gilchrist GW, Serra L. Global genetic change tracks global climate warming in Drosophila subobscura. Science. 2006;313:1773–5.PubMedView ArticleGoogle Scholar
- Balanyà J, Huey RB, Gilchrist GW, Serra L. The chromosomal polymorphism of Drosophila subobscura: a microevolutionary weapon to monitor global change. Heredity. 2009;103:364–7.PubMedView ArticleGoogle Scholar
- Rodríguez-Trelles F, Rodríguez MA. Measuring evolutionary responses to global warming: cautionary lessons from Drosophila. Insect Conserv Div. 2010;3:44–50.View ArticleGoogle Scholar
- Rezende EL, Balanyà J, Rodríguez-Trelles F, Rego C, Fragata I, Matos M, Serra L, Santos M. Climate change and chromosomal inversions in Drosophila subobscura. Clim Res. 2010;43:103–14.View ArticleGoogle Scholar
- Rodríguez-Trelles F, Tarrío R, Santos M. Genome-wide evolutionary response to a heat wave in Drosophila. Biol Lett. 2013;9:e20130228.View ArticleGoogle Scholar
- Rozas J, Aguadé M. Gene conversion is involved in the transfer of genetic information between naturally occurring inversions of Drosophila. Proc Natl Acad Sci U S A. 1994;91:11517–21.PubMedPubMed CentralView ArticleGoogle Scholar
- Betrán E, Rozas J, Navarro A, Barbadilla A. The estimation of the number and the length distribution of gene conversion tracts from population DNA sequence data. Genetics. 1997;146:89–99.PubMedPubMed CentralGoogle Scholar
- Munté A, Rozas J, Aguadé M, Segarra C. Chromosomal inversion polymorphism leads to extensive genetic structure: a multilocus survey in Drosophila subobscura. Genetics. 2005;169:1573–81.PubMedPubMed CentralView ArticleGoogle Scholar
- Puig Giribets M, García Guerreiro MP, Santos M, Ayala FJ, Tarrío R, Rodríguez-Trelles F. Chromosomal inversions promote genomic islands of concerted evolution of Hsp70 genes in the Drosophila subobscura species subgroup. Mol Ecol. 2019. https://doi.org/10.1111/mec.14511.
- Santos M, Céspedes W, Balanyà J, Trotta V, Calboli FC, Fontdevila A, Serra L. Temperature-related genetic changes in laboratory populations of Drosophila subobscura: evidence against simple climatic-based explanations for latitudinal clines. Am Nat. 2005;165:258–73.PubMedGoogle Scholar
- Fragata I, Lopes-Cunha M, Bárbaro M, Kellen B, Lima M, Santos MA, Faria GS, Santos M, Matos M, Simões P. How much can history constrain adaptive evolution? A real-time evolutionary approach of inversion polymorphisms in Drosophila subobscura. J Evol Biol. 2014;27:2727–38.PubMedView ArticleGoogle Scholar
- Sperlich D, Feuerbach-Mravlag H, Lange P, Michaelidis A, Pentzos-Daponte A. Genetic load and viability distribution in central and marginal populations of Drosophila subobscura. Genetics. 1977;86:835–48.PubMedPubMed CentralGoogle Scholar
- Puerma E, Orengo DJ, Cruz F, Gómez-Garrido J, Librado P, Salguero D, Papaceit M, Gut M, Segarra C, Alioto TS, et al. The high-quality genome sequence of the oceanic island endemic species Drosophila guanche reveals signals of adaptive evolution in genes related to flight and genome stability. Genome Biol Evol. 2018;10:1956–69.PubMedPubMed CentralView ArticleGoogle Scholar
- Brehm A, Krimbas CB. Evolution of the obscura group Drosophila species. Ill. Phylogenetic relationships in the subobscura cluster based on homologies of chromosome A. Heredity. 1990;65:269–75.PubMedView ArticleGoogle Scholar
- Ranz JM, Maurin D, Chan YS, von Grotthuss M, Hillier LW, Roote J, Ashburner M, Bergman CM. Principles of genome evolution in the Drosophila melanogaster species group. PLoS Biol. 2007;5:e152.PubMedPubMed CentralView ArticleGoogle Scholar
- Kirkpatrick M. How and why chromosome inversions evolve. PLoS Biol. 2010;8:e1000501.PubMedPubMed CentralView ArticleGoogle Scholar
- Dobzhansky T. Genetics of natural populations. XIV. A response of certain gene arrangements in the third chromosome of Drosophila pseudoobscura to natural selection. Genetics. 1947;32:142–60.PubMedPubMed CentralGoogle Scholar
- Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and speciation. Genetics. 2006;176:419–34.View ArticleGoogle Scholar
- Connallon T, Olito C, Dutoit L, Papoli H, Ruzicka F, Yong L. Local adaptation and the evolution of inversions on sex chromosomes and autosomes. Phil Trans R Soc B. 2018;373:20170423.PubMedView ArticleGoogle Scholar
- Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–4.PubMedPubMed CentralView ArticleGoogle Scholar
- Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–70.PubMedPubMed CentralView ArticleGoogle Scholar
- Hare EE, Johnston JS. Genome size determination using flow cytometry of propidium iodide-stained nuclei. Methods Mol Biol. 2011;772:3–12.PubMedView ArticleGoogle Scholar
- Nardon C, Deceliere G, Loevenbruck C, Weiss M, Vieira C, Biémont C. Is genome size influenced by colonization of new environments in dipteran species? Mol Ecol. 2005;14:869–78.PubMedView ArticleGoogle Scholar
- Animal Genome Size Database. Gregory TR. http://www.genomesize.com. Accessed 19 June 2018.
- Rhoads A, Au KF. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics. 2015;13:278–89.PubMedPubMed CentralView ArticleGoogle Scholar
- Chandler JA, Lang JM, Bhatnagar S, Eisen JA, Kopp A. Bacterial communities of diverse Drosophila species: ecological context of a host-microbe model system. PLoS Genet. 2011;7:e1002272.PubMedPubMed CentralView ArticleGoogle Scholar
- Boetzer M, Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinf. 2014;15:211.View ArticleGoogle Scholar
- English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC, et al. Mind the gap: upgrading genomes with Pacific biosciences RS long-read sequencing technology. PLoS One. 2012;7:e47768.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–18.PubMedView ArticleGoogle Scholar
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–95.PubMedView ArticleGoogle Scholar
- Steinemann M, Pinsker W, Sperlich D. Chromosome homologies within the Drosophila obscura group probed by in situ hybridization. Chromosoma. 1984;91:46–53.View ArticleGoogle Scholar
- Obbard DJ, Maclennan J, Kim KW, Rambaut A, O'Grady PM, Jiggins FM. Estimating divergence dates and substitution rates in the Drosophila phylogeny. Mol Biol Evol. 2012;29:3459–73.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–674.PubMedPubMed CentralView ArticleGoogle Scholar
- Mello B, Tao Q, Tamura K. Kumar. S. Fast and accurate estimates of divergence times from big data. Mol Biol Evol. 2017;34:45–50.PubMedView ArticleGoogle Scholar
- Ramos-Onsins S, Segarra C, Rozas J, Aguadé M. Molecular and chromosomal phylogeny in the obscura group of Drosophila inferred from sequences of the rp49 gene region. Mol Phylogenet Evol. 1998;9:33–41.PubMedView ArticleGoogle Scholar
- Gao JJ, Watabe HA, Aotsuka T, Pang JF, Zhang YP. Molecular phylogeny of the Drosophila obscura species group, with emphasis on the Old World species. BMC Evol Biol. 2007;7:87.PubMedPubMed CentralView ArticleGoogle Scholar
- Russo CAM, Mello B, Frazão A, Voloch CM. Phylogenetic analysis and a time tree for a large drosophilid data set (Diptera: Drosophilidae). Zool J Linnean Soc. 2013;169:765–75.View ArticleGoogle Scholar
- Pratdesaba R, Segarra C, Aguadé M. Inferring the demographic history of Drosophila subobscura from nucleotide variation at regions not affected by chromosomal inversions. Mol Ecol. 2015;24:1729–41.PubMedView ArticleGoogle Scholar
- Boore JL. Animal mitochondrial genomes. Nucleic Acids Res. 1999;27:1767–80.PubMedPubMed CentralView ArticleGoogle Scholar
- De Ré FC, Wallau GL, Robe LJ, Loreto EL. Characterization of the complete mitochondrial genome of flower-breeding Drosophila incompta (Diptera, Drosophilidae). Genetica. 2014;142:525–35.PubMedView ArticleGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.PubMedPubMed CentralView ArticleGoogle Scholar
- Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17:540–52.PubMedView ArticleGoogle Scholar
- Gregory TR, Johnston JS. Genome size diversity in the family Drosophilidae. Heredity. 2008;101:228–38.PubMedView ArticleGoogle Scholar
- Bachmann L, Raab M, Sperlich D. Satellite DNA and speciation: a species specific satellite DNA of Drosophila guanche. Z Zool Syst Evol-Forsch. 1989;27:84–93.View ArticleGoogle Scholar
- Miller WJ, Nagel A, Bachmann J, Bachmann L. Evolutionary dynamics of the SGM transposon family in the Drosophila obscura species group. Mol Biol Evol. 2000;17:1597–609.PubMedView ArticleGoogle Scholar
- Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013;30:1987–97.PubMedView ArticleGoogle Scholar
- Tanaka R, Higuchi T, Kohatsu S, Sato K, Yamamoto D. Optogenetic activation of the fruitless-labeled circuitry in Drosophila subobscura males induces mating motor acts. J Neurosci. 2017;37:11662–74.PubMedView ArticleGoogle Scholar
- Bhutkar A, Schaeffer SW, Russo SM, Xu M, Smith TF, Gelbart WM. Chromosomal rearrangement inferred from comparisons of 12 Drosophila genomes. Genetics. 2008;179:1657–80.PubMedPubMed CentralView ArticleGoogle Scholar
- Tesler G. GRIMM: genome rearrangements web server. Bioinformatics. 2002;18:492–3.PubMedView ArticleGoogle Scholar
- Puerma E, Orengo DJ, Aguadé M. Inversion evolutionary rates might limit the experimental identification of inversion breakpoints in non-model species. Sci Rep. 2017;7:17281.PubMedPubMed CentralView ArticleGoogle Scholar
- de Nascimento L, Willis KJ, Fernández-Palacios JM, Criado C, Whittaker RJ. The long-term ecology of the lost forest of La Laguna, Tenerife (Canary Islands). J Biogeogr. 2009;36:499–514.View ArticleGoogle Scholar
- Charlesworth B, Coyne JA, Barton NH. The relative rates of evolution of sex-chromosomes and autosomes. Am Nat. 1987;130:113–46.View ArticleGoogle Scholar
- Delprat A, Guillén Y, Ruiz A. Computational sequence analysis of inversion breakpoint regions in the cactophilic Drosophila mojavensis lineage. J Hered. 2019;110:102–17.PubMedView ArticleGoogle Scholar
- Zouros E, Krimbas CB, Tsakas S, Loukas M. Genic versus chromosomal variation in natural populations of D. subobscura. Genetics. 1974;78:1223–44.PubMedPubMed CentralGoogle Scholar
- Koske T, Maynard-Smith J. Genetics and cytology of Drosophila subobscura. X. The fifth linkage group. J Genet. 1954;52:521–41.View ArticleGoogle Scholar
- Lankinen P, Pinsker W. Allozyme constitution of two standard strains of Drosophila subobscura. Experientia. 1977;33:1301–2.View ArticleGoogle Scholar
- Chen H, Rangasamy M, Tan SY, Wang H, Siegfried BD. Evaluation of five methods for total DNA extraction from Western corn rootworm beetles. PLoS One. 2010;5:e11963.PubMedPubMed CentralView ArticleGoogle Scholar
- Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–36.PubMedPubMed CentralView ArticleGoogle Scholar
- Kamath GM, Shomorony I, Xia F, Courtade TA, Tse DN. HINGE: long-read assembly achieves optimal repeat resolution. Genome Res. 2017;27:747–56.PubMedPubMed CentralView ArticleGoogle Scholar
- Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O'Malley R, Figueroa-Balderas R, Morales-Cruz A, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13:1050–4.PubMedPubMed CentralView ArticleGoogle Scholar
- Xiao C-L, Chen Y, Xie S-Q, Chen K-N, Wang Y, Han Y, Luo F, Xie Z. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat Methods. 2017;14:1072–4.PubMedView ArticleGoogle Scholar
- Kent WJ. BLAT-the BLAST-like alignment tool. Genome Res. 2002;12:656–64.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.View ArticleGoogle Scholar
- Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11.PubMedPubMed CentralView ArticleGoogle Scholar
- Papaceit M, Aguadé M, Segarra C, Hey J. Chromosomal evolution of elements B and C in the Sophophora subgenus of Drosophila: evolutionary rate and polymorphism. Evolution. 2006;60:768–81.PubMedView ArticleGoogle Scholar
- Pinsker W, Sperlich D. Cytogenetic mapping of enzyme loci on chromosomes J and U of Drosophila subobscura. Genetics. 1984;108:913–26.PubMedPubMed CentralGoogle Scholar
- Böhm I, Pinsker W, Sperlich D. Cytogenetic mapping of marker genes on the chromosome elements C and E of Drosophila pseudoobscura and D. subobscura. Genetica. 1987;75:89–101.PubMedView ArticleGoogle Scholar
- Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinf. 2011;12:491.View ArticleGoogle Scholar
- Campbell MS, Holt C, Moore B, Yandell M. Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinformatics. 2014;48:4.11.1–39.Google Scholar
- Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013-2015; http://www.repeatmasker.org.
- Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PIW. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008;24:2938–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 2005;33:W465–7.PubMedPubMed CentralView ArticleGoogle Scholar
- Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33:6494–506.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr Protoc Bioinformatics. 2007;18:4.3.1–4.3.28.Google Scholar
- Gramates LS, Marygold SJ, Santos GD, Urbano JM, Antonazzo G, Matthews BB, Rey AJ, Tabone CJ, Crosby MA, Emmert DB, et al. FlyBase at 25: looking to the future. Nucleic Acids Res. 2017;45:D663–71.PubMedPubMed CentralView ArticleGoogle Scholar
- UniProt CT. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018;46:2699.View ArticleGoogle Scholar
- Eilbeck K, Lewis S, Mungall C, Yandell M, Stein L, Durbin R, Ashburner M. The sequence ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6:R44.PubMedPubMed CentralView ArticleGoogle Scholar
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004;32:D115–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–40.PubMedPubMed CentralView ArticleGoogle Scholar
- Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44:D279–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Chang HY, Dosztányi Z, El-Gebali S, Fraser M, et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 2017;45:D190–9.PubMedView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium Nat Genet. 2000;25:25–9.PubMedGoogle Scholar
- The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2017;45:D331–8.View ArticleGoogle Scholar
- Tate R, Hall B, DeRego T, Geib S. Annie: the ANNotation Information Extractor (Version 1.0). 2014; http://genomeannotation.github.io/annie.
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6.PubMedView ArticleGoogle Scholar
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2.PubMedView ArticleGoogle Scholar
- Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Pütz J, Middendorf M, Stadler PF. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol. 2013;69:313–9.PubMedView ArticleGoogle Scholar
- Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–89.PubMedPubMed CentralView ArticleGoogle Scholar
- Hahn MW, Han MV, Han S-G. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007;3:e197.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol. 2017;34:1812–9.PubMedView ArticleGoogle Scholar
- Supek F, Bosnjak M, Skunca N, Smuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One. 2011;6:e21800.PubMedPubMed CentralView ArticleGoogle Scholar
- Soderlund C, Nelson W, Shoemaker A, Paterson A. SyMAP: a system for discovering and viewing syntenic regions of FPC maps. Genome Res. 2006;16:1159–68.PubMedPubMed CentralView ArticleGoogle Scholar
- Soderlund C, Bomhoff M, Nelson WM. SyMAP v3.4: a turnkey synteny system with application to plant genomes. Nucleic Acids Res. 2011;39:e68.PubMedPubMed CentralView ArticleGoogle Scholar