Emergence of distinct syntenic density regimes is associated with early metazoan genomic transitions
BMC Genomics volume 23, Article number: 143 (2022)
Animal genomes are strikingly conserved in terms of local gene order (microsynteny). While some of these microsyntenies have been shown to be coregulated or to form gene regulatory blocks, the diversity of their genomic and regulatory properties across the metazoan tree of life remains largely unknown.
Our comparative analyses of 49 animal genomes reveal that the largest gains of synteny occurred in the last common ancestor of bilaterians and cnidarians and in that of bilaterians. Depending on their node of emergence, we further show that novel syntenic blocks are characterized by distinct functional compositions (Gene Ontology terms enrichment) and gene density properties, such as high, average and low gene density regimes. This is particularly pronounced among bilaterian novel microsyntenies, most of which fall into high gene density regime associated with higher gene coexpression levels. Conversely, a majority of vertebrate novel microsyntenies display a low gene density regime associated with lower gene coexpression levels.
Our study provides first evidence for evolutionary transitions between different modes of microsyntenic block regulation that coincide with key events of metazoan evolution. Moreover, the microsyntenic profiling strategy and interactive online application (Syntenic Density Browser, available at: http://synteny.csb.univie.ac.at/) we present here can be used to explore regulatory properties of microsyntenic blocks and predict their coexpression in a wide-range of animal genomes.
Local gene order has been conserved across animal phyla over vast evolutionary time spans, and is referred to as microsynteny [1,2,3,4,5]. However, little is known about the loss of ancestrally inherited microsyntenies and the emergence of novel microsyntenies, due to genome rearrangements, gene gains and/or gene losses [1, 6,7,8]. Determining the node of microsynteny loss or emergence can provide insights into the evolution of animal genome architecture and the extent of microsynteny conservation.
It still remains unclear whether microsynteny is conserved due to functional constraints (e.g., cis-regulatory constraints, topological organization) [9,10,11], or if it is simply a result of low recombination rates (i.e. without functional significance for gene regulation). Only few studies have addressed the possible functional roles of microsyntenic blocks across the whole genome. There is increasing evidence that some evolutionarily conserved pairs of adjacent genes are maintained as gene regulatory blocks (GRBs) because of cis-regulatory constraints, i.e., the regulatory regions of a target gene are located within a so-called bystander, an unrelated neighboring gene [6, 9, 10]. In human GRBs, the expression of bystander and target genes does not correlate . However, genes within microsyntenic blocks of at least three genes display a higher than expected coexpression in invertebrate genomes . The regulatory constraints on the expression of conserved gene pairs are well characterized in some species [6, 12], whereas those on microsyntenic blocks comprising three or more genes are largely lacking, due to both missing genomic information to identify such regions as well as missing functional genomic data.
Gene density has been suggested as a proxy for inferring the level of coexpression of genes, as a positive correlation between gene proximity and gene coexpression has also been reported [14, 15]. Previous studies have shown that microsyntenic blocks can possess a gene density diverging from the genome average in some species. For instance, a genome-wide study of vertebrate amniotes showed that the conserved syntenic regions exhibited a lower gene density than the rest of the genome . In contrast, the Lrk gene loci were shown to have a high gene density in wheat, barley, maize, and rice .
In this context, our objective was to conduct an exhaustive search among available animal genomes to profile the emergence of conserved microsyntenies across the animal tree of life and to investigate their genomic properties. We used comparative genomics approaches to determine the retention of gene density across orthologous microsyntenic blocks and, in conjuction with available developmental RNAseq data in a few phylogenetically dispersed animals, to study the impact of microsyntenic gene density on gene co-expression. Our results provide new insights into the evolution and functional significance of conserved microsyntenic genome architecture.
Results & discussion
Largest gain of microsynteny in the bilaterian and planulozoan ancestors
The increased taxonomic coverage of sequenced genomes allowed us to profile syntenic gains and losses across the metazoan tree. According to previous methods for synteny detection across distantly related animals [1, 2, 13], we define a microsyntenic block as a unit of three or more orthologous genes, each separated by up to five intervening genes (Supplementary Fig. 2A). As local order has been scrambled across the vast evolutionary distances [1, 18], we do not require the orthologs to exhibit conserved collinearity. Our focus on blocks of three or more orthologous genes allows for a dynamic capture of longer microsyntenic stretches and the threshold of five intervening genes minimizes the proportion of false positives .
Using the genomes of 49 metazoan species from 18 phyla (Supplementary Fig. 1) we reconstructed the ancestral complement of microsyntenic blocks throughout metazoan evolution (Fig. 1A). For every investigated node, we distinguished between inherited multi-species blocks (found in both the ingroup and the outgroup) and novel multi-species blocks (blocks found in the ingroup but not in the outgroup, see Supplementary Fig. 2B, and Methods). To assess false positive detection, we inferred microsyntenic blocks in three sets of shuffled genomes, i.e. genomes where the relative positions of genes have been randomly reshuffled (Methods).
We infer the largest gains of microsyntenic blocks to have occurred in the bilaterian last common ancestor (BLCA, 256 new blocks, only 45, 48 or 56 blocks found in shuffled genomes), and the planulozoan last common ancestor (PLCA, 162 new blocks, only 17, 19 or 25 in shuffled genomes) (Fig. 1B). In contrast, only 34 new blocks were found in the metazoan last common ancestor (MLCA, 0 in any shuffled genome). Thus, a substantial proportion (43%) of the microsyntenic blocks found in the BLCA emerged after the cnidarian-bilaterian split. Given the disputed position of the ctenophore and xenacoelomorpha phyla [21, 22, 24, 25], we calculated the amount of microsyntenic blocks novelties under various phylogenetic hypotheses. Regardless of the hypothesis, we obtained similar results regarding the counts of MLCA, PLCA and BLCA novel blocks (Supplementary Fig. 3). Accordingly, for every subsequent analysis, we used the set of blocks inferred using the multifurcating tree in Supplementary Fig. 1.
We next investigated the retention of microsyntenic blocks in extant species, defining a block to be lost if no species of a given clade possessed it in our analysis. The BLCA novelties have been largely retained among ambulacrarians and lophotrochozoans (Fig. 1A). Conversely, vertebrates lost 84% of the BLCA blocks, and ecdysozoans lost 50%. We observed medium to high percentages of lost BLCA novel blocks in acoel, tunicate, and cephalochordate (88, 95, and 51%, respectively). Since these loss rates are impacted by species sampling, the accuracy of these estimates will increase as more genomes become available for those clades. Nevertheless, the extensive loss of ancestral synteny in vertebrates can be explained by the whole genome duplications that were followed by paralog loss [26,27,28,29]. In ecdysozoans, this loss of synteny is explained by accelerated evolutionary rates involving a large number of chromosome fusions and rearrangements [30, 31] and extensive gene loss [32, 33].
It is expected that the number of genomic rearrangements between two taxa is proportional to the evolutionary time since common ancestry. In order to confirm that our estimations of the number of novel blocks were not biased by the timing of divergence, we correlated the inferred microsynteny block counts to phylogenetic node recency (a higher “recency” value means that the last common ancestor (LCA) is younger, Methods). In observed blocks, while the number of inherited blocks positively correlated with node recency (Pearson correlation coefficient 0.53, p < 0.05), the number of novel blocks did not (Fig. 1B), supporting the observation of extensive microsyntenic gains in the PLCA and BLCA. The positive correlation that we see between node recency and the associated number of inherited blocks in shuffled genomes can be explained by the increasing size of the outgroups, a larger outgroup leading to more blocks detected by chance. In most of the ancestors of the different metazoan lineages, the number of inherited blocks is several times lower in shuffled genomes. In contrast, the vertebrate LCA inherited only 70 blocks, less than the 110, 120 and 124 expected to occur by chance (Fig. 1B). The loss of microsyntenic blocks that followed the whole genome duplication events in vertebrates affected not only the BLCA novel microsyntenies, but even microsyntenic blocks inherited from older nodes, as exemplified by their loss of 91 and 85% of the MLCA and PLCA novelties, respectively (Fig. 1A).
Syntenic gains at various points in evolution were associated with different molecular functions. Among the 212 Gene Ontology (GO) terms enriched in the genes within the BLCA novel syntenies (Supplementary Table 1), some of the most significant ones were cell communication, signaling, and the establishment or maintenance of chromatin architecture (Methods, Supplementary Dataset 2). This contrasts with the 31 GO terms enriched in MLCA novel syntenies (Supplementary Table 3), mostly associated with the emergence of multicellularity, for example protein complex involved in cell adhesion, immune system process, and heterophilic cell-cell adhesion via plasma membrane cell adhesion molecules.
Taken together, our results demonstrate that the early evolutionary transitions towards bilaterally symmetrical animals  were associated with the emergence or higher retention of many microsyntenic blocks, as reflected in the substantial increase already in the planulozoan node (Fig. 1A). This is in strong contrast to the preceding phylogenetic transition and the emergence of multicellularity in the metazoan stem  which was associated with novel gene emergence [4, 36,37,38]. This suggests that different mechanisms of genomic innovation may underlie early animal evolutionary transitions.
When investigating the functional significance of gene order, it is important to also consider the three-dimensional structure of the chromatin. Topologically associating domains or TADs, for example, are regions of the genome that show a high degree of internal physical interaction, but little to no physical interaction with their neighboring regions . To date, much of the relationship between microsynteny and TADs remain unexplored, due to the lack of high-resolution HiC data for many invertebrate species, and only few syntenic regions have been described [11, 40]. Previous studies determined that bilaterians TAD formation is associated with CCCTC-binding factor (CTCF) proteins [39, 41]. The fact that CTCF protein could only be detected among bilaterians  suggests that the spatial organization of bilaterian chromatin differs from that of non-bilaterians. We hypothesize that distinct constraints on genome topology should be reflected in the syntenic signal. This is consistent with our observation that the evolutionary split between Cnidaria and Bilateria is associated with major changes of both the microsyntenic complement (Fig. 1A) and its genomic properties (Fig. 2). Accordingly, CTCF-dependent chromatin looping could be one of the forces underlying the maintenance of bilaterian microsyntenies , whereas the maintenance of PLCA novel blocks in extant cnidarians (devoid of CTCF) implies other constraints that predate the proposed emergence of CTCF.
Distinct gene density properties of microsyntenic blocks
Next, we sought to describe the genomic properties of MLCA, PLCA, and BLCA novel microsyntenies across metazoans. Not only has the proximity of two genes been suggested to facilitate their coexpression [14, 15], but the topological organization of genomes also correlates with differences in local gene density in some species, e.g., in Drosophila the density of TAD interior and boundaries were shown to exhibit low and high gene density, respectively . Therefore, gene density within microsyntenies may comprise a proxy measure for regulatory or topological properties. However, the comparison of block density across species is hampered by the differences in gene number and genome size, two parameters that are not tightly correlated in eukaryotes . In order to account for differences in density across multiple genomes, we normalized block density (defined as the total number of protein-coding genes within a microsyntenic block, including intervening genes, divided by the block length) by the whole-genome density (total number of coding genes divided by the assembly size) (see Methods). This normalization was used because the reciprocal of gene density increases linearly with the assembly size (Supplementary Fig. 4). We term this value the “normalized gene density” within microsyntenic blocks. In order to determine whether normalized gene density in microsyntenic blocks differed from the rest of the genome, we compared it to that of randomly sampled blocks, i.e. regions of the genomes sampled based on the parameters of the observed blocks (number of syntenic genes, number of intervening genes; see Methods). As errors in coding exon predictions may affect our gene density estimates, we complemented this measure of the “normalized gene density” with the median intergenic distance between consecutive genes of each block (Supplementary Fig. 7).
Our analysis showed that syntenic blocks can have higher or lower normalized gene density and median intergenic distances than those of random blocks. We respectively term the blocks as following a high or a low density regime.
A major force that affects gene density regimes is tandem duplication, which can create gene-dense regions [46, 47]. To account for this, we classified microsyntenies into paralogous (more than 40% of genes are paralogs) and non-paralogous (40% or less of genes are paralogs) blocks. This threshold was chosen because it classifies a given block as paralogous already when at least two or three paralogs are present. Indeed, the tendency of paralogous blocks to show higher gene density regime and a lower median intergenic distance than random blocks was observed across every block. This is particularly prominent in BLCA novel microsyntenies (Fig. 2A, Supplementary Fig. 7).
Conversely, the density dynamics in non-paralogous microsyntenic blocks is more diverse. The normalized gene density and median intergenic distance of the MLCA and PLCA novelties retained in extant animals do not differ from the density of randomly sampled blocks (Fig. 2A, Supplementary Fig. 7A). Most BLCA non-paralogous microsyntenic blocks retained in ecdysozoans and lophotrochozoans, on the other hand, follow a high density regime. Not only is their density higher than that of random blocks (p-value Wilcoxon rank-sum test < 0.05, Fig. 2A, Supplementary Fig. 5), the intergenic distance of consecutive syntenic genes is also generally lower (Supplementary Fig. 7A, p-value Wilcoxon rank-sum test < 0.05). In addition, most lophotrochozoan novel blocks also follow a high gene density regime (Fig. 2A, Supplementary Fig. 5). Interestingly, most non-paralogous blocks that emerged in the stem leading to the last common ancestor of jawed vertebrates displayed a low density (Fig. 2A, Supplementary Fig. 5), and high intergenic distance (Supplementary Fig. 7), suggesting most of these blocks follow a low gene density regime. This result is consistent with the observation that the blocks conserved within amniotes exhibit a lower gene density than the rest of the genome .
Our data can be used to provide thresholds for estimating whether any block follows a high or low density regime, without sampling random blocks. The lower and higher quartiles of normalized gene density of the random blocks of amphioxus (B. lanceolatum, 1.47 and 1.84, respectively) and scallop (M. yessoensis, 1.34 and 1.98, respectively), provide tentative thresholds for high and low density regimes in those two species. Accordingly, if any block has a normalized gene density below 1, it is likely to follow a low density regime, whereas any block with a normalized gene density higher than 2 is more likely to follow a high density regime. To further facilitate exploration of the density regimes of microsyntenic blocks, we developed a new browser http://synteny.csb.univie.ac.at/.
In summary, paralogous blocks are in most cases more closely packed together than randomly sampled blocks, consistent with the fact that regions that underwent tandem duplications are associated with a higher gene density [46, 47]. Interestingly, the density of non-paralogous blocks relative to the rest of the genome differs between taxa (Fig. 2A). While most MLCA and PLCA novelties display average gene density, most of the BLCA novelties retained in lophotrochozoans and ecdysozoans exhibit a high density regime. A majority of lophotrochozoan novel microsyntenic blocks follow a high density regime as well, while the majority of jawed vertebrates microsyntenic blocks follow a low density regime.
Maintenance of density regimes in orthologous blocks
We asked whether the observed difference in normalized gene densities is a result of contraction or expansion of microsyntenic blocks or from the preferential retention of blocks with high or low density.
In order to compare density regimes of orthologous blocks across taxa, we calculated the relative “change of normalized gene density” (CNGD, Methods). This value indicates whether blocks are likely to follow a low (CNGD << 0), high (CNGD > > 0) or average (CNGD ≈ 0) gene density regime. The comparison of all the possible taxon pairs revealed that the density regime of orthologous MLCA, PLCA and BLCA novel blocks were positively correlated in most cases (Fig. 2B, Supplementary Fig. 6), implying evolutionary conservation of gene density regimes.
As the density of most MLCA and PLCA novel blocks does not differ significantly from that of random blocks (Fig. 2A, Supplementary Fig. 5), yet correlates positively across taxa (Fig. 2B), we posit that these microsyntenic blocks could still experience selective pressures to maintain their density.
Since lophotrochozoans retained a particularly high number of the BLCA novel blocks (93%, Fig. 1B) they can be used as a reference to compare density regime conservation across all bilaterian clades. The majority of the lophotrochozoan BLCA novel blocks follow a high gene density regime (Fig. 2A, Supplementary Fig. 5). Moreover, lophotrochozoan BLCA blocks mostly follow the same density regimes as their orthologs in ambulacrarians (82.4% of BLCA blocks), vertebrates (14.8% of BLCA blocks) and ecdysozoans (44.1% of BLCA blocks) (Fig. 2B, C, Supplementary Fig. 8).
These results suggest that a substantial proportion of the microsyntenic blocks that emerged at the bilaterian stem followed a high density regime, and this ancestral regime was retained across extant taxa. In addition, most MLCA and PLCA blocks follow average genome density, a property conserved across taxa. The maintenance of density regimes across bilaterians suggests that most MLCA, PLCA and BLCA blocks did not undergo contraction or expansion in any lineages.
Gene density predicts coexpression properties of microsyntenic genes
We next aimed to determine the impact of different microsyntenic density regimes onto gene expression. Given the proposed link between gene density and coexpression [14, 15], we hypothesized that denser microsyntenic blocks would exhibit a higher coexpression of their genes. To test this hypothesis, we used transcriptomic data (developmental series or tissues) from two lophotrochozoans (Crassostrea gigas and Mizuhopecten yessoensis), two ambulacrarians (Strongylocentrotus purpuratus and Saccoglossus kowalevskii) and two jawed vertebrates (Mus musculus and Callorhinchus milii). In line with a previous study , we define the coexpression levels of genes within a syntenic block as the mean Spearman correlation coefficient among all pairs of expressed genes of the block (which we will refer to as “block coexpression”, see Methods, Supplementary Fig. 2D).
We then compared the coexpression of paralogous and non-paralogous blocks to that of randomly sampled blocks. We found that higher density blocks exhibit a higher coexpression than random blocks (Fig. 3, Supplementary Fig. 9). This suggests that the constraints to maintain the coexpression of genes drives the maintenance of high density blocks. Conversely, the block coexpression of vertebrate microsyntenic novelties, enriched in low density blocks, did not differ significantly from that of random blocks. As the expression of genes within vertebrate GRBs is not significantly correlated , this suggests that vertebrate microsyntenic novelties might be conserved more due to GRB [6, 9, 10, 12] rather than coexpression constraints.
We investigated gene density and coexpression properties of microsyntenies to test their predictability of coregulation in several well studied microsyntenies. We first examined the Wnt5–7 block, which is comprised of two conserved syntenic pairs (fbxl14-wnt5 and atxn10-wnt7, identified by ), as well as other known neighbors of wnt5 and/or wnt7 (e.g. erc1/2, cacna2d, [48, 49]). The Wnt5–7 block showed a normalized gene density significantly lower than that of randomly sampled blocks in most of the bilaterians we investigated (Fig. 4A, B, Supplementary Fig. 10A). Parsimony suggests that its gene density was low in the BLCA, and did not undergo contraction in any lineage. The fact that the reciprocal of gene density of the Wnt5–7 block scales at the same rate as the genome-wide average density, and is almost always of lower density than average (Fig. 4D) further supports this scenario. The presence of multiple functionally unrelated genes points to the absence of constraints on the coexpression of the block components. This is consistent with the hypothesis that wnt7 and atxn10 form a GRB : The bystander gene (atxn10) contains regulatory elements in its introns that target the transdev gene (wnt7), resulting in a cis-regulatory constraint driving the conservation of their microsyntenic association.
Even though the vast majority of blocks retained their ancestral density regime across the investigated metazoans, there is one notable exception: the Hox cluster. While the Hox cluster shows a lower density than random blocks in most bilaterians (particularly significant in lophotrochozoans and ecdysozoans, Fig. 4A, C, Supplementary Fig. 10B), its density is significantly higher than random in vertebrates. In addition, the reciprocal of gene density of most invertebrate Hox clusters scales linearly with assembly size, at a similar rate as the genome-wide average, even though the density of these clusters is generally lower than the genome-wide average (Fig. 4E). In vertebrates, however, the Hox cluster does not scale at the same rate as the genome-wide average (Fig. 4E), and most Hox clusters are at least twice as dense as this average. Consistent with previous observations [50, 51], our findings show that the Hox cluster was less dense than the rest of the genome in the BLCA, and contracted before the vertebrate LCA.
Apart from coexpression, gene density regimes might also reflect changes in genome topology. As noted above, anecdotal evidence suggest that high- and low-density regions can be respectively associated with the boundaries and the interior of TADs in Drosophila . Consistently, the gene-dense HoxD cluster of vertebrates is located at the interface of two TADs, while the low density amphioxus Hox cluster is located within a single TAD . However, in order to test whether high-density microsyntenies are associated with specific topological properties, future efforts should be directed towards the generation of high resolution Hi-C data in a broad spectrum of invertebrate species. To this end, our density regimes of microsyntenies may be useful as validators and predictors of the topological properties.
This study provides evidence that the evolutionary transition towards bilaterality in animals coincides with the emergence of a large set of microsyntenies, the majority of which are still retained in many extant cnidarian and bilaterian genomes. Our results suggest that microsyntenic blocks exist in different density regimes across animal genomes. In particular, syntenies that originated in the BLCA are generally associated with higher gene densities and display a higher level of gene coexpression than the remainder of the genome. While their topological organization remains to be investigated, this provides additional evidence for the proposed emergence of CTCF-driven topologically associating domain organization in bilaterians . It should however be noted that a substantial amount of PLCA novel microsyntenies have been maintained in cnidarians, which implies the existence of additional constraints underlying the maintenance of microsyntenic blocks. We also reveal that the constraints on gene density within blocks are heterogeneous depending on the node of emergence of the blocks and that gene density is correlated with microsyntenic gene coexpression. In contrast, coexpression does not seem to drive the maintenance of low density synteny, hinting towards other types of gene regulation [6, 28]. Apart from the known distinct regulatory regimes in Hox clusters, little data exist to speculate about potential fundamental functional differences in how vertebrate and invertebrate genomes are regulated, constrained, and organized. Microsyntenic profiling together with further regulatory genomic information on topological organization and gene regulation across metazoans will thus help to better understand the regulatory causes or consequences of the ancient syntenic transitions during animal evolution.
List of software and scripts
List of software used can be found in the Additional File 1. The repository containing all the details on commands, scripts, and datasets used in this study at the time of submission is provided in pdf format as Supplementary Dataset 1. An online version is also available at https://bitbucket.org/N_Robert/syntenic-density-and-transitions/.
Proteomes, annotations, and genomes of 38 species were downloaded from NCBI (Homo sapiens, Mus musculus, Danio rerio, Gallus gallus, Xenopus tropicalis, Callorhinchus milii, Acropora millepora, Ciona intestinalis, Latimeria chalumnae, Maylandia zebra, Lepisosteus oculatus, Chelonia mydas, Parasteatoda tepidariorum, Exaiptasia pallida, Hippocampus comes, Acanthaster planci), Ensembl Metazoa Release 45 (Adineta vaga, Amphimedon queenslandica, Anopheles gambiae, Branchiostoma lanceolatum, Caenorhabditis elegans, Capitella teleta, Crassostrea gigas, Daphnia pulex, Drosophila melanogaster, Helobdella robusta, Ixodes scapularis, Lingula anatina, Lottia gigantea, Nematostella vectensis, Strigamia maritima, Strongylocentrotus purpuratus, Tribolium castaneum, Trichoplax adhaerens, Salpingoeca rosetta, Capsaspora owczarzaki), or the National Human Genome Research Institute (Hydra vulgaris, Mnemiopsis leidyi). The Clytia hemisphaerica data was retrieved from http://marimba.obs-vlfr.fr. The Aurelia aurita genome, annotation and proteome were provided by the authors of . The genome of Hofstenia miamia and its annotation were retrieved from http://srivastavalab.rc.fas.harvard.edu. The genome, proteome, and annotation of Saccoglossus kowalevskii and Ptychodera flava were downloaded from OIST (https://groups.oist.jp/molgenu/hemichordate-genomes), the ones of Schmidtea mediterranea from planmine (http://planmine.mpi-cbg.de), and the ones of Hoilongia hongkongensis from https://bitbucket.org/molpalmuc/hoilungia-genome/src/master/tracks/. The CDS sequences and genomes of Pleurobrachia bachei and Sycon ciliatum were retrieved, respectively, from Neurobase (https://neurobase.rc.ufl.edu), and DataDryad (https://datadryad.org/resource/doi:10.5061). The mapping of the transcripts of these two species onto their genomes was done using gmap. The genome and annotation of Mizuhopecten yessoensis was provided by the authors of . Euprymna scolopes chromosome-level assembly and annotation was provided by the authors of Belcaid et al. . The proteins of the 49 aforementioned species were assigned to orthogroups using the Orthofinder v2.3 algorithm , based on the results of an all-against-all BLASTP with an e-value threshold of 0.001. In order to avoid erroneous ortholog inference due to ambiguous tree topologies, all the genes within the same orthogroup were considered to be orthologs for all subsequent analyses.
Inference of microsyntenic blocks and block emergence nodes
Syntenic blocks were inferred using in-house scripts, as described in [1, 2] (See Fig. 1A). These scripts can be found at https://github.com/nijibabulu/metazoan_synteny. All-against-all pairwise microsyntenic blocks were first built, requiring to comprise at least three orthologs, separated by no more than five intervening genes, but collinearity was not required. The minimum of three genes was chosen to increase sensitivity, as the use of scaffold-level assemblies do not allow the detection of large syntenic blocks. With a maximum of five intervening genes, the distribution of the number of intervening genes between consecutive orthologs in the block follows a power law . This also limits the number of false positives, as an increase of the allowed number of intervening genes results in an increase in the number of false positives . The pairwise syntenic blocks were then fused into multi-species blocks, if they shared at least three genes, or if there are more than three genes in the pairwise block, at least 50% of the orthologs. The multi-species microsyntenic blocks were then filtered according to the node where we inferred them to be present (Supplementary Fig. 2B). For each node, we define an ingroup as all the species descending from this node, and the outgroup as all the species outside of the ingroup. The children ingroups are one level above the nodes (e.g., acoel, deuterostomes and protostomes are the children clades of the LCA of Bilateria). We define a novel microsyntenic block as one found in two or more species of two or more children ingroup but not found in the outgroup (Supplementary Fig. 2B). An inherited microsyntenic block is a block found in at least two species of the ingroup and at least two species of the outgroup. In order to evaluate the number of blocks that we would detect by chance, we ran our microsynteny pipeline on three independent permutations of the genomes. We considered blocks to be lost in a taxonomic group if no species of the group was present in the multi species block. We used this method on the actual genomes and three iterations of shuffled genomes, where the relative position of genes on the genome was shuffled.
Gene density analysis
The gene density within a given block was calculated as the number of genes per base pairs of a block. The boundaries of a block are defined as the outermost coding bases of genes exhibiting conserved synteny. This measure was used for the sake of consistency across all included species, as the coordinates of untranslated regions of the mRNAs were absent of some of the genome annotations we used. The number of genes includes all the genes of which both start and stop codons are located within the block boundaries. The gene density per block was then normalized by the gene density within the whole genome (total number of genes divided by total number of base pairs of the assembly). We refer to this value as normalized gene density. Since the reciprocal of whole genome density increases linearly with the assembly size (Supplementary Fig. 4) this normalization allows us to group normalized gene density measures from animals with different genome sizes according to their clade (see Supplementary Fig. 1).
We also calculated the median intergenic distance of consecutive syntenic orthologs. For each block, every distance (in base pairs) between consecutive orthologs was calculated (distance between boundaries of coding regions of the two genes), and the median value of each block was retained.
Each block was defined as non paralogous or paralogous. We define a paralogous block as comprising more than 40% of genes belonging to the same orthogroup (the region underwent several events of segmental duplications). Non paralogous blocks are all the other blocks that do not satisfy this criteria.
For each observed block, 100 blocks were randomly sampled. Randomly sampled blocks were built using the methods described in . A detailed description of the methods can also be found in Supplementary Fig. 2C. Differences in the normalized gene density between the observed and randomly sampled blocks were assessed using a Wilcoxon rank-sum test and a threshold of significance of 0.05. For each multi species block found in at least two taxa, we also calculated the change of normalized gene density relative to random (CNGD), such as CNGD = (median(normalized_gene_densityobs) – median(normalized_gene_densityrand)) / median(normalized_gene_densityrand). Accordingly, if the median normalized gene density of a multi species block (median(normalized_gene_densityobs)) is twice as dense as that of their randomly sampled counterparts (median(normalized_gene_densityrand))), this will result in CGND = 1. If the observed blocks are of a density that is half of the random blocks, this will result in CNGD = − 1. If the median normalized gene density of observed and randomly sampled blocks is identical, then CNGD = 0.
Annotation of the genes of the Hox and Wnt5–7 clusters
If available, annotations of hox genes were recovered from the literature, and their orthologs in our taxonomic sample were identified by reciprocal BLAST searches and phylogenies. The hox queries were from Latimeria chalumnae , Homo sapiens, Mus musculus, Danio rerio, Callorhinchus milii, Chelonia mydas, Ciona intestinalis, Strongylocentrotus purpuratus [27, 56], Euprymna scolopes , Hofstenia miamia , Schmidtea mediterranea , Nematostella vectensis , Caehnorhabditis elegans , Clytia hemisphaerica , Parasteatoda tepidariorum , Anopheles gambiae, Drosophila melanogaster, Tribolium castaneum, Strigamia maritima, Ixodes scapularis , Capitella teleta, Lottia gigantea, Daphnia pulex , Saccoglossus kowalevskii, Ptychodera flava , Mizuhopecten yessoensis . Among the microsyntenic blocks identified by our pipeline, one block (multi-species block id 953, see Supplementary Dataset 3) comprised the following genes: wnt5, wnt7, fbxl14, atxn10, erc1/2, cacna1d, cacna2d ninj1/ninj2 and dcp1a/dcp1b. This block is referred to as the Wnt5–7 block throughout the manuscript. We identified the orthologs of the genes in all the species of our dataset by reciprocal BLAST searches and phylogenies. Only the Hox and Wnt5–7 clusters comprising at least three genes, separated by no more than five intervening genes were retained.
Functional annotation and GO term enrichment analysis
Annotation and GO annotation of the proteomes of all species was done using eggNOG-mapper version 2 . GO enrichment analysis was done using the GOAtools API . The GO enrichment was done without propagating GO counts to parent GOs, using a threshold of 0.05 for Benjamini-Hochberg corrected p-values. GO terms enriched in the MLCA novel blocks were required to be enriched in at least eight metazoan species, distributed in at least two metazoan ingroups (i.e., Porifera, Ctenophora, and Parahoxozoa). GO terms enriched in the PLCA novel blocks were required to be enriched in at least three cnidarians and eight bilaterian species. GO terms enriched in the BLCA novel blocks were required to be enriched in at least four protostomes and four deuterostomes species. We provide the tables listing the GO terms enriched in MLCA (Supplementary Table 1), PLCA (Supplementary Table 2) and BLCA (Supplementary Table 3) blocks.
Block coexpression analysis
The methods used for determining transcripts abundance in Strongylocentrotus purpuratus, Saccoglossus kowalevskii, Mizuhopecten yessoensis and Crassostrea gigas were described in Zieger et al. . In line with this study, the transcript abundances in Mus musculus  and Callorhinchus milii  tissues were quantified as TPMs (Transcript per Kilobase per Million of reads) using Kallisto  with default settings for paired end reads. For each block, we calculated the block co expression value as described in Zimmerman et al. . For each block comprising at least three expressed genes, we calculated the Spearman correlation of expression of every combination of expressed genes. To allow the averaging of correlations , the array of correlations of a given block was then transformed using the inverse hyperbolic tangent function (Fisher transformation), the mean of the Fisher-transformed correlation was then calculated, and this value was then transformed into a correlation using the hyperbolic tangent function (reverse Fisher transformation). In addition, if an untransformed correlation value was − 1 or 1, we transformed it into the next possible floating point value towards zero using the numpy.nextafter() function. This was done as the inverse hyperbolic tangent of 1 is infinite, and that of − 1 is negative infinite.
We introduce a novel browser containing our complete data set which allows users to investigate the syntenic density relationships of orthogroups with their genes of interest. In order to present a better visual comparison, we normalized the minimum base pair distances between each syntenic orthogroup pair of each multi-species block that we identified, by the total genome assembly size of each species. Our browser has four panels: while the first panel displays the boxplots of distance distributions for each taxonomic node, the second one displays the geometric densities of these; the third panel illustrates the orthogroups and the base pair distances between them in a network graph, and the fourth panel is a dictionary of all genes and their functions. Each panel is user-interactive and it is also possible to compare our observed values with the randomized cases. Users can first search for genes or annotations of their interest that exist in our dataset to find the multi-species block ID and the taxonomic group information, and then navigate to that block. Our browser is available at: http://synteny.csb.univie.ac.at/.
Availability of data and materials
The repository containing all the details on commands, scripts, and datasets used in this study at the time of submission is provided in pdf format as Supplementary Dataset 1. An online version is also available at https://bitbucket.org/N_Robert/syntenic-density-and-transitions/.
Gene Regulatory Block
Bilaterian Last Common Ancestor
Planulozoan Last Common Ancestor
Metazoan Last Common Ancestor
Topologically Associating Domain
Relative change of normalized gene density
Transcript per Kilobase per Million of reads
Simakov O, Marletaz F, Cho S-J, Edsinger-Gonzales E, Havlak P, Hellsten U, et al. Insights into bilaterian evolution from three spiralian genomes. Nature. 2013;493(7433):526–31.
Simakov O, Kawashima T, Marlétaz F, Jenkins J, Koyanagi R, Mitros T, et al. Hemichordate genomes and deuterostome origins. Nature. 2015;527(7579):459–65.
Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, et al. The Trichoplax genome and the nature of placozoans. Nature. 2008;454(7207):955–60.
Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, et al. Sea Anemone genome reveals ancestral Eumetazoan gene repertoire and genomic organization. Science. 2007;317(5834):86–94.
Putnam NH, Butts T, Ferrier DEK, Furlong RF, Hellsten U, Kawashima T, et al. The amphioxus genome and the evolution of the chordate karyotype. Nature. 2008;453(7198):1064–71.
Irimia M, Tena JJ, Alexis MS, Fernandez-Minan A, Maeso I, Bogdanovic O, et al. Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints. Genome Res. 2012;22(12):2356–67.
Belcaid M, Casaburi G, McAnulty SJ, Schmidbaur H, Suria AM, Moriano-Gutierrez S, et al. Symbiotic organs shaped by distinct modes of genome evolution in cephalopods. Proc Natl Acad Sci U S A. 2019;116(8):3030–5.
Zhao T, Zwaenepoel A, Xue J-Y, Kao S-M, Li Z, Schranz ME, et al. Whole-genome microsynteny-based phylogeny of angiosperms. Nat Commun. 2021;12(1):3498.
Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engstrom PG, Fredman D, et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 2007;17(5):545–55.
Engstrom PG, Ho Sui SJ, Drivenes O, Becker TS, Lenhard B. Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res. 2007;17(12):1898–908.
Acemel RD, Tena JJ, Irastorza-Azcarate I, Marlétaz F, Gómez-Marín C, de la Calle-Mustienes E, et al. A single three-dimensional chromatin compartment in amphioxus indicates a stepwise evolution of vertebrate Hox bimodal regulation. Nat Genet. 2016;48(3):336–41.
Akalin A, Fredman D, Arner E, Dong X, Bryne J, Suzuki H, et al. Transcriptional features of genomic regulatory blocks. Genome Biol. 2009;10(4):R38.
Zimmermann B, Robert NSM, Technau U, Simakov O. Ancient animal genome architecture reflects cell type identities. Nat Ecol Evol. 2019;3(9):1289–93.
Sun M, Zhang J. Chromosome-wide co-fluctuation of stochastic gene expression in mammalian cells. PLoS Genet. 2019;15(9):e1008389.
Ng YK, Wu W, Zhang L. Positive correlation between gene coexpression and positional clustering in the zebrafish genome. BMC Genomics. 2009;10(1):42.
Larkin DM, Pape G, Donthu R, Auvil L, Welge M, Lewin HA. Breakpoint regions and homologous synteny blocks in chromosomes have different evolutionary histories. Genome Res. 2009;19(5):770–7.
Feuillet C, Keller B. High gene density is conserved at syntenic loci of small and large grass genomes. Proc Natl Acad Sci U S A. 1999;96(14):8265–70.
Lv J, Havlak P, Putnam NH. Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes. BMC Bioinformatics. 2011;12(Suppl 9):S11.
Dunn CW, Giribet G, Edgecombe GD, Hejnol A. Animal phylogeny and its evolutionary implications. Annu Rev Ecol Evol Syst. 2014;45(1):371–95.
Telford MJ, Moroz LL, Halanych KM. A sisterly dispute. Nature. 2016;529(7586):286–7.
Philippe H, Poustka AJ, Chiodin M, Hoff KJ, Dessimoz C, Tomiczek B, et al. Mitigating anticipated effects of systematic errors supports sister-group relationship between Xenacoelomorpha and Ambulacraria. Curr Biol. 2019;29(11):1818–1826.e6.
Cannon JT, Vellutini BC, Smith J, Ronquist F, Jondelius U, Hejnol A. Xenacoelomorpha is the sister group to Nephrozoa. Nature. 2016;530(7588):89–93.
Dos Reis M, Thawornwattana Y, Angelis K, Telford MJ, Donoghue PCJ, Yang Z. Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Curr Biol. 2015;25(22):2939–50.
Whelan NV, Kocot KM, Moroz LL, Halanych KM. Ctenophore relationships and their placement as the sister group to all other animals. Nat Ecol Evol. 2017;1:1737–46.
Simion P, Philippe H, Baurain D, Jager M, Richter DJ, Franco AD, et al. A large and consistent Phylogenomic dataset supports sponges as the sister group to all other animals. Curr Biol. 2017;27(7):958–67.
Sacerdot C, Louis A, Bon C, Berthelot C, Roest CH. Chromosome evolution at the origin of the ancestral vertebrate genome. Genome Biol. 2018;19(1):166.
Kuraku S, Meyer A. Detection and phylogenetic assessment of conserved Synteny derived from whole genome duplications. In: Anisimova M, editor. Evolutionary genomics [internet]. Totowa, NJ: Humana Press; 2012. p. 385–95. Available from: http://link.springer.com/10.1007/978-1-61779-582-4_14.
Touceda-Suárez M, Kita EM, Acemel RD, Firbas PN, Magri MS, Naranjo S, et al. Ancient genomic regulatory blocks are a source for regulatory gene deserts in vertebrates after whole-genome duplications. Nowick K, editor. Mol Biol Evol. 2020;37(10):2857–64.
Cañestro C, Catchen JM, Rodríguez-Marí A, Yokoi H, Postlethwait JH. Consequences of lineage-specific gene loss on functional evolution of surviving paralogs: ALDH1A and retinoic acid signaling in vertebrate genomes. Gojobori T, editor. PLoS Genet. 2009;5(5):e1000496.
Blackmon H, Ross L, Bachtrog D. Sex determination, sex chromosomes, and karyotype evolution in insects. JHERED. 2017;108(1):78–93.
Raible F. Vertebrate-Type Intron-Rich Genes in the Marine Annelid Platynereis dumerilii. Science. 2005;310(5752):1325–6.
Fernández R, Gabaldón T. Gene gain and loss across the metazoan tree of life. Nat Ecol Evol. 2020;4(4):524–33.
Guijarro-Clarke C, Holland PWH, Paps J. Widespread patterns of gene loss in the evolution of the animal kingdom. Nat Ecol Evol. 2020;4(4):519–23.
Genikhovich G, Technau U. On the evolution of bilaterality. Development. 2017;144(19):3392–404.
Sebé-Pedrós A, Chomsky E, Pang K, Lara-Astiaso D, Gaiti F, Mukamel Z, et al. Early metazoan cell type diversity and the evolution of multicellular gene regulation. Nat Ecol Evol. 2018;2(7):1176–88.
Srivastava M, Simakov O, Chapman J, Fahey B, Gauthier MEA, Mitros T, et al. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature. 2010;466(7307):720–6.
Grau-Bove X, Torruella G, Donachie S, Suga H, Leonard G, Richards TA. Dynamics of genomic innovation in the unicellular ancestry of animals. eLife. 2017;6:e26036.
Paps J, Holland PWH. Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty. Nat Commun. 2018;9(1):1730.
Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–80.
Gómez-Marín C, Tena JJ, Acemel RD, López-Mayorga M, Naranjo S, de la Calle-Mustienes E, et al. Evolutionary comparison reveals that diverging CTCF sites are signatures of ancestral topological associating domains borders. Proc Natl Acad Sci U S A. 2015;112(24):7542–7.
Sexton T, Cavalli G. The role of chromosome domains in shaping the functional genome. Cell. 2015;160(6):1049–59.
Heger P, Marin B, Bartkuhn M, Schierenberg E, Wiehe T. The chromatin insulator CTCF and the emergence of metazoan diversity. Proc Natl Acad Sci U S A. 2012;109(43):17507–12.
Harmston N, Ing-Simmons E, Tan G, Perry M, Merkenschlager M, Lenhard B. Topologically associating domains are ancient features that coincide with metazoan clusters of extreme noncoding conservation. Nat Commun. 2017;8(1):441.
Hou C, Li L, Qin ZS, Corces VG. Gene density, transcription, and insulators contribute to the partition of the Drosophila genome into physical domains. Mol Cell. 2012;48(3):471–84.
Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302(5649):1401–4.
Zhang L, Lu HHS, Chung W, Yang J, Li W-H. Patterns of segmental duplication in the human genome. Mol Biol Evol. 2004;22(1):135–41.
Kono TJY, Brohammer AB, McGaugh SE, Hirsch CN. Tandem duplicate genes in maize are abundant and date to two distinct periods of time. G3. 2018;8(9):3049–58.
Kapasa M, Arhondakis S, Kossida S. Phylogenetic and regulatory region analysis of Wnt5 genes reveals conservation of a regulatory module with putative implication in pancreas development. Biol Direct. 2010;5(1):49.
Garriock RJ, Warkman AS, Meadows SM, D’Agostino S, Krieg PA. Census of vertebrate Wnt genes: isolation and developmental expression of Xenopus Wnt2, Wnt3, Wnt9a, Wnt9b, Wnt10a, and Wnt16. Dev Dyn. 2007;236(5):1249–58.
Duboule D. The rise and fall of Hox gene clusters. Development. 2007;134(14):2549–60.
Hoegg S, Boore JL, Kuehl JV, Meyer A. Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni. BMC Genomics. 2007;8(1):317.
Gold DA, Katsuki T, Li Y, Yan X, Regulski M, Ibberson D, et al. The genome of the jellyfish Aurelia and the evolution of animal complexity. Nat Ecol Evol. 2019;3(1):96–104.
Wang S, Zhang J, Jiao W, Li J, Xun X, Sun Y, et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat Ecol Evol. 2017;1(5):0120.
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20(1):238.
Amemiya CT, Alföldi J, Lee AP, Fan S, Philippe H, MacCallum I, et al. The African coelacanth genome provides insights into tetrapod evolution. Nature. 2013;496(7445):311–6.
Pascual-Anaya J, D’Aniello S, Kuratani S, Garcia-Fernàndez J. Evolution of Hox gene clusters in deuterostomes. BMC Dev Biol. 2013;13(1):26.
Brauchle M, Bilican A, Eyer C, Bailly X, Martínez P, Ladurner P, et al. Xenacoelomorpha survey reveals that all 11 animal Homeobox gene classes were present in the first Bilaterians. Genome Biol Evol. 2018;10(9):2205–17.
Currie KW, Brown DDR, Zhu S, Xu C, Voisin V, Bader GD, et al. HOX gene complement and expression in the planarian Schmidtea mediterranea. Evodevo. 2016;7(1):7.
DuBuc TQ, Ryan JF, Shinzato C, Satoh N, Martindale MQ. Coral comparative genomics reveal expanded Hox cluster in the cnidarian-Bilaterian ancestor. Integr Comp Biol. 2012;52(6):835–41.
Hench J, Henriksson J, Abou-Zied AM, Lüppert M, Dethlefsen J, Mukherjee K, et al. The Homeobox genes of Caenorhabditis elegans and insights into their Spatio-temporal expression dynamics during embryogenesis. PLOS ONE. 2015;10(5):e0126947.
Leclère L, Horin C, Chevalier S, Lapébie P, Dru P, Peron S, et al. The genome of the jellyfish Clytia hemisphaerica and the evolution of the cnidarian life-cycle. Nat Ecol Evol. 2019;3(5):801–10.
Leite DJ, Baudouin-Gonzalez L, Iwasaki-Yokozawa S, Lozano-Fernandez J, Turetzek N, Akiyama-Oda Y, et al. Homeobox gene duplication and divergence in arachnids. Mol Biol Evol. 2018;35(9):2240–53.
Pace RM, Grbić M, Nagy LM. Composition and genomic organization of arthropod Hox clusters. Evodevo. 2016;7(1):11.
Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, et al. Fast genome-wide functional annotation through Orthology assignment by eggNOG-mapper. Mol Biol Evol. 2017;34(8):2115–22.
Klopfenstein DV, Zhang L, Pedersen BS, Ramírez F, Warwick Vesztrocy A, Naldi A, et al. GOATOOLS: a Python library for gene ontology analyses. Sci Rep. 2018;8(1):10872.
Zieger E, Robert NSM, Calcino A, Wanninger A. Ancestral role of Ecdysis-related neuropeptides in animal life cycle transitions. Curr Biol. 2020:S0960982220315086.
Li B, Qing T, Zhu J, Wen Z, Yu Y, Fukumura R, et al. A comprehensive mouse transcriptomic BodyMap across 17 tissues by RNA-seq. Sci Rep. 2017;7(1):4200.
Peat JR, Ortega-Recalde O, Kardailsky O, Hore TA. The elephant shark methylome reveals conservation of epigenetic regulation across jawed vertebrates. F1000Research. 2017;6:526.
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525–7.
Alexander RA. A note on averaging correlations. Bull Psychon Soc. 1990;28(4):335–6.
The authors thank Elisabeth Zieger and Elizabeth Hambleton for critically reading an early version of this manuscript.
N.S.M.R., F.S., and O.S. were supported by the Austrian Science Fund grant P32190. The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
List of software used and Supplementary Figures 1 to 10.
Code and scripts from the repository.
GO enrichment tables.
Syntenic density database.
Summary of GO terms enriched in MLCA blocks
Summary of GO terms enriched in PLCA blocks.
Summary of GO terms enriched in BLCA blocks.
About this article
Cite this article
Robert, N.S.M., Sarigol, F., Zimmermann, B. et al. Emergence of distinct syntenic density regimes is associated with early metazoan genomic transitions. BMC Genomics 23, 143 (2022). https://doi.org/10.1186/s12864-022-08304-2