Towards the bridging of molecular genetics data across Xenopus species

Riadi, Gonzalo; Ossandón, Francisco; Larraín, Juan; Melo, Francisco

doi:10.1186/s12864-016-2440-9

Research article
Open access
Published: 01 March 2016

Towards the bridging of molecular genetics data across Xenopus species

Gonzalo Riadi^1,2,
Francisco Ossandón³,
Juan Larraín⁴ &
…
Francisco Melo¹

BMC Genomics volume 17, Article number: 161 (2016) Cite this article

1515 Accesses
4 Citations
Metrics details

Abstract

Background

The clawed African frog Xenopus laevis has been one of the main vertebrate models for studies in developmental biology. However, for genetic studies, Xenopus tropicalis has been the experimental model of choice because it shorter life cycle and due to a more tractable genome that does not result from genome duplication as in the case of X. laevis. Today, although still organized in a large number of scaffolds, nearly 85 % of X. tropicalis and 89 % of X. laevis genomes have been sequenced. There is expectation for a comparative physical map that can be used as a Rosetta Stone between X. laevis genetic studies and X. tropicalis genomic research.

Results

In this work, we have mapped using coarse-grained alignment the 18 chromosomes of X. laevis, release 9.1, on the 10 reference scaffolds representing the haploid genome of X. tropicalis, release 9.0. After validating the mapping with theoretical data, and estimating reference averages of genome sequence identity, 37 to 44 % between the two species, we have carried out a synteny analysis for 2,112 orthologous genes. We found that 99.6 % of genes are in the same organization.

Conclusions

Taken together, our results make possible to establish the correspondence between 62 and 65.5 % of both genomes, percentage of identity, synteny and automatic annotation of transcripts of both species, providing a new and more comprehensive tool for comparative analysis of these two species, by allowing to bridge molecular genetics data among them.

Background

African clawed frogs comprise more than twenty species of frogs native to Sub-Saharan Africa [1]. The most studied species in this genus are Xenopus laevis and more recently Xenopus tropicalis. Xenopus species have been an important model in cell biology, development, genetics and genomics. These species are an attractive model in these areas based on the ability to study embryos at all developmental stages, the presence of large eggs in abundant quantities throughout the year and the remarkable regenerative capacity in the tadpole. Xenopus research has set key principles in gene regulation and signal transduction, embryonic induction, morphogenesis and patterning as well as cell cycle regulation [2].

Historically, X. laevis has been considered one of the main animal models for developmental, cell, electrophysiology and biomedical studies [3–5]. However, this species presents a challenge for genomics analyses and genetics due to the allotetraploid nature of its genome and its long life cycle. The haploid genome of X. laevis has been sequenced to 89.21 % and consists of 18 chromosomes and 3.1Gbp (3.1x10⁹ bp). Current assembly of the X. laevis genome consists in 402,501 scaffolds in the Xenbase release 9.1 (XLA9.1) [6]. This release includes the identification of L (Long) and S (Short) chromosomes from the new nomenclature by Matsuda et. al. [7].

The X. laevis transcriptome counts with 45,099 primary transcript sequences. The annotation of the transcripts, in the current release, include the identification of the genes known to be duplicated, that belong to chromosomes L and S [8]. One limitation of X. laevis, however, has been the lack of systematic genetic studies to complement molecular and cell biology investigations. Work with the closely related diploid frog X. tropicalis has attempted to address this limitation [9].

X. tropicalis (also called Silurana tropicalis) is a diploid organism with 20 chromosomes and a 1.7Gbp long haploid genome. Currently, 84.81 % of the genome has been sequenced, consisting of 6,823 scaffolds in Xenbase release 9.0 (XTR9.0). The first and longest 10 scaffolds correspond to 74.88 % of contiguous sequences of the 10 haploid chromosomes in the X. tropicalis genome. This organism has 26,550 transcript sequences (XTR9.0). The easy molecular tractability of genomic features of X. tropicalis [9] has allowed integration of some genetic, biochemical, phenotypic and evolutionary data [10–14] in these two species. However, correspondence is not always expected between genomic data in X. tropicalis and the duplicated and divergent genome of X. laevis [15]. In the case there is correspondence, establishing it at a genome level is required. This cannot be done without a physical map between both genomes.

No comprehensive comparative analyses using genomic sequencing mapping have been conducted for X. laevis and X. tropicalis [16]. Aiming at facilitating such analysis, we have set out to build a comparative coarse-grained physical map between these two species. To this end, we aligned the 18 chromosomes from X. laevis assembly XLA9.1 to the 10 chromosomes from X. tropicalis assembly XTR9.0 and estimated percentage of sequence identity, repetitions, inversions and synteny of mapped genes between the two species. Finally, we validated the map theoretically through the synteny of Maximal Unique Matches (MUMs). As a whole, our results convey the suitability of this newly assembled map for comparative studies between these two species, bridging a long-standing gap for the integration of biochemical, genetic and genomics data in Xenopus.

Results

In this work we have performed a comparative analysis between the two frog genomes after mapping by a coarse-grain alignment method the chromosome sequences of X. laevis on the chromosome sequences from X. tropicalis and semi automatic annotation of their transcripts (Fig. 1) to complement the map information. The analyses include a validation of the map, estimations of percentage of sequence identity, repetitions, inversions and synteny between the two genomes.

The map

As X. laevis genome is around 1.8 times the length of X. tropicalis genome, 1.8 is also the expected rate of added lengths of the blocks aligned between the two species. This rate depends on the alignment drop-off score, X, chosen. A resulting rate larger than 1.8 suggests a loose alignment. On the other hand, a resulting rate smaller than 1.8 suggests a strict alignment. The drop-off score X = 35,000 rendered an average alignment length rate of 1.77, which is close to the expected rate (Table 1). However, the rate between the lengths of the chromosomes from X. laevis respect to X. tropicalis is 2.15, larger than expected.

Table 1 Summary of the coarse-grained map between 18 XLA9.1 chromosomes (L and S) on 10 XTR9.0 chromosomes. The length units are in blocks. Each block corresponds to a sequence of length 5 Kbp. Xtr (X. tropicalis); Xla (X. laevis); Chr (Chromosome)

Full size table

A coarse-grained dotplot alignment between X. laevis scaffolds and each X. tropicalis chromosome scaffold shows graphically part of the information in Table 1 (Fig. 2). Although the alignments seem to be contiguous, overall 27.1 % of X. tropicalis chromosomes did not align to X. laevis chromosomes. In supplement to this figure, the proportion of X. tropicalis chromosomes covered by X. laevis was 72.9 % (Table 1). This proportion, combined with the completion of 84.81 % of the X. tropicalis genome (Additional file 1), results that 61.8 % of X. tropicalis whole genome is actually aligned by X. laevis blocks. A similar coverage of 65.5 % was obtained for X. laevis chromosomes (Table 1).

Conservation between X. tropicalis and X. laevis

As the resulting alignment depends on the drop-off value used, we aligned all X. laevis scaffolds against all X. tropicalis chromosomes at 24 increasing drop-off score values (35,000-150,000 with a pace of 5,000) (Fig. 3). The block positions that appear with no conservation are either not aligned or have a score lower than 35,000, in which case cannot be distinguished from chance. The maximum drop-off score at which a pair of blocks can be aligned correlates directly with percentage of sequence identity between aligned sequences. However, as the variance of the percentage of sequence identity per drop-off score value is significant, the percentage of sequence identity cannot be reliably predicted from the drop-off score. In spite of this, the maximum drop-off score at which a pair of blocks is aligned can be used as a measure of conservation. From each chromosome, a histogram of maximum drop-off scores or conservation scores was generated and the coverage of alignment for each drop-off was calculated. The average maximum Cgaln drop-off score between the aligning zones of the genomes is 67,703.32 (Fig. 3). Possibly, the histogram of maximum drop-off scores shows a larger than expected proportion of conserved blocks with score of 150,000, as that bin accumulates all blocks with drop-off score 150,000 or higher. Chromosome 10 is the shortest chromosome, and the one that has the lowest average conservation (Fig. 3) and lowest alignment coverage (Table 1). In order, from highest to lowest average conservation we have X. tropicalis chromosomes: 4, 3, 1, 8, 6, 2, 9, 7 and 10 (averaging through all the chromosome sequence, including the non aligned regions). This chromosome conservation order changes to 8, 4, 3, 9, 1, 7, 6, 5 and 10 if the averaging only takes into account the aligning blocks.

Repetitions and inversions

As X. laevis genome is the result of whole genome duplication event, it is expected that 1.8 X. laevis blocks will align each X. tropicalis block. Therefore, a block of nucleotides cannot simply be regarded as a block that happens more than once in a genome. Three particular cases have to be taken in account: a block from X. tropicalis that aligns to X. laevis is considered a repeat when (i) it is an additional block to an already-aligned first block at one particular scaffold; (ii) it belongs to a third scaffold in addition to two previous aligned scaffolds or; (iii) it is a combination of the former two cases.

In this map, a total of 11.8Mbp from X. tropicalis are repeated in 26.6Mbp in the X. laevis aligned genome (Additional file 2). Inversions are identified only for colonies, i.e., with at least two consecutive aligning blocks [24]. For colonies, a previous check on the scaffold frame is made, as in Cgaln only the best out of the 6 reading frames of each X. laevis scaffold is aligned. An inversion is identified when Cgaln takes the plus frame of the X. laevis chromosome and a colony is aligned in reverse respect to the X. tropicalis chromosome. Because only colonies in reverse can be identified, the inversions counted are an underestimation of the total number of existing inversions. Taking into account this limitation, we estimated at least 64.6Mbp to be inverted between the two genomes. Inversions represent 7 % and 3 % of the aligned portion of X. tropicalis or X. laevis genomes, respectively (Table 2).

Table 2 Summary of repetitions (repeated blocks) and inversions in the coarse-grained map between 18 XLA9.1 chromosomes on 10 XTR9.0 chromosomes. Columns 2 to 5 are sub estimates of the number of repeated blocks from each genome that align on the other genome. Columns 6 to 8 are sub estimates of inversions between the genomes

Full size table

Validation of the map

In order to validate the map between X. laevis and X. tropicalis, we computed a set of common theoretical probes called Maximal Unique Matches (MUMs, see Methods) between the two genomes and compared their correlative order in the map. The MUMs generated were identical between species and 250 nt or longer.

The distribution of distances between the corresponding positions in the map for the MUMs gives a measure of how well the correspondence between the genomes was achieved. The generated list of MUMs has 1,140 sequences. From those, 1,092 were mapped on the ten X. tropicalis chromosomes and 695 were mapped on the X. laevis scaffolds; 673 MUMs, representing 59.0 % of the total, are common and mapped to both species. This number is less than expected as it is lower than the proportion of the X. laevis genome mapped. Additionally, 661, or 98.2 % of the mapped MUMs on X. laevis are at a distance of ≤5Kbp from the corresponding MUM in X. tropicalis. One block, or 5Kbp, is the resolution of the map. Therefore, we estimate that the correspondence between the two sets of scaffolds was achieved in 98.2 % of the map.

Application of the map: Conserved synteny and gene rearrangements

To calculate conserved synteny, a set of orthologous genes between two species is required. 7,910 orthologous genes were found through bidirectional-best-hit using blastn. A subset of these, 7,218 genes, map on the X. tropicalis 10 chromosomes.

Out of all X. laevis transcripts, only 9,269 map on X. tropicalis chromosomes (Table 3). From these, 2,112 are orthologous genes and present in at least pairs of consecutive orthologous genes mapped in the same X. laevis chromosome. This set was our orthologous genes sample for synteny estimation. We found that 2,105 orthologous genes, or 99.6 % of the sample, are syntenic between the two species.

Table 3 Distribution of XLA9.1 transcripts according to its mapping on XTR9.0 chromosomes assembly. A transcript is considered partially aligned if only one of the blocks, either the one including the start or the stop position, is aligned. A transcript does not align on X. tropicalis if neither of the blocks that include start or stop positions, is aligned

Full size table

Because the intergenic distance is one of the main determinants of order conservation [17], three distances were measured between pairs of orthologous genes (Fig. 4): 1) Distance between two consecutive genes in X. laevis; 2) distance between two consecutive genes in X. tropicalis and; 3) distance between X. laevis start block position projected on X. tropicalis and its orthologous gene start block.

The relative error of the distance between two consecutive genes in X. laevis respect to X. tropicalis was calculated with the first two distances. The mean relative error was 4.5 %. This means that regardless the absolute distance between two consecutive orthologous genes in X. tropicalis, the corresponding consecutive genes in X. laevis are, in average, ± 4.5 % of that distance apart. 71.1 % of the orthologous pairs of genes are in the corresponding block position according to the map. In the case of the distribution for the third measured distance, it was found that orthologous genes are mapped, in average 9Kbp, and that 95 % of the orthologous genes are at most 55Kbp apart. For comparison, the confidence interval of lengths, at 95 %, of Xenopus genes are between 5 and 15Kbp.

Percentage sequence identity between the two species

Based on the calculated mapping between the two species, and to assess more precisely the sequence conservation, a random sample containing 100Mbp of matching blocks were aligned by using the global Needleman-Wunsch and local Smith-Waterman dynamic programming algorithms. The aim was to estimate, respectively, upper and lower references of the sequence identity between the two Xenopus species.

For the two types of alignments, median percentage sequence identities are similar, both per chromosome and in total (Table 4). The distributions for global and local alignment overlap (Fig. 5). The medians are 40.9 and 43 %, respectively. In average, the percentage sequence identity shared by the two species ranges between 37.44, for global, and 44.08 %, for local alignments.

Table 4 Statistics of sequence identity between XLA9.1 and XTR9.0 genome assemblies. The sampling size of couples of aligned blocks between X. tropicalis and X. laevis was 20,000 (or 100Mbp) for all chromosomes

Full size table

Discussion

In this work we have used X. tropicalis first 10 scaffolds (XTR9.0) as reference for the coarse-grained mapping of the 18 largest X. laevis scaffolds (XLA9.1). Using this strategy, we were not only able to map the genes and calculate the conserved synteny of orthologs between these two species but also estimate the percentage of global identity, inversions and repetitions. Taken together, this newly assembled map represents a useful tool for the integration between biochemical, physiological, genetic and genomics data between X. laevis and X. tropicalis.