Systematic dissection of genomic features determining the vast diversity of conotoxins

Background Conus, a highly diverse species of venomous predators, has attracted significant attention in neuroscience and new drug development due to their rich collection of neuroactive peptides called conotoxins. Recent advancements in transcriptome, proteome, and genome analyses have facilitated the identification of conotoxins within Conus’ venom glands, providing insights into the genetic features and evolutionary patterns of conotoxin genes. However, the underlying mechanism behind the extraordinary hypervariability of conotoxins remains largely unknown. Results We analyzed the transcriptomes of 34 Conus species, examining various tissues such as the venom duct, venom bulb, and salivary gland, leading to the identification of conotoxin genes. Genetic variation analysis revealed that a subset of these genes (15.78% of the total) in Conus species underwent positive selection (Ka/Ks > 1, p < 0.01). Additionally, we reassembled and annotated the genome of C. betulinus, uncovering 221 conotoxin-encoding genes. These genes primarily consisted of three exons, with a significant portion showing high transcriptional activity in the venom ducts. Importantly, the flanking regions and adjacent introns of conotoxin genes exhibited a higher prevalence of transposon elements, suggesting their potential contribution to the extensive variability observed in conotoxins. Furthermore, we detected genome duplication in C. betulinus, which likely contributed to the expansion of conotoxin gene numbers. Interestingly, our study also provided evidence of introgression among Conus species, indicating that interspecies hybridization may have played a role in shaping the evolution of diverse conotoxin genes. Conclusions This study highlights the impact of adaptive evolution and introgressive hybridization on the genetic diversity of conotoxin genes and the evolution of Conus. We also propose a hypothesis suggesting that transposable elements might significantly contribute to the remarkable diversity observed in conotoxins. These findings not only enhance our understanding of peptide genetic diversity but also present a novel approach for peptide bioengineering. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09689-4.


Background
Marine cone snails, belonging to the genus Conus, comprise approximately 700 species, and they are typically classified into three main groups: vermivorous, molluscivorous, and piscivorous [1].These carnivorous predators utilize conotoxins, found in their milked venom, for prey hunting.Conotoxins are intricate combinations of small-molecule active polypeptides known as conopeptides, which generally consist of 30 to 200 amino acid residues and can adopt diverse disulfide structures [2,3].
The precursor of conopeptides typically consists of three domains: an N-terminal signal peptide, a propeptide, and a mature peptide located near the C-terminal.Among these domains, the signal peptide exhibits high conservation within the gene superfamily [4].In contrast, the mature peptides display significant variability among conopeptides.Analysis of mature peptides reveals an accelerated rate of nucleotide substitution and a predominance of nonsynonymous substitutions, suggesting that the targeted mutators in the mature peptide region and diversifying selection may account for the hypervariability observed in conopeptides [5].Similarly, C. bullatus demonstrates high structural diversity and a high single-nucleotide polymorphism (SNP) rate in conopeptides, supporting the hypothesis of diversifying selection in conopeptides [6].
Furthermore, targeted sequencing of venom genes from 32 Conus species has revealed a wide range of conotoxin gene copies, varying from 120 to 859.Notably, exons encoding the mature toxin region exhibit higher divergence, indicating positive selection acting on conotoxin genes [7].However, the precise factors influencing the genetic hypervariability of conotoxin genes remain unclear.
The genomes of four Conus species have been released, including C. tribblei [8], C. consors [9], C. betulinus [10], and C. ventricosus [11].However, the genomes of C. tribblei and C. consors suffer from severe fragmentation, which limits further analysis.On the other hand, the high-quality genome of C. betulinus provides valuable insights into the fundamental genetic principles governing conopeptides.Notably, it reveals a primary genetic relationship known as the "central dogma" of conopeptides, where the ratio of genes to transcripts to proteins to conopeptides is approximately 1:1:1:10.This observation suggests that post-translational modifications, such as alternative cleavage sites, highly variable N-and C-terminal truncations, and post-translational modifications, may play a significant role in generating the extensive diversity of conopeptides derived from a limited set of conotoxin genes [10,12].These findings significantly advance our understanding of conopeptide diversity at the translational and post-translational modification levels.
However, the genetic evolution of conotoxin genes remains a subject of ongoing investigation.Moreover, the lack of available genome annotation for the published C. betulinus genome hinders its comprehensive utilization in further analysis.Additionally, noncoding regions of the genome have been shown to contribute to gene diversity.Introns, which are prevalent in Metazoan genomes, facilitate frequent alternative splicing and promote the diversification of gene families through exon recombination [13].Similarly, transposon elements (TEs) play crucial roles in genome evolution and function, including genome mutations, rearrangements, and the generation of new genes [14].The chromosome-level genome of C. ventricosus suggests that conotoxin genes are located within repetitive regions, and a whole-genome duplication event has been identified [11].These findings indicate a potential association between genome features and the diversity of conotoxin genes.In summary, while extensive research has been conducted on conotoxins, the molecular mechanisms underlying the genetic diversity of conotoxin genes remain largely unresolved.Further investigation is warranted to explore the intricate relationships between genome features and the hypervariability observed in conotoxin genes.
In this study, we conducted transcriptome assembly for 34 Conus species to analyze the genetic evolution of conotoxin genes.However, due to the unavailability of publicly accessible genome annotation for the published C. betulinus genome, which limits comprehensive analysis of its genomic structure, we performed additional reassembly of the complete genome of C. betulinus using publicly available genome sequencing datasets from Peng et al. [10].Through this process, we annotated repetitive elements and protein-coding genes, enabling us to investigate whole genome duplication events, structural characteristics, expression patterns, and alternative splicing processes of conotoxin genes in C. betulinus.Moreover, we explored the presence of transposable elements in the flanking regions of conotoxin genes and in the introns adjacent to the highly variable mature-peptide coding sequences.Additionally, we assessed the occurrence of introgressive hybridization of conotoxin genes among various Conus species, utilizing publicly available conotoxin gene targeted sequencing datasets from Phuong et al. [7].

Transcriptome assembly and evaluation of 34 Conus species
RNA sequencing datasets of 34 Conus species were retrieved from NCBI (Table S1), and transcripts of each species were assembled.The number of unigenes among different species varied greatly, as shown in Table 1, ranging from 20,062 to 235,341.Similarly, N50 and completeness of transcripts in each species were also diverse in the range from 290 bp to 1,213 bp in N50 and 21.6-95.7% in completeness (percentage of complete and fragmented BUSCOs), respectively.In addition, there were 21 species of Conus for which the "complete and fragmented BUSCOs" were higher than 50%.

Identification and genetic variation of conotoxins in Conus
Conotoxins were predicted in each species of Conus.A total of 4,111 conotoxins were identified in 34 Conus species transcriptomes from published data (Table S1), and the number of identified conotoxins in each species varied widely, ranging from 34 to 282 (Table 2).Subsequently, homology clustering was performed on the total conotoxin genes, and 91 orthogroups were archived.Among these orthogroups, there were 22 orthogroups that each contained more than 80% of the species represented, including 4 orthogroups that contained all species represented.Genetic variation analysis of conotoxins within orthogroups showed that most conotoxins were under purifying selection (Ka/Ks < 1) (Fig. 1).However, positive selection (Ka/Ks > 1, total of 15.78%) and significant genetic variation were also observed in conotoxins.Positive: positive selection (Ka/Ks > 1).Purifying: purifying selection (Ka/Ks < 1).

Genome reassembly and annotation of C. betulinus
A total of 239.7 Gb of raw reads generated by PacBio Sequel were used to assemble the genome of C. betulinus and polished with cleaned Illumina sequencing reads, resulting in an assembly of 51,913 scaffolds with a total length of 2.67 Gb.The assembly genome results showed that the maximum and average scaffold lengths were 2.66 Mb and 51.5 kb, respectively, with an N50 length of 127 kb (Table 3).Meanwhile, in addition to simple repeats, LINEs (long interspersed nuclear elements, 9.71% of the genome) had the highest proportion and were the main types of transposon elements (Table S2).LTRs (long terminal repeats, 6.88% of the genome) were also abundant in the genome of C. betulinus.It has been reported that both LINEs and LTRs are retrotransposons [14].
Combined with de novo, homology, and RNA-seq methods, we finally predicted 24,308 protein-coding genes in C. betulinus.Overall, the transcription of 97.7% of the protein-coding genes was supported by the transcriptomes of multiple specimens.Moreover, BUSCO with the metazoa_odb10 database was employed to evaluate the completeness of predicted genes.It showed that the predicted genes contained 688 (72.1%) single-copy and 112 (11.7%) duplicated complete genes, as well as 49 (5.1%) fragmented genes.Compared with the published genome of C. betulinus [10], which contained 763 (78.0%) single-copy and 115 (11.8%) duplicated complete genes and 31 (3.17%) fragmented BUSCOs, genome completeness was similarly high in the present study.
Subsequently, as shown in Fig. 2A, the distribution of synonymous substitution rate (Ks) between paralog pairs in C. betulinus was calculated.And a second Ks peak was observed, suggesting similar divergence between paralogs after whole genome duplication (WGD).Furthermore, significantly conserved homologous gene blocks were identified among the scaffolds of the C. betulinus genome (Fig. 2B).Similarly, conserved homologous gene blocks among the pseudo-chromosomes of the C. ventricosus genome were also observed (Fig. 2C), for instance, between pseudo-chromosomes 1 and 2, and between pseudo-chromosomes 7 and 8.
(A) Distribution of synonymous substitution rate (Ks) between paralog pairs in C. betulinus.The presence of second Ks peak suggests the similar divergence between paralogs after WGD.(B) Conserved homologous gene blocks between scaffolds in C. betulinus derived from ortholog proteins.C. Conserved homologous gene blocks between chromosome-level scaffolds in C. ventricosus derived from ortholog proteins.

Expression characteristics of conotoxin genes in C. betulinus
In the present study, 221 conotoxin genes were identified in the reassembled C. betulinus genome.Whereas 133 conotoxin genes are identified in the published genome [10].Homology clustering results on these two gene sets showed that 117 and 43 conotoxin genes were uniquely identified in the reassembled and published genomes, respectively (Table S3).The identified conotoxin genes in the present study were classified into 12 known superfamilies, and the M-and O-superfamilies were the most  abundant.Meanwhile, 17 cysteine frameworks were also classified.However, 142 conotoxin genes were unclassified into the known superfamilies.The expression level (TPM) of those 221 conotoxin genes in the venom bulbs and venom ducts from different body lengths of C. betulinus, namely small, middle, and big, was calculated and used to profile the expression patterns of conotoxin genes.It showed that the expression level of conotoxin genes in venom ducts was dramatically higher than that in the venom bulb (Fig. 3A), and in venom ducts, 169 out of 221 (76.47%) conotoxin genes had an average TMP higher than 10.Meanwhile, although conotoxin genes were highly expressed in all venom duct tissues, the expression characteristics of conotoxin genes in the venom ducts of individuals with different body lengths were significantly diverse.As shown in Fig. 3B, the expression level (TPM) of most genes in the venom ducts of small and big individuals was relatively consistent, especially the non-conotoxin genes.However, the expression level of conotoxin genes in the venom ducts between individuals with different body lengths, including small, middle, and big, was significantly diverse (Fig. 3C and D).Additionally, alternative splicing has been observed in the transcription process of conotoxin genes among individuals with different body lengths, which may increase the diversity of conotoxins in C. betulinus.However, the frequency of alternative splicing of conotoxin genes between different individuals was limited, with only 24 conotoxin genes detected.

Structure features of conotoxin genes in C. betulinus
The distribution of the length of protein-coding genes in C. betulinus showed that the length of conotoxin and non-conotoxin genes was relatively similar; the average length of conotoxin and non-conotoxin genes was 12,014 bp and 14,063 bp, respectively (Fig. 4A).In addition, the statistical results of the exon number of protein-coding genes showed that the conotoxin genes were mainly composed of three exons, while the exon number of non-conotoxin genes showed greater fluctuation (Fig. 4B).Furthermore, the statistical results of the exon length of protein-coding genes in C. betulinus showed that the exon length of non-conotoxin genes was approximately double that of conotoxin genes, with the average exon length of conotoxin and non-conotoxin genes being 92 and 177 bp, respectively (Fig. 4C).It's worth noting that the intron length of conotoxin genes in C. betulinus was significantly longer than that of non-conotoxin genes (Fig. 4D); however, the effects of introns, such as providing mutational hotspots [15] or affecting gene size or structure expansion [16], on the diversity of conotoxin genes remained largely unknown.
In the present study, gene family expansion and contraction analysis of C. betulinus showed that two reverse transcriptases and a transposable element-derived protein had expanded rapidly.TE abundance in the genome of C. betulinus was counted with a sliding window of 20 kb.Combined with the distribution of conotoxin genes in the C. betulinus genome, we proposed that there may be a high density of TEs around conotoxin genes (Fig. 5A).Considering that the genome of C. betulinus assembled in the present study was still fragmented and might affect the statistical results, the chromosome-level genome of C. ventricosus published recently [11] was also included in the analysis.It showed that conotoxin genes in C. ventricosus tended to be distributed in regions with high TE density (Fig. 5B), suggesting that TEs may be related to the genetic diversity of conotoxin genes.
Subsequently, the content of TEs in the upstream and downstream flanking regions (100 kb) of protein-coding genes in the C. betulinus genome was analyzed.As shown in Fig. 6A, there was no significant difference in the content of TEs in the upstream flanking regions between conotoxin and non-conotoxin genes in C. betulinus.In contrast, the content of TEs in the downstream flanking regions of conotoxin genes was significantly higher than that in non-conotoxin genes (Fig. 6B, Table S4).Considering that the genome of C. betulinus assembled in the present study was still fragmented, the analysis was also carried out on the chromosome-level genome of C. ventricosus.It showed that the content of TEs in both the upstream and downstream flanking regions of conotoxin genes was significantly higher than that in non-conotoxin genes in C. ventricosus (Fig. 6C and D, Table S5).Moreover, type I retrotransposons (LINE and LTR) and type II DNA transposons are the main TEs in the flanking regions (both upstream and downstream) of conotoxin genes (Fig. 6E F, Table S5).Furthermore, the downstream flanking region of the conotoxin gene had more TEs than the upstream region.In addition, the Gypsy superfamily of LTR retrotransposons has a higher proportion in both  S4 and Table S5).
As shown in Fig. 4D, conotoxin genes have longer intron structures compared to those of non-conotoxin genes in C. betulinus.The content of TEs in introns adjacent to the highly variable mature peptide of conotoxins was significantly higher than that of introns adjacent to the conserved pro-peptide (Fig. 7A and B, Table S6).Like the main types of TEs in the flanking regions (upstream and downstream) of conotoxin genes, the content of the Gypsy and unclassified families that belong to LTRs in introns adjacent to the mature peptide was also markedly higher than that of introns adjacent to the pro-peptide (Fig. 7C and D, Table S6).However, there was no significant difference between mature peptide and pro-peptide for the retrotransposon of LINEs (data not shown).

Discussion
In this study, we conducted transcriptome assembly for 34 Conus species.However, some species exhibited lower completeness of BUSCOs, which could be attributed to variations in sequencing depth coverage or the tissue types used for sequencing (Table S1).Additionally, the detection of conotoxins varied significantly among the different Conus species (Table 2), consistent with previous findings [17][18][19].In C. quercinus, significant differences in the classes of conotoxin gene superfamilies between the venom duct, venom bulb, and salivary gland were observed, and the transcript activity of conotoxins was lower in both the venom bulb and salivary gland, suggesting that the venom duct is the primary site of conotoxin production [20].Moreover, significant variations in conotoxins were identified among different individuals of C. magus, highlighting the highly diverse nature of conotoxins [21].In the present study, it also showed that the expression characteristics of conotoxin genes in the venom ducts of C. betulinus with different body lengths were significantly diverse, suggesting that the different transcriptional expression or regulation patterns of conotoxin genes during the developmental phases may be one of the factors causing the diversification of conotoxins.Furthermore, we assessed Ka (nonsynonymous nucleotide substitutions) and Ks (synonymous nucleotide substitutions) values for conotoxin genes across the 34 Conus species.These parameters play a crucial role in molecular evolutionary analysis, with Ka/Ks > 1, Ka/Ks = 1, and Ka/Ks < 1 generally indicating positive selection, neutral mutation, and purifying selection, respectively [22].In our study, we found that certain conotoxin genes from the 34 Conus species exhibited Ka/ Ks values > 1 (p < 0.01, Fig. 1), indicating positive selection acting on these genes.This finding aligns with the hypervariability observed in conopeptides [5].Positive selection facilitates the spread of advantageous mutations, while purifying selection prevents the propagation of detrimental mutations [23].Therefore, positive selection may contribute to the genetic diversity of conotoxin genes.Likewise, the exons coding the mature peptide of conotoxins exhibited approximately three times higher divergence than their flanking non-coding regions [7].
Conotoxin genes in Conus typically consist of 1-6 exons [7].Our findings revealed that conotoxin genes in C. betulinus exhibited a range of exon numbers from 1 to 5, with a predominant composition of 3 exons, similar to C. ventricosus [11].However, it is important to note that these three exons may not align precisely with the three structural domains of conopeptides, namely the signal peptide, pro-peptide, and mature peptide [11].Interestingly, we observed a longer intron structure in conotoxin genes of C. betulinus compared to non-conotoxin genes.It has been observed that Metazoa genomes are enriched with introns, which can provide additional binding sites for transcriptional regulatory elements and facilitate gene diversification through exon recombination [13].In fungal mitochondria, self-splicing introns have been implicated in increasing the genetic diversity of exons flanking them, suggesting that intron mobility directly influences host gene diversity [15].Moreover, genes expressed abundantly in the nervous system often exhibit intron and gene size expansion, implying that the unique attributes of neurons may facilitate the evolution of neuronal genes [16].Consequently, it is worth investigating whether introns in conotoxin genes influence the diversity of conotoxins.
Our results indicated a rapid expansion of two reverse transcriptases and one transposable element-derived protein in C. betulinus.Reverse transcriptase is a key enzymatic domain found in all autonomous retrotransposons, as it catalyzes the process of reverse transcription effectively [24].Class I retrotransposons, one of the major classes of transposable elements (TEs), rely on the activity of reverse transcriptases and integrases [25].Interestingly, in concurrence with the expansion of TE-related gene families, TEs were found to be highly prevalent in introns adjacent to the hypervariable mature peptide of conotoxins.This suggests that TE hotspots in these specific introns may contribute to the high variability observed in the mature peptide of conotoxins.Similarly, conotoxin genes in C. ventricosus are typically found in regions that harbor Class I retrotransposons (Gypsy, Penelope, etc.) and Class II DNA transposons (Tc1-Mariner, etc.) [11].Consistent with previous studies, our findings indicated that the content of TEs in the flanking regions (upstream and downstream) of conotoxin genes in both C. betulinus and C. ventricosus was significantly higher compared to non-conotoxin genes (Table S4 and Table S5).Notably, the proportion of the Gypsy superfamily in the flanking regions (combined with upstream and downstream) of conotoxin genes was particularly prominent (9.45% of major TEs in C. betulinus and of that, 15.18% in C. ventricosus, Table S4 and Table S5), and it belonged to the LTRs of Class I retrotransposons.Given the critical roles of TEs in genome evolution and function, including genome mutations, rearrangements, and the promotion of new gene formation [14], we hypothesize that TE hotspots in crucial regions may be associated with the high diversity and hypervariability observed in conopeptides.
Furthermore, extensive research has revealed that gene flow between genetically distinct populations is a common occurrence in nature.In fact, it has been observed that introgressive hybridization between species can confer selective advantages to the receiving population, serving as a driving force behind the evolution of adaptive phenotypes [26].The genus Conus, estimated to comprise approximately 700 species [1], has undergone rapid speciation through adaptive radiation, which likely promotes the occurrence of introgressive hybridization among different Conus lineages [27].This notion is supported by studies documenting hybridization and introgression in various genetic regions, including mitochondrial genomes, nuclear gene regions, and conotoxin loci, within several Virroconus species.These findings strongly suggest that introgressive hybridization plays a significant role in the adaptive radiation of Conidae [28].Similarly, our study provides clear evidence of introgressive hybridization involving conotoxin genes among multiple Conus species, indicating that introgressive hybridization is a frequent phenomenon within the genus Conus.This phenomenon is likely a contributing factor to the observed genetic diversity in conotoxin genes.

Conclusions
Our study provides valuable insights into the genetic diversity and evolution of conotoxin genes in Conus species.We observed species-specific variations, evidence of positive selection, and higher divergence in conotoxin coding regions.Notably, our investigation uncovered transposable element hotspots in the flanking regions (both upstream and downstream) of conotoxin genes, as well as in the introns adjacent to the highly diverse mature peptide of conotoxins.It implies that these transposable element-rich regions play a crucial role in driving the extensive diversity observed in conopeptides.Additionally, our study detected robust signals of introgressive hybridization involving conotoxin genes across numerous species of Conus, highlighting the significant impact of introgressive hybridization on the genetic diversity of conotoxin genes and the overall evolution of the Conus genus.These findings contribute to our understanding of the molecular mechanisms underlying the hypervariability of conotoxin genes.

Prediction and identification of conotoxins in Conus transcriptomes
Mapping of CDS sequences of 34 Conus species against the reference conotoxin peptides from ConoServer database [34] was performed by using Diamond-v0.9.22 [35] with as parameters: "-p 30 -k 10 -e 1e-5".Also, the Hidden Markov Model (HMM) method was used to identify the conotoxin candidates in each species of Conus.A profile HMM from previous research [36] was used as reference, and prediction of conotoxins was performed by HMMER-v3.1b2[37] with as parameters: "-E 1e-5 --domE 1e-5".All candidate conotoxins predicted from homology and HMM search in each species of Conus were merged and subjected to redundancy removal.Furthermore, non-redundant candidate conotoxins were searched against the non-redundant protein sequences database (NR) using Diamond-v0.9.22, and non-conotoxins were filtered by manual curation.Finally, domains in each conotoxins were identified by ConoPrec [34].

Calculation of Ka/Ks in conotoxins
OrthoFinder-v2.2.1 [38] with the parameters: "-S diamond -og -t 30 -a 30" was used to infer orthogroup of the predicted conotoxins from 34 Conus species.And combined with the corresponding CDS of each conotoxins, ParaAT-v2.0[39] was used to calculate the Ka/ Ks within each of the conotoxins orthogroup pairs with as parameters: "-p proc -m mafft -f axt -g -t -k", followed by filtering with the threshold of p-value (Fisher) < 0.01.The visualization of Ka/Ks results was performed using ggplot2 [40].

Genome reassembly of C. betulinus
Whole genome sequencing datasets of C. betulinus from recently published research [10], including Illumina and PacBio sequencing platform, were obtained from NCBI (PRJNA578609).For Illumina sequencing datasets, adapters and low quality bases of reads were filtered using Trimmomatic-v0.36and checked with FastQC, followed by removing duplication by using Nubeam-dedup [41] with default parameters.In addition, 1 million reads in each sequencing dataset were randomly selected using seqtk (https://github.com/lh3/seqtk),and against the non-redundant nucleotide database (NT) using blastn [42] to make taxonomic assignments.A homemade Perl script was used to summarize the taxa of reads and analyze the potential contamination organisms, followed by removing contaminated reads using bbmap-v38.08[43] with default parameters.For PacBio sequencing datasets, combined with the pretreatment of Illumina sequencing reads, FMLRC2-v0.1.3[44] was used to perform the error correction of reads with as parameters: "-t 40 -C 10".50,000 corrected reads in each dataset were randomly selected by a homemade Perl script, and blastn was used against the NT database.Contaminated organisms were summarized, followed by removal using minimap2-v2.11[45] with default parameters.
Conotoxin genes in C. betulinus were identified as described above.Additionally, gene structures, especially the coordinates of exons of conotoxin genes, were manually checked and revised using Exonerate [59].Finally, completeness evaluation of all protein-coding genes was performed using BUSCO-v5.4.7 [32] using metazoan_odb10.

Structure and expression of conotoxin genes in C. betulinus
Gene full length, exon and intron length, and exon number in conotoxin and non-conotoxin genes were summarized using a homemade Perl script.Visualization of statistical results was performed using ggplot2 [40].
Transcriptomes of C. betulinus with multiple specimens and tissues were obtained from NCBI (PRJNA290540) [36].Three of the samples are from venom duct tissue of specimens with different body lengths, namely small, middle, and big; another sample is from venom duct that mixed with these tissues; and the last one is from venom bulb tissue of the middle specimen.Quality control of sequencing reads was performed as described above.Clean reads of each sample were mapped into genome of C. betulinus using HISAT2-v2.1.0[67] with default parameters, respectively.Subsequently, paired read counts were quantified using featureCounts-v1.6.2 [68], and TPM (Transcripts Per Kilobase Million) method was used to normalize and calculate the expression of genes using a homemade Perl script.The TPM values of conotoxin genes were used to perform hierarchical clustering using pheatmap with z-score normalization and to compare the expression of conotoxin genes between different specimens using ggtern [69].Meanwhile, alternative splicing between different specimens and tissues was performed using LeafCutter-v0.2 [70] and filtered with p < 0.05 for significantly alternative splicing sites, followed by visualization using ggsashimi [71].

Fig. 2
Fig. 2 Whole genome duplication (WGD) and conserved homologous gene blocks analysis of C. betulinus and C. ventricosus

Fig. 3
Fig. 3 Expression characteristics of conotoxin genes in different tissues and specimens of C. betulinus.A. Expression (TPM, transcripts per kilobase million) of conotoxin genes in different tissues (venom bulb and venom ducts) of C. betulinus.TPM was normalized by z-score.B. Expression of conotoxin and nonconotoxin genes in venom ducts between different body length individuals (small and big) of C. betulinus.Red color: conotoxin genes, blue color: nonconotoxin genes.C -D. Expression of conotoxin genes in venom ducts between different body length individuals (small, middle and big) of C. betulinus

Fig. 4
Fig. 4 Structure features of protein-coding genes in C. betulinus.A. Distribution of protein-coding genes length.B. Statistics of exon numbers in proteincoding genes.C. Distribution of exon length.D. Distribution of intron length.Red color: conotoxin genes.Blue color: non-conotoxin genes

Fig. 6 Fig. 5
Fig. 6 Content of TEs in flanking regions (100 kb) of protein-coding genes in C. betulinus and C. ventricosus.A -B. Content of TEs in the upstream and downstream flanking regions of protein-coding genes in C. betulinus.C -D. Content of TEs in the upstream and downstream flanking regions of protein-coding genes in C. ventricosus.E -F.Abundance of different type of TEs in the upstream and downstream flanking regions of conotoxin genes in C. ventricosus.Significant differences were performed by Wilcoxon method, and indicated at p < 0.01 (**) or p < 1e-5 (****)

Fig. 7
Fig. 7 Content of TEs in introns of conotoxin genes in C. betulinus.A -B. Content of TEs in introns that adjacent to the conotoxin pro-and mature peptide.C -D. Content of LTRs in introns that adjacent to the conotoxin pro-and mature peptide.Pro: pro-peptide of conopeptides.Mature: mature peptide of conopeptide.Significant differences were performed using Wilcoxon method, and indicated at p < 0.05 (*), p < 0.001 (***) or p < 1e-5 (****)

Fig. 8
Fig. 8 Paterson's D (ABBA-BABA) statistic test of introgression of conotoxin genes in Conus.Legend heatmap: D-statistic value in abscissa and p value that transformed by logarithm in ordinate.Redder colors in grid indicate higher introgression level

Table 1
Summary of transcriptome assembly of 34

Table 3
Summary of the reassembly genome of C. betulinus