Widespread protein-coding gene transcription in the mammalian testis
To assess the extent of gene transcription in different organs, a RNA-seq data set was used here. This dataset involved 12 organs (testis, brain, cerebellum, hypothalamus, pituitary, heart, liver, kidney, fat, renal cortex, skeletal muscle and skin) of five mammals: pig, cattle, sheep, human and mouse (Table S1). Among them, the transcriptome data of testis, brain, heart, liver and skeletal muscle were available for all the five mammals. The RNA-seq data were mapped onto the reference genome of the corresponding species and resulted in more than 80% average mapping ratio in these species and > 10 million mapped reads of 76 bp per sample (Table S2-S6 and Fig. S1). Analyses of these mammalian data confirmed that protein-coding genes were more frequently transcribed in testis than in other tissues in all the species analyzed (P < 8.88× 10− 8, chi-square test) (Fig. 1), yielding a pattern consistent with previous estimates for humans, rhesus macaque, mouse, opossum and chicken [18, 19]. Together, testis had high transcriptome complexity.
Gene expression patterns revealed pig male reproduction-related genes
The pig was used as the model system in this study in order to explore the high transcription complexity seen in testis. Results showed that protein-coding gene expression levels vary across tissues and testis had a distinct distribution (Fig. S2). Among them, as expression level increased, the proportion of genes with high expression levels (log2 FPKM ≥4) in testis gradually increased compared to other tissues (Fig. 2A).
The results of previous study demonstrated that many genes related to male reproduction were specifically expressed in the testis (Table S7) [11,12,13]. Therefore, in order to further elucidate genes that were associated with male reproduction in pigs, TSGs were investigated using the distribution of the tissue specificity index τ. Interestingly, data showed that testis contributed considerably to tissue specificity, and the number of tissue-specific genes in the testis was far higher than in others (such as brain, liver, heart and so on) (Fig. 2B, C).
A total of 1210 TSGs were obtained from pig when the τ score was greater than the top 20% value of τ (τ value ≥ 0.91) (Fig. 2B-D and Table S8). TSG expression levels in testis were significantly higher than those in other tissues (P < 2.00 × 10− 16) (Fig. 2D). GO functional analysis revealed that these TSGs were significantly enriched for functions associated with male reproduction, including sperm motility, spermatogenesis, sperm development, reproduction and so on (Fig. 2E). GSEA also showed that these TSGs were involved in gene sets and signal pathways related to male reproduction (Fig. 2F).
Characterizing unique or conserved during evolution TSGs in the pig
Several studies have highlighted that there were differences in gene expression levels between species, yet some tissues (such as testis, brain, heart, etc.) usually have conserved gene expression patterns [20,21,22]. We therefore proposed a hypothesis that TSGs of the pig might also be testis-specific in other phylogenetically closely related species (genetic relationship was revealed using TimeTree website [23]), such as cattle, sheep, human and mouse. To verify this assumption, 13,253 orthologous gene families and 10,740 1: 1 orthologous genes were first identified in these five mammals (Fig. S3).
Then, based on the FPKM values of the 10,740 orthologous genes, pearson correlation coefficients for common tissues from five mammals were calculated, and cluster analysis and principal component analysis (PCA) were performed. The results showed that the gene expression pattern between homologous tissues of different species was more similar than that between different tissues of the same species and that the replicates within each sample exhibited high reproducibility (Fig. 3A, B).
A similar analysis was then also performed to calculate TSGs using organ RNA-seq data from the four additional mammals, and found that the number of TSGs in cattle, sheep, human and mouse were 1459, 1541, 1403 and 1452, respectively (Fig. 3C and Fig. S4). Next, on the basis of a gene family size, genes were classified as single-copy genes (SC) and multi-copy genes (MC, gene family size ≥2). The TSGs of each species were mostly single-copy genes (Fig. 3C). Meanwhile, based on the correspondence of 10,740 1:1 orthologous genes between the five mammals, Fig. 3D showed 195 TSGs with high expression conservation (HCTSGs, shared by all five species), 113 TSGs with moderate expression conservation (MCTSGs, shared by pig, cattle and sheep) and 87 TSGs with low expression conservation (LCTSGs, unique to pig) in pig (Fig. 3D and Table S8).
Also, the expression levels and tissue specificity index scores between LCTSGs, MCTSGs and HCTSGs in the pig were compared, respectively. These comparisons showed that HCTSGs exhibited significantly greater expression levels and tissue-specific index scores than either MCTSGs or LCTSGs and that there were differences in the functions of these three gene sets (Fig. 3E-G). Indeed, the more conservative the gene expression level, the easier it was for a gene to become enriched for male reproduction-related functions (Fig. 3G).
Evolutionary rates of porcine TSGs were relatively higher
Due to differences in selective pressures, the evolutionary rates of gene expression vary between organs and lineages, and these variations were thought to be a basis for the development of phenotypic differences of many organs in mammals [24]. Thus, we assessed how the TSG evolutionary rate in the pig had changed.
Compared with NTSGs, porcine TSGs were found to have significantly higher dN, dS and gene evolutionary rate (dN/dS) (Fig. 4A). At the same time, however, although there were no significant differences in the rate of evolution between LCTSGs, MCTSGs and HCTSGs sets, highly expressed conserved TSGs nevertheless had a relatively low evolutionary rate and were more conserved (Fig. 4B).
Porcine TSGs alternative splicing patterns
The achievement of different functions for genes in different tissues and cells required the process: alternative splicing (AS), which would lead to changes in gene expression and thus change phenotype [25]. To clearly illustrate the complex AS patterns of porcine TSGs, 23,059 AS events (including SE, IR, A5, A3, MX, AF and AL) were identified, which correspond to 8027 protein-coding genes. The data presented in Fig. 5A revealed that the major splicing pattern in porcine protein-coding genes was exon skipping (Fig. 5A). Remarkably, more protein-coding genes (3772) had splice variants in testis than in certain organs (cerebellum, kidney, liver, pituitary and skeletal muscle) (Fig. 5B).
This study then determined the distribution of genes affected by seven AS events in each porcine tissue, and found that trends in the distribution of these AS events were basically consistent in all analyzed tissues and the SE remained the major splicing event (Fig. 5C). This study further identified AS changes between porcine TSGs and NTSGs, the most frequent changes were the number of TSGs in which A5, AF, and SE events occurred (Fig. 5D). Moreover, the study explored changes in the splicing pattern of TSGs with diverse degrees of conservation (LCTSGs, MCTSGs and HCTSGs). The distribution of splicing events in these three gene sets was completely different, and these TSGs were affected by different splicing types (Fig. 5E).
It was clear that a range of different gene isoforms was produced by AS in testis, and we speculated whether the highly expression genes were the result of the high expression of certain transcript isoforms. Hence, the isoform contribution rates with highest expression in testis to the expression of TSGs were calculated. Among TSGs with multiple transcripts, the median number of contribution ratio per gene was 0.937, supporting our conjecture (Fig. 5F). At the same time, Fig. 5F showed that this phenomenon was significantly reduced in other organs (P < 2.00 × 10− 16) (Fig. 5F).
Transcriptional control in porcine TSGs
Transcription factors (TFs) are proteins that bind to specific DNA sequences, influence the expression of neighboring or distal genes, and are a central determinant of gene expression [26]. One of the aims of this study was to evaluate which TFs regulate porcine TSGs. The results presented here showed that 206 TFs were significantly associated with TSGs and not to NTSGs, and these TSGs were preferentially regulated by TFs such as AR, THRB, NR5A1, SOX9 (Fig. 6A and Table S9). Furthermore, TSGs-related TFs were expressed at lower abundance than that of its unrelated TFs in testis (P = 0.014) (Fig. 6B). Data also showed that the abundance of TSGs-related TFs in testis remained significantly lower than its average abundance in the other nine tissues (P = 1.7 × 10− 9) (Fig. 6C).
This study tested whether there were essential TFs that regulate TSGs expression, as determined by the differences in TFs enrichment between LCTSGs, MCTSGs and HCTSGs. Interestingly, although the number of TFs associated with these gene sets was disparate, they overlapped significantly with those identified in whole TSGs at ratios of 68%, 85.4%, and 88.6%, respectively (Fig. 6A and Table S9). Beyond that, the analysis predicted that TCF7L1 (transcription factor 7 like 1) and THRB (thyroid hormone receptor beta) might play a crucial regulator role for TSGs, whereas many other TFs could also potentially regulate the expression abundance of TSGs in the pig (Fig. 6D).
Establishing gene regulation network of porcine TSGs
Simple linear connections between organismal genotypes and phenotypes do not exist. It was clear that the relationships between most genotypes and phenotypes were the result of much deeper underlying complexity [27, 28]. The regulation network of TSGs in the pig was therefore explored in this analysis. Data showed that the degree centrality, betweenness centrality and closeness centrality were significantly lower in TSGs when compared to NTSGs (Fig. 7A). It was also noteworthy that these three centralities were not significantly different between LCTSGs, MCTSGs and HCTSGs (Fig. 7B).
This study also evaluated TSGs that play a central regulatory role in the regulation of male reproduction of pig, employing the regulation network between TSGs (including 767 nodes and 8071 edges). Thus, on the basis of the distribution of degree centrality, 41 candidate hub TSGs were screened within the TSGs regulatory network and a top 5% value of degree centrality was used as a threshold (Fig. 7C, D). Some examples of genes that passed this screen included DAZL, CAPZA3 (capping actin protein of muscle Z-line subunit alpha 3), STRA8 (stimulated by retinoic acid 8), DDX4 (DEAD-box helicase 4), and so on (Fig. 7D).
Verifying TSG abundances in the pig by qRT-PCR
To verify the results of the RNA-seq data analysis, qRT-PCR was performed on eight randomly selected TSGs of LY (Landrace×Yorkshire) pig. The results presented in Fig. 8 showed that qRT-PCR analysis was performed using total mRNA from testis, brain, heart, liver, kidney and muscle. Seven genes with testis-specific expression were observed (including SYCP3 (synaptonemal complex protein 3), SLC26A8 (solute carrier family 26 member 8), SYCE3 (synaptonemal complex central element protein 3), CRISP2 (cysteine rich secretory protein 2), SOX30 (SRY-box transcription factor 30), DDX4 and ODF1 (outer dense fiber of sperm tails 1)) in this analysis, except for FAM71D (family with sequence similarity 71 member D), consistent with the findings of the RNA-seq data (Fig. 8).