miRNA gene predictions
The presented pipeline identified 21 high quality, previously unknown miRNA genes in tammar using a strict gene annotation and confirmed 75 of the 421 known miRNA genes in tammar. The remaining miRNA genes predicted in Ensembl that do not match a mature miRNA from one of our datasets could be bone fide miRNA genes for which a mature miRNA is not expressed or sequenced in one of the target tissues analyzed herein. Alternatively, these could also represent miRNA loci that, while carrying sequence orthology to miRNAs in miRBase, have undergone lineage-specific locus death by genetic drift due to a lack of selection for function in this lineage . However in light of our validation experiments and since each of the steps in our pipeline utilizes published tools, we have high confidence in our predictions.
Within our miRNA gene dataset are three pseudogenes that represent novel miRNA genes in the tammar. Previous work has shown that two miRNAs in primates were derived from processed pseudogenes , although the incidence of this type of miRNA gene evolution is considered rare [19, 30]. Thus, there has been lineage-specific selection on the hairpins found in these pseudogene transcripts, which we can infer is involved in tammar-specific gene regulation given the mature miRNAs observed from these loci.
Closer examination of a cluster of miRNAs genes on the human X chromosome indicates there is high conservation of this specific miRNA gene cluster in metatherian mammals. This cluster is likely conserved on the X chromosome in tammar as it found on human Xq26.2, in a region on the ancient portion of the mammalian X chromosome and conserved on the X in marsupials [31, 32]. While the conservation of the six miRNA genes in this region was confirmed by the presence of mature miRNAs in our miRNA pools, a miRNA peak was identified just downstream of MIR20B that was highly represented in the testis. The placement of this miRNA just adjacent to the 3’ end of this miRNA gene indicates this gene is likely under post-transcriptional regulation by a miRNA derived from another location, specifically in the testis. This would lead to a loss of gene regulation for targets of MIR20B in a testis-specific fashion, although the specific cell type affected and functional consequences remain to be determined.
Mature miRNA analyses
For each of the microRNA pools, many of the miRNA reads did not overlap with known mature miRNAs annotated in miRBase, indicating that the tissues analyzed in the tammar may carry numerous novel microRNAs or that there has been high sequence divergence from previously annotated animal miRNAs. However, this may be an overestimation of lineage-specificity based on the criteria used in the mapping pipeline. Each RNA from miRBase, along with the sequenced miRNA pools, was mapped to the genome allowing for at most one mismatch to the genome sequence. This procedure indirectly performs an un-gapped alignment with no more than two mismatches between each miRBase annotation and sequenced tammar miRNA. While allowing more mismatches would increase the likelihood of identifying false miRNA targets, relying on such high stringency to identify conserved miRNAs may not account for deep evolutionary distances. This data will ultimately be used to develop new annotation methods that not only use direct information such as sequence similarity to previously annotated miRNAs, but also indirect information such as a predicted set of target genes.
Our annotation strategy for mature miRNAs allowed for assessment of target genes. While limited in the number of target genes to those with a full annotation in Meug_1.0, we were able to identify several tammar-specific miRNA targets, confirm conserved miRNA targets and potentially identify previously unknown miRNA targets in other species, such as human. For example, a conserved miRNA target was identified in the 3’UTR of the gene Lrtm1 (Figure 3A), although the usage of this particular miRNA target varies across species (Figure 2). Thus, while miRNA utility may be species- or tissue-specific, the target location remains conserved. Within the annotated 3’UTR of C17ORF49, we identified two miRNA targets that appeared at first glance to be tammar-specific. However, closer examination of the conservation of this gene between tammar and human indicates these two locations are specific sites of high conservation, spanning ~160 million years of evolution. Note that the predicted human miRNA target sites are not correspondingly conserved (Figure 3B). The two tammar-identified target locations may indicate a conserved miRNA site in human that was previously unknown (Figure 3B). Moreover, C17ORF49 is a gene of unknown function in both tammar and human, thus indicating that the regulatory network of miRNA target genes may aide in understanding novel gene function.
Our analyses also identified several target genes that may represent tammar-specific miRNA regulation. One example of this was the gene Srfs5 (Figure 3C), which carries two different target miRNA sites (Figure 3C). One target location resides within the 3’ most UTR and is in a region of low conservation between human and tammar. The second location lies within a cryptic 3’UTR that is utilized in an alternatively-spliced isoform of this gene . Similar to C17ORF49, this miRNA site is in a region of high conservation between tammar and human and accordingly may represent a conserved miRNA target site. This 3’UTR, unlike most 3’UTRs in tammar, is highly conserved with human across its entire length, confounding inferences regarding the conservation of specific miRNA target sites as the conservation of this portion of the transcript may be independent of any miRNA regulatory pathway. The miRNA identified for the cryptic 3’UTR target site was found limited to the pouch young brain miRNA pool, indicating this gene is under miRNA regulation specifically in that tissue. Interestingly, this gene codes for a splicing factor that is involved in alternative splicing of transcripts (reviewed in ). While it is interesting to speculate that the derivation of a miRNA regulated splicing pathway may have evolved in the tammar brain, leading to species-specific adaptation, a more exhaustive search within brain subregions in human and other mammalian species would be needed to confirm species-specificity.
Genome defense and piRNAs
The annotation of the piRNAs in tammar was restricted to the testis due to technical difficulties with the ovary-specific library. However, we were able to confirm that while piRNAs in this species are predominantly derived from mobile elements, we found this pool was enriched for retrotransposons such as LINEs, SINEs, and LTR-elements. As in other species, there were several piRNA subgroups that were specific to de novo repeats identified in this species that are not conserved with opossum, platypus, mouse or human (Figure 4). Within this de novo pool was enrichment for tammar-specific LINEs and LTR-elements. Given the restriction of piRNAs to the germ line, and their role in genome defense and reproductive isolation [2, 35], our discovery that a subset of piRNAs within the tammar are derived from novel repeats may provide an explanation to the long-standing mystery of Haldane’s Rule  within macropodid marsupials [36, 37]. While macropodid marsupials can produce viable offspring, male F1 hybrids are sterile, following the tenets of Haldane’s Rule in which the heterogametic sex is adversely affected in interspecific crosses . In addition, the genomes of macropodid marsupial F1 hybrids experience instability specifically associated with mobile elements [38–40]. Thus, we postulate that the rapid evolution of mobile DNA across macropodid marsupial species may result in an incompatibility within species hybrids that is manifest in the male germline as a result of expressed piRNA incompatibilities [2, 14, 41].
crasiRNAs and centromeres
The final small RNA class that was annotated as part of the tammar genome project is the crasiRNAs. First discovered in the tammar , crasiRNAs were hypothesized to be derived from mobile elements resident within centromeres . Our analyses represent the first full annotation of small RNAs in this class range and have identified several salient characteristics that demarcate this class from other small RNAs (reviewed in ). Across both tissues examined (testis and fibroblast cells), we find enrichment for mobile DNA progenitor sequences (Figure 5). Unlike the piRNAs, the predominant class of element within crasiRNAs is the SINE retroelement, including a recently discovered SINE class, SINE28, although the distribution of SINEs within each pool is different between testis and fibroblast cells. Our analyses of specific members within the crasiRNAs cytologically confirm that progenitor sequences are enriched at centromeres (Figure 6, Additional file 4: Figure S1). Moreover, these progenitor sequences are enriched in CENP-A containing nucleosomes, further supporting the classification of these small RNAs as centromere-repeat associated. While it cannot be ruled out that discontinuous palindromic signature identified in the crasiRNAs is a feature of the progenitor sequence from which the crasiRNAs are derived, it may also be a pattern involved in the biogenesis and/or targeting of crasiRNAs within centromeric sequences.
While this study has provided sequence annotation and genomic location for these small RNAs, their function within the genome has yet to be determined and remains largely inferential. The fact that crasiRNAs are found specifically in CENP-A rich regions of the centromere points to a role in centromere function; how these small RNAs participate in the demarcation of CENP-A nucleosomes or in centromere function is unknown. Histone tail modifications are dynamic processes that are modulated by other protein complexes and noncoding RNAs, such as small RNAs. For example, it has been proposed that RNAs mediate the pairing of centromere-specific DNAs to chromodomain-like adaptor proteins which in turn recruit histone methyltransferases (HMTases) that target the H3K9 residue for methylation. This interaction may be stabilized by the centromere-specific heterochromatin protein 1 (HP1)[43, 44]. The methylation of H3K9 also triggers DNA methylation of CpG residues in centromeres [45, 46].
The role of RNA in the process of histone modification is not clear; however, regions of the genome once thought of as “junk”, such as repeated DNAs and centromeres, are transcriptionally active and can modulate epigenetic states. Centromeres have long been thought to comprise noncoding and transcriptionally inactive DNA. Surprising new evidence suggests that eukaryotic centromeres produce a variety of transcripts. The transcription of satellites has been observed in numerous eukaryotic species across a broad range of phyla, from yeast to human. The wide-spread conservation of satellite transcription is consistent with a conserved regulatory role for these transcripts in gene regulation or chromatin modification .
These transcripts may function in one of four ways: 1) They may facilitate post-transcriptional gene regulation , potentially through the RNA-induced silencing complex (RISC). In this pathway, double stranded (ds) RNAs are cleaved into short interfering RNAs (siRNAs, 21 nucleotide double stranded RNAs) that, upon association with RISC, mediate native mRNA inactivation . 2) They may participate in the RNA-induced transcriptional silencing complex (RITS), a pathway in which siRNAs are involved in heterochromatin recruitment [50, 51]. 3) Alternatively, in a manner analogous to the Xist transcript in mammalian X-inactivation, they may recruit heterochromatin assembly factors such as HP1 , histone deacetylases, SET domain proteins and Polycomb group proteins ). 4) Lastly, they may regulate the movement of chromosomes through nuclear territories via association with specific chromocenters and “transcriptional factories” [54, 55]. Although the mechanisms are unknown, evidence that satellite transcripts participate in heterochromatin assembly and/or nucleosome recruitment is accumulating.