- Research article
- Open Access
Small RNAs from plants, bacteria and fungi within the order Hypocreales are ubiquitous in human plasma
BMC Genomicsvolume 15, Article number: 933 (2014)
The human microbiome plays a significant role in maintaining normal physiology. Changes in its composition have been associated with bowel disease, metabolic disorders and atherosclerosis. Sequences of microbial origin have been observed within small RNA sequencing data obtained from blood samples. The aim of this study was to characterise the microbiome from which these sequences are derived.
Abundant non-human small RNA sequences were identified in plasma and plasma exosomal samples. Assembly of these short sequences into longer contigs was the pivotal novel step in ascertaining their origin by BLAST searches. Most reads mapped to rRNA sequences. The taxonomic profiles of the microbes detected were very consistent between individuals but distinct from microbiomes reported at other sites. The majority of bacterial reads were from the phylum Proteobacteria, whilst for 5 of 6 individuals over 90% of the more abundant fungal reads were from the phylum Ascomycota; of these over 90% were from the order Hypocreales. Many contigs were from plants, presumably of dietary origin. In addition, extremely abundant small RNAs derived from human Y RNAs were detected.
A characteristic profile of a subset of the human microbiome can be obtained by sequencing small RNAs present in the blood. The source and functions of these molecules remain to be determined, but the specific profiles are likely to reflect health status. The potential to provide biomarkers of diet and for the diagnosis and prognosis of human disease is immense.
It has been estimated that there are at least ten times more microbial cells associated with our bodies than there are human cells [1, 2]. Recent advances in high throughput, metagenomic sequencing approaches have facilitated identification of this diverse population of microbes at the genomic level. Characterisation of this microbiome, led by the Human Microbiome Project , has revealed that its composition varies widely between body sites and between individuals [2, 4–7].
The microbiome has a significant influence upon health. The majority of microbes are found in the gut and have essential roles in normal human physiology and immune responses [1, 8]. The composition of the gut microbiome is correlated with diet  and may be linked with the pathophysiology of bowel disorders [10, 11], obesity [12–14], atherosclerosis [15–17], diabetes , rheumatoid arthritis [19, 20] and neurodevelopmental disorders . Inflammatory bowel conditions have been linked with the intestinal fungal community [4, 22].
Most metagenomic studies to date have involved isolation of DNA from external body sites or from the respiratory or digestive tracts, with fecal samples being the most commonly used source for investigation of the gut microbiome. Certain small RNAs are stable in the blood and in particular microRNAs have been widely studied as potential predictors of disease [23, 24]. However, we and others [25–27] have observed the existence of additional, exogenous small RNAs of potential microbial origin. Indeed, Wang et al. have documented the existence of RNA from bacteria and fungi in plasma and suggested that they may serve as signaling molecules or indicators of human health . The origin of these small RNAs is unclear, but they are almost certainly derived from microbes inhabiting the gut or respiratory tract, rather than from viable microbes within the circulation. Nonetheless, it seems likely that the subset of the total human microbiome which contributes to these blood-borne small RNAs is linked with health status. The ability to reliably determine the composition of this microbiome from the sequences of the small RNAs present in a blood sample could form the basis of an extremely valuable diagnostic test.
The aim of this study was to construct a profile of the microbiome from which the exogenous small RNAs present in human plasma are derived. The merging of overlapping sequences to generate contigs facilitated identification of the origin of the short RNA sequences. The microbiome profiles generated were consistent across 6 individuals (3 from this study and 3 from publicly available data ). In addition to bacterial sequences, a large proportion of reads matched fungal sequences. To our surprise, the majority of these were assigned to the order Hypocreales. This work has further demonstrated the feasibility of generating a microbiome profile from small RNAs in plasma . The ease of obtaining blood samples will facilitate analysis of this microbiome in a wide range of physiological and disease conditions. These findings also raise the intriguing questions of whether these exogenous RNAs have any functional implications and why sequences from one fungal order are so abundant.
RNA was extracted from three plasma samples and small RNA libraries prepared using an Illumina kit. Each library was sequenced on a MiSeq (Illumina). The unique reads and raw sequencing data have been deposited in Gene Expression Omnibus (GEO), accession number GSE52981. Sequencing data for three plasma exosomal small RNA libraries prepared with a kit from Bioo Scientific were downloaded from GEO . For one of these samples data from libraries prepared with an NEB kit and an Illumina kit (as used in this study) were also available. The strategy for analysis of the sequencing data was to filter out reads derived from human genes, assemble the remaining reads into contigs, annotate these by alignment to known sequences and perform a phylogenetic classification (Figure 1).
The proportions of reads annotated to human genes are illustrated in Figure 2A (absolute numbers in Additional file 1). As expected, a large proportion of reads represented microRNAs, but remarkably, in the whole plasma samples prepared in this study, a similar proportion mapped to Y RNAs. Y RNAs are small cytoplasmic non-coding RNAs that can be cleaved to form smaller RNAs independently of the microRNA pathway . The vast majority of reads (>99%) mapped to hY4, with small numbers to hY5, hY3 and hY1. A smaller but significant number of Y RNA sequences were present in the plasma exosome samples. In small RNA sequencing datasets from whole blood, which included cellular RNAs (GEO accession GSE46579), hy4-derived RNAs were present at levels comparable to an abundant microRNA . The differences in Y RNA abundance observed between studies can be attributed to differences in sample collection (eg whole plasma or plasma exosomes) and library preparation, which result in differing distributions of small RNA read lengths (Additional file 2: Figure S1). The small RNAs detected corresponded to the 5p and 3p arms of the predicted secondary structure of hY4 (Figure 3A). Taqman small RNA RT-qPCR assays employ a stem-loop reverse transcription primer and are therefore expected to be specific for the target small RNA and not detect the full length precursor RNA. Therefore the low Cp values observed with the assays targeting the most abundant hY4 sequences from each arm both confirmed the presence of these small RNAs in plasma and suggested that they are indeed much more abundant than any individual microRNA (Figure 3B). To further confirm the presence of hY4 fragments, RNA was polyadenylated, reverse transcribed with an oligo-dT adaptor and PCR performed with primers specific for the putative hY4 fragments. The size of the product amplified using the 5p primer was consistent with presence of the small RNA template detected in the sequencing rather than full length hY4 RNA (Figure 3C).
A significant number of unannotated reads remained in all samples. The randomly cloned DNA sequences obtained in conventional metagenomic studies are typically assembled into contigs to enhance identification of homology with known genes. Although this strategy would not be applicable to discretely processed small RNAs, such as microRNAs, we reasoned that it could aid detection of longer RNAs which are processed to generate multiple small overlapping RNAs. All the unannotated reads were therefore pooled and assembled into 41542 contigs. For annotation purposes, the 5142 contigs with significant hits (E < 1×10−3) in a megablast search of the NCBI non-redundant database were assigned the identity of the top hit (lineages listed in Additional file 3). The unnannotated reads from each sample were realigned to these contigs and the proportions of reads mapping to different taxonomic categories calculated (Figure 2B-F). Most identifiable reads were assigned to Metazoa, Bacteria or Fungi. Although some metazoan reads could be derived from food , many are likely to be misassigned due to similarity with human sequences.
A small percentage of contigs matched plant sequences, but due to the conservation of rRNA across the kingdom Viridiplantae, the top blast hits did not reliably identify their source, but rather reflected the composition of the database (a preponderance of algal sequences was observed). However, in most instances the sequences were sufficiently divergent from human rRNA to support the notion that they are derived from dietary plant material (Figure 4).
The phylogenetic profile of the bacterial microbiome was remarkably similar between individuals (Figure 2C), with Proteobacteria being the most abundant phylum. This is consistent with an origin in the gut. The number of reads matching fungal sequences was higher than expected and of these, more than 90% in 5 of 6 individuals were from the phylum Ascomycetes (Figure 2D). Remarkably, it was possible to further define the origin of almost all these reads to within the class Sordariomycetes and order Hypocreales (Figure 2E-F). The predominance of sequences from the Hypocreales is illustrated when the numbers of reads mapping to each fungal order are placed on a phylogenetic tree comprising all orders with at least one matching contig (Figure 5).
For the 20 exogenous contigs represented by the most reads, the top 5% of BLAST hits (min score 50) were analysed with the MEGAN taxonomic classification tool [32, 33]. They all mapped to rRNA, 16 of the 20 to fungal sequences, with the lowest common taxonomic rank for 5 of the top 6 being the fungal order Hypocreales or lower (Figure 6). The relative abundances of contigs across the samples were very consistent. Contig 44, which mapped to Hypocreales rRNA, was the most abundant in 5 of the 6 individuals. Notably 9 of the top BLAST hits for the 20 contigs were to the genus Fusarium. The mycoprotein Quorn is derived from Fusarium venenatum . Although it is intriguing to speculate that the sequences we observe are derived from Quorn, it seems unlikely that all 6 subjects would have had this in their diet. In addition, although several contigs align very closely with published F. venenatum rRNA sequences, they match even more closely to other species (Additional file 4: Figure S2).
The contigs assigned to Hypocreales are extremely similar to the published sequences. For example, contig 44 has a similarity of 98.6% identity over 1162 nucleotides to Hypocreales Cordycipitaceae Cordyceps gunnii 28S ribosomal RNA (Figure 7A). This contig can also be aligned, with lower similarity, to rRNA from many other species. A region of contig 44 across which many orthologous sequences were available was selected and a multiple alignment made (Figure 7B). The phylogram derived from this alignment illustrates that contig 44 is considerably more similar to sequences from several species within Hypocreales than to those within Malasseziales and even more dissimilar to the human rRNA sequence (Figure 7C). Contigs generated from analysis of samples from the study by Wang et al.  were also similar to fungal sequences and indeed some were identical to contig 44 for >700 bp (Additional file 5: Figure S3).
All the most abundant contigs fall within the mature rRNA regions but the distribution of detected reads is very uneven (Figure 8). Although the variation in coverage could be partially due to experimental bias (ie differential cloning efficiency of sequences ) it is also likely to reflect in vivo abundances.
Highly expressed small RNAs derived from Y RNAs hY1 and hY3 have been reported in tumours and high expression in serum suggested by RT-PCR . We also observed a small number of sequences matching hY1 and hy3, but the presence of extremely abundant hY4 fragments, confirmed by RT-qPCR, was unexpected. Our ability to detect Y RNA fragments as such a large proportion of total small RNAs in this study may relate to practical details of the library preparation protocol employed, particularly the size range selected. Y RNAs form part of the RoRNP, which also contains the proteins Ro60 and La, but their function is poorly understood . They are required for chromosomal replication  and are overexpressed in tumours . It has been demonstrated that double-stranded RNA oligonucleotides comprising the stem of the Y RNA are sufficient to reconstitute DNA replication in vitro . Y RNAs are rapidly degraded during apoptosis to generate fragments similar in size to those observed in this study . Although it has been suggested that small RNAs derived from Y RNAs may act analogously to microRNAs, the formation of Y3 and Y5 RNA fragments has been shown to be Dicer independent . Given the abundance of the hY4 fragments in plasma, it is an intriguing possibility that they may have some, as yet unknown function.
The detection of microbial sequences in plasma supports previous reports of circulating enterobacterial transcripts  and the most detailed study of these sequences to date by Wang et al. , who performed extensive control experiments to rule out potential sources of contamination. However, the possibility that observations of exogenous RNAs result from contamination remains a serious concern . Spurious detection of such sequences could arise due to contamination during sample handling, library preparation or sequencing or result from errors in data analysis. It is difficult to envisage how contamination with identical sequences could occur in studies undertaken in diverse locations by independent investigators, ie as detected in this study and by Huang et al.  and Wang et al.  (Additional file 5: Figure S3). In addition, analysis of data from the sequence runs prior to those reported in this study confirmed that they were not the source of contamination. In this study reads were assembled to try to improve mapping accuracy and reduce the computational requirements for database searching. The observation of similar mapping results without assembly of the sequence reads (Additional file 6: Figure S4) supports the proposed phylogenetic origins.
The taxonomic breakdown of the originating organisms achieved with our contig-based strategy is in broad agreement with that reported by Wang et al.; Proteobacteria were the most abundant bacterial phylum in both studies, with Bacteroidetes also commonly detected, whilst Ascomycota was the most abundant phylum of Fungi in both studies. However, our data suggest an even greater predominance of Ascomycota and we can assign many of these reads down to the level of Order (Hypocreales). Whilst members of this order have occasionally been reported as opportune pathogens in immunocompromised patients , they are more commonly plant or insect parasites , while Hypocrea jecorina is a widely used source of cellulases . It is remarkable that the vast majority of fungal reads should be derived from a small number of closely related species or potentially even a single species. From where do all these sequences originate?
The composition of both the fungal and bacterial plasma microbiome detected suggests that the sequences do not result from contamination from the skin microbiome during collection of blood samples. Whilst the human skin microbiome varies widely, it is dominated by the bacterial phylum Actinobacteria (and to a lesser degree Firmicutes and Proteobacteria) [1, 5] and the fungal genus Malassezia of the Basidiomycota phylum . Reads from Actinobacteria comprised an average of 1.5% percent of bacterial reads in 5 of 6 samples and only 17.6% in the remaining sample. Firmicutes averaged 1% percent across all samples, although Proteobacteria were the most abundant (50%). With regard to fungi, only 3 contigs (91 reads) were assigned to Malassezia. It seems unlikely that contamination during sample processing could result in such similar microbiome profiles in three independent plasma small RNA datasets and across multiple library preparation methods.
Small RNA sequences have been reported to enter the circulation from the gastrointestinal tract  and pharmacological preparations of small interfering RNAs (siRNAs) have been demonstrated to cross the gut wall following oral administration [46–48]. The gut therefore seems the most likely origin for microbial plasma small RNAs. The human gut, in contrast to skin, is predominantly colonised by the bacterial phyla Bacteroidetes and Firmicutes [1, 5], and by the fungal phylum Ascomycota . It is therefore conceivable that the gut is the source, but one would not expect the observed predominance of sequences from Hypocreales. Perhaps the niche occupied by these species within the gut predisposes them to uptake into the circulation. The respiratory tract is another potential source and indeed Fusarium is one of the four most common pathogenic fungi detected, along with Candida, Aspergillus and Cryptococcus . Although some microRNAs may be absorbed from the gut unshielded to survive exposed in the circulation for several hours [31, 50], many are protected from degradation by association with lipids and proteins [51, 52] and there is some evidence that the exogenous RNAs may be similarly protected . Indeed rRNA fragments have been shown to enter argonaute protein complexes . Differential stability could contribute to over-representation of certain sequences.
In addition to RNAs of microbial origin, some sequences potentially derived from foodstuffs were detected. Notably the greatest proportion of reads matching plant sequences were found in sample 3, which was obtained from the one individual who reported following a vegetarian diet. Although it has been reported that plant microRNAs (xenomiRs) are not reliably detected in plasma after ingestion [54, 55] the possibility of genetic material from food entering the circulation is supported by the detection of plant chloroplast DNA in the blood of cows . The unequivocal assignment of significant numbers of circulating small RNAs to plant rRNA in this study raises the exciting possibility that it may be possible to quantify diet from a simple blood test.
Great care must be taken when comparing between studies because differences in sample collection and library preparation can have profound effects upon the small RNA profiles observed and the proportion of reads mapping to Y RNAs or exogenous small RNAs. Nonetheless, the detection of these same small RNAs in diverse studies confirms that they are a common feature of the circulation.
Abundant fragments derived from the non-coding hY4 RNA, but of unknown function, have been detected in human plasma. RNAs from a diverse range of microbes are also present, but the majority of fungal sequences are from species in the Order Hypocreales. This raises questions about how these exogenous RNAs reach the circulation, whether they are functional and why specific fungi are so highly represented. This work has demonstrated the feasibility of determining the microbiome that contributes small RNAs to the blood. The profile of microbial sequences detected is almost certainly influenced by the composition of the wider microbiome, particularly in the gut. Given the integral role of the human microbiome in normal health and pathology, it seems likely that knowledge of the plasma microbiome will be soon prove to be of clinical importance.
Sample collection and RNA extraction
Three healthy individuals aged 20–40 years were recruited from Belfast, N. Ireland, UK: male, Caucasian (sample 1); female, Caucasian (sample 2); and male, Indian (sample 3). All participants completed a food-frequency questionnaire which included questions on any special dietary requirements. A blood sample was taken in EDTA-treated tubes and plasma was separated immediately by centrifugation for 10 minutes at 1,000 g and subsequently at 10,000 g for 10 minutes prior to RNA extraction using a miRNeasy kit (Qiagen, Crawley, UK). RNA purity and quantity were determined using a Nanodrop spectrophotometer (Thermo Scientific) and Qubit fluorimeter (Life Technologies). RNA integrity was assessed using RNA 2000 and small RNA chips on a Bioanalyzer (Agilent).
Ethics and consent
This study was conducted according to the guidelines laid down in the Declaration of Helsinki and all procedures involving human participants/patients were approved by the Research Ethics Committee of the School of Medicine and Dentistry, Queen’s University Belfast (Ref:11/05v3). Written informed consent was obtained from all participants.
Small RNA libraries were prepared using a Truseq small RNA sample prep kit (Illumina) following the manufacturer’s protocol. This included size selection using a 6% PAGE Gel; the region between the custom Illumina markers was excised, corresponding to insert sizes of approximately 20–35 nucleotides. Cluster generation and sequencing with 40 nucleotide reads on a MiSeq was performed at the Trinity Genome Sequencing Laboratory, Dublin .
Sequencing data were analyzed using Genomics workbench software v5.5.1 (CLCbio, Aarhus, Denmark). After removal of adapter sequences, reads >15 bp and with at least 2 copies were aligned, allowing 2 mismatches, to miRBase (Release 19), a database of human non-coding RNA downloaded from Ensembl using Biomart  and the human genome (hg19). The remaining unannotated reads were pooled and assembled into contigs using the de novo assembly algorithm of Genomics workbench. Reads from each individual sample were then mapped back to the contigs. For subsequent phylogenetic analyses the putative origins of contig sequences were assigned using the sequence identifier (gi) numbers of the top hits determined by megablast [59, 60] (available online ) against the NCBI non-redundant database (E-value <0.001). Lists of gi numbers were uploaded to the metagenomic analysis tools  available through the Galaxy platform [63, 64], specifically to ‘Fetch taxonomic representation’, ‘Summarize taxonomy’, ‘draw phylogeny’ and ‘Find lowest diagnostic rank’. Microsoft Access databases were used to integrate datasets. Taxonomic classification of the top 5% of BLAST hits was performed using the MEtaGenome ANalyzer (MEGAN) analysis tool [32, 33]. The lowest common ancestor was assigned following manual removal of individual hits with obviously incorrect taxonomic classifications (ie matching the query and top blast hits but not other sequences from their alleged species). Optimal RNA secondary structures were predicted using the Vienna RNAfold webserver [65, 66]. Additional multiple sequence alignments were performed using the Multiple Alignment using Fast Fourier Transform (MAFFT) program  available online  or Clustal Omega [69, 70], available through the EBI server . Multiple alignments were visualised with Jalview  and phylograms with Archaeopteryx . Custom Perl scripts were used for manipulating sequence files.
Y-RNA custom small RNA Taqman assays (Life Technologies) were designed to target the following sequences: HY4_5p; GGCUGGUCCGAUGGUAGUGGGUUAUCAGAACU and HY4_3p; CCCCCCACUGCUAAAUUUGACUGGCUU . Taqman reverse transcription and PCR were performed according to the manufacturer’s instructions on a LightCycler480 platform (Roche).
For detection of Y-RNA fragments, RNA was polyadenylated using E. coli Poly(A) Polymerase I (Ambion) and reverse transcribed using Super Script III reverse transcriptase (Life Technologies) and an oligo-dt adaptor: GCGAGCACAGAATTAATACGACTCACTATAGGTTTTTTTTTTTTVN. PCR was performed using the common reverse primer GCGAGCACAGAATTAATACGAC and either an HY4_5p primer: GGCTGGTCCGATGGTAGT or HY4_3p primer: CCCCCCACTGCTAAAATTTGA. 35 cycles of PCR were performed with the following conditions 94°C 30 sec; 56°C 30 sec; 72°C 1 minute using Hotstar Taq DNA polymerase (Qiagen).
Availability of supporting data
The data sets supporting the results of this article are available in the Gene Expression Omnibus (GEO) repository . The sequencing data generated in this study has accession number GSE52981 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52981) and the publicly available plasma small RNA sequencing data  analysed has accession number GSE45722.
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010, 464 (7285): 59-65. 10.1038/nature08821.
Wang ZK, Yang YS: Upper gastrointestinal microbiota and digestive diseases. World J Gastroenterol. 2013, 19 (10): 1541-1550. 10.3748/wjg.v19.i10.1541.
Human Microbiome Project (HMP). http://commonfund.nih.gov/Hmp/,
Ott SJ, Kuhbacher T, Musfeldt M, Rosenstiel P, Hellmig S, Rehman A, Drews O, Weichert W, Timmis KN, Schreiber S: Fungi and inflammatory bowel diseases: Alterations of composition and diversity. Scand J Gastroenterol. 2008, 43 (7): 831-841. 10.1080/00365520801935434.
Cho I, Blaser MJ: The human microbiome: at the interface of health and disease. Nat Rev Genet. 2012, 13 (4): 260-270.
Findley K, Oh J, Yang J, Conlan S, Deming C, Meyer JA, Schoenfeld D, Nomicos E, Park M, NIH Intramural Sequencing Center Comparative Sequencing Program, Kong HH, Segre JA: Topographic diversity of fungal and bacterial communities in human skin. Nature. 2013, 498 (7454): 367-370. 10.1038/nature12171.
Faith JJ, Guruge JL, Charbonneau M, Subramanian S, Seedorf H, Goodman AL, Clemente JC, Knight R, Heath AC, Leibel RL, Rosenbaum M, Gordon JI: The long-term stability of the human gut microbiota. Science. 2013, 341 (6141): 1237439-10.1126/science.1237439.
Honda K, Littman DR: The microbiome in infectious disease and inflammation. Annu Rev Immunol. 2012, 30: 759-795. 10.1146/annurev-immunol-020711-074937.
Hoffmann C, Dollive S, Grunberg S, Chen J, Li H, Wu GD, Lewis JD, Bushman FD: Archaea and fungi of the human gut microbiome: correlations with diet and bacterial residents. PLoS One. 2013, 8 (6): e66019-10.1371/journal.pone.0066019.
Simren M, Barbara G, Flint HJ, Spiegel BM, Spiller RC, Vanner S, Verdu EF, Whorwell PJ, Zoetendal EG, Rome Foundation Committee: Intestinal microbiota in functional bowel disorders: a Rome foundation report. Gut. 2013, 62 (1): 159-176. 10.1136/gutjnl-2012-302167.
Rigsbee L, Agans R, Shankar V, Kenche H, Khamis HJ, Michail S, Paliy O: Quantitative profiling of gut microbiota of children with diarrhea-predominant irritable bowel syndrome. Am J Gastroenterol. 2012, 107 (11): 1740-1751. 10.1038/ajg.2012.287.
Ley RE, Turnbaugh PJ, Klein S, Gordon JI: Microbial ecology: human gut microbes associated with obesity. Nature. 2006, 444 (7122): 1022-1023. 10.1038/4441022a.
Greenblum S, Turnbaugh PJ, Borenstein E: Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc Natl Acad Sci U S A. 2012, 109 (2): 594-599. 10.1073/pnas.1116053109.
Zhao L: The gut microbiota and obesity: from correlation to causality. Nat Rev Microbiol. 2013, 11: 639-647. 10.1038/nrmicro3089.
Koeth RA, Wang Z, Levison BS, Buffa JA, Org E, Sheehy BT, Britt EB, Fu X, Wu Y, Li L, Smith JD, DiDonato JA, Chen J, Li H, Wu GD, Lewis JD, Warrier M, Brown JM, Krauss RM, Tang WH, Bushman FD, Lusis AJ, Hazen SL: Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat Med. 2013, 19 (5): 576-585. 10.1038/nm.3145.
Koren O, Spor A, Felin J, Fak F, Stombaugh J, Tremaroli V, Behre CJ, Knight R, Fagerberg B, Ley RE, Backhed F: Human oral, gut, and plaque microbiota in patients with atherosclerosis. Proc Natl Acad Sci U S A. 2011, 108 (Suppl 1): 4592-4598.
Backhed F: Meat-metabolizing bacteria in atherosclerosis. Nat Med. 2013, 19 (5): 533-534. 10.1038/nm.3178.
Karlsson FH, Tremaroli V, Nookaew I, Bergstrom G, Behre CJ, Fagerberg B, Nielsen J, Backhed F: Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature. 2013, 498 (7452): 99-103. 10.1038/nature12198.
Wu HJ, Ivanov II, Darce J, Hattori K, Shima T, Umesaki Y, Littman DR, Benoist C, Mathis D: Gut-residing segmented filamentous bacteria drive autoimmune arthritis via T helper 17 cells. Immunity. 2010, 32 (6): 815-827. 10.1016/j.immuni.2010.06.001.
Scher JU, Sczesnak A, Longman RS, Segata N, Ubeda C, Bielski C, Rostron T, Cerundolo V, Pamer EG, Abramson SB, Huttenhower C, Littman DR: Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis. Elife. 2013, 2 (0): doi:10.7554/eLife.01202
Hsiao EY, McBride SW, Hsien S, Sharon G, Hyde ER, McCue T, Codelli JA, Chow J, Reisman SE, Petrosino JF, Patterson PH, Mazmanian SK: Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell. 2013, 155 (7): 1451-1463. 10.1016/j.cell.2013.11.024.
Iliev ID, Funari VA, Taylor KD, Nguyen Q, Reyes CN, Strom SP, Brown J, Becker CA, Fleshner PR, Dubinsky M, Rotter JI, Wang HL, McGovern DP, Brown GD, Underhill DM: Interactions between commensal fungi and the C-type lectin receptor Dectin-1 influence colitis. Science. 2012, 336 (6086): 1314-1317. 10.1126/science.1221789.
Geekiyanage H, Jicha GA, Nelson PT, Chan C: Blood serum miRNA: non-invasive biomarkers for Alzheimer's disease. Exp Neurol. 2012, 235 (2): 491-496. 10.1016/j.expneurol.2011.11.026.
Guay C, Roggli E, Nesca V, Jacovetti C, Regazzi R: Diabetes mellitus, a microRNA-related disease?. Transl Res. 2011, 157 (4): 253-264. 10.1016/j.trsl.2011.01.009.
Wang K, Li H, Yuan Y, Etheridge A, Zhou Y, Huang D, Wilmes P, Galas D: The complex exogenous RNA spectra in human plasma: an interface with human gut biota?. PLoS One. 2012, 7 (12): e51009-10.1371/journal.pone.0051009.
Semenov DV, Baryakin DN, Kamynina TP, Kuligina EV, Richter VA: Fragments of noncoding RNA in plasma of human blood. Ann N Y Acad Sci. 2008, 1137: 130-134. 10.1196/annals.1448.030.
Semenov DV, Baryakin DN, Brenner EV, Kurilshikov AM, Vasiliev GV, Bryzgalov LA, Chikova ED, Filippova JA, Kuligina EV, Richter VA: Unbiased approach to profile the variety of small non-coding RNA of human blood plasma with massively parallel sequencing technology. Expert Opin Biol Ther. 2012, 12 (Suppl 1): S43-S51.
Huang X, Yuan T, Tschannen M, Sun Z, Jacob H, Du M, Liang M, Dittmar RL, Liu Y, Liang M, Kohli M, Thibodeau SN, Boardman L, Wang L: Characterization of human plasma-derived exosomal RNAs by deep sequencing. BMC Genomics. 2013, 14: 319-10.1186/1471-2164-14-319.
Nicolas FE, Hall AE, Csorba T, Turnbull C, Dalmay T: Biogenesis of Y RNA-derived small RNAs is independent of the microRNA pathway. FEBS Lett. 2012, 586 (8): 1226-1230. 10.1016/j.febslet.2012.03.026.
Leidinger P, Backes C, Deutscher S, Schmitt K, Muller SC, Frese K, Haas J, Ruprecht K, Paul F, Stahler C, Lang CJ, Meder B, Bartfai T, Meese E, Keller A: A blood based 12-miRNA signature of Alzheimer disease patients. Genome Biol. 2013, 14 (7): R78-10.1186/gb-2013-14-7-r78.
Zhang L, Hou D, Chen X, Li D, Zhu L, Zhang Y, Li J, Bian Z, Liang X, Cai X, Yin Y, Wang C, Zhang T, Zhu D, Zhang D, Xu J, Chen Q, Ba Y, Liu J, Wang Q, Chen J, Wang J, Wang M, Zhang Q, Zhang J, Zen K, Zhang CY: Exogenous plant MIR168a specifically targets mammalian LDLRAP1: evidence of cross-kingdom regulation by microRNA. Cell Res. 2012, 22 (1): 107-126. 10.1038/cr.2011.158.
Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res. 2007, 17 (3): 377-386. 10.1101/gr.5969107.
Huson DH, Mitra S, Ruscheweyh HJ, Weber N, Schuster SC: Integrative analysis of environmental sequences using MEGAN4. Genome Res. 2011, 21 (9): 1552-1560. 10.1101/gr.120618.111.
O'Donnell K, Cigelnik E, Casper HH: Molecular phylogenetic, morphological, and mycotoxin data support reidentification of the Quorn mycoprotein fungus as Fusarium venenatum. Fungal Genet Biol. 1998, 23 (1): 57-67. 10.1006/fgbi.1997.1018.
Sorefan K, Pais H, Hall AE, Kozomara A, Griffiths-Jones S, Moulton V, Dalmay T: Reducing ligation bias of small RNAs in libraries for next generation sequencing. Silence. 2012, 3 (1): 4-10.1186/1758-907X-3-4. -907X-3-4
Meiri E, Levy A, Benjamin H, Ben-David M, Cohen L, Dov A, Dromi N, Elyakim E, Yerushalmi N, Zion O, Lithwick-Yanai G, Sitbon E: Discovery of microRNAs and other small RNAs in solid tumors. Nucleic Acids Res. 2010, 38 (18): 6234-6246. 10.1093/nar/gkq376.
Kohn M, Pazaitis N, Huttelmaier S: Why YRNAs? About versatile RNAs and their functions. Biogeosciences. 2013, 3 (1): 143-156.
Christov CP, Gardiner TJ, Szuts D, Krude T: Functional requirement of noncoding Y RNAs for human chromosomal DNA replication. Mol Cell Biol. 2006, 26 (18): 6993-7004. 10.1128/MCB.01060-06.
Christov CP, Trivier E, Krude T: Noncoding human Y RNAs are overexpressed in tumours and required for cell proliferation. Br J Cancer. 2008, 98 (5): 981-988. 10.1038/sj.bjc.6604254.
Gardiner TJ, Christov CP, Langley AR, Krude T: A conserved motif of vertebrate Y RNAs essential for chromosomal DNA replication. RNA. 2009, 15 (7): 1375-1385. 10.1261/rna.1472009.
Rutjes SA, van der Heijden A, Utz PJ, van Venrooij WJ, Pruijn GJ: Rapid nucleolytic degradation of the small cytoplasmic Y RNAs during apoptosis. J Biol Chem. 1999, 274 (35): 24799-24807. 10.1074/jbc.274.35.24799.
Zhang Y, Wiggins BE, Lawrence C, Petrick J, Ivashuta S, Heck G: Analysis of plant-derived miRNAs in animal small RNA datasets. BMC Genomics. 2012, 13: 381-10.1186/1471-2164-13-381.
Howard DH: Pathogenic Fungi in Humans and Animals (Mycology). 2002, Boca Raton, FL, USA: CRC Press, 2
Berbee ML: The phylogeny of plant and animal pathogens in the Ascomycota. Physiol Mol Plant Pathol. 2001, 59 (4): 165-187. 10.1006/pmpp.2001.0355.
Martinez D, Berka RM, Henrissat B, Saloheimo M, Arvas M, Baker SE, Chapman J, Chertkov O, Coutinho PM, Cullen D, Danchin EG, Grigoriev IV, Harris P, Jackson M, Kubicek CP, Han CS, Ho I, Larrondo LF, de Leon AL, Magnuson JK, Merino S, Misra M, Nelson B, Putnam N, Robbertse B, Salamov AA, Schmoll M, Terry A, Thayer N, Westerholm-Parvinen A, et al: Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nat Biotechnol. 2008, 26 (5): 553-560. 10.1038/nbt1403.
Aouadi M, Tesz GJ, Nicoloro SM, Wang M, Chouinard M, Soto E, Ostroff GR, Czech MP: Orally delivered siRNA targeting macrophage Map4k4 suppresses systemic inflammation. Nature. 2009, 458 (7242): 1180-1184. 10.1038/nature07774.
Xu J, Ganesh S, Amiji M: Non-condensing polymeric nanoparticles for targeted gene and siRNA delivery. Int J Pharm. 2012, 427 (1): 21-34. 10.1016/j.ijpharm.2011.05.036.
Akhtar S: Oral delivery of siRNA and antisense oligonucleotides. J Drug Target. 2009, 17 (7): 491-495. 10.1080/10611860903057674.
Ghannoum MA, Jurevic RJ, Mukherjee PK, Cui F, Sikaroodi M, Naqvi A, Gillevet PM: Characterization of the oral fungal microbiome (mycobiome) in healthy individuals. PLoS Pathog. 2010, 6 (1): e1000713-10.1371/journal.ppat.1000713.
Witwer KW: XenomiRs and miRNA homeostasis in health and disease: evidence that diet and dietary miRNAs directly and indirectly influence circulating miRNA profiles. RNA Biol. 2012, 9 (9): 1147-1154. 10.4161/rna.21619.
Arroyo JD, Chevillet JR, Kroh EM, Ruf IK, Pritchard CC, Gibson DF, Mitchell PS, Bennett CF, Pogosova-Agadjanyan EL, Stirewalt DL, Tait JF, Tewari M: Argonaute2 complexes carry a population of circulating microRNAs independent of vesicles in human plasma. Proc Natl Acad Sci U S A. 2011, 108 (12): 5003-5008. 10.1073/pnas.1019055108.
Vickers KC, Palmisano BT, Shoucri BM, Shamburek RD, Remaley AT: MicroRNAs are transported in plasma and delivered to recipient cells by high-density lipoproteins. Nat Cell Biol. 2011, 13 (4): 423-433. 10.1038/ncb2210.
Wei H, Zhou B, Zhang F, Tu Y, Hu Y, Zhang B, Zhai Q: Profiling and identification of small rDNA-derived RNAs and their potential biological functions. PLoS One. 2013, 8 (2): e56842-10.1371/journal.pone.0056842.
Witwer KW, McAlexander MA, Queen SE, Adams RJ: Real-time quantitative PCR and droplet digital PCR for plant miRNAs in mammalian blood provide little evidence for general uptake of dietary miRNAs: limited evidence for general uptake of dietary plant xenomiRs. RNA Biol. 2013, 10 (7): 1080-1086. 10.4161/rna.25246.
Snow JW, Hale AE, Isaacs SK, Baggish AL, Chan SY: Ineffective delivery of diet-derived microRNAs to recipient animal organisms. RNA Biol. 2013, 10 (7): 1107-1116. 10.4161/rna.24909.
Bertheau Y, Helbling JC, Fortabat MN, Makhzami S, Sotinel I, Audeon C, Nignol AC, Kobilinsky A, Petit L, Fach P, Brunschwig P, Duhem K, Martin P: Persistence of plant DNA sequences in the blood of dairy cows fed with genetically modified (Bt176) and conventional corn silage. J Agric Food Chem. 2009, 57 (2): 509-516. 10.1021/jf802262c.
Trinity Genome Sequencing Laboratory. http://www.medicine.tcd.ie/sequencing,
Biomart for the Ensembl database. http://www.ensembl.org/biomart/martview/,
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410. 10.1016/S0022-2836(05)80360-2.
Morgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, Schaffer AA: Database indexing for production MegaBLAST searches. Bioinformatics. 2008, 24 (16): 1757-1764. 10.1093/bioinformatics/btn322.
Basic Local Alignment Search Tool (BLAST) home page at NCBI. http://blast.ncbi.nlm.nih.gov/,
Kosakovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung WY, Taylor J, Nekrutenko A, Galaxy Team: Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res. 2009, 19 (11): 2144-2153. 10.1101/gr.094508.109.
The Galaxy project. http://usegalaxy.org,
Goecks J, Nekrutenko A, Taylor J, Galaxy Team: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11 (8): R86-10.1186/gb-2010-11-8-r86.
Gruber AR, Lorenz R, Bernhart SH, Neubock R, Hofacker IL: The Vienna RNA websuite. Nucleic Acids Res. 2008, 36 (Web Server issue): W70-W74.
Vienna RNAfold webserver. http://rna.tbi.univie.ac.at,
Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30 (14): 3059-3066. 10.1093/nar/gkf436.
Multiple Alignment using Fast Fourier Transform (MAFFT). http://mafft.cbrc.jp/alignment/server/,
Goujon M, McWilliam H, Li W, Valentin F, Squizzato S, Paern J, Lopez R: A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010, 38 (Web Server issue): W695-W699.
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011, 7: 539-
Clustal Omega on the EBI server. http://www.ebi.ac.uk/Tools/msa/clustalo/,
Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ: Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009, 25 (9): 1189-1191. 10.1093/bioinformatics/btp033.
Zmasek CM, Eddy SR: ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics. 2001, 17 (4): 383-384. 10.1093/bioinformatics/17.4.383.
Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A: NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2010, 39 (Database issue): D1005-D1010.
Thanks to Estelle Lowry for blood collection and Jayne Woodside and Margaret Dellett for helpful comments and discussion.
This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) (Grant number: BB/H005498/1) and the Department for Employment and Learning, Northern Ireland.
The authors declare that they have no competing interests.
MB carried out the bioinformatic analyses of sequence data, participated in the design of the study and helped to draft the manuscript. JG prepared the libraries and helped to draft the manuscript. EB performed RT-PCR analyses. SB wrote perl scripts and provided comments. UC participated in the design of the study. RH helped to design the study and draft the manuscript. DS conceived and designed the study, led and performed data analysis and drafted the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 2: Figure S1: Distribution of read lengths in a range of sequencing libraries prepared from blood. The percentages of reads of each length are shown for libraries prepared from plasma, exosomes isolated from plasma or whole blood, including cells. Both the source material and library preparation protocol (eg size selection) influence the insert sizes observed. References: Huang et al. ; Wang et al. ; Leidinger et al. . (PDF 172 KB)
Additional file 5: Figure S3: Analysis of plasma sequence data from Wang et al. . (a) Taxonomic composition of the contigs derived from small RNAs isolated from a normal plasma sample (ERR248695), determined from BLAST searches using MEGAN. (b) Alignment of one contig derived from sample ERR248695 from the study by Wang et al.  with contig 44 from this study, demonstrating total identity. (PDF 145 KB)
Additional file 6: Figure S4: Phylogenetic profiles predicted from individual reads or contigs. A random subset of the reads that were unannotated to human databases was generated from Sample 3A. These were input either directly or after assembly into contigs to BLAST searches of the nt database. Similarities with fungal sequences are a key feature detected by both approaches. Using this subset of sequences no contigs with potential bacterial origin were detected, probably reflecting the relatively low abundance of putative bacterial reads in this sample in comparison to fungal reads (see Figure 2D). (a) Phylogenetic profile predicted using MEGAN to interpret BLAST searches using contigs assembled from the reads. The number of hits at each node is indicated. (b) Phylogenetic profile predicted from BLAST searches of individual reads. The similarity between the trees suggests that mapping of assembled reads is broadly consistent to the results with individual reads. (PDF 150 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.