Volume 14 Supplement 5
Systematically profiling and annotating long intergenic non-coding RNAs in human embryonic stem cell
© Tang et al.; licensee BioMed Central Ltd. 2013
Published: 16 October 2013
While more and more long intergenic non-coding RNAs (lincRNAs) were identified to take important roles in both maintaining pluripotency and regulating differentiation, how these lincRNAs may define and drive cell fate decisions on a global scale are still mostly elusive. Systematical profiling and comprehensive annotation of embryonic stem cells lincRNAs may not only bring a clearer big picture of these novel regulators but also shed light on their functionalities.
Based on multiple RNA-Seq datasets, we systematically identified 300 human embryonic stem cell lincRNAs (hES lincRNAs). Of which, one forth (78 out of 300) hES lincRNAs were further identified to be biasedly expressed in human ES cells. Functional analysis showed that they were preferentially involved in several early-development related biological processes. Comparative genomics analysis further suggested that around half of the identified hES lincRNAs were conserved in mouse. To facilitate further investigation of these hES lincRNAs, we constructed an online portal for biologists to access all their sequences and annotations interactively. In addition to navigation through a genome browse interface, users can also locate lincRNAs through an advanced query interface based on both keywords and expression profiles, and analyze results through multiple tools.
By integrating multiple RNA-Seq datasets, we systematically characterized and annotated 300 hES lincRNAs. A full functional web portal is available freely at http://scbrowse.cbi.pku.edu.cn. As the first global profiling and annotating of human embryonic stem cell lincRNAs, this work aims to provide a valuable resource for both experimental biologists and bioinformaticians.
The great potential of human embryonic stem cell (hES) in clinical usage inspired scientists to investigate underlying mechanisms for their unique pluripotency and self-renew characteristics [1–9]. Recently, several studies demonstrate that long intergenic non-coding RNAs (lincRNAs) play key roles in maintaining pluripotency [10, 11], modulating reprogramming  and differentiation . Knockdown of multiple lincRNAs has great effect on global gene expression pattern and could cause exit from the pluripotent state . Several human lincRNAs are further showed to be involved in core regulatory feedback circuits of hES cells and directly regulated by well-known key pluripotency transcription factors such as Oct4 and Nanog [10, 14, 15]. As more and more human lincRNAs were identified [9, 10, 13–16], systematically characterizing human embryonic stem cell lincRNAs will not only shed lights on the hES transcriptome dynamics but also help revealing biological functions of these novel regulators.
Combining a comprehensive collection of human embryonic stem cell RNA-Seq datasets with Human BodyMap 2, we validated that 295 previously annotated lincRNAs were expressed in multiple human embryonic stem cell samples and further identified five novel hES lincRNAs through de novo assembling. Global statistical analysis revealed that these lincRNAs' expression levels are lower than that of their protein-coding counterparts. Functional analysis further demonstrated that hES lincRNAs were preferentially involved in multiple development processes including embryo development, ribosome biogenesis, and aging. To help explore the abundant information effectively, we built an integrative web portal for scientists to browse, search and perform analysis of all lincRNAs through an intuitive Web interface. It could be accessed freely at http://scbrowse.cbi.pku.edu.cn.
300 lincRNAs are transcribed in human embryonic stem cells
In order to systematically profile hES lincRNAs, we firstly compiled a known human lincRNA catalog by integrating multiple public sources. Annotated lincRNA gene models were extracted from Ensembl, UCSC and RefSeq. Redundant gene models were identified and merged based on the genomic coordinates, resulting in 5,571 standalone annotated lincRNA genes (See Methods and Materials, as well as the Additional File 1 for more details). Moreover, we surveyed and manually screened published hES RNA-Seq datasets in several public repositories, resulting in a list of 31 wild-type human embryonic stem cell samples. Out of which, 19 high-quality datasets with at least 50nt read length were further selected for follow-up analysis to minimize technological biases caused by early Solexa platforms (see Additional File 2 for more details). In addition, transcriptome profiling for 16 adult normal tissues derived from Illumina Human BodyMap 2 Project were also incorporated as control.
To find novel hES lincRNAs, we performed de novo assembling against all wild-type hES samples. After excluding annotated lincRNAs and non-lincRNA transcripts (e.g. known protein-coding genes, miRNAs and tRNAs), five novel lincRNAs were eventually identified. Combining with previous annotated catalog, we got a full list with 5,576 human lincRNA genes (5,571 known lincRNAs and 5 novel ones, see Methods and Materials for more details)
300 (~16.43%) of the expressed lincRNAs were detected as being expressed in hES (Figure 1b). Open chromatin marks were detected to be significantly enriched at their promoters (Fisher's exact test, H3K4me3, p-value < 2.2e-16; H3K4me2, p-value < 2.2e-16) and gene bodies (Wilcoxon test, H3K36me3, p-value<2.2e-16; H3K79me2, p-value< 2.2e-16), confirming active transcription of these genomic regions in human ES cells [20, 21] (Additional File 3).
Nanog and Oct4 are both well-known essential transcription factors in hES [6, 22, 23]. The promoters of hES lincRNAs were found to be enriched with binding sites of Nanog (Fisher's exact test, adjusted p-value < 2.2e-16, odds ratio = 5.0) but not Oct4 (adjusted p-value = 1, odds ratio = 1.7), suggesting different regulation of lincRNAs between these two pluripotency factors in hES.
Comparing with other lincRNAs, hES lincRNAs generally had more complex transcript structure with longer transcript length (1,215 nt versus 906 nt, p-value < 2.2e-16), more exons per transcript (3.44 versus 3.00, p-value = 2.2e-06) and more alternative isoforms per gene (2.76 versus 1.42, p-value < 2.2e-16). On the other hand, being consistent with previous reports [18, 19], both their expression level and breath were overall lower than of their protein-coding counterparts.
Systematic annotation of hES lincRNAs
Partly due to their expression specificity, a bit lower proportion (93.3%, 280 out of 300) of hES lincRNAs were annotated. Global statistical analysis suggested that these hES lincRNAs were preferentially involved in embryo development (108 lincRNAs, p-value = 4.56e-54) and ribosome biogenesis (41 lincRNAs, p-value = 2.10e-32). On the other hand, hES biased lincRNAs were more likely to be involved in mitosis (67 lincRNAs, p-value = 1.90e-21), cell cycle (72 lincRNAs, p-value = 8.96e-18), reproduction (52 lincRNAs, p-value = 3.53e-7), as well as embryo development (76 lincRNAs, p-value = 2.13e-21) and ribosome biogenesis (40 lincRNAs, p-value = 6.86e-19) (Figure 2b and Figure 2c). Interestingly, the five de novo assembled hES lincRNAs were involved in embryonic epithelial tube formation and regulation of cell cycle, suggesting their putative important roles.
In addition to functional annotation, we further investigated the evolutionary pattern of hES lincRNAs. Previous studies have shown that the exon sequences of lincRNA are less conserved than that of protein coding genes, while more conserved than neutrally evolving ancestral repeat sequences [18, 19]. On the other hand, despite of their relatively rapid sequence turn-over rate, several lincRNAs have been reported to have homologs in remote species, suggesting distinct evolutionary patterns among different classes of lincRNAs [18, 19, 26, 27].
For hES lincRNAs, 263 out of 300 were successfully dated. Around half (129 out of 300) were dated at the root of the mammalian tree, i.e. being conserved across mouse and human lineage. A close look of the respective mouse homologous regions found that more than 80% (109 out of 129) of them had at least one active histone markers (H3K9ac, H3K27ac, H3K36me3, and H3K4me3) and only two of them had repressive histone modifications (H3K9me3, H3K27me3) in mouse ES cells, with one third (33) also posed directly upstream PolII binding measured by ChIP-Seq assay, suggesting their bona fide mouse ES expression.
To our surprise, hES lincRNAs were found to be generally younger than other lincRNAs (Wilcoxon test, p-value = 4.46e-07) (Figure 3b). Consistently, we also found that human specific lincRNAs were more likely to be hES lincRNAs (Single-tailed Fisher's exact test, odds ratio =6.3, p-value = 0.0054). Eleven hES lincRNAs were identified as human-specific. Similar to previous reports , we also found these human-specific hES lincRNAs more likely to hold HERVH transposable elements (TEs) than mouse-conserved ones (Single-tailed Fisher's exact test, odds ratio= 66.3, p-value = 0.0001). Of interest, two of them (ENSG00000228437, ENSG00000254339) were found to be under strong intra-population purifying selection indicated by low derived allele frequency (< 0.1) , suggesting their potential human-specific functions.
Integrative web portal for visualizing and analyzing data
To facilitate further investigation of these hES lincRNAs, we integrated gene models and annotations with multiple related biological data into an integrative web portal. Powered by ABrowse , the portal aims to providing users a fully interactive environment for browsing, searching and analyzing these lincRNAs as well as annotations through an intuitive interface.
The gene model tracks are designed to present both gene structures and annotations. In addition to the complete list of 5,576 human lincRNAs (track "lincRNA gene model"), protein-coding genes from Ensembl 68 (track "protein coding gene model"), two human lincRNA sets from LNCipedia , Cabili et al.  and one hES long non-coding RNA (lncRNA) catalog from Sigova et al.  are also available as references. Detailed annotations for each record will be shown in the "Entry detail" tab (of left-side Control panel, Figure 4c) after a click.
"Basic Information" sheet contains its genomic location, original accession number and the source link, as well as the functional and evolutionary annotations.
"Expression Profile" sheet shows global expression profile of the given lincRNA across multiple samples. To get a comprehensive view, we also incorporated 24 differentiated embryonic stem cell samples in addition to the 31 wild-type embryonic stem cell samples and 16 normal adult human tissues. After choosing samples in the "Sample Tree", the user could visually inspect the expression profile through a bar chart, and add mean expression for similar samples (i.e. samples in the same group) if needed.
The last sheet ("Find genes with similar expression pattern") allows user to identify co-expressed genes quickly. After specifying the correlation coefficient measurement (Pearson's r, Spearman's rho or Gini correlation coefficient), the cut-off and the samples, user can either view the matched genes interactively, or download in batch for further analysis.
To offer insights for the regulation and evolution of hES lincRNAs, we further integrated additional tracks derived from several public annotation sources. Multiple transcription factor binding and histone modification tracks, including more than 50 different transcription factors and 11 different types of histone modifications in H1 hES cell line  are grouped as "Transcriptional Regulation". Similarly, evolutionary conservation tracks covering both inter-species (measured by PhloyP score ) and intra-species (measured by derived allele frequency ) comparisons are grouped as "Evolution". All these tracks could be easily turned on or off through the "Tracks" box.
In addition to rich annotations, the portal also provides powerful searching tools for users to find lincRNAs quickly. In addition to common text-based search and sequence-based search (through the "Advanced Search" box), the portal allows users to search lincRNAs with specified expression pattern through the dedicated "Find lincRNAs" tab in the Control panel (Figure 4d). With an intuitive interface, the user can specify the expression pattern across multiple samples by either interactively "drawing" in the "Basic" sub-tab or inputting numbers precisely in the "Advance" sub-tab. Similar to the output of "Find genes with similar expression pattern" sheet, the result could also be exported as plain text for further processing (Figure 4d).
Furthermore, multiple utilities were implemented to make it easier for users to analyze data through the portal. After registration, the user can upload customized tracks and manage them through the "My Tracks" tab in the Control panel. Even more conventionally, the registered user can add Instant Note for any genomic region on-the-fly by clicking-and-dragging. All these user generated annotations could be seamlessly integrated with existing tracks, and freely set as "public" or "private" when necessary.
Compiling annotated human lincRNA catalog
To get a comprehensive annotated human lincRNA catalog, we firstly extracted all annotated lincRNA gene models from Ensembl (v68), then filtered 6,730 RefSeq NR_ records and 10,654 UCSC noncoding genes based on multiple criteria. Inspired by previous studies [18, 19], we took a conservative strategy. Briefly, only multi-exon transcripts satisfying 1) length >200bp, 2) not overlapped with known genes, and 3) classified as "non-coding" by CPC  were kept for further analysis. Redundant gene models were identified and merged based on genomic coordinate, resulting in 5,571 standalone annotated lincRNA genes (see also Additional File 1 for details).
RNA-Seq data collection
We manually screened all Illumine human RNA-Seq datasets in NCBI GEO [36, 37]. In case of potential genomic contamination, only samples with polyA plus libraries were kept. Eventually, we got a list of 31 wild-type samples (19 H1, 4 H9 and 8 other hES cell lines) and 24 differentiated embryonic stem cell samples. To minimize technological biases caused by early Solexa platforms, we further chose 19 high-quality wild-type datasets with at least 50nt read length for follow-up analysis (see Additional File 2 for more details). In addition, transcriptome profiling for 16 adult normal tissues generated by Human BodyMap 2 Project were also incorporated.
De novo assembling lincRNAs from RNA-Seq data
Following standard protocol , we mapped raw reads of the 19 high quality wild-type hES samples onto the human reference genome (hg19) by TopHat  and assembled mapped reads into transcripts by Cufflinks . All assembled transcripts were firstly filtered using similar criteria being described above, and transcripts with low expression level were further removed to control false positives due to transcription noises (see Additional File 4 for more details).
Finally, after combining with previous annotated catalog, we got a full list with 5,576 human lincRNA genes (5,571 known lincRNAs and 5 novel ones).
Identifying hES lincRNAs
Several efforts have been taken to ensure the quality of hES lincRNAs catalog. For effectively reducing false positives caused by random "bench effect", we systematically screened multiple heterogeneous datasets generated by different labs around the world (Figure 1).
To further improve the robustness, we applied a stringent criteria (median FPKM across multiple human ES samples > 1) when calling hES lincRNAs. On the other hand, a rather loose cut-off (0.07), was used by Sigova et al. , resulting in large numbers of marginally expressed candidates. In fact, more than 80% (2,910) of the Sigova et al. reported hES long noncoding RNAs (3,548) have low expression levels (< 1), and less than 18% (638 out of 3,548) Sigova's hES long noncoding RNAs could pass our filter, including only 37 (less than 10%) Sigova's distant hES long noncoding RNAs.
ChIP-Seq analysis for transcription factors and histone modifications
Respective ENCODE ChIP-seq datasets for both human and mouse ES cell lines were downloaded (from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC and http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC). Replicates were merged firstly and then feed into MACS  for peak calling with p-value cutoff = 1e-10.
The upstream 3kb and downstream 1kb of annotated transcript starting sites were defined as promoter regions . TFBS or promoter biased histone modifications as H3K4me3, H3K4me2, H3K27me3 and H3K9me3 located within these regions are considered as effective regulatory sites.
Gene ontology (GO) enrichment analysis
According to previous evaluation , we calculated the Gini correlation coefficient (GCC) as the co-expression measurement with R package rsgcc. For each lincRNA, protein coding genes with GCC >= 0.9 were taken as the "neighbors" with strong expression correlation. GO enrichment analysis for "neighbors" was implemented by R package GOstats . Raw p-values were adjusted for multiple testing using BH procedure . Significantly enriched terms (adjusted p-value <= 0.01) were assigned to the lincRNA.
To get a broad overview, fine-grained terms were further projected onto generic GO slim terms: Firstly, we downloaded GO slim file from http://www.geneontology.org/GO_slims/goslim_generic.obo, then mapped each GO term to its ancestor GO slim terms using map2slim in Perl package go-perl. After mapping lincRNA to GO slim terms, we further tested the enrichment of each lincRNA set for each GO slim term. For example, if we intended to see whether lincRNAs expressed in hES are biasedly enriched in GO slim term X, we classified all lincRNA in two ways: 1) expressed in hES or not expressed in hES, 2) enriched in term × or not enriched in term X. After Fisher's exact tests, GO slim terms with BY adjusted  p-value <= 0.05 and odds ratio > 1 were determined as significantly positively associated with hES lincRNA.
We also tried Pearson correlation coefficient instead of GCC, and changed the cutoff 0.9 to 0.85 and 0.95 for both measurements. Different parameters resulted in similar results, showing the robustness of our analysis pipeline.
Transposable element (TE) content analysis
We downloaded annotations of repeats (hg19 rmsk table) from the UCSC Table Browser. Satellite, low complexity, and simple repeats were then excluded . We compared genome coordinates of repeats with genome coordinates of lincRNA exons.
Evolutionary dating of lincRNA
Firstly, we mapped human lincRNAs onto genomes of other species according to genome alignments by using UCSC liftOver with default parameters. If there were genome alignments between specified species and human, covering > 80% base pairs of human lincRNA, the aligned region within the specified specie was taken as the homolog of this human lincRNA. Using this criteria, we have identified 5,526 (~99%) homologs for our lincRNA catalog (5,576 lincRNA) among the 6 genomes (chimp: 5,383; gorilla: 5,258; orangutan: 5,308; rhesus: 5,064; marmoset: 4,481; mouse: 2,601). We further checked the alignment identity for these homologs in each species, and found that 95% of them with identity > 96% in chimp, > 95% in gorilla, > 92% in orangutan, > 87% in rhesus, > 82% in marmoset, > 50% in mouse.
Secondly, we dated lincRNAs according to appearance of homologs using parsimony rules. Briefly, if there are homologs in each of the species in the specified evolutionary clade for a lincRNA and no homologs in all outgroups, the lincRNA is dated to the period from the time divergent from its closest outgroup to the born time of the most recent common ancestor of that clade.
List of abbreviations used
long intergenic non-coding RNA
long non-coding RNA
human embryonic stem cell
fragments per kilobase of transcript per million mapped reads
Gini correlation coefficient.
This work was supported by National Key Basic Research Program of China (No. 2011CBA01102) and the China National High-tech (863) Program (2006AA02Z334). The authors would like to thank Prof. Jingchu Luo for the invaluable suggestions and comments.
Funding to pay the open access publication charges for this article was provided by State Key Laboratory of Protein and Plant Gene Research Open Funding.
This article has been published as part of BMC Genomics Volume 14 Supplement 5, 2013: Twelfth International Conference on Bioinformatics (InCoB2013): Computational biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/14/S5.
- Amit M, Carpenter MK, Inokuma MS, Chiu CP, Harris CP, Waknitz MA, Itskovitz-Eldor J, Thomson JA: Clonally derived human embryonic stem cell lines maintain pluripotency and proliferative potential for prolonged periods of culture. Dev Biol. 2000, 227 (2): 271-278. 10.1006/dbio.2000.9912.PubMedView ArticleGoogle Scholar
- Thomson JA, Itskovitz-Eldor J, Shapiro SS, Waknitz MA, Swiergiel JJ, Marshall VS, Jones JM: Embryonic stem cell lines derived from human blastocysts. Science. 1998, 282 (5391): 1145-1147.PubMedView ArticleGoogle Scholar
- Vazin T, Freed WJ: Human embryonic stem cells: derivation, culture, and differentiation: a review. Restor Neurol Neurosci. 2010, 28 (4): 589-603.PubMedPubMed CentralGoogle Scholar
- Ben-David U, Kopper O, Benvenisty N: Expanding the boundaries of embryonic stem cells. Cell Stem Cell. 2012, 10 (6): 666-677. 10.1016/j.stem.2012.05.003.PubMedView ArticleGoogle Scholar
- Hay DC, Sutherland L, Clark J, Burdon T: Oct-4 knockdown induces similar patterns of endoderm and trophoblast differentiation markers in human and mouse embryonic stem cells. Stem Cells. 2004, 22 (2): 225-235. 10.1634/stemcells.22-2-225.PubMedView ArticleGoogle Scholar
- Zaehres H, Lensch MW, Daheron L, Stewart SA, Itskovitz-Eldor J, Daley GQ: High-efficiency RNA interference in human embryonic stem cells. Stem Cells. 2005, 23 (3): 299-305. 10.1634/stemcells.2004-0252.PubMedView ArticleGoogle Scholar
- Qiu C, Ma Y, Wang J, Peng S, Huang Y: Lin28-mediated post-transcriptional regulation of Oct4 expression in human embryonic stem cells. Nucleic Acids Res. 2010, 38 (4): 1240-1248. 10.1093/nar/gkp1071.PubMedPubMed CentralView ArticleGoogle Scholar
- Peng S, Chen LL, Lei XX, Yang L, Lin H, Carmichael GG, Huang Y: Genome-wide studies reveal that Lin28 enhances the translation of genes important for growth and survival of human embryonic stem cells. Stem Cells. 2011, 29 (3): 496-504. 10.1002/stem.591.PubMedView ArticleGoogle Scholar
- Chia NY, Chan YS, Feng B, Lu X, Orlov YL, Moreau D, Kumar P, Yang L, Jiang J, Lau MS, et al: A genome-wide RNAi screen reveals determinants of human embryonic stem cell identity. Nature. 2010, 468 (7321): 316-320. 10.1038/nature09531.PubMedView ArticleGoogle Scholar
- Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G, Young G, Lucas AB, Ach R, Bruhn L, et al: lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature. 2011, 477 (7364): 295-300. 10.1038/nature10398.PubMedPubMed CentralView ArticleGoogle Scholar
- Ng SY, Johnson R, Stanton LW: Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 2012, 31 (3): 522-533.PubMedPubMed CentralView ArticleGoogle Scholar
- Loewer S, Cabili MN, Guttman M, Loh YH, Thomas K, Park IH, Garber M, Curran M, Onder T, Agarwal S, et al: Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat Genet. 2010, 42 (12): 1113-1117. 10.1038/ng.710.PubMedPubMed CentralView ArticleGoogle Scholar
- Dinger ME, Amaral PP, Mercer TR, Pang KC, Bruce SJ, Gardiner BB, Askarian-Amiri ME, Ru K, Solda G, Simons C, et al: Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res. 2008, 18 (9): 1433-1445. 10.1101/gr.078378.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Sheik Mohamed J, Gaughwin PM, Lim B, Robson P, Lipovich L: Conserved long noncoding RNAs transcriptionally regulated by Oct4 and Nanog modulate pluripotency in mouse embryonic stem cells. RNA. 2010, 16 (2): 324-337. 10.1261/rna.1441510.PubMedPubMed CentralView ArticleGoogle Scholar
- Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, et al: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009, 458 (7235): 223-227. 10.1038/nature07672.PubMedPubMed CentralView ArticleGoogle Scholar
- Pauli A, Rinn JL, Schier AF: Non-coding RNAs as regulators of embryogenesis. Nat Rev Genet. 2011, 12 (2): 136-149. 10.1038/nrg2904.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28 (5): 511-515. 10.1038/nbt.1621.PubMedPubMed CentralView ArticleGoogle Scholar
- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL: Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011, 25 (18): 1915-1927. 10.1101/gad.17446611.PubMedPubMed CentralView ArticleGoogle Scholar
- Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, et al: The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 2012, 22 (9): 1775-1789. 10.1101/gr.132159.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, et al: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007, 448 (7153): 553-560. 10.1038/nature06008.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou VW, Goren A, Bernstein BE: Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet. 2011, 12 (1): 7-18.PubMedView ArticleGoogle Scholar
- Chambers I, Colby D, Robertson M, Nichols J, Lee S, Tweedie S, Smith A: Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell. 2003, 113 (5): 643-655. 10.1016/S0092-8674(03)00392-1.PubMedView ArticleGoogle Scholar
- Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, et al: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122 (6): 947-956. 10.1016/j.cell.2005.08.020.PubMedPubMed CentralView ArticleGoogle Scholar
- Guo X, Gao L, Liao Q, Xiao H, Ma X, Yang X, Luo H, Zhao G, Bu D, Jiao F, et al: Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic Acids Res. 2013, 41 (2): e35-10.1093/nar/gks967.PubMedPubMed CentralView ArticleGoogle Scholar
- Rinn JL, Chang HY: Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012, 81: 145-166. 10.1146/annurev-biochem-051410-092902.PubMedView ArticleGoogle Scholar
- Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP: Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011, 147 (7): 1537-1550. 10.1016/j.cell.2011.11.055.PubMedPubMed CentralView ArticleGoogle Scholar
- Chodroff RA, Goodstadt L, Sirey TM, Oliver PL, Davies KE, Green ED, Molnar Z, Ponting CP: Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes. Genome Biol. 2010, 11 (7): R72-10.1186/gb-2010-11-7-r72.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelley DR, Rinn JL: Transposable elements reveal a stem cell specific class of long noncoding RNAs. Genome Biol. 2012, 13 (11): R107-10.1186/gb-2012-13-11-r107.PubMedPubMed CentralView ArticleGoogle Scholar
- Ward LD, Kellis M: Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science. 2012, 337 (6102): 1675-1678. 10.1126/science.1225057.PubMedPubMed CentralView ArticleGoogle Scholar
- Kong L, Wang J, Zhao S, Gu X, Luo J, Gao G: ABrowse - a customizable next-generation genome browser framework. BMC Bioinformatics. 2012, 13: 2-10.1186/1471-2105-13-2.PubMedPubMed CentralView ArticleGoogle Scholar
- Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, Vandesompele J, Mestdagh P: LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 2013, 41 (Database): D246-251.PubMedPubMed CentralView ArticleGoogle Scholar
- Sigova AA, Mullen AC, Molinie B, Gupta S, Orlando DA, Guenther MG, Almada AE, Lin C, Sharp PA, Giallourakis CC, et al: Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells. Proc Natl Acad Sci USA. 2013, 110 (8): 2876-2881. 10.1073/pnas.1221904110.PubMedPubMed CentralView ArticleGoogle Scholar
- Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, et al: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 2013, 41 (Database): D64-69.PubMedPubMed CentralView ArticleGoogle Scholar
- Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A: Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010, 20 (1): 110-121. 10.1101/gr.097857.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G: CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007, 35 (Web Server): W345-349. 10.1093/nar/gkm391.PubMedPubMed CentralView ArticleGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30 (1): 207-210. 10.1093/nar/30.1.207.PubMedPubMed CentralView ArticleGoogle Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013, 41 (Database): D991-995.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012, 7 (3): 562-578. 10.1038/nprot.2012.016.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008, 9 (9): R137-10.1186/gb-2008-9-9-r137.PubMedPubMed CentralView ArticleGoogle Scholar
- Veerla S, Hoglund M: Analysis of promoter regions of co-expressed genes identified by microarray analysis. BMC Bioinformatics. 2006, 7: 384-10.1186/1471-2105-7-384.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma C, Wang X: Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis. Plant Physiol. 2012, 160 (1): 192-203. 10.1104/pp.112.201962.PubMedPubMed CentralView ArticleGoogle Scholar
- Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics. 2007, 23 (2): 257-258. 10.1093/bioinformatics/btl567.PubMedView ArticleGoogle Scholar
- Benjamini Y HY: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995, 289-300. Series B (Methodological)
- Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Annals of Statistics. 2001, 29 (4): 1165-1188. 10.1214/aos/1013699998.View ArticleGoogle Scholar
- Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, Bar-Even A, Horn-Saban S, Safran M, Domany E, et al: Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005, 21 (5): 650-659. 10.1093/bioinformatics/bti042.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.