- Research article
- Open Access
Generation of a de novo transcriptome from equine lamellar tissue
© Holl et al. 2015
- Received: 30 January 2015
- Accepted: 22 September 2015
- Published: 3 October 2015
Laminitis, the structural failure of interdigitated tissue that suspends the distal skeleton within the hoof capsule, is a devastating disease that is the second leading cause of both lameness and euthanasia in the horse. Current transcriptomic research focuses on the expression of known genes. However, as this tissue is quite unique and equine gene annotation is largely derived from computational predictions, there are likely yet uncharacterized transcripts that may be involved in the etiology of laminitis. In order to create a novel annotation resource, we performed whole transcriptome sequencing of sagittal lamellar sections from one control and two laminitis affected horses.
Whole transcriptome sequencing of the three samples resulted in 113 million reads. Overall, 88 % of the reads mapped to the equCab2 reference genome, allowing for the identification of 119,430 SNPs. The de novo assembly generated around 75,000 transcripts, of which 36,000 corresponded to known annotations. Annotated transcript models are hosted in a public data repository and thus can be easily accessed or loaded into genome browsers. RT-PCR of 12 selected assemblies confirmed structure and expression in lamellar tissue.
Transcriptome sequencing represents a powerful tool to expand on equine annotation and identify novel targets for further laminitis research.
Laminae are interdigitated dermal and epidermal tissues found in the hooves of livestock that form the attachment to the distal skeleton. Equids have an additional specialization in the form of secondary laminae that project from the primary laminae which further increase the surface area and thus strengthen this connection . The junction between dermal and epidermal laminae must be strong enough to withstand the forces of weight bearing and motion without separation, while providing sufficient flexibility to absorb concussive forces and allow growth. Inflammation of the laminae (laminitis) is a devastating disease that can lead to separation of these tissues and a rotation of the third phalanx (P3) away from the hoof wall.
The etiology of laminitis is poorly understood. Many risk factors have been identified in the horse, including inflammation in other parts of the body, sepsis, metabolic conditions, or mechanical stress . Currently, as there are very few treatments available, prevention through avoiding known risk factors is recommended. In the early stages of laminitis (either pre-clinical symptoms or at the onset of lameness), prolonged cooling of the hooves in ice water has been shown to reduce severity of the disease and prevent separation of the laminae . However, if adequate treatment is not provided promptly, euthanasia is often the result. A study from the United States Department of Agriculture in 1998 estimated the annual cost of lameness at $678 million, with laminitis accounting for 15 % of the reported cases . The American Association of Equine Practitioners has specifically identified laminitis as the disease most frequently reported as needing more research .
Several methods have been devised to experimentally induce laminitis, including carbohydrate overload, oligofructose overload, and black walnut extract administration. Although all of these models will result in the disease, key differences in physiological response (as compared to the natural etiology) have been demonstrated [6, 7]. However, as natural cases can be much more difficult to acquire, these models continue to serve an important role in research.
Gene expression has been applied in studies to better understand the disease process. However, much of this research has focused on the expression of few known genes, using qPCR to target specific pathways [8–11]. Only two studies have attempted a transcriptome-wide view of laminitis. The first commercially available whole-transcriptome equine-specific microarray was not published until 2009, therefore early studies attempted two different approaches. The first study chose to use cross-species hybridization with the bovine gene expression chip, identifying 155 out of the 15,000 genes assayed to be significantly up-regulated . They were unable to identify any down-regulated genes, which was likely due to the high false-negative rate associated with imperfect hybridization. A second study instead generated a custom equine-specific array with 3076 targets derived from leukocyte EST libraries . Less than 100 of these genes were found to have significant differential expression.
Both of these projects, and any current work utilizing microarrays, are hindered by insufficient genome annotation in the horse. The only major annotation attempt used an older sequencing technology, generating 35 bp reads from eight diverse tissue types . They identified that 48 % of genes displayed tissue-specific expression patterns, with 7 % of the genes only found in one tissue type. However, this data was not incorporated into automatic annotation pipelines for the popular genome browsers, and lamellar tissue was not included in sequencing. Using this data, the authors also demonstrated there were 428 genes completely lacking in equine annotation, even though many of these genes have data in other species .
Whole transcriptome sequencing (RNA-seq) is a promising solution for interrogation of gene structure and expression, especially in a divergent tissue like the hoof. RNA-seq is a hypothesis-free examination of all cDNA in a given sample, allowing for the identification of unique features such as unannotated transcription, splice sites, allele-specific expression, anti-sense expression, and alternative poly-adenylation [16–18]. Additionally, technical variation is reportedly low, with high reproducibility between lanes . Studies have continuously demonstrated high correlation between microarray differential expression studies and RNA-seq strategies, noting the main difference is improved sensitivity for low-abundance transcripts by RNA-seq [20, 21]. However, as RNA-seq is still considerably more expensive and computationally intense than microarrays, much mainstream research still relies on microarrays or qPCR.
The objective of this study was to produce a transcriptome resource for the study of laminitis. Given that recent studies rely heavily on qPCR, the generation of a set of equine, hoof-specific transcripts can greatly benefit in the selection of novel targets for expression studies. Current annotation is largely based on computational predictions and gene models from other species, among which there is not a good physiological model for the laminae. Additionally, while there have been a few equine RNA-seq studies, raw data is often only placed in public databases and not fully processed or curated [14, 22–24]. Thus these valuable datasets are difficult to access and may require intensive bioinformatic analysis before use in subsequent projects, and sadly are often underutilized.
Illumina sequencing and assembly
De novo assembly statistics
Total reads (100 bp)
Reads after filtering
Average read length after filtering
Isoform statistics by locus. Unigenes are clustered together into unique loci based on an overlap of at least 1 bp
Long (3+ Exon) Transcripts
Min Unigenes per Locus
Max Unigenes per Locus
Average Unigenes per Locus
Mapping statistics for RNA-seq onto the equCab2 reference genome assembly
Annotation with known gene and protein databases
Unigenes matching records in selected databases. The repeat-filtered assembly was utilized for EquCab2 alignment-based annotation
NCBI NR Protein
36,195 / 74,860a (48 %)
19,740 / 74,860a (26 %)
24,501 / 55,120b (44 %)
15,538 / 55,120b (28 %)
604 / 55,120b (1 %)
31,091 / 74,860a (42 %)
Amplification and sequencing of cDNA from putative novel transcripts
Putative novel loci validated by RT-PCR
AAA80518 T-cell receptor alpha chain (IgC TCRA) [Equus caballus]
XP_003209297 PREDICTED: intraflagellar transport protein 80 homolog [Meleagris gallopavo]
XP_001493637 PREDICTED: guanylate-binding protein 5 [Equus caballus]
XP_001917082 PREDICTED: lymphocyte antigen 6H-like [Equus caballus]
NP_663348 secreted and transmembrane protein 1A precursor [Mus musculus]
ACI67873163 Perlwapin [Salmo salar]
NP_001243909 placenta-specific protein 1 precursor [Equus caballus]
AES10462 antileukoproteinase-like protein [Mustela putorius furo]
XP_002696828 PREDICTED: uncharacterized protein LOC509029 [Bos taurus]
XP_003952339 PREDICTED: uncharacterized protein LOC101059192 [Pan troglodytes]
XP_003134963 PREDICTED: odorant-binding protein-like [Sus scrofa]
We utilized RNA-seq to successfully generate a transcriptome assembly of equine lamellar tissue. As the hoof is a specialized tissue, it likely has unique transcripts that previous annotation efforts would have missed. By pooling data from healthy and diseased tissues, we have captured loci that should be valuable to future differential expression studies. Though the varied physiological states could result in differences between each transcriptome, pooling the data prior to assembly ensures sufficient power to assemble lower expressed loci. This data set represents a valuable tool for laminitis research, providing information on both known genes expressed in the hoof, as well as a wealth of previously unannotated transcripts. The transcripts identified in this study can now be utilized with other technologies to search for novel targets with relevance to laminitis.
RNA-seq provides unprecedented power for transcript and isoform discovery. However, relatively little of this information trickles down to human-readable annotation and applied datasets useful to the average molecular biologist. While some resources now exist that attempt to bridge this gap by providing bioinformatics instruction for molecular biologists, this approach is not practical for all researchers . Our newly generated data is available in two ways. Raw reads and identified variants have been deposited in public databases, so that they may be accessed or incorporated into automated pipelines. NCBI has recently begun to advantageously incorporate RNA-seq data from the short-read archive into their RefSeq annotation pipeline, and the inclusion of additional unique tissue types is essential for robust annotation from this automated analysis. However, these updated annotations (especially computational predictions) are not always readily accessible in popular genome browsers. Therefore, we have also provided downloadable Browser Extensible Data (BED) tracks of our assembly. The first file, labeled the “full” assembly, includes models of any number of exons. We have also provided only those models with 3 or more exons (the “larger” assembly) in order to remove partial transcripts likely originating from poorly expressed loci and intronless non-coding RNAs. The BED format is small and much easier to use than the raw sequencing data itself, including only the positions of each feature (not the exact sequence). BED files also are quite easy for individual researchers to load gene model annotation into their browser of choice .
Our data also includes potential non-coding RNAs, which are an emerging field of research. As the RefSeq set is specifically designed for protein-coding genes, all other transcript types are not given accession numbers. There are existing databases of non-coding RNAs available for the human and mouse genomes, however for all other species, there are only the few (less than ten) entries manually curated from the literature . Unlike protein-coding genes, there is considerably less sequence conservation between species in non-coding RNAs, necessitating within species identification . Within non-coding RNAs, there are two main classes: small (<200 bp) and long (>200 bp) . While long non-coding RNAs are often picked up in normal RNA-seq experiments (and must be separated from protein-coding mRNAs for analysis), the smaller molecules are often excluded in normal RNA-seq library preparation, and require additional methodologies to sequence.
The function of non-coding RNAs has been the subject of recent controversy. It is debatable whether the observed RNA transcription is biologically relevant, or if transcription may simply be technical noise [37, 38]. Well documented functions for non-coding RNA include regulation of the genome (through chromatin modification, DNA binding, and protein binding) and of cellular differentiation during development [39–41]. One of the most well-known non-coding RNAs is XIST, which regulates X chromosome inactivation in females. More recently, several mutations that cause overexpression of a conserved long non-coding RNA proved to be responsible for the bovine polled phenotype . It is thus important to consider all possible RNAs in studies of differential expression, instead of only the protein-coding transcripts.
Utilization of this data in studies of laminitis could identify new targets and pathways to help further our understanding of the etiology. Whereas current veterinary methods generally can only detect laminitis at the onset of lameness, the development of biomarkers could allow for rapid identification (and thus the most effective treatment) of cases before permanent damage occurs. Future understanding of the precise pathways underlying laminitis could lead to vital novel prevention methods and treatments.
Sample collection and transcriptome sequencing
Samples were collected from four horses presented for necropsy for disposal to the Cornell University College of Veterinary Medicine (samples labeled CU). An additional two lamellar samples were provided by a collaborator (labeled LSU). Medical history was collected when available. Full-thickness, mid-sagittal hoof sections were placed on ice for transport to the lab, gross examination, and dissection of lamellar tissue. Samples were placed into RNA later (Life Technologies, Carlsbad, CA, USA) and stored at -80°C until processing.
Summary of samples used in this study. Laminitis phenotype was determined through medical history and histological examination. Demographic information was not available for most horses
Equine Metabolic Syndrome
Equine Metabolic Syndrome
RNA was extracted from approximately 60 mg of lamellar tissue using the Qiagen RNeasy kit (Qiagen Inc., Valencia, CA, USA) following manufacturer's protocols for fibrous tissue. 50 μL of RNA was DNase treated using either the Ambion Turbo DNA free kit (Life Technologies, Carlsbad, CA, USA) or Qiagen DNase I kit, followed by Qiagen RNA cleanup kit. Quantification was carried out using a NanoDrop spectrophotometer (NanoDrop Technologies LLC., Wilmington, DE, USA).
Library preparation and sequencing was performed by Cornell University's Life Sciences Core Laboratory Center. A total of 5-10 μg of RNA from each sample was submitted. Single-end libraries were constructed using manufacturer's protocols for poly-T selection and sequenced on an Illumina HiSeq 2000 (Illumina Inc., San Diego, CA, USA). Raw reads were submitted to the European Nucleotide Archive [ENA:PRJEB6100].
De novo assembly
Raw RNA-seq reads were processed in two steps. First, a custom R script (based on the ShortRead package) was used to remove adapter and barcode sequences, as well as to trim low quality (Q < 20) bases from both ends of the reads . Trimmed reads shorter than 25 bp were discarded. Second, reads were aligned to the GenBank virus (version 186) and ribosomal RNA sequence databases with BWA under default parameters . Only unmapped reads were retained for assembly.
The filtered reads from all samples were pooled and de novo assembled into contigs using Trinity with “min_kmer_cov” set to 2 . In order to remove some of the redundancy of Trinity-generated contigs, a further assembly step using iAssembler with a minimum of 99 % identity (-p) was performed . Contigs shorter than 200 bp were discarded.
All unique transcripts (unigenes) were compared to the GenBank non-redundant protein database using blastx with an E-value cutoff of 1e-5. Only the protein with the lowest E-value (and thus highest significance) was retained for further analysis.
Unigenes were also aligned to the equCab 2.0 reference genome using BLAT with parameters recommended for same-species mRNA alignments . The pslCDnaFilter tool was used to remove alignments with less than 200 bp, 98 % identity, or 50 % coverage. The resulting PSL file was converted to BED format and compared with Equine-specific repeat annotation using BEDtools intersectBed in order to filter out alignments that contained over 10 % repetitive DNA [47, 48]. Many retroviruses in the genome are expressed, but high homology among these elements often leads to chimeric and spurious assemblies, and thus creates problems for alignment-based analyses. The filtered unigenes were then compared to NCBI Non-Horse RefSeq, Horse RefSeq, and Horse Ensembl annotations using intersectBed at 10 % overlap.
Putative gene names were assigned to unigenes based on high quality matches to NCBI non-redundant databases. Two BED files were produced for use in genome browsers (one containing all transcripts and one with only large transcripts containing 3 or more exons) .
Raw sequencing reads were split by barcode and aligned to the EquCab 2.0 reference genome using BWA under default parameters. SAMtools was used to convert alignments to BAM format and to remove PCR duplicate reads . SNPs were identified with GATK using the recommended pipelines with a Q > 30 cutoff [28–30]. VCFtools was then used to filter out variants with fewer than 10 observations, followed by BEDtools to remove variants that fell outside of regions with corresponding assembly alignments . The final list of variants was pooled and submitted to NCBI dbSNP.
Analysis of putative novel loci
We screened the transcriptome assembly for novel loci with two steps. First, a second genome alignment was prepared by running RepeatMasker (using RepBase 2013-04-22 libraries) on the unigenes, then BLAT and subsequent filtering was performed as before [51, 52]. Next, the unmasked and masked alignments were compared, and unigenes that passed filtering criteria in both datasets were selected. The unmasked alignments of these unigenes were then compared to RefSeq annotation using BEDtools, and alignments with less than 5 % overlap to known annotation were labeled as putative novel loci. All matches to the unassembled chromosome (chrUn) were discarded. Although valuable novel genes are likely to be found there, the incomplete state of assembly in this region makes downstream alignment based analyses problematic.
Twelve novel genes were selected for RT-PCR validation and proof of concept based on additional criteria. ExPasy “translate” tool was used to identify open reading frames (ORFs) in these unigenes . These were then aligned back to the equCab 2.0 reference genome using BLAT, and only unigenes with ORFs spanning at least three exons on their corresponding transcript annotation were retained, thus identifying larger transcripts with significant exon/intron structure. The ORFs were then compared to the non-redundant protein database using blastp, and targets with little to no experimental data were selected for further validation.
Primers used to confirm expression of unannotated transcripts. All PCRs were performed with an annealing temperature of 62°C and an elongation time of 30s
Amplification was verified on 3 % agarose gel, and the resulting PCR products were submitted to the Cornell Core Life Sciences Laboratories Center for sequencing using standard ABI chemistry on a 3730 DNA Analyzer (Applied Biosystems Inc., Foster City, CA, USA). Amplicons were aligned to their corresponding unigenes to confirm identity using Consed .
The data sets supporting the results of this article are available in the NAGRP Host of Supplementary Data to Publications repository, http://www.animalgenome.org/repository/pub/CORNEL2015.0126/.
The authors would like to thank Dr. Laura Riggs and the Cornell University Equine Hospital for the samples used in this study. They would also like to thank Dr. Hannah Galantino-Homer for her discussions on disease pathology and Dr. David Adelson for his computational insight as well as for providing more comprehensive equine repeat annotation.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Pollitt CC. The anatomy and physiology of the suspensory apparatus of the distal phalanx. Vet Clin North Am Equine Pract. 2010;26(1):29–49.View ArticlePubMedGoogle Scholar
- Eades SC. Overview of current laminitis research. Vet Clin North Am Equine Pract. 2010;26(1):51–63.View ArticlePubMedGoogle Scholar
- van Eps AW, Pollitt CC, Underwood C, Medina-Torres CE, Goodwin WA, Belknap JK. Continuous digital hypothermia initiated after the onset of lameness prevents lamellar failure in the oligofructose laminitis model. Equine Vet J. 2014;46(5):625–30.View ArticlePubMedGoogle Scholar
- Service USAaPHI, System NAHM. National Economic Cost of Equine Lameness, Colic, and Equine Protozoal Myeloencephalitis (EPM) in the United States. Fort Collins: U.S. Department of Agriculture, APHIS; 2001.Google Scholar
- Moore RM, Bud, Jenne M. “Vision 20/20 – Conquer Laminitis by 2020”. J Equine Veterinary Sci. 2010;30(2):74–76.View ArticleGoogle Scholar
- Faleiros RR, Johnson PJ, Nuovo GJ, Messer NT, Black SJ, Belknap JK. Laminar leukocyte accumulation in horses with carbohydrate overload-induced laminitis. J Vet Intern Med. 2011;25(1):107–15.View ArticlePubMedGoogle Scholar
- Leise BS, Faleiros RR, Watts M, Johnson PJ, Black SJ, Belknap JK. Laminar inflammatory gene expression in the carbohydrate overload model of equine laminitis. Equine Vet J. 2011;43(1):54–61.View ArticlePubMedGoogle Scholar
- Kwon S, Moore JN, Robertson TP, Hurley DJ, Wagner B, Vandenplas ML. Disparate effects of LPS infusion and carbohydrate overload on inflammatory gene expression in equine laminae. Vet Immunol Immunopathol. 2013;155(1-2):1–8.View ArticlePubMedGoogle Scholar
- Steelman SM, Johnson D, Wagner B, Stokes A, Chowdhary BP. Cellular and humoral immunity in chronic equine laminitis. Vet Immunol Immunopathol. 2013;153(3-4):217–26.View ArticlePubMedGoogle Scholar
- Wang L, Pawlak EA, Johnson PJ, Belknap JK, Eades S, Stack S, et al. Impact of laminitis on the canonical Wnt signaling pathway in basal epithelial cells of the equine digital laminae. PLoS One. 2013;8(2):e56025.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang L, Pawlak EA, Johnson PJ, Belknap JK, Alfandari D, Black SJ. Expression and activity of collagenases in the digital laminae of horses with carbohydrate overload-induced acute laminitis. J Vet Intern Med. 2014;28(1):215–22.View ArticlePubMedGoogle Scholar
- Budak MT, Orsini JA, Pollitt CC, Rubinstein NA. Gene expression in the lamellar dermis-epidermis during the developmental phase of carbohydrate overload-induced laminitis in the horse. Vet Immunol Immunopathol. 2009;131(1-2):86–96.View ArticlePubMedGoogle Scholar
- Noschka E, Vandenplas ML, Hurley DJ, Moore JN. Temporal aspects of laminar gene expression during the developmental stages of equine laminitis. Vet Immunol Immunopathol. 2009;129(3-4):242–53.View ArticlePubMedGoogle Scholar
- Coleman SJ, Zeng Z, Wang K, Luo S, Khrebtukova I, Mienaltowski MJ, et al. Structural annotation of equine protein-coding genes determined by mRNA sequencing. Anim Genet. 2010;41 Suppl 2:121–30.View ArticlePubMedGoogle Scholar
- Coleman SJ, Zeng Z, Hestand MS, Liu J, Macleod JN. Analysis of unannotated equine transcripts identified by mRNA sequencing. PLoS One. 2013;8(7):e70125.PubMed CentralView ArticlePubMedGoogle Scholar
- 't Hoen PA, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RH, de Menezes RX, et al. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 2008;36(21):e141.PubMed CentralView ArticlePubMedGoogle Scholar
- Malone JH, Oliver B. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 2011;9:34.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilson LO, Spriggs A, Taylor JM, Fahrer AM. A novel splicing outcome reveals more than 2000 new mammalian protein isoforms. Bioinformatics. 2014;30(2):151–6.View ArticlePubMedGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18(9):1509–17.PubMed CentralView ArticlePubMedGoogle Scholar
- Mooney M, Bond J, Monks N, Eugster E, Cherba D, Berlinski P, et al. Comparative RNA-Seq and microarray analysis of gene expression changes in B-cell lymphomas of Canis familiaris. PLoS One. 2013;8(4):e61088.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS One. 2014;9(1), e78644.PubMed CentralView ArticlePubMedGoogle Scholar
- Park KD, Park J, Ko J, Kim BC, Kim HS, Ahn K, et al. Whole transcriptome analyses of six thoroughbred horses before and after exercise using RNA-Seq. BMC Genomics. 2012;13:473.PubMed CentralView ArticlePubMedGoogle Scholar
- Capomaccio S, Vitulo N, Verini-Supplizi A, Barcaccia G, Albiero A, D'Angelo M, et al. RNA sequencing of the exercise transcriptome in equine athletes. PLoS One. 2013;8(12):e83504.PubMed CentralView ArticlePubMedGoogle Scholar
- Iqbal K, Chitwood JL, Meyers-Brown GA, Roser JF, Ross PJ. RNA-seq transcriptome profiling of equine inner cell mass and trophectoderm. Biol Reprod. 2014;90(3):61.PubMed CentralView ArticlePubMedGoogle Scholar
- Zheng Y, Zhao L, Gao J, Fei Z. iAssembler: a package for de novo assembly of Roche-454/Sanger transcriptome sequences. BMC Bioinformatics. 2011;12:453.PubMed CentralView ArticlePubMedGoogle Scholar
- NAGRP Host of Supplementary Data to Publications. [http://www.animalgenome.org/repository/pub/CORNEL2015.0126/]
- Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, et al. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 2009;326(5954):865–7.PubMed CentralView ArticlePubMedGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303.PubMed CentralView ArticlePubMedGoogle Scholar
- DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;11(1110):11.10.11–33.Google Scholar
- Horse Genome Analysis Data Repository. [http://www.animalgenome.org/repository/horse/]
- Bradnam K, Korf I. UNIX and Perl to the Rescue!: A Field Guide for the Life Sciences (and Other Data-rich Pursuits). Cambridge: Cambridge University Press; 2012.View ArticleGoogle Scholar
- Dreszer TR, Karolchik D, Zweig AS, Hinrichs AS, Raney BJ, Kuhn RM, et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 2012;40(Database issue):D918–923.PubMed CentralView ArticlePubMedGoogle Scholar
- Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res. 2011;39(Database issue):D146–151.PubMed CentralView ArticlePubMedGoogle Scholar
- Qu Z, Adelson DL. Identification and comparative analysis of ncRNAs in human, mouse and zebrafish indicate a conserved role in regulation of genes expressed in brain. PLoS One. 2012;7(12):e52275.PubMed CentralView ArticlePubMedGoogle Scholar
- Veneziano D, Nigita G, Ferro A. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques. Front Bioeng Biotechnol. 2015;3:77.PubMed CentralView ArticlePubMedGoogle Scholar
- van Bakel H, Nislow C, Blencowe BJ, Hughes TR. Most "dark matter" transcripts are associated with known genes. PLoS Biol. 2010;8(5):e1000371.PubMed CentralView ArticlePubMedGoogle Scholar
- Kapranov P, St Laurent G. Dark Matter RNA: Existence, Function, and Controversy. Front Genet. 2012;3:60.PubMed CentralPubMedGoogle Scholar
- Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012;81:145–66.View ArticlePubMedGoogle Scholar
- Hu W, Alvarez-Dominguez JR, Lodish HF. Regulation of mammalian cell differentiation by long non-coding RNAs. EMBO Rep. 2012;13(11):971–83.PubMed CentralView ArticlePubMedGoogle Scholar
- Morán I, Akerman I, van de Bunt M, Xie R, Benazra M, Nammo T, et al. Human β cell transcriptome analysis uncovers lncRNAs that are tissue-specific, dynamically regulated, and abnormally expressed in type 2 diabetes. Cell Metab. 2012;16(4):435–48.PubMed CentralView ArticlePubMedGoogle Scholar
- Allais-Bonnet A, Grohs C, Medugorac I, Krebs S, Djari A, Graf A, et al. Novel insights into the bovine polled phenotype and horn ontogenesis in Bovidae. PLoS One. 2013;8(5):e63512.PubMed CentralView ArticlePubMedGoogle Scholar
- Morgan M, Anders S, Lawrence M, Aboyoun P, Pagès H, Gentleman R. ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics. 2009;25(19):2607–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95.PubMed CentralView ArticlePubMedGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64.PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42(Database issue):D756–763.PubMed CentralView ArticlePubMedGoogle Scholar
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Smit A, Hubley R, Green P. RepeatMasker Open-3.0. In: 1996-2010: http://www.repeatmasker.org/.
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1-4):462–7.View ArticlePubMedGoogle Scholar
- Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A. ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003;31(13):3784–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000;132:365–86.PubMedGoogle Scholar
- Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. 1998;8(3):195–202.View ArticlePubMedGoogle Scholar