Identification and analysis of pig chimeric mRNAs using RNA sequencing data
© Ma et al.; licensee BioMed Central Ltd. 2012
Received: 11 December 2011
Accepted: 17 August 2012
Published: 28 August 2012
Gene fusion is ubiquitous over the course of evolution. It is expected to increase the diversity and complexity of transcriptomes and proteomes through chimeric sequence segments or altered regulation. However, chimeric mRNAs in pigs remain unclear. Here we identified some chimeric mRNAs in pigs and analyzed the expression of them across individuals and breeds using RNA-sequencing data.
The present study identified 669 putative chimeric mRNAs in pigs, of which 251 chimeric candidates were detected in a set of RNA-sequencing data. The 618 candidates had clear trans-splicing sites, 537 of which obeyed the canonical GU-AG splice rule. Only two putative pig chimera variants whose fusion junction was overlapped with that of a known human chimeric mRNA were found. A set of unique chimeric events were considered middle variances in the expression across individuals and breeds, and revealed non-significant variance between sexes. Furthermore, the genomic region of the 5′ partner gene shares a similar DNA sequence with that of the 3′ partner gene for 458 putative chimeric mRNAs. The 81 of those shared DNA sequences significantly matched the known DNA-binding motifs in the JASPAR CORE database. Four DNA motifs shared in parental genomic regions had significant similarity with known human CTCF binding sites.
The present study provided detailed information on some pig chimeric mRNAs. We proposed a model that trans-acting factors, such as CTCF, induced the spatial organisation of parental genes to the same transcriptional factory so that parental genes were coordinatively transcribed to give birth to chimeric mRNAs.
KeywordsChimeric mRNA Trans-splicing RNA-sequencing CTCF Pig
Chimeric mRNAs fused by two previously separate genes located on different genomic loci may allow a limited number of genes to encode a substantially large number of mRNAs and proteins. They are expected to increase proteomic diversity through chimeric proteins or altered regulation. As a consequence, gene fusion can change the properties of precursor proteins and can even perturb normal regulatory pathways and initiate or stimulate neoplastic cell growth. A well-known example is the BCR-ABL1 fusion gene, which is the result of the chromosomal translocation t(9; 22)(q34; q11) and is responsible for 90% of chronic myelogenous leukemia cases. In this sense, chimeric genes can be used as desirable therapeutic targets for cancers. For instance, matinib mesylate (Gleevec, Novartis) can target the oncogenic kinase activity of BCR-ABL1 in chronic myeloid leukemia[2–4]. Therefore, the identification and analysis of novel chimeric genes will pave the way for a greater understanding of the role of gene fusion.
Chromosomal translocation is generally responsible for the generation of some chimeric mRNAs in cancer cells. Therefore, chimeric mRNAs are often viewed as potential diagnostic biomarkers for tumours caused by chromosomal translocation. However, a low amount of a chimeric RNA (JAZF1-JJAZ1) was detected in normal endometrial tissues, joining the JAZF1 gene on chromosome band 7p15 to the JJAZ1/SUZ12 gene on chromosome band 17q21. Chimeric RNAs and proteins are identical to those produced from a chromosomal rearrangement found in human endometrial stromal tumours. The explanation generally offered for this finding is that specific chromosomal rearrangements occur within small numbers of cells in healthy tissues. However, no rearranged bands t(7;17)(p15;q21) were detected in normal cells. Given the absence of any detectable rearranged DNA in cells producing chimeric RNAs, the obvious explanation is the rearrangement at the RNA level. After incubation of mixed extracts from a human endometrial stromal cell line and from a rhesus monkey fibroblast cell line, rhesus JAZF1 exons were joined to human JJAZ1 exons, implying that the JAZF1-JJAZ1 RNA is a result of trans-splicing.
In eukaryotes, trans-splicing is a special event in RNA processing where exons from two different primary RNA transcripts are joined from end to end and then ligated. In simulating the RNA cis-splicing mechanism, a cDNA is thought to be generated by trans-splicing when it is aligned to multiple non-contiguous genomic loci and the fusion junction obeys canonical GU-AG splice site. However, how precursor genes find each other before splicing remains to be elucidated, and where the trans-splicing event takes place is still poorly understood.
Some chimeras are derived from a non-spliceosome mechanism. Short homologous sequences are proposed to be associated with the generation of chimeric mRNAs in eukaryotes, suggesting that the ‘misaligns’ of short homologous sequences could guide the chromosomal interaction for the proximity of distal genes. In addition, read-through/splicing is another way of generating chimeric mRNAs[8–11]. In this process, an mRNA starts from the upstream gene, reads through intergenic regions, and ends at a termination point of the adjacent downstream gene, with the region in between removed by splicing. However, read-through/splicing cannot explain the chimeras derived from different chromosomes or opposite strands. Some chimeric mRNAs may have originated from the strand-switching feature of the reverse transcripatase. In some cases, chimeric mRNAs are considered as artefacts from the reverse transcription polymerase chain reaction (RT-PCR).
The presence of chimeric mRNAs in normal cells is a critical issue because the important pathways in normal cells would be disrupted by the potential therapy targeting chimeric mRNAs and proteins. The identification of chimeric mRNAs in normal cells will provide a wealth of biological information for this issue. The pig (Sus scrofa) is an economically important species and a potential medical model for some human health issues. Therefore, research on chimeric mRNAs in normal cells can benefit from pigs. Results from the present study provide the first broad overview of chimeric mRNAs in pigs, and their analysis in normal tissue will aid in the further understanding of the molecular mechanisms of gene fusions.
Identification of putative chimeric RNAs
For the confirmation of a hybrid transcript candidate, we inspected whether the fusion point corresponded to a pair of known splice sites. We separately extracted the chromosomal DNA sequences of the 5′ and 3′ partners of an inferred chimera and then connected the two non-contiguous genomic sequences to an artificially fused genomic sequence. Each inferred chimera was aligned to the corresponding artificially fused genomic sequence using the SIM4 program to take into account consensus splice signals. The alignment around the fusion point was checked. Only the fusion points that were aligned precisely, without a gap or overlap, were retained. In addition, the reading frame must have structural integrity. Finally, 618 candidates had clear trans-splicing sites, 537 of which obeyed the GT/AG rule (Additional file1).
To confirm further the trans-splicing events, 48 chimeric candidates were randomly selected for the RT-PCR assay using RNA from a number of tissues (see Methods). An RT-PCR product was required to span the fusion point. Through this assay, 36 out of randomly selected candidates showed identity with the expected fusion sequences (Additional file2). Given that the transcription of mRNAs may vary in different tissues or stages of life, the selected samples for the RT-PCR assay may not be suitable for their expression. In addition, all mRNAs used in the present study have prior biological studies annotated in databases of the UCSC and the NCBI (National Centre for Biotechnology Information). Thus, the rate detected by the RT-PCR assay might underestimate the positive rate of chimeric mRNA identification. The use of expressed sequence tag (EST) and RNA-sequencing data from more tissues or stages would supply the gaps of the RT-PCR assay.
Putative chimeras were aligned to ESTs downloaded from the UCSC database to seek support from external experimental evidences and verify the putative fusion junctions. If at least 20 nt of the sequence on either side of a putative fusion point overlap with the ESTs, this candidate was retained for further analysis. The 431 candidates were supported by at least three ESTs (Additional file3).
Mapping putative pig chimeras to known human chimeric transcripts
Putative pig chimeric mRNAs were aligned to known human chimeric transcripts annotated in the chimera database (ChimeraDB 2.0) to estimate the relationship between two kinds of transcripts. The fusion junctions of 21 putative pig chimeric mRNAs were matched to known human chimeric mRNAs (Additional file4). However, only two putative pig chimera variants (AK239284 and AK349030) whose fusion junction was overlapped with that of a known human chimeric mRNA (AML1/AMP19 fusion gene) were found.
Validation by transcriptome sequencing
We collected 396.2 million sequence reads from the transcriptome sequencing of liver tissue samples from 11 adult Bama miniature pigs (five males and six females, Additional file5). This procedure was done to verify that the putative chimeric mRNAs were real expressed genes rather than involved in exonic coding sequences shared among multiple genes or homologous pseudogenes. The Illumina Genome Anlayzer II was employed to sequence these samples. Two length types of single-end reads, 76 and 101 nt, were generated (Additional file5). For the uniformity of the read length, 101 nt reads were trimmed to 76 nt from a low-quality (right) end, which would increase the quality of 101 nt reads.
Estimation was further performed on the validity of junction reads that overlapped fusion points with a minimum of 5 nt. In the present study, reads were trimmed to 76 nt. Therefore, the length of fusion junctions was 142 nt (71 nt on either side of the fusion junction) by requiring a 5 nt overhang for read mapping fusion points. If the start position of a read located in the region from the 1st to the 67th nt of the fusion junction, the read was termed as a junction read. In this estimation, reads from 11 liver samples were pooled together. The 496 fusion junctions were matched by at least one read. Among these junctions, 89.3% (443/496) were overlapped by at least three reads and 89.7% (440/496) were validated by reads starting from at least three different positions (Additional file7).
Variation of expression among individuals
We used a cut-off that required junction events to be present in all samples and unique without an overlap with other chimeras to access further the differential expression of unique chimeric mRNAs without the confounding issues of tissues. This cut-off resulted in 87 unique chimeric events. The dispersion of the expression of each unique chimeric event across the samples was measured using the coefficient of variation (CV), the percentage ratio of the sample standard deviation to the sample mean of the junction reads for each event. Figure2B represents the distribution of junction events along the CVs. The mean of the CVs was 57%, with a standard deviation of 14%, following a normal distribution (P>0.57, Kolmogorov-Smirinov test). This result implies that most of these unique chimeric events were considered middle variances in the expression.
Variation of expression among pig breeds
More attention was given to the variation in the expression of chimeric events among the pig breeds. A set of 49 nt single-end reads from three RNA-pooling samples of skeletal muscle was analysed in the same way as those from liver samples (Additional file5, Additional file6). These samples were obtained during embryo collection at slaughter. The first, second, and last samples were pooled using equivalent amounts of RNA from three adult female Wuzhishan, Tongcheng, and Landrace pigs, respectively. These samples may remove the difference among female individuals to some extent. The mean of the CVs was 35%, with a standard deviation of 19%, spanning 0% to 89% and following a normal distribution (P>0.61, Kolmogorov-Smirinov test).
DNA motif in the genomic region of chimeras
Distribution of potential shared DNA motifs in genomic regions
Subsequently, these shared sequences were submitted to the TOMTOM software in the MEME suite (4.6.1) for comparison against the database of known motifs. This database is the JASPAR CORE (version 12-Oct-2009) that contains a curated, non-redundant set of profiles derived from published collections of experimentally defined transcription factor binding sites for multi-cellular eukaryotes. The 81 shared sequences significantly matched known DNA motifs in the JASPAR CORE database (P<0.00065 and false motif discovery rate < 0.05, Additional file9). Among these matched sequences, 6 were shared in the upstream regions of both partners (P<0.00009 and false motif discovery rate < 0.042). This finding suggests that the same or similar transcription factors would bind these potential shared DNA motifs to coordinate the transcription of parental genes, which may be necessary in generating chimeric mRNAs.
The CCCTC-binding factor (CTCF) is a versatile trans-acting factor that binds distal regulatory elements such as enhancers, and CTCF binding sites are commonly distributed along the vertebrate genomes[22–26]. Thus, we placed efforts on computationally identifying potential CTCT binding sites shared in two non-continuous genomic regions of chimeric mRNAs. Four DNA motifs shared in parental genomic regions were significantly similar with known human CTCF binding sites (P<0.014 and false motif discovery rate < 0.029, Additional file10). This result suggests that some trans-acting factors, such as the CTCT-binding factor, might bind these shared motifs to facilitate the approximation of the distal genomic parts and make up the subcellular environment for the generation of chimeric mRNAs. Communication between distal chromosomal elements would be an origin for the nuclear processes of gene fusions.
Several factors including strand-switching, deep sequencing errors, or reference genome errors would result in false positive results. Therefore, we rigorously inspected each chimera using several criteria. First, all the mRNAs used in the present study have prior biological information annotated in the UCSC and NCBI databases to avoid reference genome errors as much as possible. To remove false results from homologous, paralogous, or random spurious hits, strict filtering was performed on the highly qualitative alignments of mRNAs to the S. scrofa chromosomes. Trans-splicing sites were then inspected for each candidate to exclude strand-switching or the random connection of two cDNAs. In addition, 14 independent samples were used to evaluate the expression of the fusion transcripts. We could not completely rule out the possibility of the creation of a false fusion in the process of cDNA library construction. However, random breakage and rejoining of two cDNAs are unlikely to happen at the exact exon boundaries of two genes and simultaneously in multiple samples. Thus, although the present identification of chimeric RNAs filters out some genuine fusion gene transcripts by stringent cut-offs, it is conservative and reliable.
During transcription in vivo, different genes frequently share the same transcription factory where nascent RNA production and RNA polymerase II seem to be localised[27, 28]. For example, the Igh on chromosome 12 is preferentially recruited to the same transcription factory where the Myc gene on chromosome 15 is highly transcribed. Many active genes can dynamically co-localise to shared sites of ongoing transcription, which may be induced by the classical effectors of gene expression including trans-acting factors, enhancers, chromatin modifications, and chromosomal interaction. For example, CTCF can create the dynamic nature of nuclear spatial organisation of different genes by binding to the elements on distal genomic regions or different chromosomes[25, 30, 31]. The recruitment of different genes into shared factories is expected to have a fundamental role in gene expression, which may efficiently share limited resources or perhaps coordinate the transcription of different genes.
As earlier common computational methods for identifying precursor genes, a gene with the best alignments to a chimeric mRNA was considered as the precursor gene. However, exons often overlap exons for some cases. For example, the 5′ partner of the chimeric mRNA AK343294 was precisely mapped on the exons of mRNAs AK233826, AK231250, and AK346646 in chromosome 5. Therefore, the precursor mRNA would be discretionary if multiple transcriptional start sites were present. Furthermore, the partners of chimeric mRNAs may be transcribed independently at their own transcriptional start sites that are not associated with other genes. Thus, the selection of which variant would serve as the precursor gene would need more molecular experimental identifications.
The present study provided detailed information on pig chimeric mRNAs and further analysed the expression of unique chimeras among samples. Interestingly, similar DNA sequences widely shared in the two non-continuously genomic regions of chimeric mRNAs. Similar DNA sequences that share in the upstream regions of both partners significantly matched the known transcription factor binding sites in the JASPAR CORE database, suggesting the potential coordinated transcription of the parental genes. In addition, possible CTCF binding sites were also observed in the parental genomic regions. We supposed that trans-acting factors, such as CTCF, would induce the spatial organisation of parental genes to the same transcriptional factory so that parental genes would be coordinatively transcribed to give birth to chimeric mRNAs. Although this hypothesis needs further experimental evidence, it will provide useful information for the investigation of the mechanism for the generation of chimeric mRNAs. Overall, our results will aid in the further understanding of chimeric mRNAs.
Chimeric mRNA identification
The BED format table of all pig mRNAs were analysed for further study using the Galaxy[33–35] in the UCSC Table Browser (February 2011). According to the annotation of that table, GenBank pig mRNAs were aligned against the pig genome (SGSC Sscrofa9.2/susScr2, Nov. 2009) using the Blat program. The alignment with the highest base identity was found when a single mRNA was aligned in multiple places. Only alignments with a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept (http://genome.ucsc.edu/). An entry in that BED table annotates a chromosomal locus of an mRNA. We extracted mRNAs aligned to two non-contiguous loci. We required alignments from non-contiguous loci without long similar sequences at the putative junction sites to remove homologous, paralogous, or random spurious hits. In accommodating small errors in alignment that occur at the edges of the alignment, we only allowed overlaps or gaps of up to 10 nt within the fusion junction. Using the Circos software, we represented the genome-wide distribution of putative chimeric mRNAs in Figure1.
To validate putative chimeras by external experimental evidence, we aligned predicted chimeras to the EST sequences downloaded from the UCSC (May 2012) using the BLAST program (Basic Local Alignment Search Tool, version 2.2.26+)[38–40] with default parameters except at least 96% base identity. The candidate was retained for analysis when at least 20 nt of the sequence on either side of a putative fusion point overlapped ESTs. To compare with known human chimeras, we aligned pig putative chimeras to human chimeric mRNAs downloaded from the ChimeraDB 2.0 using the BLAST with default parameters (May 2012).
Inspection of splice sites
As previously described, we prepared an artificially fused genomic DNA sequence for putative chimeras by joining the genomic sequences of the 5′ and 3′ partners. The fusion transcript candidate was then aligned to the corresponding artificially fused genomic sequence using the SIM4 program (version 2002-03-03) with default parameters. The alignment around the fusion point was inspected to take into account consensus splice signals.
Validation by RT-PCR
We obtained total RNAs from Tongcheng pig tissues (liver, kidney, spleen, heart, lung, testis, ovary, embryo, skeletal muscle, small and large intestine) using the RNA Extraction Kit (BioTeke). The cDNA was prepared by reverse transcription using the Strand cDNA Synthesis Kit (BioTeke) with random hexamer priming and oligo dT’s. PCR products covering the junction position were amplified using primers designed according to the hybrid transcripts (Additional file2). PCR amplification was performed using the following thermocycling protocol: initial denaturation at 95°C for 4 min, followed by 30 cycles of denaturation at 95°C for 30 s, annealing at 60°C for 30 s, and elongation at 72°C for 30 s. The PCR products were then analyzed, cloned, and sequenced.
Validation by RNA-seq data
Up to 400 million sequence reads from deep sequencing the transcriptome of pigs were recently acquired in our lab. In brief, the following steps were used for transcriptome sequencing using the Illumina Genome Analyser II at Shanghai Biotechnology Co., Ltd. We isolated mRNA from 10 μg of total RNA with an RNA integrity number (RIN) ≥ 8. The isolated mRNA was then fragmented and converted into double-stranded cDNA. The ends of cDNA were ligated to adapters. The fragments with 200 to 300 base pairs in length were amplified by PCR to make a library. Finally, the library was sequenced to yield single-end reads.
A set of reads was derived from the transcriptome of the liver tissue samples obtained from 11 adult Bama miniature pigs (five males and six females, Additional file5). Reads with a Phred quality score lower than 20 were filtered out. The length of the reads from eight pigs was 76 nt, whereas that from the other three pigs was 101 nt. To obtain uniform lengths of reads, the 101 nt were trimmed from the low-quality (right) end of each read to only 76 nt before mapping. The remaining reads were aligned to the pig genome (SGSC Sscrofa9.2/susScr2, Nov. 2009) using the Bowtie software (version 0.12.8) with default parameters except maximum two mismatches, unique mapping, and trimming from 101 to 76 nt for the three samples.
The present version of the Bowtie program (version 0.12.8) does not report gapped alignments. Thus, a read mapped on the genome was derived from a contiguous locus in the genome. However, some unmapped reads may arise from non-contiguous genomic loci, making them suitable for inspecting splice junctions. The unmapped reads were further aligned to the putative chimeric mRNAs by the Bowtie program with default parameters except maximum two mismatches and trimming from 101 to 76 nt for the three samples. The previously unmapped reads that were matched on the putative junctions with an overlap of at least 5 nt on either side of the RNA junction were remained for further analysis.
Another set of 49 nt single-end reads from three equivalently pooled RNA samples of skeletal muscle was analyzed as described above (Additional file5). These samples were extracted during embryo collection at slaughter. The first, second, and last samples were pooled using equivalent amounts of RNA from three adult female Wuzhishan, Tongcheng, and Landrace pigs, respectively.
CV was calculated to represent the variance in the expression. The reads uniquely mapped on the pig genome and the junction reads were pooled together to reveal the read coverage along the transcript. The RS test was used to evaluate the difference in the expression levels between the male and female samples.
DNA motif identification
The MEME software (version 4.6.1) with default parameters (except DNA alphabet, zero or one occurrence of each motif per sequence, motif width between 10 and 30 nt, and maximum one motif to find) were used to search similar DNA sequences within two non-continuous genomic sequences of chimeric mRNAs. Then, using the TOMTOM tool, similar DNA sequences were compared with the database of 476 known motifs, the JASPAR CORE (version 12-Oct-2009).
We thank Joshua Liao for his advice and two anonymous reviewers for their helpful suggestions on the manuscript. We also thank the China Postdoctoral Science Foundation (20110490045), the National Natural Science Foundation of China (30830080 and 31172189), and the Science Foundation of the Shihezi University (RCZX201137) for their support.
- Pane F, Intrieri M, Quintarelli C, Izzo B, Muccioli GC, Salvatore F: BCR/ABL genes and leukemic phenotype: from molecular mechanisms to clinical correlations. Oncogene. 2002, 21: 8652-8667. 10.1038/sj.onc.1206194.View ArticlePubMedGoogle Scholar
- Druker BJ, Guilhot F, O’Brien SG, Gathmann I, Kantarjian H, Gattermann N, Deininger MW, Silver RT, Goldman JM, Stone RM, et al: Five-year follow-up of patients receiving imatinib for chronic myeloid leukemia. N Engl J Med. 2006, 355: 2408-2417. 10.1056/NEJMoa062867.View ArticlePubMedGoogle Scholar
- Druker BJ, Tamura S, Buchdunger E, Ohno S, Segal GM, Fanning S, Zimmermann J, Lydon NB: Effects of a selective inhibitor of the Abl tyrosine kinase on the growth of Bcr-Abl positive cells. Nat Med. 1996, 2: 561-566. 10.1038/nm0596-561.View ArticlePubMedGoogle Scholar
- Kantarjian H, Sawyers C, Hochhaus A, Guilhot F, Schiffer C, Gambacorti-Passerini C, Niederwieser D, Resta D, Capdeville R, Zoellner U, et al: Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N Engl J Med. 2002, 346: 645-652. 10.1056/NEJMoa011573.View ArticlePubMedGoogle Scholar
- Li H, Wang J, Mor G, Sklar J: A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science. 2008, 321: 1357-1361. 10.1126/science.1156725.View ArticlePubMedGoogle Scholar
- Di Segni G, Gastaldi S, Tocchini-Valentini GP: Cis-and trans-splicing of mRNAs mediated by tRNA sequences in eukaryotic cells. Proc Natl Acad Sci. 2008, 105: 6864-10.1073/pnas.0800420105.PubMed CentralView ArticlePubMedGoogle Scholar
- Li X, Zhao L, Jiang H, Wang W: Short homologous sequences are strongly associated with the generation of chimeric RNAs in eukaryotes. J Mol Evol. 2009, 68: 56-65. 10.1007/s00239-008-9187-0.View ArticlePubMedGoogle Scholar
- Magrangeas F, Pitiot G, Dubois S, Bragado-Nilsson E, Cherel M, Jobert S, Lebeau B, Boisteau O, Lethe B, Mallet J, et al: Cotranscription and intergenic splicing of human galactose-1-phosphate uridylyltransferase and interleukin-11 receptor alpha-chain genes generate a fusion mRNA in normal cells. Implication for the production of multidomain proteins during evolution. J Biol Chem. 1998, 273: 16005-16010. 10.1074/jbc.273.26.16005.View ArticlePubMedGoogle Scholar
- Communi D, Suarez-Huerta N, Dussossoy D, Savi P, Boeynaems JM: Cotranscription and intergenic splicing of human P2Y11 and SSF1 genes. J Biol Chem. 2001, 276: 16561-16566. 10.1074/jbc.M009609200.View ArticlePubMedGoogle Scholar
- Moore RC, Lee IY, Silverman GL, Harrison PM, Strome R, Heinrich C, Karunaratne A, Pasternak SH, Chishti MA, Liang Y, et al: Ataxia in prion protein (PrP)-deficient mice is associated with upregulation of the novel PrP-like protein doppel. J Mol Biol. 1999, 292: 797-817. 10.1006/jmbi.1999.3108.View ArticlePubMedGoogle Scholar
- Finta C, Zaphiropoulos PG: The human cytochrome P450 3A locus. Gene evolution by capture of downstream exons. Gene. 2000, 260: 13-23. 10.1016/S0378-1119(00)00470-4.View ArticlePubMedGoogle Scholar
- McManus CJ, Duff MO, Eipper-Mains J, Graveley BR: Global analysis of trans-splicing in Drosophila. Proc Natl Acad Sci U S A. 2010, 107: 12975-12979. 10.1073/pnas.1007586107.PubMed CentralView ArticlePubMedGoogle Scholar
- Wernersson R, Schierup MH, Jorgensen FG, Gorodkin J, Panitz F, Staerfeldt HH, Christensen OF, Mailund T, Hornshoj H, Klein A, et al: Pigs in sequence space: a 0.66X coverage pig genome survey based on shotgun sequencing. BMC Genomics. 2005, 6: 70-10.1186/1471-2164-6-70.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006.PubMed CentralView ArticlePubMedGoogle Scholar
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A: The UCSC genome browser database: update 2011. Nucleic Acids Res. 2011, 39: D876-D882. 10.1093/nar/gkq963.PubMed CentralView ArticlePubMedGoogle Scholar
- Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W: A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res. 1998, 8: 967-974.PubMed CentralPubMedGoogle Scholar
- Kim N, Kim P, Nam S, Shin S, Lee S: ChimerDB—a knowledgebase for fusion sequences. Nucleic Acids Res. 2006, 34: D21-D24. 10.1093/nar/gkj019.PubMed CentralView ArticlePubMedGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMed CentralView ArticlePubMedGoogle Scholar
- Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009, 37: W202-W208. 10.1093/nar/gkp335.PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS: Quantifying similarity between motifs. Genome Biol. 2007, 8: R24-10.1186/gb-2007-8-2-r24.PubMed CentralView ArticlePubMedGoogle Scholar
- Bryne JC, Valen E, Tang MHE, Marstrand T, Winther O, Da Piedade I, Krogh A, Lenhard B, Sandelin A: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 2008, 36: D102-D106. 10.1093/nar/gkn449.PubMed CentralView ArticlePubMedGoogle Scholar
- Wallace JA, Felsenfeld G: We gather together: insulators and genome organization. Curr Opin Genet Dev. 2007, 17: 400-407. 10.1016/j.gde.2007.08.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Ohlsson R, Renkawitz R, Lobanenkov V: CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet. 2001, 17: 520-527. 10.1016/S0168-9525(01)02366-6.View ArticlePubMedGoogle Scholar
- Klenova EM, Morse HC, Ohlsson R, Lobanenkov VV: The novel BORIS+ CTCF gene family is uniquely involved in the epigenetics of normal biology and cancer. Seminars in Cancer Biology. 2002, 12: 399-414. 10.1016/S1044-579X(02)00060-3.View ArticlePubMedGoogle Scholar
- Dunn KL, Davie JR: The many roles of the transcriptional regulator CTCF. Biochem Cell Biol. 2003, 81: 161-167. 10.1139/o03-052.View ArticlePubMedGoogle Scholar
- Kim TH, Abdullaev ZK, Smith AD, Ching KA, Loukinov DI, Green RD, Zhang MQ, Lobanenkov VV, Ren B: Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell. 2007, 128: 1231-1245. 10.1016/j.cell.2006.12.048.PubMed CentralView ArticlePubMedGoogle Scholar
- Osborne CS, Chakalova L, Brown KE, Carter D, Horton A, Debrand E, Goyenechea B, Mitchell JA, Lopes S, Reik W, Fraser P: Active genes dynamically colocalize to shared sites of ongoing transcription. Nat Genet. 2004, 36: 1065-1071. 10.1038/ng1423.View ArticlePubMedGoogle Scholar
- Gingeras TR: Implications of chimaeric non-co-linear transcripts. Nature. 2009, 461: 206-211. 10.1038/nature08452.PubMed CentralView ArticlePubMedGoogle Scholar
- Osborne CS, Chakalova L, Mitchell JA, Horton A, Wood AL, Bolland DJ, Corcoran AE, Fraser P: Myc dynamically and preferentially relocates to a transcription factory occupied by Igh. PLoS Biol. 2007, 5: e192-10.1371/journal.pbio.0050192.PubMed CentralView ArticlePubMedGoogle Scholar
- Ren L, Shi M, Wang Y, Yang Z, Wang X, Zhao Z: CTCF and cohesin cooperatively mediate the cell-type specific interchromatin interaction between Bcl11b and Arhgap6 loci. Mol Cell Biochem. 2012, 360: 243-251. 10.1007/s11010-011-1063-3.View ArticlePubMedGoogle Scholar
- Ren L, Wang Y, Shi M, Wang X, Yang Z, Zhao Z: CTCF Mediates the Cell-Type Specific Spatial Organization of the Kcnq5 Locus and the Local Gene Regulation. PLoS One. 2012, 7: e31416-10.1371/journal.pone.0031416.PubMed CentralView ArticlePubMedGoogle Scholar
- Filippova GN, Qi CF, Ulmer JE, Moore JM, Ward MD, Hu YJ, Loukinov DI, Pugacheva EM, Klenova EM, Grundy P: Tumor-associated zinc finger mutations in the CTCF transcription factor selectively alter its DNA-binding specificity. Cancer research. 2002, 62: 48-PubMedGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.PubMed CentralView ArticlePubMedGoogle Scholar
- Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. 2010, 10: 11-21. Chapter 19:Unit 19Google Scholar
- Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, et al: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005, 15: 1451-1455. 10.1101/gr.4086505.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ: BLAT—the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMed CentralView ArticlePubMedGoogle Scholar
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res. 2009, 19: 1639-1645. 10.1101/gr.092759.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.View ArticlePubMedGoogle Scholar
- Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol. 2000, 7: 203-214. 10.1089/10665270050081478.View ArticlePubMedGoogle Scholar
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL: BLAST+: architecture and applications. BMC Bioinforma. 2009, 10: 421-10.1186/1471-2105-10-421.View ArticleGoogle Scholar
- Hahn Y, Bera TK, Gehlhaus K, Kirsch IR, Pastan IH, Lee B: Finding fusion genes resulting from chromosome rearrangement by analyzing the expressed sequence databases. Proc Natl Acad Sci U S A. 2004, 101: 13257-13261. 10.1073/pnas.0405490101.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.