Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest
© The Author(s). 2017
Published: 6 December 2017
Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Public databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One main drawback to microarray data analysis involves the selection of probes to represent a specific transcript of interest, particularly in light of the fact that transcript-specific knowledge (notably alternative splicing) is dynamic in nature.
We therefore developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, individual exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF). The analysis reveals only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current hg38 human assembly without mismatches. We also tested new mappings on the publicly available data series using rat and human data from GSE48611 and GSE72551 obtained from GEO, and illustrate that functional grouping allows for the subtle detection of regions of interest likely to have phenotypical consequences.
Through reanalysis of the publicly available data series GSE48611 and GSE72551, we profiled the contribution of UTR and CDS regions to the gene expression levels globally. The comparison between region and gene based results indicated that the detected expressed genes by gene-based and region-based CDFs show high consistency and regions based results allows us to detection of changes in transcript formation.
A DNA microarray (DNA chip or biochip) is a technology used to identify and measure the expression level of specific mRNA molecules in order to ascertain transcriptional profiles in response to differing conditions. The most commonly used microarray is the Affymetrix® GeneChip® family of arrays. Each GeneChip® consists of a silicon chip with fixed locations called cells, spots or features . Each spot contains millions of identical 25 base oligonucleotides (probes) which are selected to be complementary to various transcript regions of a gene . In order to determine transcript expression, which directly infers gene expression, groups of 11-20 probes matching the same gene/transcript are arranged in a probe set. Given a particular Affymetrix® GeneChip® platform, the design of the probes is fixed based on earlier genome assemblies and annotation available at that time. Since the design of the first Affymetrix® GeneChip®, rapid progress has been made in genome sequencing resulting in more accurate databases of annotated coding and non-coding genes.
Release dates of databases used by NetAffx v35 annotations and current database versions
Databases Common to All Four GEO Platforms
In addition, updating links between probe sets and their corresponding genes/transcripts does not provide a solution for problems caused by individual probes such as single nucleotide polymorphisms (SNPs) [4, 5], probes that target genes other than the designated gene of a probe set, and probes that no longer align to a genomic location. For example in the Affymetrix® GeneChip® HG-133 Plus 2 array, a total of 40,680 probes out of 603,158 (excluding quality control probes) do not have a perfect match to the most recent human genome assembly (hg38).
Top Affymetrix® in situ oligonucleotide arrays found in GEO
Number of Probes (PM)
Number of Probe Sets
Number of Samples
Human Genome U133 Plus 2.0 Array
Mouse Genome 430 2.0 Array
Rat Genome 230 2.0 Array
Arabidopsis ATH1 Genome Array
Several research groups have reassigned probes into new probe sets by creating their own custom Chip Description Files (CDF) [7–13], which are specially formatted files used to store the layout information for an Affymetrix® GeneChip® array. Given a CDF, the intensity values of probes located in the CEL file can be extracted and summarized as a defined probe set to detect the expression level of genes or transcripts.
These approaches have a similar workflow of mapping probes but differ in terms of the groupings of probe sets, including: data source used, the selected target level (gene or transcript), whether to create probe sets from scratch or redesign the existing groups and sharing probes between probe sets.
In terms of annotations used, most approaches have mapped the probe sequences to the transcripts obtained from one or more databases such as GenBank, NCBI RefSeq and Ensembl. Unlike other approaches, Harbig et al.  mapped to the target sequences of probes obtained from Affymetrix® rather than the actual mRNA sequences themselves, where the target sequence is an exemplar region of a specific transcript ≤600 bases in length. After mapping, they grouped probes to unique transcripts or genes based on the mapping results. Some approaches update the original probe set groups by removing select probes and changing the link between probe set and gene/transcript. The most comprehensive study for probe annotation remapping was achieved by Dai et al. (brainarray CDFs) . Rather than focusing on one reference database or combining multiple sources to create one custom CDF, they mapped probes to different annotation databases and created a specific custom CDF for each database.
Alternative CDFs for the top Affymetrix® in situ oligonucleotide arrays found in GEO
Number of Alternative CDFs
Number and Percent of Samples Using Alternative CDFs
While microarrays have been successfully utilized for understanding differential expression at the gene or probe set level, less attention has been given to the potential analysis at the individual exon, alternative transcript, and untranslated region (UTR) level. While the selection bias of probes on the 3′ ends of genes for earlier iterations of Affymetrix® GeneChip® designs presents limitations on the completeness of transcript information, more recent designs allow for a more complete coverage of exons and exon junctions. However, information concerning individual exons can still be extracted from earlier GeneChip® designs, particularly in the 3′ UTR regions that have been shown to play important roles in cancer [15–17], development [18–22], and localization in the nervous system [23–27]. In fact, over 40% of genes have been shown to generate multiple mRNAs with variable 3′ UTR lengths . These 3′ UTRs harbor binding sites for molecules including microRNAs (miRNAs) and RNA-binding proteins. Thus, mRNA isoforms with lengthened 3′ UTRs have increased numbers of sites for these cis-interacting factors. The diversity of 3′ UTRs is predominantly regulated by alternative polyadenylation (APA), which employs alternative mRNA cleavage sites that lie progressively distal to the stop codon. APA-driven mRNA diversity is required for normal physiology, and misregulation of this process is associated with diverse disease states . We therefore have developed a framework for analysis of Affymetrix® GeneChip® data by regrouping probes into probe sets based on Ensembl annotations at the gene, transcript, individual exon, and UTR levels in order to detect changes in gene expression that may occur within specific regions of the transcript.
Mapping of perfect match probes to a genome
PM probe sequences, which can be obtained from the Affymetrix® Netaffx™ web site, are aligned to the indexed genome using Bowtie version 1.0.1  with the parameters -v 0 and –m 1, requiring that probes align to a single genomic location with 100% identity, thereby reducing cross-hybridization effects. Note that Bowtie version 1 is best at aligning shorter sequences (25-50 bp) as found with microarray probes while the most recent versions of Bowtie are optimized for long sequence reads (>50 bp). Mismatch (MM) probes are not considered in the mapping step, although they could theoretically map uniquely to genomic regions. Rather, the MM probes are set aside and are included with their corresponding PM probe during the final CDF construction step once the PM probes have been assigned to a probe set. During this analysis, only probes perfectly matching to a region are considered. Therefore, probes crossing splice junctions will be discarded.
Annotation of perfect match probes via nested containment list (NCList)
GTF  files for the mouse, rat, and human genome were obtained from the Ensembl ftp server . Each GTF is a tab-delimited text file used to represent gene structure information, including the start and end positions of a gene together with chromosome location. Each structure is tagged with a feature which can be gene, transcript, exon, start_codon, stop_codon, CDS or UTR. Ensembl GTFs were used since the annotations are determined by an automated system based on experimentally verified data combined from multiple databases such as RefSeq, EMBL and UniProtKB. It also contains manual curation for selected species.
A nested containment list (NCList)  was created for each chromosome from intervals (start and end points) of gene structures. The intervals of the NCList were selected based on the target of the probe sets. When the probe sets were constructed based on regions of a gene, we used UTR, individual exon and CDS intervals. For gene/transcript targeted probe sets, we used gene/transcript intervals.
Probe intervals were searched in the NCList and annotated according to the overlapping results. Probes were split based on the matched chromosome. Each probe group interval was searched in the same chromosome’s NCList. When an overlap was found, the probe was annotated with the list node. Only complete overlaps were accepted; both the low and high ends of the interval have to be included in the list node. The probes which did not overlap the nodes were discarded. As a result, probes partially overlapping UTRs, individual exons, and CDS regions will not be included at the region and gene level, but will be present at the transcript level.
A probe’s start and end points may overlap multiple gene structures. It may overlap with the UTR and exon region of the same gene or with multiple genes or transcripts. In order to remove cross hybridization and ensure probes uniquely map to a single region, gene or transcript, we choose one of the annotations for each probe and remove the remaining matches. The rule for assigning these probes occurs with the following priority (I) 5′ and 3′ UTRs; (II) exons; (III) CDS. Thus, although UTR regions technically occur within exons, the more specific UTR assignment will be used. When the annotation was based on gene or transcript the first obtained annotation was selected.
- V.Probes with the same annotation were grouped together to form a probe set. Figure 2 shows the grouping of probes for three types of CDFs. These CDFs are:
Region-based CDF: Probe sets are designed to target a specific region of a gene and consist of probes which map to the same region (UTR, individual exon, CDS) of a gene. In Fig. 2, green probes were mapped to the UTR region of Gene_1; therefore, those probes cluster together to form the Gene_1 UTR region probe set. Based on the same logic, blue colored probes form the probe set for Gene_1 exon and pink colored probes form the probe set for Gene_1 CDS.
Gene-based CDF: Probe sets are designed to target genes and consist of probes which map to the same gene. In Fig. 2, green, blue and pink colored probes, which mapped to Gene_1, cluster together to form the Gene_1 probe set.
Transcript-based CDF: Probes that map to same transcript of a gene compose a probe set. In Fig. 2, the orange and red arrow show the start and end positions of Transcript_1 and Transcript_2. The probes mapped to the Transcript_1 (two greens, two blue and two pink) cluster together to form the probe set for Transcript_1.
Probe sets were saved into binary and ASCII format CDF files. The CDF files were created via the affxparser  Bioconductor package. In addition to the probes specific for a particular gene, Affymetrix® GeneChips® contain a number of different control probes such as probes that are added during sample preparation, providing evidence that assay was performed properly. We added those probe sets to our CDFs without any change. R CDF libraries were created via the makecdfenv  R Bioconductor package. The custom CDFs for three species (rat, mouse, and human) can be obtained from bioinformatics.louisville.edu/RegionCDFDesc.html
Probe set naming
Custom CDF naming examples
Probe Set Name
Summary of probes used for gene and transcript based custom CDFs
Number of Probes Used
Number of Probe Sets Constructed
Average Number of Probes Per Probe Set
Summary of probes used for region based custom CDFs
Number of Probes Aligned to Genome
Number of Probes Used
Number of Probe Sets Constructed
Average Number of Probes Per Probe Set
Custom CDF generation
Probes mapping to the genome
Number of mapped probes for custom CDF construction
Number of PM Probes
Number of PM Probes Mapped Uniquely
Number of PM Probes Mapped to Multiple Locations
Number of PM Probes Not Aligned
Human Genome U133 Plus 2.0 Array
Rat Genome 230 2.0 Array
Mouse Genome 430 2.0 Array
Probe annotations and probe sets
To annotate probes, we mapped uniquely aligned probes to gene regions using the most recent Ensembl genome and GTF file for each respective organism. We used the specific regions based on the custom CDF type (gene, transcript or region-based). Consequently we produced three types of custom CDFs (Tables 5 and 6).
The human gene based CDF has 22,651 custom designed probe sets composed from 414,701 probes and 62 original control probe sets. 442,025 annotations were identified between genes and the probes. 27,324 annotations were filtered after shared probes were removed. In order to validate our probe set annotations, we compared the original CDF probe sets with the custom CDF. A total of 21,585 annotated genes were shared between the two CDFs, with 3068 unique to the original CDF, and 1066 unique to our custom CDF. In order to determine why some genes were not covered in our CDF, we examined those unique to the original CDF. First we obtained the probe sets which represent these genes in the original CDF, yielding 2781 probe sets. We retrieved both the PM and MM probe sequences for each of these. We observed that for 667 probe sets, every probe was removed during probe mapping to the genome due to either non-unique mappings or mapping rates less than 100%. 30,150 probes from the remaining 2114 probe sets were not used in our CDF since they either did not map to the genome or they were MM probes. 14,028 probes were used in our newly constructed probe sets which target different genes than the original assignment by Affymetrix® and 2656 probes were not aligned to gene structures and not annotated. As a result, the differences between the original CDF and our method occurs because of probes removed during genome alignment, probes that no longer map to gene structure or probes that map to gene structures different from the original annotation.
For the rat 230 2.0 GeneChip®, the restriction of three probes per probeset yields 12,534 uniquely identified Ensembl genes at the gene level. We determined that for this specific GeneChip®, reorganization of the Affymetrix® probes into mRNA region-specific probesets provides 4024 unique Ensembl gene identifiers with probesets in both the 3′ UTR and CDS. Using this subset of probesets, differential expression of the CDS can then be compared to the 3′ UTR.
Analysis with custom CDFs
We reanalyzed the publicly available data series GSE72551 and GSE48611. Both of these studies involve the nervous system, where differences in 3′ UTRs are likely to have phenotypic effects on transcript localization. The GSE72551 data series examines gene expression changes associated with collateral sprouting and includes 5 naïve controls, 7 replicates at day 7 post-surgery and 7 replicates at day 14 post-surgery. The GSE48611 data series examines Down syndrome gene expression monitoring. This data set includes mRNA samples from the isogenic trisomy of chromosome 21 (Ts21) and control pluripotent stem cells (iPSCs) (DS1, DS4, and DS2U) between passages 24 and 48 and from day 30 neurons. Three biological replicates were present for each condition. Prior to analysis, we removed probe sets with two or fewer probes from the custom CDFs in order to achieve more accurate results for target expression levels. Robust Multiarray Averaging (RMA) normalization  was used for preprocessing. A p-value cutoff of 0.05 was used as the threshold for all experiments.
Further examination of the 7 day versus naïve ENSEMBL genes found to be differentially expressed in either the gene-based or region-based CDF shows high concordance, with 975 ENSEMBL genes determined to be differentially expressed using both CDFs (Fig. 3a). Examination of the p-values shows a significant correlation between both the gene and the 3′ UTR region (r = 0.439; p = 1.480E-58) as well as between the gene and the exon region (r = 0.101; p = 0.001). The higher correlation with the 3′ UTR region is to be expected, due to a higher abundance of probes designed in these regions.
One hundred sixty genes are found to be differentially expressed using the gene-based approach only. Three genes are omitted completely from the region-based CDF. Further examination of the remaining 157 genes measured using both the gene-based and region-based methods shows that 122 of these (78%) have a gene-based p-value >0.03, and 80 (50%) have a gene-based p-value >0.04, indicating the detected differences are just below the cutoff level. Analysis of the region-based p-values show that 120 of these (77%) have a region-based p-value <0.10, and 146 (94%) have a region-based p-value <0.20, putting these genes just above the significance threshold.
An additional 423 genes are found to be differentially expressed using the region-based approach only, with 203 from the 3′ UTR only, 10 from the 5′ UTR only, 206 from the exon only, and 4 from both the 3′ UTR and exon. Unlike the DEGs uniquely found in the gene-based approach, those genes found to be differentially expressed in the region-based approach typically have a much higher p-value in gene-based analysis, with only 31% having a p-value between 0.05 and 0.10. This supports our reasoning that separating into functional regions allows detection of subtle changes in transcript formation that may have a larger functional impact of those transcripts which has been further validated by experimental work showing differential expression of the 3′ UTR of the CAMKIV gene plays a role in localization .
In order to determine why some genes were only detected by the brainarray CDF, we examined the probe sequences of those genes that are brainarray specific. Of these, 39 were excluded from our CDFs since they aligned to multiple locations in the rn6 genome. An additional ten of these probes did not match to known Ensembl gene structures and were thus removed. Eighteen of these probes were excluded because the probe set contained fewer than three probes. An additional 40 of the brainarray probes were used in our CDFs, but with annotations differing from brainarray due to changes in annotation information.
DEGs detected by our gene based CDF and GPL570
Our Gene Based CDF
DS1 versus Ts21
DS2 versus Ts21
One of the limitations of microarray technologies is the design of probes based on available sequence and annotation data at the time of design. Based on our analysis, the percentage of uniquely mapping probes varies from 84% (rat) to 87% (human), indicating that changing knowledge about the genome itself plays a role in probe utilization. In terms of annotation, the rat genome is known to have more incomplete information when compared to mouse and human, which is reflected in the fact that only 47% of the rat probes lie in region-based locales (exons and UTRs) compared to 65% for mouse, and 69% for human. Since this can potentially lead to a small number of probes in each annotated region (and thus increased false positive rates), we have further required at least three probes be present in each probe set for our analysis. Both unrestricted (1 or more) and restricted (3 or more) probe groupings are available as CDFs.
As our analysis with the GSE48611 and GSE72551 datasets show, reanalysis of publicly available datasets using updated annotations can yield additional information when compared to the use of the original CDFs. In our case, the region-based CDFs allow for a better understanding of 3′ UTR dynamics through the reanalysis of publicly available data. While current high-throughput sequencing technologies may allow for a more complete picture, this custom CDF approach will allow for deeper insight with only minimal computational cost, taking advantage of the high volume of publicly available GeneChip® data.
We proposed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on functional regions of interest. Our work differs from others in that we annotated probes in UTR and exon levels in addition to gene and transcript (isoform) levels. We illustrated how this framework affects the detection of differentially expressed genes, particularly when focusing on functional regions of interest. Removing probes that no longer align to the genome without mismatches or align to multiple locations can help to reduce false-positive differential expression, as can removal of probes in regions overlapping multiple genes.
The main motivation of our work was profiling the contribution of UTR and exon regions to the gene expression levels globally. Our results indicate that features differentially expressed in either the gene-based or region-based CDF show high concordance and separating out into functional regions allows for the detection of subtle changes in transcript formation.
The authors wish to thank members of the Kentucky Biomedical Research Infrastructure Network Bioinformatics Core, the University of Louisville Bioinformatics Journal Club, and members of the University of Louisville Bioinformatics and Biomedical Computing Laboratory for helpful insight, project review, and suggestions.
Publication charges provided by National Institutes of Health grant P20GM103436. Research support provided by NIH grants P20GM103436 and R01NS094741. The contents of this manuscript are solely the responsibility of the authors and do not represent the official views of NIH.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the figshare repository, https://doi.org/10.6084/m9.figshare.3840144. Individual custom CDFs can also be accessed at: http://bioinformatics.louisville.edu/RegionCDFDesc.html.
About this supplement
This article has been published as part of BMC Genomics Volume 18 Supplement 10, 2017: Selected articles from the 6th IEEE International Conference on Computational Advances in Bio and Medical Sciences (ICCABS): genomics. The full contents of the supplement are available online at https://bmcgenomics.biomedcentral.com/articles/supplements/volume-18-supplement-10.
ES was responsible for code preparation, development of the project, and manuscript preparation. ECR and JCP developed the overall project goals. ECR supervised the overall project, provided the necessary lab space and computational resources for project completion, and led development of the manuscript. JCP and BJH provided test data, analyzed results, and reviewed the manuscript. KW performed testing of UTR analysis of microarrays and reviewed the manuscript. All authors have read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Causton HC, Quackenbush J, Brazma, A: Microarray gene expression data analysis: a beginner's guide. Malden, MA: Wiley-Blackwell; 2009.Google Scholar
- Knudsen S. Guide to analysis of DNA microarray data. 2nd ed. Hoboken, N.J: Wiley-Liss; 2004.View ArticleGoogle Scholar
- Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA. NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res. 2003;31(1):82–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Flight RM, Eteleeb AM, Rouchka EC. Affymetrix® mismatch (MM) probes: useful after all. In: 2012 ASE/IEEE international conference on BioMedical Computing (BioMedCom). Washington: IEEE Computer Society; 2012. pp. 6-13.Google Scholar
- Rouchka EC, Phatak AW, Singh AV. Effect of single nucleotide polymorphisms on Affymetrix match-mismatch probe pairs. Bioinformation. 2008;2(9):405–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41(Database issue):D991–5.PubMedGoogle Scholar
- Chalifa-Caspi V, Yanai I, Ophir R, Rosen N, Shmoish M, Benjamin-Rodrig H, Shklar M, Stein TI, Shmueli O, Safran M, et al. GeneAnnot: comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes. Bioinformatics. 2004;20(9):1457–8.View ArticlePubMedGoogle Scholar
- Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33(20):e175.View ArticlePubMedPubMed CentralGoogle Scholar
- Gautier L, Moller M, Friis-Hansen L, Knudsen S. Alternative mapping of probes to genes for Affymetrix chips. BMC Bioinformatics. 2004;5:111.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu H, Zeeberg BR, Qu G, Koru AG, Ferrucci A, Kahn A, Ryan MC, Nuhanovic A, Munson PJ, Reinhold WC, et al. AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets. Bioinformatics. 2007;23(18):2385–90.View ArticlePubMedGoogle Scholar
- Lu J, Lee JC, Salit ML, Cam MC. Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics. 2007;8:108.View ArticlePubMedPubMed CentralGoogle Scholar
- Risueno A, Fontanillo C, Dinger ME, De Las RJ. GATExplorer: genomic and transcriptomic explorer; mapping expression probes to gene loci, transcripts, exons and ncRNAs. BMC Bioinformatics. 2010;11:221.View ArticlePubMedPubMed CentralGoogle Scholar
- Yin J, McLoughlin S, Jeffery IB, Glaviano A, Kennedy B, Higgins DG. Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data. BMC Genomics. 2010;11:50.View ArticlePubMedPubMed CentralGoogle Scholar
- Harbig J, Sprinkle R, Enkemann SA. A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res. 2005;33(3):e31.View ArticlePubMedPubMed CentralGoogle Scholar
- Akman HB, Oyken M, Tuncer T, Can T, Erson-Bensan AE. 3'UTR shortening and EGF signaling: implications for breast cancer. Hum Mol Genet. 2015;24(24):6910–20.PubMedGoogle Scholar
- Fu Y, Sun Y, Li Y, Li J, Rao X, Chen C, Xu A. Differential genome-wide profiling of tandem 3′ UTRs among human breast cancer and normal cells by high-throughput sequencing. Genome Res. 2011;21(5):741–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang L, Hu X, Wang P, Shao ZM. The 3'UTR signature defines a highly metastatic subgroup of triple-negative breast cancer. Oncotarget. 2016;Google Scholar
- Hilgers V, Perry MW, Hendrix D, Stark A, Levine M, Haley B. Neural-specific elongation of 3′ UTRs during drosophila development. Proc Natl Acad Sci. 2011;108(38):15864–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Ji Z, Lee JY, Pan Z, Jiang B, Tian B. Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci. 2009;106(17):7028–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Kuersten S, Goodwin EB. The power of the 3[prime] UTR: translational control and development. Nat Rev Genet. 2003;4(8):626–37.View ArticlePubMedGoogle Scholar
- Revil T, Gaffney D, Dias C, Majewski J, Jerome-Majewska LA. Alternative splicing is frequent during early embryonic development in mouse. BMC Genomics. 2010;11:399.View ArticlePubMedPubMed CentralGoogle Scholar
- Thomsen S, Azzam G, Kaschula R, Williams LS, Alonso CR. Developmental RNA processing of 3′UTRs in Hox mRNAs as a context-dependent mechanism modulating visibility to microRNAs. Development. 2010;137(17):2951–60.View ArticlePubMedGoogle Scholar
- Harrison BJ, Flight RM, Gomes C, Venkat G, Ellis SR, Sankar U, Twiss JL, Rouchka EC, Petruska JC. IB4-binding sensory neurons in the adult rat express a novel 3′ UTR-extended isoform of CaMK4 that is associated with its localization to axons. J Comp Neurol. 2014;522(2):308–36.View ArticlePubMedGoogle Scholar
- Harrison BJ, Venkat G, Hutson T, Rau KK, Bunge MB, Mendell LM, Gage FH, Johnson RD, Hill C, Rouchka EC, et al. Transcriptional changes in sensory ganglia associated with primary afferent axon collateral sprouting in spared dermatome model. Genom Data. 2015;6:249–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Jansen RP. mRNA localization: message on the move. Nat Rev Mol Cell Biol. 2001;2(4):247–56.View ArticlePubMedGoogle Scholar
- Prakash N, Fehr S, Mohr E, Richter D. Dendritic localization of rat vasopressin mRNA: ultrastructural analysis and mapping of targeting elements. Eur J Neurosci. 1997;9(3):523–32.View ArticlePubMedGoogle Scholar
- Willis DE, Xu M, Donnelly CJ, Tep C, Kendall M, Erenstheyn M, English AW, Schanen NC, Kirn-Safran CB, Yoon SO, et al. Axonal localization of Transgene mRNA in mature PNS and CNS neurons. J Neurosci. 2011;31(41):14481–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Derti A, Garrett-Engele P, Macisaac KD, Stevens RC, Sriram S, Chen R, Rohl CA, Johnson JM, Babak T. A quantitative atlas of polyadenylation in five mammals. Genome Res. 2012;22(6):1173–83.View ArticlePubMedPubMed CentralGoogle Scholar
- Curinha A, Oliveira Braz S, Pereira-Castro I, Cruz A, Moreira A. Implications of polyadenylation in health and disease. Nucleus. 2014;5(6):508–19.View ArticlePubMedPubMed CentralGoogle Scholar
- The Brent Lab: GTF2.2: A Gene Annotation Format. http://mblab.wustl.edu/GTF22.html. Accessed 20 Sep 2016.
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.View ArticlePubMedPubMed CentralGoogle Scholar
- Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, et al. Ensembl 2015. Nucleic Acids Res. 2015;43(D1):D662–9.View ArticlePubMedGoogle Scholar
- Alekseyenko AV, Lee CJ. Nested containment list (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics. 2007;23(11):1386–93.View ArticlePubMedGoogle Scholar
- Bengtsson H, Bullard J, Hanson K: Affxparser: Affymetrix file parsing SDK. R package version 1.40.0. 2015.Google Scholar
- Irizarry RA, Gautier L, Huber W, Bolstad B: makecdfenv: CDF Environment Maker. R package version 1.44.0. 2006.Google Scholar
- Weick JP, Held DL, Bonadurer GF 3rd, Doers ME, Liu Y, Maguire C, Clark A, Knackert JA, Molinarolo K, Musser M, et al. Deficits in human trisomy 21 iPSCs and neurons. Proc Natl Acad Sci U S A. 2013;110(24):9962–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–64.View ArticlePubMedGoogle Scholar
- Fernandes HB, Catches JS, Petralia RS, Copits BA, Xu J, Russell TA, Swanson GT, Contractor A. High-affinity kainate receptor subunits are necessary for ionotropic but not metabotropic signaling. Neuron. 2009;63(6):818–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Jacob CP, Koutsilieri E, Bartl J, Neuen-Jacob E, Arzberger T, Zander N, Ravid R, Roggendorf W, Riederer P, Grunblatt E. Alterations in expression of glutamatergic transporters and receptors in sporadic Alzheimer's disease. J Alzheimers Dis. 2007;11(1):97–116.View ArticlePubMedGoogle Scholar
- Pickard BS, Knight HM, Hamilton RS, Soares DC, Walker R, Boyd JK, Machell J, Maclean A, McGhee KA, Condie A, et al. A common variant in the 3'UTR of the GRIK4 glutamate receptor gene affects transcript abundance and protects against bipolar disorder. Proc Natl Acad Sci U S A. 2008;105(39):14940–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Ray PS, Jia J, Yao P, Majumder M, Hatzoglou M, Fox PL. A stress-responsive RNA switch regulates VEGFA expression. Nature. 2009;457(7231):915–9.View ArticlePubMedGoogle Scholar