The effect of heterogeneous Transcription Start Sites (TSS) on the translatome: implications for the mammalian cellular phenotype

Background The genetic program, as manifested as the cellular phenotype, is in large part dictated by the cell’s protein composition. Since characterisation of the proteome remains technically laborious it is attractive to define the genetic expression profile using the transcriptome. However, the transcriptional landscape is complex and it is unclear as to what extent it reflects the ribosome associated mRNA population (the translatome). This is particularly pertinent for genes using multiple transcriptional start sites (TSS) generating mRNAs with heterogeneous 5′ transcript leaders (5′TL). Furthermore, the relative abundance of the TSS gene variants is frequently cell-type specific. Indeed, promoter switches have been reported in pathologies such as cancer. The consequences of this 5′TL heterogeneity within the transcriptome for the translatome remain unresolved. This is not a moot point because the 5′TL plays a key role in regulating mRNA recruitment onto polysomes. Results In this article, we have characterised both the transcriptome and translatome of the MCF7 (tumoural) and MCF10A (non-tumoural) cell lines. We identified ~550 genes exhibiting differential translation efficiency (TE). In itself, this is maybe not surprising. However, by focusing on genes exhibiting TSS heterogeneity we observed distinct differential promoter usage patterns in both the transcriptome and translatome. Only a minor fraction of these genes belonged to those exhibiting differential TE. Nonetheless, reporter assays demonstrated that the TSS variants impacted on the translational readout both quantitatively (the overall amount of protein expressed) and qualitatively (the nature of the proteins expressed). Conclusions The results point to considerable and distinct cell-specific 5′TL heterogeneity within both the transcriptome and translatome of the two cell lines analysed. This observation is in-line with the ribosome filter hypothesis which posits that the ribosomal machine can selectively filter information from within the transcriptome. As such it cautions against the simple extrapolation transcriptome → proteome. Furthermore, polysomal occupancy of specific gene 5′TL variants may also serve as novel disease biomarkers. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2179-8) contains supplementary material, which is available to authorized users.

WNT5BV1 5-'GATCTGAATTCGACCATTAGCAGGCACCCAGGCCTGTCTTTGGCTCGG AAACGGTGGCCCCCAATGTAGCCTAGTTTGAACCTAGGAACTGCAGGACCAGAGAGAT TCCACTGGAGCCTGATGGACGGGT-3' 5-'GAGCTCCATGGTCGGCCTCAGCCCTCCCCAGTGCCCTGGGACT GACAGTTTCCAGAGTAGGGTTCCCTCTGTCACCCGTCCATCAGGCTCCAGTGGAATCTC TCTGGTCCTGCCAGTTCTAGGTTCAAACTAGGCTAC-3' WNT5B V2 5'-GATCTGAATTCTATTCTTCCAAATGGAAACTGCTAATTTTTGAAGC AGAAGGTTGACAGCTTCAGTAAGATCTCAAGAGAGCGAGAAGACTGGAATCAGGG-3' 5'-GAGCTCCATGGTCGGCCTCAGCCCTCCCCAGTGCCCTGGGACTGAC AGTTTCCAGAGTAGGGTTCCCTGATTCCAGTCTTCTCGCTCTCTCTTGAGA-3' Figure S1: Reproducibility of the total and polysomal RNAseq experiments across all replicas. In each scatter plot the Spearman correlation and Pearson correlation coefficient between samples are indicated. Figure S2: (A) MA plot used to carry out Z-score calculation for translation efficiency changes between MCF7 and MCF10A. The level of expression for each gene is obtained from an average of three replicates. Genes identified to have differential promoter usage are shown in blue. (B) Histogram of Z-score obtained for all genes. Those with an absolute Z-score greater than 2 were taken to be differentially translated.
Figure S3: Identification of differential expressed genes under various input parameters. (A) Heatmaps indicating conditions that genes are identified to have differentially expressed promoter usage in polysomal and total RNAseq using altered parameters for Tophat and Cuffdiff (left) and when reads are aligned with different gene annotations (right). Details of the alignment parameters can be found in the Additional File 4. The genes are found to group together according to the RNAseq sample. (B) A heatmap indicating conditions when genes are identified to be TE differentially expressed with RefSeq and Ensembl gene annotations. The samples are clustered with the cityblock distance metric and "average" linkage criteria.
(E) 53BP1 Figure S4: A-D. Sequence alignments of the different transcript variants for the genes CAMKK1, CCND3, WNT5B and CLDN7. For clarity only the 5'TLs and ORFs have ben included in the alignments. The AUG start codons are highlighted (red line and blue arrow). Gaps in the alignment within the principle ORF arise due to alternative splicing. E. Alignment of the two TLs for the gene 53BP1. The uORF in V3 is indicated. In the lower image the two TLs have been folded. The AUG start codon is circled, the position of the uORF within the RNA structure and the minimum free energies are indicated

TGFBR3L
Transforming Growth Factor-Beta Receptor Type III-Like Protein . Binds to various members of the TGF-beta superfamily of ligands via its core protein, and bFGF via its heparan sulfate chains.

MAP3K14
It participates in an NF-kappaB-inducing signalling cascade

FGFRL1
Fibroblast Growth Factor Receptor-Like 1. A marked difference between this gene product and the other family members is its lack of a cytoplasmic tyrosine kinase domain.

SHB
Src Homology 2 Domain Containing Adaptor Protein B: May play a role in apoptosis.

IKBKE
Inducible I Kappa-B Kinase. IKBKE has also been identified as a breast cancer (MIM 114480) oncogene and is amplified and overexpressed in over 30% of breast carcinomas and breast cancer cell lines.
ARAF v-raf murine sarcoma oncogene homolog: May also regulate the TOR signaling cascade.

PDLIM2
PDZ And LIM Domain 2: The encoded protein is also a putative tumor suppressor protein.

LIN9
Lin-9 Homolog: This gene encodes a tumor suppressor protein that inhibits DNA synthesis and oncogenic transformation through association with the retinoblastoma 1 protein.

CDC73
Cell Division Cycle 73: Tumor suppressor probably involved in transcriptional and post-transcriptional control pathways.

XRCC5
DNA repair protein-implicated in breast cancer

XRCC4
DNA repair protein

CKS2
CDC28 Protein Kinase Regulatory Subunit 2: binds to the catalytic subunit of the cyclin dependent kinases and is essential for their biological function.

RAD21
RAD21 Homolog: may play a role in spindle pole assembly during mitosis. Also plays a role in apoptosis, via its cleavage by caspase-3/CASP3 or caspase-7/CASP7.

CCNC
Cyclin C: Component of the Mediator complex (see MED23). Binds to and activates CDK8 that phosphorylates the CTD (C-terminal domain) of the large subunit of RNA Pol II

NPEPPS
Aminopeptidase Puromycin Sensitive: involved in proteolytic events regulating the cell cycle.

POLK
DNA Polymerase Kappa: DNA polymerase specifically involved in DNA repair.

CENPK
Centromere Protein K: involved in assembly of kinetochore proteins, mitotic progression and chromosome segregation.

CENPQ
Centromere Protein Q: involved in assembly of kinetochore proteins, mitotic progression and chromosome segregation.

NBN
Nijmegen Breakage Syndrome 1 (Nibrin): Component of the MRE11-RAD50-NBN (MRN complex) which plays a critical role in the cellular response to DNA damage and the maintenance of chromosome integrity.

CDK1
Cyclin-Dependent Kinase 1: Plays a key role in the control of the eukaryotic cell cycle.

MNAT1
CDK-Activating Kinase Assembly Factor MAT1: Involved in cell cycle control and in RNA transcription by RNA polymerase II.

SMC3
Structural Maintenance Of Chromosomes 3: Central component of cohesin, a complex required for chromosome cohesion during the cell cycle. Cohesion is coupled to DNA replication and is involved in DNA repair.

TTK
TTK Protein Kinase: found to be a critical mitotic checkpoint protein for accurate segregation of chromosomes during mitosis.

SUPT16H
Suppressor Of Ty 16 Homolog: involved in multiple processes that require DNA as a template such as mRNA elongation, DNA replication and DNA repair.

HAT1
Histone Acetyltransferase 1: May play a role in DNA repair in response to free radical damage.

PRKDC
Protein Kinase, DNA-Activated, Catalytic Polypeptide: functions with the Ku70/Ku80 heterodimer protein in DNA double strand break repair and recombination.

RAD51AP1
RAD51-Associated Protein 1: May participate in a common DNA damage response pathway associated with the activation of homologous recombination and double-strand break repair.

POLE2
DNA Polymerase II Subunit 2: Participates in DNA repair and in chromosomal DNA replication

GTF3C3
General transcription factor RAD51AP1 RAD51-Associated Protein 1. May participate in a common DNA damage respomse associtaed with double strand break repair

LIG4
Ligase IV, DNA, ATP-Dependent: The LIG4-XRCC4 complex is responsible for the NHEJ ligation step

PRIM1
DNA Primase 49 KDa Subunit: DNA primase is the polymerase that synthesizes small RNA primers for the Okazaki fragments during DNA replication.

UP REGULATED XRCC3
member of the RecA/Rad51-related protein family that participates in homologous recombination to maintain chromosome stability and repair DNA damage. Implicated in breast cancer.