A zebrafish high throughput screening system used for Staphylococcus epidermidis infection marker discovery

Background Staphylococcus epidermidis bacteria are a major cause of biomaterial-associated infections in modern medicine. Yet there is little known about the host responses against this normally innocent bacterium in the context of infection of biomaterials. In order to better understand the factors involved in this process, a whole animal model with high throughput screening possibilities and markers for studying the host response to S. epidermidis infection are required. Results We have used a zebrafish yolk injection system to study bacterial proliferation and the host response in a time course experiment of S. epidermidis infection. By combining an automated microinjection system with complex object parametric analysis and sorting (COPAS) technology we have quantified bacterial proliferation. This system was used together with transcriptome analysis at several time points during the infection period. We show that bacterial colony forming unit (CFU) counting can be replaced by high throughput flow-based fluorescence analysis of embryos enabling high throughput readout. Comparison of the host transcriptome response to S. epidermidis and Mycobacterium marinum infection in the same system showed that M. marinum has a far stronger effect on host gene regulation than S. epidermidis. However, multiple genes responded differently to S. epidermidis infection than to M. marinum, including a cell adhesion gene linked to specific infection by staphylococci in mammals. Conclusions Our zebrafish embryo infection model allowed (i) quantitative assessment of bacterial proliferation, (ii) identification of zebrafish genes serving as markers for infection with the opportunistic pathogen S. epidermidis, and (iii) comparison of the transcriptome response of infection with S. epidermidis and with the pathogen M. marinum. As a result we have identified markers that can be used to distinguish common and specific responses to S. epidermidis. These markers enable the future integration of our high throughput screening technology with functional analyses of immune response genes and immune modulating factors.


Introduction
Infections with Staphylococcus epidermidis bacteria pose a serious problem associated with the use of biomaterials in modern medicine [1][2][3][4][5]. These bacteria can form biofilms on the surface of inserted biomaterials and persist in the surrounding tissues, where immune functions are disturbed due to the combined presence of a biomaterial and the bacteria [6,7]. In order to better understand the cause of this phenomenon and to assess the propensity of different bacterial strains and biomaterials to alter and trigger the immune response in the host, a whole animal model with high-throughput screening possibilities is desired. This will help identifying which factors determine that innocent bacteria become less susceptible to host defence mechanisms or antibiotic treatments when associated with biomaterials. Mouse and rat models have been used to investigate S. epidermidis infection and biomaterial-associated infection processes. However, histological examination of biopsies is time consuming and does not allow following the infection process over time [8][9][10]. Even with the use of bioluminescence and fluorescence imaging, high challenge doses are required to visualize bacterial colonization and high-throughput screening in rodents is not feasible. However, the zebrafish at the embryonal and larval stages is an excellent model for this purpose: it is translucent, fluorescently labelled immune cells and bacteria can be microscopically imaged in real time, and embryos can be obtained in high numbers [11][12][13][14][15]. The responses of many different pathogens such as Escherichia coli, Mycobacterium marinum, Salmonella typhimurium, Edwardsiella tarda, Burkholderia cenocepacia, and Staphylococcus aureus have already been assessed in zebrafish [16][17][18][19][20][21]. In previous work we successfully performed extensive transcriptome analyses with M. marinum, S. typhimurium, and E. tarda intravenous infection models using custom made Agilent micro-arrays and deep sequencing [19,[22][23][24]. The conventional infection method for zebrafish embryos is injection of pathogens into the caudal vein. However, this method is labour intensive and low throughput. For that reason we have recently developed and validated a high-throughput yolk infection model using M. marinum with an automated micro-injection system [12]. However, in this high-throughput model no transcriptome analysis has been performed until now. In the present study we have developed a highthroughput system for quantitating infection with S. epidermidis using the automated micro-injection system together with Complex Object Parametric Analysis and Sorting (COPAS) technology. This quantitative high-throughput technology has been used to study the transcriptome responses during non-lethal infection progression of S. epidermidis over time using micro-arrays and RNA deep sequencing. In order to understand which responses can be linked to defence mechanisms of the zebrafish towards fish pathogens, we have compared the host responses to S. epidermidis and to M. marinum at a time point when the initial yolk infection has further spread into the embryo's tissues. The obtained results allowed us to identify a number of genes as markers common for both infection models but also genes that can be used as markers to discriminate between pathogen specific responses.

Pathogenesis of S. epidermidis and S. aureus in zebrafish embryos
We first set out to compare S. epidermidis infected zebrafish embryos with embryos infected with S. aureus. For this purpose we injected S. epidermidis O-47 and S. aureus RN4220 strains containing GFP or mCherry plasmids under the same conditions into the yolk of embryos at 2 hours post fertilization (hpf). Injections with 5 cfu of S. aureus already showed a high intensity of fluorescent bacteria inside the yolk at the first day after injection. At the second day after injection all embryos had died from infection with bacteria spread inside the entire body of the embryos (data not shown). Injection directly into the caudal vein at 28 hpf with approximately 2500 cfu resulted in 100% mortality within several hours (data not shown). This high early mortality due to S. aureus is in accordance with earlier reports [20,25,26]. We subsequently tested S. epidermidis O-47 in yolk injections at doses of 5, 20, 50 or 100 cfu. At 1 day post injection (dpi) several small spots of fluorescent bacteria were observed inside the yolk with all cfu doses ( Figure 1A), which were absent in mock-injected controls. From 2 dpi onwards, bacteria, indicated by their fluorescence signal, were visible inside the yolk in a dosedepended fashion. The fluorescence signal became detectable inside the body of the embryos starting at 3 dpi ( Figure 1A). From this day onwards the bacteria were persisting in the vascular system and within various tissues. Confocal laser scanning microscopy was used to obtain a detailed image of bacteria spreading into the different tissues at 3, 4 and 5 dpi, the time points at which spreading of bacteria was observed (Figure 2). At 3 dpi bacteria were found intracellularly and extracellularly in the hematopoietic region ( Figure 2, panel 1), and free existing bacteria were observed in the blood were taken up by mpeg1:KAEDE positive cells ( Figure 2, panel 4). Free staphylococci in the blood have also been observed after intravascular catheter-related infections [27,28]. At 4 dpi much more extracellular bacteria in the intersegmental vessels were seen ( Figure  2, panel 2). No differences were found between the patterns observed at 4 and 5 dpi (  showed that all injected bacteria were cleared within several hours after injection (data not shown). In view of this, we conclude that the yolk infection system is therefore uniquely suitable to follow the proliferation of S. epidermidis and its effects on the host for at least 5 dpi. At the moment we can only speculate why the bacterial that were injected in the yolk had such better survival rates than bacteria injected in the caudal vein at later stages. Three possible explanations (or a combination of these factors) are that (1) there were repeated cycles of invasion from the yolk, (2) the bacteria in the yolk are primed to an infectious growth strategy for instance by using alternate sigma factors [29], or (3) the host immune system has been altered due to the prolonged exposure to high numbers of bacteria and possible associated anti-inflammatory compounds inside the embryos.

High-throughput infection quantification
The COPAS XL (Union Biometrica, USA) is a large cell flow cytometer designed for fluorescence screening of zebrafish embryos, Drosophila larvae and beads ranging from 1500 to 2000 microns in diameter [12,30]. Samples are analysed for size, optical density and three fluorescence signals. Groups of up to 3000 embryos can simultaneously be analysed and sorted into multi well plates or Petri dishes within 15 minutes ( Figure 3). The Profiler software package II detects and analyses up to 8000 data points per object for the extinction and fluorescence channels, and can be used to visualize every sample or to set sorting parameters. The Profiler shows the outline of a sample together with all fluorescence intensity traces, for each of the embryos in a sample. This is exemplified by a typical experiment of S. epidermidis infection of zebrafish embryos measured at 4 dpi ( Figure 3, panel B, and detailed in Supplementary file 2). COPAS analysis was performed every day from 2 dpi until 5 dpi. The daily analysis did not cause noticeable damage to the embryos. We observed an increase in the fluorescence signal during the 5 days of infection with S. epidermidis ( Figure 1C). Cfu count results showed good correlation with the increase of fluorescence signal in pools of embryos infected with mCherrylabelled ( Figure 1D) or GFP-labelled ( Figure 1E) bacteria. However, in the green channel some background signal produced by the embryonic yolk was detected, leading to less accurate quantification ( Figure 1E). Since the red fluorescence channel did not show any background signal, the results with mCherry-labelled bacteria were quantitatively more reliable, showing a correlation with the cfu counts ( Figure 1D). We did not find any influence of the orientation of the embryos in the flow chamber since we did not detect differences in embryos passing the laser dorsally or ventrally, or with the anterior or posterior side first. Therefore, our results show that combining the COPAS analysis with the automated micro-injection system provides a screening system of which the infection levels are statistically reliable. to right; a zebrafish pair are put together to mate, eggs are collected, eggs are distribute into a 1024 well agarose grid, eggs are injected into the yolk at 2 hpf using the automated micro-injection system. (B) from left to right; after injection, eggs are collected into Petri dishes and incubated at 28°C for a period of 5 days, COPAS analysis is performed on the S. epidermidis and non-injected embryos at 2, 3, 4 and 5 dpi. (C) from left to right; from all groups 20 embryos are snap frozen at 6 hpi, 1, 2, 3, 4 and 5 dpi for RNA isolation, amplification and Cy3 labelling, micro-array analysis against Cy5 labelled common reference and data analysis using Rosetta Resolver. (D) from left to right; validation of micro-array data was performed by RNAseq analysis of 4 biological replicas of S. epidermidis infected embryos at 5 dpi.

Specific marker genes for S. epidermidis infection
To characterize the transcriptome response of zebrafish embryos following S. epidermidis yolk injection, we performed a time resolved infection experiment using the highthroughput set up ( Figure 3). Considering that all bacterial injections were carried out with polyvinylpyrrolidone 40 (PVP) as carrier, PVP-injected embryos were taken along to control for possible effects of the carrier. Furthermore, needle puncture treated and non-injected embryo groups were included as additional control groups. Injections were performed with groups of at least 150 embryos of the same parents of which sets of 20 embryos were sampled during 6 time points (Figure 3). In order to check for reproducibility of this experiment an independent experiment was performed with the same parents at 4 dpi. RNA from these samples was used for micro-array analysis using custom made Agilent 4x180k micro-arrays.  Figure 4. Comparing each time point with multiple control samples clearly shows that there is a false negative effect in the controls that can be corrected for by using the overlap of the ratios of the different controls. This led to a filtered dataset as used for Figure 5 as discussed below. However, we want to emphasize that the injection of PVP has a reproducible effect by itself ( Figure 4). This could be of relevance, especially considering the effect of biomaterials on infection capacity of S. epidermidis in patients that make it worthwhile to further analyse the effect of PVP on infection in future experiments. We have analysed the effect of S. epidermidis infection over time on gene expression using Unigene clusters and ENSEMBL codes as specified in the raw data table of Supplementary file 4. Annotation of these probes by Gene Ontology (GO) shows that the most noticeable result is that S. epidermidis does induce many immunerelated genes starting from 3 dpi, observing a maximum induction of their expression at 4 dpi. Filtering the results ( Figure 5) we found an effect on the expression levels of many genes in the earliest measured time point (6 hpi) after exposure to injected bacteria. This effect is diminished to only a few genes whose expression is affected at the time point of 2 dpi, most of which cannot be assigned to a GO category (Figure 4). At 6 hpi, GO analysis indicated very broad classes of gene functions whose expression are affected by infection but did not reveal an obvious link to the immune response since the broad GO category "immune response" was not represented. At this stage the known innate immune responses to bacterial infection are not yet apparent. For instance neutrophils and macrophages have not yet developed and nothing is known about the function of pattern recognition receptors before this stage. We are currently studying the function of the expressed Toll-like receptors during early stages of embryogenesis [31]. We have manually annotated various functional categories of genes, of which the transcription levels were strongly affected by infection during time as shown in Figure 5 in a schematic representation and in Supplementary file 5 in a quantitative manner. Many of the immune genes indicated in Figure 5 have been previously linked to expression in cells of the myeloid lineage in zebrafish [32]. The immune transcriptome response correlates with the infection progression as described above. The first 3 days, the bacteria accumulate inside the embryonic yolk. This apparently does not lead to significantly induction or repression of many immune-related genes. From 3 dpi onwards many immune-related genes were significantly induced with a peak at 4 dpi ( Figure 5). At 5 dpi there were slightly less immune-related genes significantly expressed than at 4 dpi. Expression levels of the 49 selected genes shown in Figure 4 at 5 dpi were also lower compared with 4 dpi (Supplementary file 5). Since the analysed larvae were from the same injected batch it seems that higher microbial burden is not strictly correlated with a stronger immune response. Validation of this micro-array experiment was performed by deep sequencing analysis of RNA samples derived of 4 batches of approximately 150 embryos at the 5 day time point of infection and 4 non-infected batches of embryos. The data confirms the micro-array data as exemplified for some of the most reliable probes (Table 1). Furthermore we have compared the normalized reads per kilobase per million mapped reads (RPKM) [33] for the genes shown in Figure 5. These comparisons show that only in a few cases there are discrepancies between the results of the two technologies. Since the RNA deep sequencing results are obtained with a pool of larger number of biological samples, this could indicate that in these cases the micro-arrays result are less trustworthy. However, it was noted that in cases of discrepancy there were extremely low levels of expression resulting in a very limited number of mapped reads, showing that even with a sequencing depth of at least 20 million reads per sample there is still a limitation of sensitivity of RNA sequencing. This is of note because in most publications currently a sequence depth of 20 million reads is standard for RNA deep sequencing [34,35]. In several cases, such as il8, there was no ENSEMBL annotation of the gene that could be used for RPKM analysis. We have manually quantified the number of reads mapping to il8 to show that there is also induction after infection as in the case of the micro-array analysis ( Figure 6). RNA deep sequencing provides a much more detailed insight in gene regulation for instance showing also expression levels for every exon of the genes as shown for the representative genes mmp9 and il8 ( Figure 6). With the expected progress in development of high-throughput bioinformatic pipelines for data visualization of RNA sequencing data sets, we and others will be able in the near future to further harvest information from our submitted expression datasets as to the effects of infection on differential splicing, transcription start sites or polyadenylation sites in the entire transcriptome. The Venn diagrams show the number of significantly up-regulated probes (top row) or down-regulated probes (bottom row) (P-value smaller than 10 -8 and fold changes larger than 2 or smaller than -2) between S. epidermidis injected versus non-injected, S. epidermidis injected versus needle puncture and S. epidermidis injected versus PVP at 6 hpi, 1, 2, 3, 4 and 5 dpi. Data at 4 dpi are based on a biological replica. Pie diagrams represent GO annotation using DAVID bioinformatics resources 6.7 [40,41] from the overlapping genes (in white) at the representative time points.

Comparison of transcriptome responses to S. epidermidis and M. marinum
In order to compare the transcriptional response observed in zebrafish embryos infected with S. epidermidis with the response triggered by a pathogenic bacterium, we also performed an injection experiment with M. marinum using the same experimental protocol and sampling at 5 dpi. This time point was chosen in order to make it comparable to previous studies in which the caudal vein was used as the injection site [23]. We performed micro-array analysis, confirming the biological relevance of the yolk injection system since many markers that were previously identified to be differentially expressed in the caudal vein injection system [23] appeared regulated in a similar manner in the high-throughput yolk infection system (Figure 7 and Supplementary file 6). We observed a stronger transcriptional response of immune related genes than what we observed with caudal vein administration, which can be explained by the fact that bacteria accumulated more strongly after five days compared to the caudal vein injection method and have been present one more day inside the embryos. A number of the immune markers identified to be differentially expressed after infection with S. epidermidis appeared regulated in the same way after M. marinum yolk infection. These include the matrix metalloproteinases, complement factors, cytokines and heat shock proteins that were previously also identified in M. marinum infection in the caudal vein [23]. There are also distinct differences in genes responding to infection by these two different bacteria. Most obvious is a stronger transcriptional response of a number of relevant genes to M. marinum than to S. epidermidis infection. Interestingly, there is also a category of genes that are highly regulated by S. epidermis but not significantly by M. marinum in the yolk infection model. These genes include various immune related genes such as il8, il12a, tnfb, lect2l, and transcription factor atf3, junba, junbb,     Figure 6). Here we clearly benefit from the power of RNA sequencing that can overcome the problems of micro-array probe annotation for complex gene families. Therefore using our unbiased approach we were able to confirm known immune genes as markers for staphylococci infection but also identify novel markers as good candidates for specific response markers of S. epidermidis in our infection model that we will study further in functional analysis in the near future.

Conclusions
Microscopic imaging showed that S. epidermidis when injected into the yolk or caudal vein proved to be far less virulent than S. aureus. Under the same conditions of yolk injection S. aureus immediately invades the body of embryo causing 100% mortality within 3 dpi. In contrast, during the five day time period analysed, S. epidermidis proliferates efficiently in the entire body of the infected embryos providing an excellent sample at the right side of each gene. Green boxes represent up-regulation and the purple boxes downregulation with a P-value smaller than 0.02 and fold changes larger than 2 or smaller than -2. Genes were manually annotated and assigned to functional groups based on GO annotations of the zebrafish genes and their human homologues and on searching on PubMed abstracts. The D. rerio Uni-Gene Build # 124 or ENSEMBL Zv9 codes were used as shown with the raw data table in Supplementary file 6).
system for analysis of factors that influence bacterial proliferation and virulence. Based on this advantage, we have developed a versatile high-throughput analysis system for bacterial proliferation that is much less time consuming than cfu count determinations, and which allows repeated measurements of the same embryos over time. COPAS analysis proved to be accurate to determine the bacterial burden inside embryos at high-throughput. With addition of sorting zebrafish embryos into multi well plates, for automated confocal laser scanning microscopy, a medium-throughput, high-resolution screenings system can be added. We therefore have extended the high-throughput infection methods developed by Carvalho et al. [12] to a quantitative level and showed the applicability for the analysis of proliferation of opportunistic pathogens such as S. epidermidis. Our over time transcriptome analysis results correlate very well with the infection pattern of S. epidermidis. The bacteria will grow for the first 2 to 3 days inside the yolk of the embryos, while from 3 and 4 dpi S. epidermidis invade the body of the embryo, at which stage a strong response of many immune related genes occurs. We have compared transcriptome response in the same system using M. marinum. These comparisons show that M. marinum has a far stronger effect on host gene regulation than S. epidermidis. However, some genes were identified that specifically responded to S. epidermidis and not to M. marinum infection including a cell adhesion gene (mfap4, ENSEMBL 4 of 13) that can be linked to specific infection by staphylococci in mammals. Vuong et al. [38] and Otto et al. [26] already reported that S. epidermidis itself does not seem to have particular specific virulence factors. All known putative virulence factors have origins in the commensal lifestyle of this species. However, the large difference between the outcome of injection of bacteria into the yolk or caudal vein could have been caused by an effect of prolonged growth of the bacteria in the host organism resulting in a higher virulence when the bacteria are release in other tissues. In our future research we aim to use our identified host marker genes to identify new bacterial traits involved in proliferation in host tissues and the factors that determine their expression during time with emphasis on the time points when bacteria get in contact with immune cells. We are particularly interested in the effect of biomaterials on possible virulence factors that make virulence deviate from the commensal life style. This can help to understand which host mechanisms and genes are involved during biomaterial-associated infections.

Bacterial strains and growth conditions
S. epidermidis strain O-47, containing the GFP expression vector pWVW189 or a derived mCherry expression vector (De Boer L. unpublished) and S. aureus strain RN4220 pWVW189 (De Boer L. unpublished) from LB (Luria Bertani) agar plates were cultured overnight at 37°C in 25 ml LB medium supplemented with 10 μg/ml chloramphenicol to mid-log stage. M. marinum strain E11 was grown as described in Carvalho et al. [12]. Two reaction vials with 1 ml of the culture were centrifuged at 14680 rpm for 1 min.
The pellets were combined and washed three times with 1 ml PBS. Suspensions were prepared based on the OD 600 and by plating and cfu determination. The inocula were suspended in 2% polyvinylpyrrolidone 40 (PVP 40 , CalBiochem) to 5.0×10 6 , 1.0×10 8 cfu/ml.

Zebrafish husbandry
Zebrafish were handled in compliance with animal welfare regulations and maintained according to standard protocols (http://ZFIN.org). Embryos were grown at 28°C in egg water (60 μg/ml Instant ocean sea salt, Sera Marin). The egg water was refreshed every day.

Experimental design of infection study
Infection experiments were performed using mixed egg clutches from wild type AB×TL or Tg(UAS:KAEDE/MPEG1:GAL4) strain zebrafish [15]. Embryos were staged at 2 hpf by morphological criteria, and 20 cfu of mCherry or GFP expressing S. epidermidis O-47 bacteria suspended in 2% PVP 40 were injected into the yolk. As a control an equal volume of 2% PVP 40 was likewise injected. Manual injections were controlled using a Leica M50 stereomicroscope together with a FemtoJet micro-injector (Eppendorf) and a micromanipulator with pulled micro capillary needles. Automated micro-injections were performed as described in Carvalho et al. [12].

Microscopy
A Leica fluorescence (MZ 16 FA) stereo microscope and Leica TCS SPE confocal microscope were used to take images of zebrafish embryos. Embryos were kept under anaesthesia (0.02% buffered 3-aminobenzoic acid ethyl ester (Tricaine, Sigma) in egg water) during imaging.

COPAS analysis
Zebrafish embryos were measured alive every 24 hours until 5 dpi with the COPAS XL using the setting as described below. Photo multiplier tubes (PMT) voltage: 650 V for green/red and 0 V for yellow. Optical density threshold signal was set to 975 mV (COPAS value: 50) and the time of flight (TOF) minimum to 320 μs (COPAS value: 800) in order to reduce the influence of debris.

Cfu count
Injected embryos were collected into a 2 ml reaction vial with a sterile 5 mm stainless steel bead and PBS. The reaction vials were vigorously shaken for 30 seconds at 30 revolutions per second in a shaker (Retsch MM301). All suspensions were diluted, plated in duplicate on LB agar supplemented with 10 μg/ml chloramphenicol, and incubated overnight at 37°C. The following day, colonies were counted using a fluorescence stereo microscope (Leica MZ125).

Micro-array
Seven parent zebrafish couples kept separately from one another for mating the following week, to perform an identical experiment for a biological replicate. Injections were performed from the 16  GE2_107_Sep09 protocol for two-color Agilent micro-arrays. Micro-array data was processed using Rosetta Resolver 7.2 (Rosetta Biosoftware). S. epidermidis infection groups were compared to the PVP, needle puncture and non-injected control groups using the Rosetta common reference re-ratio experiment pipeline. Significance cut off for the ratios of S. epidermidis versus PVP, S. epidermidis versus needle puncture and S. epidermidis versus non-injected were set at 2 fold change at P-value smaller than 10 -8 . Pathway analysis was performed using the Pathvisio software package (www. pathvisio.org) [39] with the same significance cut off. The raw micro-array data have been deposited in the NCBI GEO database under accession number GSE42847 and GSE44352. DAVID bioinformatics resources 6.7 [40,41] was used for gene ontology analysis.

RNA deep sequencing
Validation of micro-array data was performed by RNAseq analysis. Ten parent zebrafish couples were kept separately from one another for mating the following week, to perform an identical experiment for 4 biological replicates. Injections were performed at approximately 2 hpf using the automated micro-injection system. At 5 dpi, embryos were collected from the 2 hpf injected and non-injected group, snap frozen in liquid nitrogen, and stored at −80°C for RNA isolation. Twenty cfu of S. epidermidis O-47 pWVW189 were injected to obtain 150 embryos per sample. For M. marinum infected embryos approximately 1000 embryos were used with 30 cfu injected per embryo. Embryos were homogenized in 1 ml of TRIzol reagent (Invitrogen), and total RNA was extracted according to the manufacturer's instructions. RNA samples were treated with DNaseI, (Ambion) to remove residual genomic DNA. RNA integrity was analysed by Lab-on-a-chip analysis (Agilent). The average RIN value of the RNA samples was 9.7 with a minimum of 9.5. A total of 3 μg of RNA was used to make RNA-Seq libraries using the Illumina TruSeq RNA Sample Preparation Kit v2 (Illumina Inc., San Diego, USA). In the manufacturer's instructions two modifications were made. In the adapter ligation step 1 μl instead of 2.5 μl adaptor was used. In the library size selection step the library fragments were isolated with a double Ampure XP purification with a 0.7x beads to library ration. The resulting mRNA-Seq library was sequenced using an Illumina HiSeq2000 instrument according to the manufacturer's description with a read length of 2 × 50 nucleotides. Image analysis and base calling was done by the Illumina HCS version 1.15.1. Sequence reads were quality trimmed using the quality_trim module in the CLCbio Assembly Cell v4.0.6. Filtered reads were mapped to ENSEMBL transcripts (Zv9_63) using the ref_ assemle_short module in the CLCbio Assembly Cell v4.0.6. Accumulation of transcripts to ENSEMBL genes was done by first converting the mapping files to a table with the assembly_table module in the CLCbio Assembly Cell v4.0.6. Secondly, a custom script was used that sums all reads belonging to the same gene. Non-uniquely mapped reads were divided between genes according to their ratio of uniquely mapped reads. Finally, read counts of transcripts belonging to the same gene were summed to obtain count data at ENSEMBL gene level. Fold-change and differential expression significance values were calculated from gene level read counts using the DESeq package version 1.8.3) available in Bioconductor (version 2.10). DESeq utilizes a negative binomial distribution for modelling read counts [42]. Secondly reads were counted per exon with a python script (Lodder R. unpublished). Sorted sam files were obtained from the raw fastq files through Bowtie2 [43] and samtools [44]. The raw RNAseq data have been deposited in the NCBI GEO database under accession number GSE42847 and GSE44352.