### Empirical RNA-Seq data

RNA-Seq read data used in the present analysis was taken from a previous study
[8] and is available through the sequence read archive (SRA id GSE20895). 14 RNA-Seq datasets were used, each corresponding to the sequencing reads from one “lane,” which is a physical compartment in the flow cell of the Illumina GA-II instrument. The 14 lanes corresponded to 4 RNA-Seq libraries, whereby 3 lanes of sequence data were available for each of two of the libraries and 4 lanes for each of the other two. Each library was synthesized from a different source of RNA. RNA sources corresponded to two experimental conditions, “SNL” (spinal nerve ligation) or “control.” Two independent RNA-Seq libraries, “biological replicates,” were available for each condition. The published analysis
[8] found that biological replicates were similar and that the 2 experimental conditions were meaningfully different. Technical replicates were not compared in the previous report (instead reads originating from different lanes corresponding to the same RNA-Seq library were pooled) but are included in the comparisons made in the present study.

Additional file
1: Table S1 lists the condition and replicate identification corresponding to each of the 14 lanes. The table illustrates how the available data allowed for three types of comparisons between RNA-Seq datasets: pairs of technical replicates (same library sequenced on different lanes); pairs of biological replicates (same experimental condition); and the two experimental conditions “SNL” *versus* “control”.

### Mapping and annotation

RNA-Seq reads (50bp) were aligned to the rat reference genome (RGSC 3.4) by Bowtie
[19]. We allowed for a maximum of two mismatches and considered only the uniquely mapped reads for the downstream analysis. This filtration step resulted in 108,636,496 million UMRs to the genome. Genome annotation ENSEMBL 65 was used including 22,921 protein coding genes. Overlapping exons of genes having multiple isoforms were combined resulting in a total of 222,097 exons (Additional file
2). For the subsequent analysis exons served as bins, i.e., reads aligning to each exon were counted and the sum noted in the master read count file (Additional file
3). The file has 222,097 rows (bins corresponding to exons) and 14 columns (corresponding to each of the lanes described above).

### A universal pool of RNA-Seq reads for the simulation experiments

All uniquely mapped reads (from lane 1 to 3) from the first “control” RNA-Seq sample were combined resulting in 22.9 × 10^{6} reads in order to create a universal pool. The datasets for the in silico duplicates and replicates described below were generated from this pool. The in silico replicates created from the universal pool of RNA-Seq reads by random drawing are by definition only different due to stochastic (Poisson) variation of the sampling process (see Results). Similarly all 3 lanes from “control 2” were combined to create a second pool used as “contaminant” in the contamination experiments.

### In silico replicates: “Perfect” replicates

A set of RNA-Seq datasets faithfully representing Poisson variation only (perfect non-identical replicates) was generated by randomly choosing sets of 5 x10^{6} reads from the universal pool by using the “sample” function in R (Additional file
4) and a Java script (Additional file
5). This process was repeated 200 times and used as reference of perfect replicates in the subsequent benchmarking of the different test statistics. The same procedure was further repeated with different set sizes (0.5 to 10 million reads per sample) in order to test for the influence of total read counts on the statistics.

### In silico contamination

To test whether the statistical measures were sensitive to actual differences, we contaminated one in silico replicate out of a pair with 0;5;10;25;50;75;100% of a biological replicate (“control 2”) via computer simulation. In detail, the first sample was created by randomly drawing 5 million reads of the universal pool of “control 1” and the second by drawing x% of reads from “control 1” and y% of reads from file “control 2”, whereby x+y=100, corresponding to 5 million reads. The procedure was repeated 200x.

### Processing of the empirical data

For Pearson’s correlation coefficient and Kappa, the 3 lanes of each of the two “control” and the 4 lanes for each of the two “SNL” condition were compared in a pairwise fashion, resulting in a total of 18 technical replicate comparisons (see Figure
4). To compare the performance of the statistics on biological replicates we compared the first lane of “control 1” to the first lane of “control 2”, and lane 1 from “SNL 1” to lane 1 from “SNL 2”. To test whether Pearson’s *r* and Kappa were sensitive to different experimental conditions, we selected the first lane of each “control” and compared it to the first lane of each “SNL” sample. SERE is not restricted to pairwise comparisons, but allows to compare multiple samples or lanes simultaneously. Therefore, SERE yielded 4 values for the technical replicate comparisons, 2 for biological and 4 for different experimental conditions.

### Simple Error Ratio Estimate (SERE)

Given a set of N exons and M lanes, let

*y*
_{
ij
} denote the number of reads covering the

*i*
^{
th
} exon in the

*j*
^{
th
} lane. Let

*L*
_{
j
} be the total read count for lane

*j*,

*E*
_{
i
} the total for exon

*i*, and T the grand total count across all lanes and exons. Under the hypothesis that the lanes are simple technical replicates of each other, they will have a Poisson distribution with one parameter. This parameter can be thought of as the expected number of reads for the lane

*j* and the exon

*i*. Its estimate can be calculated using eq. 1.

The expected variation for each observation

*y*
_{
ij
} is

, and the expected variation under the Poisson assumption is

. This gives a per exon overdispersion estimate of:

The denominator is (*M* − 1) due to the constraint that
for each exon *i*.

Averaging over all N exons we have:

The SERE estimate is
.

Simple algebra shows that for a singleton count, i.e., an exon that appears only once in only one of the lanes, SERE equals exactly 1. That is, singletons shrink the overall SERE estimate towards 1, whether or not the samples are actually replicates. Therefore, we modify the average in equation 2 to sum over only the non-singleton counts. The R code to calculate SERE is provided in Additional file
6.

The unmodified measure of equation 2 is the measure of Poisson over-dispersion most often used in generalized linear models, see for instance the classic textbook of McCullagh and Nelder
[20]. An alternative measure of overdispersion is based on the deviance statistic. Brown and Zhao
[21] consider this measure along with two others in the context of random arrival data, e.g. calls to a support center, and show that it is inferior to *s*
_{
i
}
^{2} (equation 2) whenever there are small values for
. Their work corresponds to the case where all lanes have the same total count *L*
_{
j
}; extending their method shows that (*M* − 1)*s*
_{
i
}
^{2} will be distributed as a chi-square random variable with (*M* − 1) degrees of freedom and non-centrality parameter
. The parameter *p*
_{
ij
} is the true fraction of exon *i* within sample *j* and
the fraction of exons in the (hypothetical) pooled sample. Under the null hypothesis of sample equality
the non-centrality parameter is zero. *N*(*M* − 1)*s*
^{2} will follow a chi-squared distribution with *N*(*M* − 1) degrees of freedom. This can be used to set 99 % confidence intervals on the non-centrality parameter, and through that on the SERE estimate. We provide the corresponding R code in Additional file
7. Bins containing a read count of 0 in both samples of a pair contain no information and were therefore excluded from the comparative analyses of SERE and the two alternative parameters described below.

### Pearson’s correlation coefficient

For a pair of lanes, we calculated the RPKM
[1] for each lane. In the RPKM calculation we added a pseudo count of one read to each of the exons. The RPKM values were then log transformed and the Pearson’s correlation was calculated. The process was repeated using the raw counts to verify any influence of the RPKM transform itself, and using both the log and the square root of the raw counts. The latter is the variance stabilizing transform for Poisson data. None of these had any substantive impact on the results (data not shown).

### Cohen’s simple Kappa statistic

The read counts were normalized to RPKM and divided into 9 bins of size: 0, 1-10, 11-20, 21-40, 41-80, 80-160, 161-320, 321-1000 and greater than 1000 RPKM as it was suggested by McIntyre et al. In order to compare a pair of replicates, a 9 × 9 table of counts was constructed, whereby each exon pair added to a cell of the table (see Additional file
8). In this way, exons that were in agreement added to the diagonal of the table, whereas the total fraction of off-diagonal entries contributed to a measure of non-agreement.