Relative power and sample size analysis on gene expression profiling data
© van Iterson et al; licensee BioMed Central Ltd. 2009
Received: 19 February 2009
Accepted: 17 September 2009
Published: 17 September 2009
With the increasing number of expression profiling technologies, researchers today are confronted with choosing the technology that has sufficient power with minimal sample size, in order to reduce cost and time. These depend on data variability, partly determined by sample type, preparation and processing. Objective measures that help experimental design, given own pilot data, are thus fundamental.
Relative power and sample size analysis were performed on two distinct data sets. The first set consisted of Affymetrix array data derived from a nutrigenomics experiment in which weak, intermediate and strong PPARα agonists were administered to wild-type and PPARα-null mice. Our analysis confirms the hierarchy of PPARα-activating compounds previously reported and the general idea that larger effect sizes positively contribute to the average power of the experiment. A simulation experiment was performed that mimicked the effect sizes seen in the first data set. The relative power was predicted but the estimates were slightly conservative. The second, more challenging, data set describes a microarray platform comparison study using hippocampal δ C-doublecortin-like kinase transgenic mice that were compared to wild-type mice, which was combined with results from Solexa/Illumina deep sequencing runs. As expected, the choice of technology greatly influences the performance of the experiment. Solexa/Illumina deep sequencing has the highest overall power followed by the microarray platforms Agilent and Affymetrix. Interestingly, Solexa/Illumina deep sequencing displays comparable power across all intensity ranges, in contrast with microarray platforms that have decreased power in the low intensity range due to background noise. This means that deep sequencing technology is especially more powerful in detecting differences in the low intensity range, compared to microarray platforms.
Power and sample size analysis based on pilot data give valuable information on the performance of the experiment and can thereby guide further decisions on experimental design. Solexa/Illumina deep sequencing is the technology of choice if interest lies in genes expressed in the low-intensity range. Researchers can get guidance on experimental design using our approach on their own pilot data implemented as a BioConductor package, SSPA http://bioconductor.org/packages/release/bioc/html/SSPA.html.
Genome-wide technologies such as microarray and sequencing are intensively used to study differential expression in e.g. disease and/or treatment, often compared with controls. Power and sample size analysis give valuable information about the performance of the experiment: what is the optimal number of replicates? Is the power sufficient to detect a biological effect?
Absolute power and sample size estimation must be done using pilot experimental data in each problem separately, as they are influenced by variability that is both technical as well as biological . For this a technology must be chosen. So it is important to understand beforehand how relative power and sample size behave depending on the technology used. In particular, different technologies may display different power depending on the gene expression range.
Here we focus on estimating relative change in power and sample size, given either different effect sizes or different expression profiling technologies. In each case results are derived from pilot experiments, so conclusions relate directly to practice. The different expression profiling technologies include commercial and home-spotted gene-expression microarray platforms, as well as a deep sequencing technology. For power and sample size calculations, we adapted the method proposed by Ferreira et al. .
Power and sample size estimation
where λ represents the density of effect sizes θ and π0 the proportion of non-differentially expressed genes.
where Γ represents the power for a single gene as function of the p-value u and effect size θ and δ is the user-defined false discovery rate. In fact, Ferreira et al.  showed that the average power, given by equation 2, is controlled for multiple testing through the adaptive Benjamini-Hochberg method . This is essentially the same as the originally proposed false discovery rate method of Benjamini and Hochberg , corrected by the proportion of differentially expressed genes to avoid over-estimation.
The effect size density is estimated unconstrained, so after it is obtained it must be constrained to being non-negative, whilst integrating to 1. To avoid discontinuities where the constraint is applied, we re-adjust the π0 estimate (see Additional file 1).
Note that to estimate the average power in this way involves also the power to detect effect sizes around zero, which are technically very difficult to measure accurately. A small region around zero can be defined that will be excluded from the density of effect sizes and thereby increases the estimated average power.
A simulation study was performed based on the simulation perviously described by Langaas et al. . While keeping the proportion of non-differentially expressed genes fixed at 0.8, we varied the effect size distribution. Specifically, three different effect size distributions were constructed based on a symmetric bitriangular distribution . This means that each differentially expressed gene is either over- or under-expressed with equal probability, and that the mean effect size per over-expressed gene is chosen at random from values in a window between a = log2(1.2) and b > a, with the mode at m = log2(2), and for under-expressed genes a window between -a and -b is used, with the mode at -m. We used three values for b, namely log2(2), log2(4), log2(5), generating situations with weak, intermediate and strong effect sizes respectively. This means for example that the weak-effect situation has genes with effect sizes between log2(1.2) and log2(2).
A total of N = 250 simulations were performed with 20000 independently generated normalized expression values and J = 5 samples in each of two groups. Test-statistics were calculated as described by Langaas et al. . The observed power was calculated as the number of differentially expressed genes with Benjamini-Hochberg FDR ≤ 10% (true positives), divided by m1 = 4000. In addition, the estimated power was evaluated using the method proposed by Ferreira et al. .
Example 1 - different biological effect sizes
Overview of data sets in Example 1 and 2
strong, 6 hours
intermediate, 6 hours
weak, 6 hours
strong, 5 days
weak, 5 days
Example 2 - different expression profiling platforms
Gene expression profiles in the hippocampi of transgenic δ C-doublecortin-like kinase mice were compared to wild-type. The sample size and power analysis was applied to expression data relating to the same samples, measured either by one of several microarray platforms, or by Solexa/Illumina deep sequencing .
In this analysis we included expression data for the same samples obtained with four microarray platforms, previously analyzed in the platform comparison study of Pedotti et al. , namely Affymetrix, Agilent, Illumina and home-spotted oligonucleotide arrays, the last containing the 22K Sigma-compugen collection. Ten microarrays were used for each platform. For the one-color platforms (Affymetrix and Illumina), each individual RNA was hybridized to one microarray, yielding five hybridizations in each of the groups wild-type and transgenic mice. A direct design was used for hybridization of the two-color arrays (Agilent and home-spotted), i.e. each microarray was hybridized with two RNA samples from different groups. Dye-swapped hybridizations were done with non-identical pairs. This design yielded ten hybridizations in each of the groups, five of which being biologically independent. The data has been pre-processed as described by Pedotti et al. . mRNA expression levels from four samples of each group were also measured using the Solexa/Illumina deep sequencing; for details about sample preparation and pre-processing, see . Moderated t-test statistics were calculated, for each technology, using the empirical Bayes linear regression model as implemented in limma .
To enable comparison between technologies, for each technology subsets were generated containing only genes mapped to the same Ensembl tags, leaving 9504 genes per technology . The datasets were stratified into three subsets of genes of approximately equal size (3168) as follows. First the median-expression values across wild-type samples for the Solexa/Illumina data were calculated, per Ensembl tag. Then the 33rd and 67th percentiles of these values were obtained. These two values were used to determine three strata: tags with Solexa/Illumina median -expression below 33rd percentile were put into the low-intensity range, followed by tags with median-expression between 33th and 67th percentiles being put into the intermediate-intensity range, and finally remaining tags were put into the high-intensity range. In the power and sample size analysis a region near zero ([-0.5, 0.5]) was excluded from the density of effect sizes.
We used the R  environment for statistical computing for all calculations, the BioConductor  packages gcrma , limma  and multtest , qvalue  and for graphs we used the lattice  package. We adapted the original S-Plus scripts kindly made available to us by Ferreira  and developed an R package called SSPA, available from BioConductor  http://bioconductor.org/packages/release/bioc/html/SSPA.html.
Example 1 - different biological effect sizes
Estimated power and adjusted proportions of non-differentially expressed genes for the pilot studies.
strong, 6 hours
intermediate, 6 hours
weak, 6 hours
strong, 5 days
weak, 5 days
Example 2 - different expression profiling platforms
This study was designed to explore the capabilities of expression profiling technology to identify subtle differences in gene expression. To estimate power, it is essential to estimate the proportion of non-differentially expressed genes (π0). In table 2 (right-panel) we report π0 for the different expression profiling platforms, as estimated by the method of Langaas et al. . We used alternative methods for estimation of π0 with essentially similar results (see Additional file 2). The estimated proportion of non-differentially expressed genes differ significantly between the different technologies, with Illumina displaying the highest proportion (right-panel of table 2). However, perhaps the most marked point is that the power is generally low for all technologies used. This is a reflection of the fact that the effects on expression are mostly subtle.
Considering the two one-color platforms, Illumina yields less power than Affymetrix in this experiment (see right-panel of table 2). This is also reflected by the estimated densities of effect sizes (not shown).
We aim at helping researchers to understand: a) how much change in sample size different expression profiling technologies require to yield the same power; and b) by how much the sample size required for a fixed power would change, if smaller effects were to be detected. In order to do so, we use experiments carefully designed to answer these questions. Our experiments use tissue samples from animal models, commonly used in practice. Not only our conclusions are easily applicable to the design of microarray experiments, but also our approach is available for researchers to use on their own data via our BioConductor package SSPA.
For our study, we use the method proposed by Ferreira et al. . It makes use of a pilot study to estimate distribution of effect sizes and proportion of non-differentially expressed genes, based upon which power is estimated.
Various other methods to estimate sample size and power in microarray studies have been proposed. The first few assumed that multiple testing correction is done via control of the familywise error rate [20, 21], which is unlikely to be the case in current practice. More recently, methods were proposed to handle the more commonly used control of false discovery rate [22–25, 2]. The methods differ in how they treat the distribution of effect sizes; simpler methods assumed a fixed value for all differentially expressed genes, or took a subset of the largest effect sizes [22, 25]. These are unlikely to correctly describe effects from experimental studies. Recently proposed methods [2, 24, 26] estimated the distribution of effect sizes from a pilot data set. Ferreira  assumes that the test statistics follow a normal distribution, which is unlikely to be the case. However, the extension of Ferreira's method to consider statistics with a more suitable Student-t distribution is not trivial. Indeed, to solve equation 2 is a much harder analytical problem under the Student-t distribution. Jørstad et al.  proposed to solve this problem by discretizing the effect sizes and then estimating the components of the resulting mixture. Ruppert et al.  proposed to estimate the density of effect size by a linear combination of splines optimized via penalized least squares. The number of parameters that needs to be estimated by both methods is considerably larger than by the method of Ferreira et al. , making them computationally much more intensive. For this reason, we chose to use Ferreira's method.
The simulation experiment shows that the estimated power is in agreement with the observed power. Important for the power estimation is the estimation of the proportion of non-differentially expressed genes. The π0 estimates of the simulation experiment were all conservative and a little less than π0 = 0.8, which may have led to overestimation of the power. However, the opposite was observed: the estimated power was found to be conservative. Generally speaking it is good to be on the conservative side in power calculations. However, this may not be the case in other applications. Indeed, we have observed that the relationship between observed and estimated power depends on both π0 and the effect size distribution (data not shown). This issue deserves further investigation but is beyond the scope of this paper.
Both our simulation study as well as example 1 demonstrate that our method correctly ranks the power of different experimental scenarios and is thus suited to evaluate the relative capacity to identify differentially expressed genes. Some of the results obtained with this method are as expected: in example 1, potent compounds yield higher power than weaker compounds, as does longer exposure time compared to a shorter one (example 1). The hierarchy of PPARα-activating compounds found with this method confirms previously biologically-driven analysis  (Hooiveld et al. manuscript in preparation). However, some results are unexpected, such as that the power seems to increase little after a certain sample size (12, for 5-days exposure). Since the power for 12 samples per group is already acceptably high (80% for the stronger compound, over 70% for the weaker one), this result suggests that it is useless to analyse more than 12 samples to find differentially expressed genes.
The higher power for the longer-exposed compounds is associated with a markedly bimodal density of effect sizes (figure 2). This density represents additional and important information for the researcher. In this case, for example, it can be seen that there are more genes up- regulated than down-regulated after short-term exposure, and that after long-term exposure this is more balanced, suggesting that up-regulation is kicked off earlier on. So, we have shown how the density of effect sizes reflects varying distributions of differential expression. This is intuitive and seems trivial, but no other method has produced this result before .
Example 2 shows great difference in performance for different expression profiling technologies to detect a subtle biological effect. Commercial microarray platforms perform better than the home-spotted mainly due to their higher reproducibility. The poor performance of Illumina may be attributed to the fact that less probes are used than Affymetrix. By estimating power separately for genes expressed in intensity ranges varying from low to high, we can clearly see the added value of the Solexa/Illumina deep sequencing technology compared with microarrays. Due to background, microarray platforms often cannot reliably measure expression in the low-intensity range. Indeed, Solexa/Illumina displays with 4 replicates per group the same power as Agilent and Affymetrix with 7 replicates per group (figure 4). The reduced power of microarray technologies is shown to be mainly due to lack of power to detect differential expression for lowly expressed genes, possibly due to presence of background intensities, a problem that does not affect Solexa/Illumina.
The estimated proportion of non-differentially expressed genes p0 shows great difference between the different expression profiling technologies. This is likely due to differences in genome mapping  and hybridization efficiency. Since it is known that the same samples were hybridized to the different platforms, one might wonder if by using a common, fixed π0, value for all platforms more consistent results would have followed. That is not the case: power estimates are robust to variations of roughly 10% around the estimated π0 value (data not shown). Moreover, we believe this should not be done, as effects of technical differences between the platforms would then have been ignored, which is undesirable. So, we believe that for each experiment π0 should be estimated from the data, and that a sensitivity analysis may help reassure the researcher that power estimates are reliable.
Our example 2 is not trivial and no previous article, to our knowledge, has produced power and sample size calculations in such datasets. Indeed, the MAQC study [27, 28] only involves technical replicates. So we do believe our work, by involving not only 4 different microarray platforms, but more importantly also a deep sequencing technology, does yield new knowledge. In particular, no previous work has shown that deep sequencing technology displays more power than microarrays in the presence of biological, in addition to technical, variability.
The objective of the study plays an important role in choosing the FDR-control level. Indeed, if results are meant to be further explored via high-dimensional tools such as pathway analysis, then there is interest in having a longer list of possibly interesting features, albeit with a larger FDR. This effectively expands the space on which the subsequent analysis tool will look for associations, improving the chance of finding more subtle, and for that less obvious, ones. If, on the other hand, the list of selected features must be validated by time-consuming and labour intensive techniques, then a shorter list obtained with as low an FDR as possible is the best. For our objectives, we needed only to have at least a few features selected at each instance in order to be able to draw comparisons, but preferably not too many, and we find this is achieved by controlling the FDR at 10%. We made a fast and easy-to-use implementation of a power and minimal sample size calculation method adapted from Ferreira et al. . The only input needed is a set of test statistics obtained from the pilot data and the number of samples of the two groups. More details about this R package will appear elsewhere.
In conclusion relative power and sample size analysis can help researchers make important decisions about technology used for gene expression profiling. We showed that if interest lies in genes expressed in the low-intensity range Solexa/Illumina deep sequencing is the superieur technology compared to microarray technology. Furthermore, we have implemented our method via a BioConductor package so that other researchers can use it on their own pilot data.
This work was part of the BioRange programme of the Netherlands Bioinformatics Centre (NBIC), which is supported by a BSIK grant through the Netherlands Genomics Initiative (NGI). The work was conducted within the Centre for Medical Systems Biology (CMSB), established by the Netherlands Genomics initiative/Netherlands Organisation for Scientific Research (NGI/NWO).
- Kerr M, Churchill G: Statistical design and the analysis of gene expression microarray data. Genetics Research. 2001, 77 (2): 123-128.Google Scholar
- Ferreira J, Zwinderman A: Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method. International Journal of Biostatistics. 2006, 21: 8-Google Scholar
- Langaas M, Lindqvist B, Ferkingstad E: Estimating the proportion of true null hypotheses, with application to DNA microarray data. Journal of the Royal Statistical Society, Series B. 2005, 67 (4): 555-572. 10.1111/j.1467-9868.2005.00515.x.View ArticleGoogle Scholar
- Benjamini Y, Krieger A, Yekutieli D: Adaptive linear step-up procedures that control the false discovery rate. Biometrics. 2006, 93 (3): 491-507. 10.1093/biomet/93.3.491.View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 1995, 57: 289-300.Google Scholar
- Ferreira J, Zwinderman A: Approximate Sample Size Calculations with Microarray Data: An Illustration. Statistical Applications in Genetics and Molecular Biology. 2006, 5: 25-10.2202/1544-6115.1227.View ArticleGoogle Scholar
- Desvergne B, Wahli W: Peroxisome proliferator-activated receptors: nuclear control of metabolism. Endocrine Reviews. 2006, 20 (5): 649-688. 10.1210/er.20.5.649.Google Scholar
- Sanderson L, de Groot P, Hooiveld G, Koppen A, Kalkhoven E, Müller M, Kersten S: Effect of synthetic dietary triglycerides: a novel research paradigm for nutrigenomics. PLoS ONE. 2008, 3 (2): e1681-10.1371/journal.pone.0001681.PubMed CentralView ArticlePubMedGoogle Scholar
- Bünger M, Bosch van den H, Meijde van der J, Kersten S, Hooiveld G, Müller M: Genome-wide analysis of PPARalpha activation in murine small intestine. Physiological Genomics. 2007, 30 (2): 192-204. 10.1152/physiolgenomics.00198.2006.View ArticlePubMedGoogle Scholar
- Dai M, Wang P, Boyd A, Kostov G, Athey B, Jones E, Bunney W, Myers R, Speed T, Akil H, Watson S, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Research. 2005, 33 (20): e175-10.1093/nar/gni179.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Z, Irizarry R, Gentleman R, Martinez-Murillo F, Spencer F: A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association. 2004, 99: 909-917. 10.1198/016214504000000683.View ArticleGoogle Scholar
- Smyth G: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology. 2004, 3: 3-10.2202/1544-6115.1027.View ArticleGoogle Scholar
- 't Hoen P, Ariyurek Y, Thygesen H, Vreugdenhil E, Vossen R, Boer J, van Ommen G, den Dunnen J: Deep Sequencing-based Expression analysis shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acid Research. 2008, 36 (21): 10.1093/nar/gkn705.
- Pedotti P, 't Hoen P, Vreugdenhil E, Schenk G, Vossen R, Ariyurek Y, van Ommen G, den Dunnen J, Boer J, de Menezes R: Can subtle changes in gene expression be consistently detected with different microarray platforms?. BMC Genomics. 2008, 9 (124):
- R Development Core Team: R: A Language and Environment for Statistical Computing. 2007, R Foundation for Statistical Computing, Vienna, Austria, [http://www.R-project.org]Google Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Pollard K, Dudoit MS, Laan van der : Multiple Testing Procedures and Applications to Genomics. 2008, Springer Series in Statistics, Springer, New YorkGoogle Scholar
- Storey J: A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B. 2002, 64: 479-498. 10.1111/1467-9868.00346.View ArticleGoogle Scholar
- Deepayan S: Lattice: Multivariate Data Visualization with R. 2008, [http://lmdvr.r-forge.r-project.org/]Google Scholar
- Lee M, Whitmore G: Power and sample size for DNA microarray studies. Statistics in Medicine. 2002, 21: 3543-3570. 10.1002/sim.1335.View ArticlePubMedGoogle Scholar
- Warnes G, Liu P: Sample Size Estimation for Microarray Experiments. 2006, [http://www.bioconductor.org/packages/2.3/bioc/vignettes/ssize/inst/doc/ssize.pdf]Google Scholar
- Jung SH: Sample size for FDR-control in microarray data analysis. Bioinformatics. 2005, 21 (14): 3079-3104. 10.1093/bioinformatics/bti456.View ArticleGoogle Scholar
- Liu P, Hwang J: Quick calculation for sample size while controlling false discovery rate with application to microarrays. Bioinformatics. 2007, 26 (6): 739-746. 10.1093/bioinformatics/btl664.View ArticleGoogle Scholar
- Jørstad T, Midelfart H, Bones A: A mixture model approach to sample size estimation in two-sample comparative microarray experiments. BMC Bioinformatics. 2008, 9 (117):
- Tong T, Zhao H: Practical guidelines for assessing power and false discovery rate for fixed sample size in microarray experiments. Statistics in Medicine. 2008, 27: 1960-1972. 10.1002/sim.3237.PubMed CentralView ArticlePubMedGoogle Scholar
- Ruppert D, Nettleton D, Hwang J: Exploring the information in p-values for the analysis and planning of multiple-test experiments. Biometrics. 2007, 63 (2): 483-95. 10.1111/j.1541-0420.2006.00704.x.View ArticlePubMedGoogle Scholar
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, hui Fan X, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker WJ: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology. 2006, 24 (9): 1151-61. 10.1038/nbt1239.View ArticlePubMedGoogle Scholar
- Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L, Wolfinger RD: Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nature Biotechnology. 2006, 24 (9): 1140-50. 10.1038/nbt1242.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.