Radon is the largest component of natural background radiation in the United States, and exposure is a risk factor for lung cancer. Comparison of epidemiological studies of uranium miners exposed to high levels of radon with studies of domestic exposures suggest that lower doses may be proportionately more dangerous than extrapolation from high doses would predict. This has resulted in the addition of a correction factor to domestic radon risk estimates, although the biological basis for this correction is not well understood . As few cells sustain the direct traversal of a radon alpha particle at domestic exposure levels, non-targeted effects such as bystander response may increase the number of cells at risk  through mechanisms such as tumor promotion  or induction of genomic instability .
The radiation bystander effect is the response of cells in contact with or in the vicinity of irradiated cells. Many endpoints have been measured in bystander cells, including sister chromatid exchanges, micronuclei, apoptosis, terminal differentiation, mutation and gene expression changes [5–9]. Some of these outcomes might be considered protective, while others could increase tissue risk and a better understanding of the regulation of bystander responses is needed. The mechanisms of the bystander response are known to involve both direct cell-to-cell communication and indirect release of factors into extra-cellular space. A variety of signaling molecules, including cytokines, reactive oxygen species, nitric oxide, prostaglandins and MAPK (mitogen-activated protein kinases) have been shown to be implicated in the bystander response, but the signal transduction pathways that regulate bystander responses are still not clear .
Overall, radiation effects at the tissue and organism levels are complicated to understand because they occur at different levels of biological organization, from chromosomal damage to metabolic pathways . After irradiation, signaling pathways rapidly modulate gene expression, which leads to additional signaling in the cell population both as a response to the initial damage and to maintain tissue homeostasis while the damage is being repaired . Also, bystander effects can result in long-term genomic instability, which suggests that bystanders may continue to respond to signals for many generations after the initial irradiation event [13, 14]. The radiation bystander effect, therefore, involves a complex cellular response across physical space and time. In the clinical context, the bystander effect has been linked with abscopal effects  and could potentially be exploited to enhance tumor-killing effects and to protect normal tissue from radiation exposure [12, 16]. After irradiation, when the processes of tissue homeostasis are severely impaired, carcinogenesis has been demonstrated in unexposed bystander tissue  underlining the importance of understanding the mechanisms involved. Bystander responses are, therefore, especially relevant to cancer risk assessment in low-dose/low dose-rate radiation exposure situations such as domestic radon exposure or extended space travel, and also in partial body exposures such as from medical radiation.
It is important to understand not only the physiological and DNA damage effects of radiation on cells but also the global inflammatory and stress responses of cells and tissues. For instance, irradiated fibroblasts are known to promote tumor formation in neighboring epithelial cells by altering the tumor microenvironment . With this in mind, we studied gene expression over time in normal human lung fibroblasts, at the mRNA level, to provide insight into the mechanisms and timing of signaling in irradiated and bystander cells. We have previously studied the gene expression response of bystander fibroblasts to 0.5 Gy α-particle irradiation, 4 hours after exposure . To better understand both early and sustained signaling associated with responding genes, we have now extended the study, measuring global gene expression at 0.5 hour, 1 hour, 2 hours, 4 hours, 6 hours, and 24 hours after irradiation. We studied the direct radiation and bystander gene expression responses separately to compare trends because, although much is known about the effects of radiation on gene expression in cells , the full effect of radiation encompasses cells that are hit and those that are not. Also, over time the response in tissues comes from the convergence of signaling and responding genes from both types of cells. In the previous study of the 4 hour response, we identified 238 genes that were significantly changed 4 hours after exposure in irradiated and/or bystander cells . In the current study, we focused our analysis on the response of these genes over time, and applied a novel time course clustering technique to identify genes with potential regulatory similarities.
The choice of methodology is a crucial issue in the use of clustering methods to examine structure in a given data set. It is important to choose and/or devise a methodology appropriate for the given data. Time series data are often analyzed using standard clustering algorithms such as hierarchical clustering, k-means and self-organizing maps [20–22]. Although these algorithms have yielded biological insights, the fundamental problem is that these methods typically treat measurements taken at different time points as independent, ignoring the sequential nature of time series data . Furthermore, most methods that have been developed specifically for time course data [24–26] are designed for longer time series. In contrast, most microarray-based studies encompass relatively few time points. In this study, six time points and four biological replicates were measured, yielding sparsity in both the number of time points and the number of replicates. This characteristic rules out any modeling based on classical time-series methods, because there are an insufficient number of observations to allow accurate estimation of the parameters associated with the models. While short time series datasets such as presented here are becoming more common, there are still few choices for clustering that are tailored towards this type of data.
Here, we examine the data using two non-parametric clustering algorithms. The first is the Short Time series Expression Miner (STEM) algorithm and software developed by Ernst et al., where all genes are clustered into one of a set of pre-defined patterns based on transformation of gene profiles into "units of change" . Then, clusters are assigned significance levels using a permutation test based method. Second, we apply a clustering method proposed in  that uses the Partitioning Around Medoids (PAM)  algorithm, which we have called the Feature Based PAM Algorithm (FBPA). It employs an innovative set of features of gene expression over time, such that, the unit of analysis changes from gene expression at given time points to profile curves over the entire time horizon. Unlike alternative approaches, it does not pre-specify patterns of expression and does not cluster point values using a distance measure or a model. The algorithm clusters biologically relevant features or curve summarization measures, extracted from each short time series, and then feeds these features into the PAM algorithm. PAM is very similar to the k-means algorithm, chosen here because it uses median data points to determine cluster centroids instead of the mean, making it more robust to outliers. This approach is designed to be both statistically powerful and biologically valid.
The idea of feature selection was first used in the context of clustering large time series data for dimension reduction, where the term dimension refers to the number of time points that describe the series. In these cases, a few well chosen statistics describing the dynamics of the series such as serial correlation, skewness, and kurtosis were used to summarize the data . We also used feature selection, but in the sparse-data context, as a dimension augmentation technique to effectively and appropriately describe the curve and provide the most complete description of the time series possible. The clustering features we proposed here were based on the structural characteristics of the time course data and reflect a clear link with subject-matter considerations and the questions under study. The features we used were: the vector of slopes between adjacent time points, maximum and minimum expression, time of maximum and minimum expression, and the steepest positive and negative slope. In a sense, they capture the "global picture" of an admittedly short time horizon of expression and provide sufficient summarization of the dynamic structure of the curves. An obvious advantage of this method is that it can handle time series of various lengths with measurements taken at different time points as well as data with missing values. Although the fundamental idea on which this method is based, effective summarization of time course data, is transferable to a variety of application domains, the best features describing the time series are context-dependent and may differ depending on the application domain.
FBPA sufficiently describes the time course by performing dimension augmentation using biologically relevant features, thus avoiding interpolation/extrapolation; as such, the unit of the analysis is the time course itself, and not the expression measurements obtained at each time point. Because FBPA clusters all genes, it preserves information and renders unnecessary the notion of cluster significance. The use of biologically relevant features, together with the sufficient description of the time course, tends to produce clusters with focused biology.
This study addressed the question: can we extract information about regulation of genes in irradiated and bystander cells from closely coordinated temporal gene expression profiles? To do this we evaluated STEM and FBPA in both treatment conditions and showed our assessment of the results of both methodologies using computational measures as well as biological enrichment. To measure cluster tightness, we used homogeneity, and to measure cluster separation and structure we used the average silhouette, both are described in detail in the Methods section. To compare agreements of the various clustering methods, we used the Rand Index. We also curated a manual clustering using a subset of the data to compare clustering methods. We then assessed the biological implications of temporal clustering in both treatments and by both clustering methods, using gene ontology and pathway tools. Gene ontology analyses using the PANTHER tool showed that FBPA tended to cluster genes with related functions together and separated different biological processes into distinct clusters. This suggested that the features selected to describe the gene expression curves for FBPA analysis were more relevant to the underlying biological signaling than the parameters used in STEM. Network analysis using the Ingenuity Pathway Analysis (IPA) tool was also applied to the clusters enriched in related biological processes to identify potential hubs regulating specific aspects of the radiation and bystander responses. The overall picture of biological networks in irradiated versus bystander cells analyzed by FBPA clustering showed that temporal curves of gene expression after irradiation can be clearly differentiated into focused biological clusters. In comparison, bystander gene expression suggested that there is a general stress and inflammatory response in bystanders that can overshadow specific signaling networks. Some important and novel regulatory processes were suggested by the FBPA clustering approach, however, and we predicted the possible epigenetic regulation of the metallothionein gene family after irradiation and in neighboring bystanders as a novel finding in our study.