SOM abundance portraits and sample trajectories
Figure 1a shows the gallery of protein abundance landscapes as seen by the SOM-portraits. They visualize the mean protein abundances averaged over the individual volunteer data at each time point of sample collection. Hence, each landscape 'portrays' the proteomics phenotype of the about 2,000 protein species identified by mass spectrometry in the urine samples (IPI items). Proteins with high topmost over and under-expression levels are localized in the red and blue spot-like regions, respectively. The spot patterns clearly change in the course of the experiment reflecting alterations in the proteomics phenotypes potentially caused by isolation, modifications of salt (NaCl) consumption and presumably other factors.
Panel b of Figure 1 shows the so-called 2nd-level SOM which visualizes the mutual similarities between the samples in a two-dimensional plot. The samples pass virtually four time windows where the first and second ones were indicated by dotted ellipses: The first window includes the samples taken before starting the isolation experiment. The second time window lasts roughly until the end of the sixth week of isolation in which salt consumption is reduced from 12 g/day to 9 g/day. The third period ends after week no. 11, i.e. two weeks after salt consumption is further reduced to 6 g/day. The last time window finally includes the samples taken in the last three weeks of the isolation experiment and the three sample points taken afterwards. Note that the transition between time window two and three forms a sort of turning point of the trajectory after that the proteomic landscapes in the phase space of the 2nd level SOM 'move' back in direction towards the starting point. According to the amount of salt consumption the samples taken before/after this turning point refer to higher and lower salt consumption, respectively. In a more rough view we divide the data into an 'early', 'intermediate' and a 'late' time regime as indicated in Figure 1: It considers the similarity of the abundance landscapes in the first two time windows and aggregates them into one early phase.
In the supplementary text we analyzed similarity relations using independent component analysis (ICA) projecting the samples in linear scale. ICA virtually confirms the results obtained using 2nd level SOM.
Spot trajectories and module selection
The SOM-algorithm distributes the proteins over the map such that co-expressed proteins become located nearby. In consequence, proteins specifically up-regulated in one of the time regimes aggregate into red spot-like textures at a certain position of the map. With evolving time of the experiment the spot patterns change and, in particular, existing spots disappear and new ones appear at new positions (see Figure 1a). Figure 2 (upper part) illustrates these spot trajectories for red over- (left panel) and blue under- (right panel) expression spots. The so-called summary maps aggregate all red or blue spots observed in the individual profiles into one master map, respectively. The arrows illustrate the temporal order of appearance of the respective spots: Due to the self-organizing properties of the map red and blue spots 'rotate' in counterclockwise direction along the edges of the map in a central-symmetrical fashion. I.e., as a rule of thumb red and blue spots often appear as antagonistic twins indicating that each state is characterized by a set of up-, and a set of down-regulated proteins as well.
This property of self-organization is reflected in the spot-spot correlation and anti-correlation maps which were calculated using a weighted-topology overlap network approach as described in the Methods section and in ref. [12]: The bottom left panel in Figure 2 shows that spots up-regulated in the early time range are mutually highly correlated forming a sort of continuum of states located in right-upper part of the map. The two time windows in the early range are consequently associated with spots along the right and upper border of the map, respectively. The intermediate and late time ranges are accompanied by a marked shift of the spot position towards the lower left corner of the map thus allowing to associate the proteins within the respective spots with the discontinuous changes in samples trajectory described above (see also Figure 1). The anti-correlation map (bottom right panel in Figure 2) supports the view that spots up-regulated in the early and intermediate/late time ranges are down regulated at intermediate/late and early time ranges, respectively. Hence, the characteristic breakpoints along the spot trajectories observed can be associated with discontinuous changes of protein abundance detected in the spot trajectories.
In the next step we address the question how to select the spots appropriately or, in other words, how to segment the map properly into regions of co-regulated proteins. Besides the over- and under-expression spot selection algorithm we also applied alternative methods based on correlation and K-means clustering. Details and results of this analysis were provided in the supplementary text.
We found that the spot selection method is not crucial for extracting the basal dynamic properties of the system. In dependence on partial needs, e.g. to extract strongly differentially expressed proteins or larger groups of mutually co-expressed or even largely invariant features we recommend the overexpression, correlation or K-means clustering method, respectively. Here we will focus on the overexpression spot selection method because it is a good choice for marker selection which includes up- and down-regulated features as well. Selected results for the correlation and K-means clustering methods are presented in the supplementary text (Additional file 1).
Spot profiles and functional analysis
Figure 3 assigns the spot profiles to selected overexpression spots. These profiles are mean time-dependent protein abundance data averaged over all meta-features included in the respective spot. The meta-features, in turn, are mean protein abundance data averaged over all single protein data contained in each meta-feature. Hence, the spot profiles are mean profiles characterizing the average abundance of the single proteins included in the respective spot. Most of the profiles show a wave-like shape with a maximum and minimum in different time windows reflecting the dynamic up- and down-regulation of proteins during the experiment. In direction of the spot trajectories discussed above, the abundance maximum seen in the individual spot profiles shifts to later times. The spot trajectory thus reflects first of all the phase-shift φ of the wave-like profiles which roughly increases from φ ~ 0-T*/2 for early activation (e.g. spot D) to φ ~ T*/2 - T* for activation at intermediate and late times (e.g. spot R). Here T* denotes the period of the changes, e.g. given as total time of the experiment. The spot profiles differ however not only in the position of their abundance maximum but also in the time delay between maximum and minimum abundance and also in their shape which can resemble more a harmonic cosine (e.g. spots G and R) or more a single peaked function (e.g. spots M and Q). The period can cover the whole duration of the experiment, i.e. T*~105 days (e.g. spots D and J) or a considerably longer or shorter time, T ~ 2 T* (e.g. spots E and R) or T<T* (e.g. spots L and P), respectively. Note that periodic changes of protein abundance can be induced by different extrinsic factors such as the activity, nutrition and working regime (e.g. night shift work during the experiment) of the volunteers, salt consumption but also intrinsic ones such as hormone activities (e.g. of andosterone, see discussion) and thus the period, or in other words, the degree of recovery of protein abundance after its perturbation, can deviate from the time span of the experiment.
Hence, the spots along the spot trajectory represent clusters of proteins concertedly activated and deactivated in sequential order during the experiment and differing also in the time of activation and the degree of recovery of the initial state at the end of the experiment. The overexpression spots contain from 6 to 27 proteins (as given in Figure 3) whereas the correlation and K-means spot clusters are markedly larger with 23-76 and 39-93 proteins per cluster, respectively (see respectively (see supplementary text; Additional file 2 and 3). Despite their differing size, the respective spot profiles taken from comparable regions of the map look very similar (compare Figure 3 with the respective figures shown in the supplementary text for correlation and K-means clustering).
The different time profiles of the spots allow us to relate them to different properties of the sample trajectory depicted in Figure 1. Particularly, spots showing different levels of protein abundance at the start and the end of the experiment (i.e. with periods T ≠ T*) are responsible for the shift between the start and end points of the sample trajectories whereas spots with cosine-like profiles and T~T* and also spots with peak-shaped profiles are mainly responsible for the turning point of the trajectories because the respective proteins mostly recover their abundance state during the experiment (see Figure 1).
Enrichment analysis using more than 2000 predefined groups of proteins referring to different GO-terms from the categories 'biological process', 'cellular component' and 'molecular function' allowed us to assign the functional context to each of the spot clusters selected. In Figure 3 the leading gene set is given for each overexpression spot cluster. The results of a more detailed analysis are given in the heat map shown in Figure 4 (see refs. [13, 14] for the description of the method) and in the supplementary text where we map and profile selected protein sets in detail. According to these analyses the early time range is characterized by the activation of inflammatory processes and angiogenesis (gene sets inflammation, extracellular region, cell adhesion, complement activation, proteolysis, angiogenesis and Calcium ion binding) whereas intermediate and late responses are related to developmental and regenerative processes (development, mitosis, regulation of transcription, chromatin remodeling) and stress and drug response (small molecule regulative process, response to oxidative stress, hypoxia, apoptosis, response to Zinc, Magnesium ion binding, G-protein coupled activity), respectively. Note that part of the processes related to inflammation, drug response and also to genome and transcriptome activity (chromatin remodeling, DNA repair) can be attributed to the lack of recovery of the sample trajectories (these processes are marked by the asterisks in Figure 4).
Clusters of proteins associated to the response of the organism to 'NaCl' deficiency are identified previously using a comprehensive interactome network analysis [10]. We mapped proteins from these clusters into SOM space and found that they mostly refer to the early, and to a less degree, to the intermediate-time response (see supplementary text).
Pathway signal flow analysis (PSF) represents an independent option to discover the functional context of the spot profiles. In contrast to gene set enrichment analysis it takes into account the network topology of selected pathways taken from the KEGG database to obtain PSF-profiles which are compared with the abundance profiles of the spots. It turned out that early and intermediate protein abundance changes are associated with inflammatory responses and metabolic processes (fatty acids, nucleic acids and amino acids) indicating alterations of nutrition and partly starvation followed by activation of regenerative processes (Wnt-pathway, N-glycan biosynthesis) in the intermediate time range and of stress response signaling (p53 and mTOR-signaling pathways) and digestion at late times of the experiment (see Figure 5 and supplementary text for details). Many pathways lead to the activation of protein kinase C and inositol-triphosphate signaling cascades in agreement with the enriched protein sets related to signal transduction such as Ca2+ binding and G-protein coupled receptor activity.
Individual volunteer analysis
So far we presented results based on the averaging of the abundance of each protein at each time point over all six volunteers. This 'mean volunteer' analysis allowed extracting mean effects induced by isolation and varying salt consumption but it neglects individual differences between the volunteers. We therefore performed a second independent SOM analysis of the individual data of each volunteer. Figure 6 shows the gallery of time-dependent 'personalized' portraits of all six probands (P1 - P6). As for the 'mean volunteer analysis', the protein abundance landscapes can be divided into typical color textures assigned to the early-, intermediate- and late-response types, respectively. Simple visual inspection of the portraits shows that the abundance patterns of most of the volunteers alter in parallel (see the colored frames in Figure 6). Partly, one observes however small variations in the time-dependent changes: For example, the portraits of P4-P5 switch into the time regime of the 'late' type almost one-two weeks earlier than that of P1-P3. Late-type protein abundance patterns were observed for P5 in three samples taken before starting isolation.
Figure 7 shows the individual sample trajectories of each of the volunteers using 2nd level SOM analysis. One sees that virtually each trajectory can be clearly divided into the early, intermediate and late time ranges. The borderlines separating the different time regimes however slightly shift between the individuals. One also sees that volunteer P5 is characterized by a certainly more intricate trajectory reflecting his individual specifics.
Next, we performed functional analysis by applying gene set enrichment clustering to the single volunteer data (see supplementary text for details). In general, the functional context of the different time ranges agrees with that of the mean volunteer analysis. However, the larger set of individual sample data provides a more detailed view on the specifics of each volunteer. For example, features related to 'immune response' were either up-regulated in the early phase of the experiment only (P1, P4, P6) or, in addition, again in the late phase (P2, P3, P5).
Organ related protein abundance
Proteins not of renal origin fall in urine from blood and in blood from the respective tissues and cells. We used Tissue specific Gene Expression and Regulation data base (TiGER, [15]) to assign protein species to different tissues and assess their abundance in the urine samples studied (see Figure 8 and supplementary text). First we map the tissue-related protein sets to SOM space: It turned out that the respective species of a series of tissue sets accumulate in different regions of the map which were assigned to different time ranges. For example, pancreas and liver proteins show an increased local density in the area of early_up proteins, muscle proteins in the region of intermediate_up region and testis proteins in the late_up region. The respective time profiles confirm the expected activation patterns. We found that proteins from liver, pancreas and kidney show increased abundance before and at the beginning of the isolation experiment. Proteins from muscle are overexpressed at intermediate times of isolation and proteins related to testis and stomach at the end and after isolation. Protein sets related to skin, lymph nodes, blood, prostate, brain and colon show virtually no or only a very weak time dependence in the single volunteer analysis.
The single volunteer tissue profiles again reveal individual differences between the probands: For example, liver proteins of P1 and P5 respond much weaker than liver proteins of the other volunteers. The individual profiles of prostate proteins clearly show time dependencies which however are averaged out in the average volunteer profile due to their asynchronous character (see supplementary text provided in Additional file 1).
Total protein abundance analysis
In addition to single-, meta-feature and spot related abundance levels using centralized values (i.e. normalized ones with respect to the mean value averaged over all volunteers and time points) we analyzed the time profile of the total protein (i.e. integral) abundance level in terms of the variance of the respective meta-feature abundance landscapes (Figure 9). The abundance landscapes refer to a separate SOM training described below and in the supplementary text. It turned out that, on average, the total abundance level slightly increases before isolation in the early time range but then, after a plateau, it steeply decreases in the intermediate and late time ranges until the end of isolation of the volunteers. Hence, isolation causes the overall decrease of protein abundance in the urine samples. In other words, processes down-regulated in the intermediate and late time regimes obviously involve a larger number of proteins and/or their stronger abundance changes than processes up-regulated in the late time regime. Analysis of the population map supports this expectation (see supplementary text): About 27% of the proteins and 33% of the meta-features are up-regulated in the early time range whereas only 20%/13% of the proteins/meta-features up-regulate in the late regime. The remaining 53%/54% refer to rare and single spiked features. Inspection of the individual volunteer data again reveals slight differences between the total abundance levels of the probands and between details of their respective time courses (Figure 9). For example, P5 shows a decreased total level of protein abundance.
The detailed inspection of all total profiles indicates a certain fine structure in terms of three to four local peaks which appear immediately before or at starting isolation, after reducing salt consumption from 12 to 9 g/day and further to 6 g/day and at the end of the experiment (see the asterisks in Figure 9). Interestingly, adjacent local peaks of total protein abundance are separated by about five weeks possibly reflecting an intrinsic infradian rhythm in protein abundance. The total abundance level slightly increases after finishing isolation indicating slow regeneration of the volunteers. Part of this fine structure is found also in the abundance profiles of selected spot modules, e.g. of the overexpression spots G, E, M, P, J and Q (see Figure 3) expressing one or two sharp peaks in the time regions identified in the total abundance analysis.
To get deeper insight into this phenomenon we performed a full and detailed SOM analysis of the absolute abundance profiles using a similar approach as developed for differential abundance data. Recall that analysis using centralized profiles applied so far focuses on abundance changes independent of the abundance level. For example, virtually invariant profiles of high and of low abundance levels were clustered together in this case. Absolute abundance values certainly distinguish between these two situations. Thus the analysis of absolute abundance profiles is expected to provide additional information about the abundance levels of the proteins in the course of the experiment. Detailed results were described in the supplementary text (Additional file 1). We found that a series of processes become activated in relatively narrow time windows of peaked abundance at the four fixed times identified in total abundance analysis, namely at or immediately before isolation (angiogenesis, complement activation and others), at or immediately after reducing salt consumption to 9 g/day (focal adhesion and cytoskeleton) and to 6 g/day (cell differentiation and organ development) and near the end of the experiment after isolation. The latter trend suggests recovery of the initial state before starting isolation. Double peaked profiles combine peaks at late and intermediate times (e.g. metabolic process and apoptosis). Importantly, immune response processes are permanently active during the experiment with a slight decay in the late time range. About 60% of the proteins are permanently expressed on low abundance levels during the experiment whereas about 7% - 10% are permanently expressed on high abundance levels. This result agrees with our estimation using centralized data.