Skip to main content

Table 1 Overview of the pretreatment methods used in this study. In the Unit column, the unit of the data after the data pretreatment is stated. O represents the original Unit, and (-) presents dimensionless data. The mean is estimated as: x ¯ i = 1 J j = 1 J x i j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyypa0ZaaSaaaeaacqaIXaqmaeaacqWGkbGsaaWaaabCaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGkbGsa0GaeyyeIuoaaaa@3DF5@ and the standard deviation is estimated as: s i = j = 1 J ( x i j x ¯ i ) 2 J 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaWgaaWcbaGaemyAaKgabeaakiabg2da9maakaaabaWaaSaaaeaadaaeWbqaamaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqaIYaGmaaaabaGaemOAaOMaeyypa0JaeGymaedabaGaemOsaOeaniabggHiLdaakeaacqWGkbGscqGHsislcqaIXaqmaaaaleqaaaaa@45A6@ . x ˜ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacaaaa@2E34@ and x ^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqcaaaa@2E35@ represent the data after different pretreatment steps.

From: Centering, scaling, and transformations: improving the biological information content of metabolomics data

Class Method Formula Unit Goal Advantages Disadvantages
I Centering x ˜ i j = x i j x ¯ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaaa@3A94@ O Focus on the differences and not the similarities in the data Remove the offset from the data When data is heteroscedastic, the effect of this pretreatment method is not always sufficient
II Autoscaling x ˜ i j = x i j x ¯ i s i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaacqWGZbWCdaWgaaWcbaGaemyAaKgabeaaaaaaaa@3DA4@ (-) Compare metabolites based on correlations All metabolites become equally important Inflation of the measurement errors
  Range scaling x ˜ i j = x i j x ¯ i ( x i max x i min ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaadaqadaqaaiabdIha4naaBaaaleaacqWGPbqAdaWgaaadbaGagiyBa0MaeiyyaeMaeiiEaGhabeaaaSqabaGccqGHsislcqWG4baEdaWgaaWcbaGaemyAaK2aaSbaaWqaaiGbc2gaTjabcMgaPjabc6gaUbqabaaaleqaaaGccaGLOaGaayzkaaaaaaaa@4BF0@ (-) Compare metabolites relative to the biological response range All metabolites become equally important. Scaling is related to biology Inflation of the measurement errors and sensitive to outliers
  Pareto scaling x ˜ i j = x i j x ¯ i s i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaadaGcaaqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaqabaaaaaaa@3DB4@ O Reduce the relative importance of large values, but keep data structure partially intact Stays closer to the original measurement than autoscaling Sensitive to large fold changes
  Vast scaling x ˜ i j = ( x i j x ¯ i ) s i x ¯ i s i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaadaqadaqaaiabdIha4naaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyOeI0IafmiEaGNbaebadaWgaaWcbaGaemyAaKgabeaaaOGaayjkaiaawMcaaaqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaaakiabgwSixpaalaaabaGafmiEaGNbaebadaWgaaWcbaGaemyAaKgabeaaaOqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaaaaaa@47A9@ (-) Focus on the metabolites that show small fluctuations Aims for robustness, can use prior group knowledge Not suited for large induced variation without group structure
  Level scaling x ˜ i j = x i j x ¯ i x ¯ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaacuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaaaaaa@3DC6@ (-) Focus on relative response Suited for identification of e.g. biomarkers Inflation of the measurement errors
III Log transformation x ˜ i j = 10 log ( x i j ) x i j = x ˜ i j x ˜ ¯ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiqbdIha4zaaiaWaaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaahaaWcbeqaaiabigdaXiabicdaWaaakiGbcYgaSjabc+gaVjabcEgaNnaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaaaeaacuWG4baEgaWeamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JafmiEaGNbaGaadaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaiyaaraWaaSbaaSqaaiabdMgaPbqabaaaaaa@4C62@ Log O Correct for heteroscedasticity, pseudo scaling. Make multiplicative models additive Reduce heteroscedasticity, multiplicative effects become additive Difficulties with values with large relative standard deviation and zeros
  Power transformation x ˜ i j = ( x i j ) x i j = x ˜ i j x ˜ ¯ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiqbdIha4zaaiaWaaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaGcaaqaamaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaaaSqabaaakeaacuWG4baEgaWeamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JafmiEaGNbaGaadaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaiyaaraWaaSbaaSqaaiabdMgaPbqabaaaaaa@4654@ √O Correct for heteroscedasticity, pseudo scaling Reduce heteroscedasticity, no problems with small values Choice for square root is arbitrary.