Skip to main content

Table 1 Overview of the pretreatment methods used in this study. In the Unit column, the unit of the data after the data pretreatment is stated. O represents the original Unit, and (-) presents dimensionless data. The mean is estimated as: x ĀÆ i = 1 J āˆ‘ j = 1 J x i j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaOGaeyypa0ZaaSaaaeaacqaIXaqmaeaacqWGkbGsaaWaaabCaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGkbGsa0GaeyyeIuoaaaa@3DF5@ and the standard deviation is estimated as: s i = āˆ‘ j = 1 J ( x i j āˆ’ x ĀÆ i ) 2 J āˆ’ 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGZbWCdaWgaaWcbaGaemyAaKgabeaakiabg2da9maakaaabaWaaSaaaeaadaaeWbqaamaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqaIYaGmaaaabaGaemOAaOMaeyypa0JaeGymaedabaGaemOsaOeaniabggHiLdaakeaacqWGkbGscqGHsislcqaIXaqmaaaaleqaaaaa@45A6@ . x Ėœ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacaaaa@2E34@ and x ^ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaqcaaaa@2E35@ represent the data after different pretreatment steps.

From: Centering, scaling, and transformations: improving the biological information content of metabolomics data

Class

Method

Formula

Unit

Goal

Advantages

Disadvantages

I

Centering

x Ėœ i j = x i j āˆ’ x ĀÆ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaaa@3A94@

O

Focus on the differences and not the similarities in the data

Remove the offset from the data

When data is heteroscedastic, the effect of this pretreatment method is not always sufficient

II

Autoscaling

x Ėœ i j = x i j āˆ’ x ĀÆ i s i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaacqWGZbWCdaWgaaWcbaGaemyAaKgabeaaaaaaaa@3DA4@

(-)

Compare metabolites based on correlations

All metabolites become equally important

Inflation of the measurement errors

 

Range scaling

x Ėœ i j = x i j āˆ’ x ĀÆ i ( x i max āˆ’ x i min ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaadaqadaqaaiabdIha4naaBaaaleaacqWGPbqAdaWgaaadbaGagiyBa0MaeiyyaeMaeiiEaGhabeaaaSqabaGccqGHsislcqWG4baEdaWgaaWcbaGaemyAaK2aaSbaaWqaaiGbc2gaTjabcMgaPjabc6gaUbqabaaaleqaaaGccaGLOaGaayzkaaaaaaaa@4BF0@

(-)

Compare metabolites relative to the biological response range

All metabolites become equally important. Scaling is related to biology

Inflation of the measurement errors and sensitive to outliers

 

Pareto scaling

x Ėœ i j = x i j āˆ’ x ĀÆ i s i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaadaGcaaqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaqabaaaaaaa@3DB4@

O

Reduce the relative importance of large values, but keep data structure partially intact

Stays closer to the original measurement than autoscaling

Sensitive to large fold changes

 

Vast scaling

x Ėœ i j = ( x i j āˆ’ x ĀÆ i ) s i ā‹… x ĀÆ i s i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaadaqadaqaaiabdIha4naaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyOeI0IafmiEaGNbaebadaWgaaWcbaGaemyAaKgabeaaaOGaayjkaiaawMcaaaqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaaakiabgwSixpaalaaabaGafmiEaGNbaebadaWgaaWcbaGaemyAaKgabeaaaOqaaiabdohaZnaaBaaaleaacqWGPbqAaeqaaaaaaaa@47A9@

(-)

Focus on the metabolites that show small fluctuations

Aims for robustness, can use prior group knowledge

Not suited for large induced variation without group structure

 

Level scaling

x Ėœ i j = x i j āˆ’ x ĀÆ i x ĀÆ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacuWG4baEgaacamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0ZaaSaaaeaacqWG4baEdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaraWaaSbaaSqaaiabdMgaPbqabaaakeaacuWG4baEgaqeamaaBaaaleaacqWGPbqAaeqaaaaaaaa@3DC6@

(-)

Focus on relative response

Suited for identification of e.g. biomarkers

Inflation of the measurement errors

III

Log transformation

x Ėœ i j = 10 log ( x i j ) x āŒ¢ i j = x Ėœ i j āˆ’ x Ėœ ĀÆ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiqbdIha4zaaiaWaaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaahaaWcbeqaaiabigdaXiabicdaWaaakiGbcYgaSjabc+gaVjabcEgaNnaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaaaeaacuWG4baEgaWeamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JafmiEaGNbaGaadaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaiyaaraWaaSbaaSqaaiabdMgaPbqabaaaaaa@4C62@

Log O

Correct for heteroscedasticity, pseudo scaling. Make multiplicative models additive

Reduce heteroscedasticity, multiplicative effects become additive

Difficulties with values with large relative standard deviation and zeros

 

Power transformation

x Ėœ i j = ( x i j ) x āŒ¢ i j = x Ėœ i j āˆ’ x Ėœ ĀÆ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaaiqbdIha4zaaiaWaaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGH9aqpdaGcaaqaamaabmaabaGaemiEaG3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaakiaawIcacaGLPaaaaSqabaaakeaacuWG4baEgaWeamaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JafmiEaGNbaGaadaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabgkHiTiqbdIha4zaaiyaaraWaaSbaaSqaaiabdMgaPbqabaaaaaa@4654@

āˆšO

Correct for heteroscedasticity, pseudo scaling

Reduce heteroscedasticity, no problems with small values

Choice for square root is arbitrary.