Skip to main content

Table 2 Performance of the MSCleaner version 2.0 over a large test set.

From: Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction

A1

A2

A3

A4

A5

A6

A7

A8

A9

A10

A11

A12

A13

A14

A15

A16

alphaAmyl_col1

10108

633

24

11.30

667

24

31.65

60.07

667

24

15.13

51.09

667

24

18.07

alphaAmyl_col2

10184

698

35

9.82

780

35

34.20

50.22

780

35

19.05

20.25

780

35

22.76

AmylGlu_col1

10030

736

28

13.26

761

28

28.40

79.24

761

28

8.66

73.58

761

28

10.63

AmylGlu_col2

9870

801

36

13.31

860

37

29.50

72.62

860

37

11.70

63.95

860

37

14.29

apo_col1

10032

2606

63

11.72

2814

63

30.76

63.10

2814

63

13.93

54.49

2814

63

16.78

apo_col2

10090

2571

60

12.13

2761

60

32.95

53.12

2761

60

17.53

44.32

2761

60

21.03

betaGal_col1

10324

1459

56

7.17

1567

57

34.98

48.06

1567

57

22.05

40.53

1567

57

24.60

betaGal_col2

10368

1309

51

8.12

1508

56

36.71

42.90

1454

55

24.76

33.10

1454

55

28.61

CarAnly_col1

9946

586

49

12.35

616

49

26.35

90.31

573

49

3.65

84.94

607

49

5.48

CarAnly_col2

9534

582

52

13.40

616

52

26.27

86.07

616

52

5.08

78.44

616

52

7.66

Cat_col1

10098

1798

61

11.13

1886

61

30.88

67.26

1879

61

13.13

57.89

1879

61

16.50

Cat_col2

10034

1567

65

11.78

1693

65

31.90

59.50

1693

65

15.91

48.55

1693

65

19.56

phosB_col1

10118

2780

59

10.30

3079

61

35.13

63.49

3014

60

14.26

54.46

3047

61

17.25

phosB_col2

10096

2655

61

10.52

3116

65

32.58

53.96

3084

65

17.58

44.31

3116

65

21.16

GluDey_col1

10006

892

36

11.29

986

36

27.30

79.55

986

36

7.75

73.42

986

36

9.71

GluDey_col2

9886

850

34

11.81

962

34

28.73

72.51

962

34

10.13

62.25

962

34

13.51

GluTra_col1

10022

351

25

10.36

389

25

28.61

71.64

348

25

10.25

62.78

389

25

14.30

GluTra_col2

10156

341

33

9.18

384

33

31.31

61.15

384

33

14.25

49.59

384

33

28.11

Immo_col1

10330

506

35

9.27

565

35

36.20

42.30

565

35

24.95

34.44

565

35

27.66

Immo_col2

10334

356

66

8.61

500

66

38.05

37.06

500

66

27.31

28.47

500

66

30.31

LacDe_col1

10286

1549

58

10.36

1694

58

35.36

53.20

1694

58

20.03

44.86

1694

58

23.15

LacDe_col2

10250

1346

54

9.07

1483

54

36.48

40.16

1483

54

25.60

31.67

1483

54

28.31

LactoPee_col1

10242

1613

45

13.16

1764

45

34.78

62.12

1756

45

15.91

52.37

1764

45

19.53

LactoPee_col2

10402

1679

43

9.09

1890

44

35.18

51.70

1890

44

20.31

41.76

1890

44

23.85

Myo_col1

9958

561

66

11.67

594

66

27.26

85.42

594

66

5.46

79.25

594

66

7.45

Myo_col2

9744

530

66

12.15

584

66

28.01

80.83

584

66

6.95

70.92

584

66

10.35

  1. A1 name of test set (.mgf file; see Methods),
  2. A2 total number of spectra (.dta files),
  3. A3 MASCOT score of top protein hit with the original .mgf file
  4. (without application of MS Cleaner),
  5. A4 sequence coverage (in %) without application of MS Cleaner,
  6. A5 fraction of non-interpretable "bad" spectra found with sequence ladder
  7. length n= 4 among all peaks (intensity threshold s= 100%)
  8. A6 MASCOT score of the top protein hit for this search,
  9. A7 sequence coverage (in % of the whole protein length) for this search,
  10. A8 MS Cleaner processing time (in min) on a PC with a single Pentium IV (to
  11. achieve exact time consumption values, we did not use the cluster version and
  12. stopped the "soft frequency recognition option")
  13. A9 fraction of non-interpretable "bad" spectra found with sequence ladder
  14. length n= 4 among the s= 20% most intense peaks
  15. A10 MASCOT score of the top protein hit for this search,
  16. A11 sequence coverage (in % of the whole protein length) for this search,
  17. A12 MS Cleaner processing time (in min),
  18. A13 fraction of non-interpretable "bad" spectra found with sequence ladder
  19. length n= 4 among the s= 25% most intense peaks (in % of A2; i.e.,
  20. of all spectra)
  21. A14 MASCOT score of the top protein hit for this search,
  22. A15 sequence coverage (in % of the whole protein length) for this MASCOT
  23. search,
  24. A16 MS Cleaner processing time on the same machine as described in the legend of
  25. Table 1 (in min).
  26. The sequence ladder criterion (minimal ladder length 4 with varying peak intensity thresholds) and the noise suppression algorithms of MS Cleaner 2.0 have been applied over a large set of tandem MS results. For each of the test proteins, two independent sample preparations and dataset recordings (marked with appendices _col1 and _col2 in the dataset name) were carried out: α-amylase, amylogucosidase, apo-transferrin, β-galactidase, carbonic anhydrase, catalase, phosphorylase B, glutamic dehydrogenase, glutathione transferase, immunoglobulin γ, lactic dehydrogenase, lactoperoxidase, myoglobin). For these datasets, the MASCOT interpretation was carried out on a cluster in parallel with other jobs; therefore, no computation time is provided.