Skip to main content

Table 2 Example entries of the processed definition file

From: RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration

>NP _ 775259,NM _173167, Q 8IWX 7

60

I 00

SAP

gTC → A

|NM _173167|

dbSNP : 16970659

60

I 00

SAP

v → I

|Q 8IWX 7|

V → I (dbSNP : 16970659): FTId = V AR _ 027506

199

V 00

SAP

GcA → T

|NM _173167|

dbSNP : 35749208

377

R 00

SAP

AaG → G

|NM _173167|

dbSNP : 41389545

496

H 00

SAP

d → H

|Q 8IWX 7|

D → H (breast cancersomatic mutation). FTId = VAR _035870

778

Q 00

SAP

CgG → A

|NM _173167|

dbSNP : 34242925

852

N 00

SAP

AtC → A

|NM_ 173167|

dbSNP : 11654824

852

N 00

SAP

i → N

|Q 8IWX 7|

I → N (dbSNP : 11654824). FTId = V AR _027507

>NP _076410,NM _023921,Q 9NY W 0

92

N 08,N 09, N 10, N 11,N 12

  

PTM

Nlinked(GlcNAc...) |Q 9NY W 0| N – linked(GlcNAc...)(Potential)

156

M 00

SAP

AcG → T

|NM _023921|

dbSNP : 597468

156

M 00

SAP

m → T

|Q 9NY W 0|

M → T (dbSNP : 597468) FTId = V AR _030009

156

M 00

SAP

t → M

|NP _076410|

Alignment with Q 9NY W 0

158

N 08,N 09,N 10, N 11,N 12

  

PTM

Nlinked(GlcNAc...) |Q 9NY W 0| N – linked(GlcNAc...)(Potential)

  1. Two sequence clusters are shown in this table to demonstrate the structure of our processed information file. The text line after the ">" symbol contains accession numbers associated with the members of the cluster. The other rows each contains six entries separated by tabs. The first column indicates the residue position. The second column indicates the modified residue(s) that can occur at the position specified in the first column. The third column, labeled by either SAP or PTM, indicates the modification type. The fifth column contains the accession number of the source of modification, this may be a protein sequence or a mRNA. The fourth column explains the origin of the modification; a lower case letter indicates residue content in the source sequence, the upper case letter indicates the modified residue in the variant sequence. The notation, v → I, indicates the source sequence with amino acid V can change into I, ie, a SAP. The notation, gTC → A, is a short hand for codon change from gtc to atc, ie, a SNP that changes the coded amino acid from V to I as well. The sixth column contains additional information for the fourth column. It may include disease information or database entry index. As an example, in the first row of the first cluster, we have dbSNP : 16970659 indicating this SNP comes from the NCBI's dbSNP with entry index 16970659. In the fifth row, the sixth column contains disease origin. The additional Feature Identifier (FTId), VAR_xxxxxx, is the variant sequence documented by SwissProt. In the second cluster, fourth row, we see in the sixth column "Alignment with Q9NYW0", indicating that this SAP comes from the mismatch in the alignment between protein sequences in the clustering procedure. In the first and the last row of the second cluster, the second column contains N08, N09,..., N12, all of which are possible post-translational modifications associated with Glycosylations [22] as indicated in the sixth column.