Table 2 Example entries of the processed definition file

From: RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration

>NP _ 775259,NM _173167, Q 8IWX 7
60 I 00 SAP gTCA |NM _173167| dbSNP : 16970659
60 I 00 SAP vI |Q 8IWX 7| VI (dbSNP : 16970659): FTId = V AR _ 027506
199 V 00 SAP GcAT |NM _173167| dbSNP : 35749208
377 R 00 SAP AaGG |NM _173167| dbSNP : 41389545
496 H 00 SAP dH |Q 8IWX 7| DH (breast cancersomatic mutation). FTId = VAR _035870
778 Q 00 SAP CgGA |NM _173167| dbSNP : 34242925
852 N 00 SAP AtCA |NM_ 173167| dbSNP : 11654824
852 N 00 SAP iN |Q 8IWX 7| IN (dbSNP : 11654824). FTId = V AR _027507
>NP _076410,NM _023921,Q 9NY W 0
92 N 08,N 09, N 10, N 11,N 12    PTM Nlinked(GlcNAc...) |Q 9NY W 0| Nlinked(GlcNAc...)(Potential)
156 M 00 SAP AcGT |NM _023921| dbSNP : 597468
156 M 00 SAP mT |Q 9NY W 0| MT (dbSNP : 597468) FTId = V AR _030009
156 M 00 SAP tM |NP _076410| Alignment with Q 9NY W 0
158 N 08,N 09,N 10, N 11,N 12    PTM Nlinked(GlcNAc...) |Q 9NY W 0| Nlinked(GlcNAc...)(Potential)
  1. Two sequence clusters are shown in this table to demonstrate the structure of our processed information file. The text line after the ">" symbol contains accession numbers associated with the members of the cluster. The other rows each contains six entries separated by tabs. The first column indicates the residue position. The second column indicates the modified residue(s) that can occur at the position specified in the first column. The third column, labeled by either SAP or PTM, indicates the modification type. The fifth column contains the accession number of the source of modification, this may be a protein sequence or a mRNA. The fourth column explains the origin of the modification; a lower case letter indicates residue content in the source sequence, the upper case letter indicates the modified residue in the variant sequence. The notation, vI, indicates the source sequence with amino acid V can change into I, ie, a SAP. The notation, gTCA, is a short hand for codon change from gtc to atc, ie, a SNP that changes the coded amino acid from V to I as well. The sixth column contains additional information for the fourth column. It may include disease information or database entry index. As an example, in the first row of the first cluster, we have dbSNP : 16970659 indicating this SNP comes from the NCBI's dbSNP with entry index 16970659. In the fifth row, the sixth column contains disease origin. The additional Feature Identifier (FTId), VAR_xxxxxx, is the variant sequence documented by SwissProt. In the second cluster, fourth row, we see in the sixth column "Alignment with Q9NYW0", indicating that this SAP comes from the mismatch in the alignment between protein sequences in the clustering procedure. In the first and the last row of the second cluster, the second column contains N08, N09,..., N12, all of which are possible post-translational modifications associated with Glycosylations [22] as indicated in the sixth column.