Skip to main content


Table 1 Selected nucleotide pattern frequencies for the human data

From: A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts

  GRCh37 GRCh38
1 aa, aaa, ac, aca, acg aa, aaa, ac, aca, act
2 act, ag, aga, at, ata ag, aga, at, ata, atc
3 atc, atg, att, ca, caa atg, att, ca, caa, cac
4 cac, cag, cat, cc, cca cag, cat, cc, cca, ccc
5 ccc, cg, cgc, ct, cta cg, cgc, ct, cta, ctc
6 ctc, ctg, ga, gac, gag ctg, ga, gac, gag, gc
7 gc, gcg, gg, ggg, gt gcg, gg, ggg, gt, gta
8 gtc, gtg, ta, tac, tag gtc, gtg, ta, tac, tag
9 tat, tc, tca, tct, tg tat, tc, tca, tct, tg
10 tga, tgt, tt, ttg, ttt tga, tgt, tt, ttg, ttt
  1. GRCh37 and GRCh38 data sets were analyzed to identify 50 pattern frequencies with the highest PCA loadings. The patterns “acg” and “gta”, in bold, are the only difference. In the additional files, we listed these nucleotide pattern frequencies, ordered by PCA loadings