From: Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing

Heuristic finding a w -window on the forward strand from a scan of k -words in VJ recombinations. Detection on the reverse strand is done in a similar way, and detection in VDJ recombinations is also based on the V and J genes. The labels V and J indicate the beginning of matching k-words in the index. (Top). The window is correctly centered on the N region (which is between the actual V and the actual J regions). There is one mutation (or sequencing error), denoted by ×, far from the 3’ end of the V region. (Upper middle). A mutation or an error in the k rightmost base pairs from the V region leads to a small error in the w-window prediction. However, the end of the V region is predicted with an error that is less than or equal to k. Because we use large values of w, parts of the V and J regions are still contained within the extracted w-window. (Lower middle). When there are too many errors compared with the size of the germline gene, the heuristic is unable to predict a w-window. This may happen particularly with the J gene, which is shorter than the V gene. For this to occur, mutations must be separated from each other by less than k bp. (Bottom). Spaced seeds improve the sensitivity of the heuristic. The spaced 10-word #####-##### leads to the recognition of k-words as soon as the mutations are separated by at least k/2 bp.

