Skip to main content
Figure 5 | BMC Genomics

Figure 5

From: Addressing challenges in the production and analysis of illumina sequencing data

Figure 5

Image artifacts can generate false sequences. Cluster identification can identify crystals, dust and lint particles as well as other flow cell features as sequence clusters (A). Indicated are 103 non-library sequences originating from a lint particle that has been observed in a library that was sequenced with a three base pair tag ('GAC') in the beginning of each read. In this case, non-library sequences could therefore be distinguished based on these first three bases. The fraction of such artifact clusters is increased for low loading density and low intensity runs. A sequence entropy filter is efficient for removing the majority of these sequences (82.52% for a cutoff of 0.85), but also removes non-artifact sequences (B) - as indicated in the figure, 0.01% of the human reference genome (GRCh37/hg19). For 3'/5' tagged libraries or indexed sequencing libraries, filtering for the index/tag is therefore superior to base composition/sequence entropy filters for removing such sequencing artifacts.

Back to article page