Skip to main content
Figure 2 | BMC Genomics

Figure 2

From: Mobile element scanning (ME-Scan) by targeted high-throughput sequencing

Figure 2

Sequence analysis pipeline. Paired-end 2x36-bp sequence reads generated on an Illumina GAII are received as fastq-formatted text files. The index of each read pair is identified and trimmed from the genomic flank reads. Read pairs that could not be assigned a valid index are filtered out. Read quality filtering then removes read pairs in which either read is composed mainly of a single nucleotide, contains too many 'N' base calls, or contains sequences derived from the adapter oligonucleotides. The remaining read pairs are then mapped to the human reference genome, in two ways: first with the expected 16 bp of Alu sequence in the junction read, to identify Alu insertions that are present in the reference; and then with those 16 bp trimmed off of the junction read, to enable identification of new Alu insertions. Read pairs that do not map to a unique location with the proper orientation and the expected distance between them are then filtered out. For each read pair, the position (in the reference genome) of the final nucleotide of the Alu junction read is computed for use as the unique identifier of the corresponding insertion. Read pairs are then grouped according to those positional identifiers. Loci that lack Alu sequence in the first 16 bp of the junction read are annotated as such and rejected as unreliable. The final data set consists of a list of insertion loci observed in at least one sample and the number of read pairs supporting the presence of each insertion in each indexed sample.

Back to article page