Skip to main content
Fig. 4 | BMC Genomics

Fig. 4

From: NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites

Fig. 4

Illustration of the procedure of G’-filtering. Panel a: A GBrowse [39] snapshot of a representative location showing filtered and unfiltered TSS-Seq data. The track labeled “Reads” displays individual reads mapped to the genome, with those beginning with a “G” in red, those that begin with a “G” that does not match the reference genome (unencoded) in black, and all other reads in blue. The track labeled “Peak Calls” shows all peaks called using all reads from the “Reads” track. The track labeled “Read Distribution” shows a histogram of all reads found in the “Reads” track, while the “Filtered Peak Read Distribution” track shows a histogram of the reads belonging to peaks that passed G’-filtering. The “Genes” track shows genome annotation for the gene (AT1G01010). Panel b: Representative distribution of TSS peak calls along the length of genes based on percent of reads starting with unencoded “G”s. The percent of reads in each peak that began with a “G” that did not match the reference genome was calculated and peaks were filtered based on having a minimum of 0 to 100 % of reads beginning with an unencoded “G” (x-axis). All peaks passing the G’ filter were then categorized based on the gene part to which they aligned: promoter = ≤3000 bp upstream from TAIR10 annotated TSS, TSS = peak overlaps TAIR10 TSS, 5′ UTR = peak begins in the 5′ untranslated region, CDS = peak begins in the coding portion of a gene, Intron = peak begins in an intron of the gene, 3′ UTR = peak begins in the 3′ untranslated region. The percent of peaks annotated for each category with the minimum percent of reads beginning with an unencoded “G” was then totaled and plotted (y-axis)

Back to article page