Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: POTION: an end-to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes

Fig. 1

General schema of POTION. Black boxes represent user-provided files and final results, grey boxes indicate filtering steps, and white boxes indicate parallelized steps performed for each valid group of homologs. Filtering steps comprise four sequential conceptual stages (AD), each composed of one or more sequential filters (numbered steps). Stage “A” comprises four filters for removal of sequence data: (1) absence of valid start and/or stop codons; (2) presence of non-standard nucleotides; (3) length not a multiple of three and (4) lower and upper bounds for sequence length. Stage “B” comprises one filter to remove sequences and groups according to homology relationships within groups, allowing users to analyze biologically meaningful gene sets they wish (1-1 orthologs and/or paralogs, for instance). Stage “C” comprises four filters for sequences and groups: (1) mean sequence identity of groups or of individual sequences; (2) removal of groups containing any sequence removed in previous steps, allowing users to analyze only high-quality data since the beginning of analysis; (3) removal of groups containing sequence and species count outside user-defined ranges and (4) removal of groups with no sequence from a user-defined anchor genome. Step “D” comprises a filter where POTION detects groups with evidence of recombination using three methods (Phi, NSS, Max Chi2), followed by multiple hypothesis correction. After the filtering steps POTION executes the following sequential analyses in parallel for each valid group of homologs: multiple protein sequence alignment using one out of three popular sequence aligners: MUSCLE, MAFFT or PRANK; protein-guided codon alignment; alignment trimming using TrimAl; phylogenetic tree reconstruction using proml and dnaml from phylip; search for positive selection using codeml–site-model analysis using nested models M1a/M2, M7/M8 and M8a/M8, followed by multiple hypothesis correction. POTION parses output files and writes final results files (fasta and flat files) for groups with evidence of recombination and positive selection

Back to article page