Skip to main content
Fig. 3 | BMC Genomics

Fig. 3

From: A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences

Fig. 3

Flow chart of the procedure used to predict the number of genes for a family in a species of an incompletely assembled genome using the expanded two-motif method. An overview of the workflow for identifying all members of a specific family. Step 1: A query sequence is built by aligning sequences from the desired family. The query sequence should contain two motifs, with at least one being unique to the family. Step 2: The genome for the organism of interest is translated in all six reading frames. Step 3: A BLAST database is built using the translated genome. Step 4: The BLAST database is searched with the query sequence. Step 5: The BLAST result is inspected; the orientation and distance (if there are few or no gaps between the motifs in the family) of the two motifs is used to determine if a hit can constitute a member of the gene family. In our hypothetical example, inspection revealed hits on three hypothetical contigs: 1050, 3000, and 5045. For Contig 1050, the two motifs are close and correctly oriented. For 3000, the two motifs are located on each side of an intron; thus, the position in the genome is used to verify the orientation and establish if the hits are located relatively close to each other. 5045 is a negative hit. In this contig, there are two hits, but inspection revealed that the motifs are incorrectly oriented and therefore is deemed not to constitute a member of the family

Back to article page