Transmembrane domain-focused GPCR sequence mining strategy. (A) Family-specific profile HMMs are built using TM-only pseudosequences (TOPs) extracted from the GPCRDB  sequence repository. The predicted proteomes of both S. mansoni and S. mediterranea are processed in a manner identical to that of the training sequences and are searched against a set of family-specific profile HMMs. Results are ranked statistically and sequences meeting a conservatively selected cutoff undergo an automated BLASTp campaign against the NCBI "nr" database. The output is parsed, and transmembrane proteins exhibiting significant homology to non-GPCR proteins are removed. Redundant sequences are removed with the BLAT utility. The surviving sequence pool is then manually assessed and curated, followed by homology-based searches of these sequences against the whole genome assemblies. Adhesion and Secretin GPCR sequences are distinguished from one another by inspection of their N-terminal ectodomains. Putative full-length Rhodopsin GPCRs, defined by the presence of an intact 7TM domain, are sub-classified via SVM. (B) Construction of TOPs is a two-step process involving the prediction of TM boundary coordinates by HMMTOP, followed by the ordered concatenation of TM domains flanked bi-directionally by 5 amino acids.