Data analysis framework for investigating the G protein-coupled receptor (GPCR) signaling pathway in diatoms. The diatom genomes were first TBLASTN searched against the human gpDB database to identify potential GPCR signaling pathway proteins (black). Selected conserved protein domains for those GPCR signaling pathway proteins that had no BLAST similarity to the diatoms were extracted from the Pfam v.26.0 database and HMMER searched against the translated diatom genomes (purple). Diatom genomic sequences with matches to the human gpDB or GPCR signaling protein Pfam were then searched against the diatom EST libraries. The GPCRDB was TBLASTN searched against the diatom genomes to identify putative diatom GPCRs (red). GPCRs were also identified by downloading sequence alignments for the GPCR families from the GPCRDB (classes A, B and C) to use as seed alignments for HMM searches against a custom microeukaryote database (red). Identified GPCRs were then further characterized using transmembrane domain (TMD) region and conserved domain analyses. Diatom GPCRs were also searched against the respective diatom EST libraries. A seed alignment was generated using the TMD regions of the putative diatom GPCRs and converted to an HMM profile to recruit related sequences from the custom microeukaryote database and GenBank. Phylogenetic analysis was then performed. BLAST, basic local alignment search tool; EST, expressed sequence tag; GPCRDB, G protein-coupled receptor database; HMM, hidden markov model; HMMTOP, Hidden Markov Model for Topology Prediction; TBLASTN, protein query versus translated nucleotide BLAST; TMHMM, transmembrane hidden markov model.