Skip to main content
Fig. 1 | BMC Genomics

Fig. 1

From: Topology based identification and comprehensive classification of four-transmembrane helix containing proteins (4TMs) in the human genome

Fig. 1

(Parts A and B). Automatic and manual classification process. Part 1A: Schematic diagram of the automatic classification process. The first step in the automatic characterization process included downloading the Homo sapiens proteome from the Genome Reference Consortium human genome 38 (GRCh38) release with the GenCode annotation information. SignalP standalone was used to assess and excise any signal peptides. The four membrane-spanning regions were predicted using TOPCONS-single, which comprises five different prediction tools and returns a consensus decision of the number of transmembrane areas. The longest sequence for each genomic location, or gene, was then selected and then those sequences were re-evaluated with TOPCONS-single. Uniprot, which is a large comprehensive repository of protein sequences with both manually curated and automatically generated annotations, was used to download associated annotations for each sequence in the predicted dataset. Part 1B: Schematic overview of the manual curation process for each individual sequence. The purpose of the individual sequence inspections was to ensure the 4TM dataset was composed of valid proteins with four membrane-spanning regions so that the function could be inferred and described. The predicted dataset was evaluated for validity and reliability using the CCDS dataset. The predicted dataset includes proteins that may be fragments, possible false-positive hits from TOPCONS-single, and protein isoforms that contain incomplete functional domains. The Uniprot annotation data included information such as whether the sequence was a fragment or not, any associated Pfam functional domains or families, the Transportation Classification (TC) number, the Enzyme Commission (EC) number, as well as Gene Ontology annotation terms

Back to article page