In this manuscript we have described the process of resolving two independently developed ontologies for the purpose of knowledge integration and sharing. The integration of ChEBI and the GO benefits both ontologies since it allows for the consistent and accurate representation of chemicals in the GO, and for the chemicals represented in ChEBI to be placed in the natural biological contexts of GO processes. This interoperability of ontologies is one of the primary goals of the coordinated development of the group of ontologies that make up the OBO Foundry .
As a result of this work and the close ties that were established between the GO and ChEBI during this process, all new GO terms that involve the transport, metabolism, response to chemical entities and homeostasis can be added to the ontology via a new web-based tool called TermGenie (http://go.termgenie.org/, Dietze et al., manuscript in preparation). TermGenie is a template-based, reasoner-assisted term-generation tool. Annotators can generate these terms directly by selecting the broad GO category (e.g. transport, biosynthesis) and any term from ChEBI. Missing ChEBI entities need to be requested from ChEBI before proceeding. Labels, synonyms and textual definitions are generated automatically, using the grammar described above. The new term is placed automatically into the GO subsumption hierarchy using the ELK reasoner  without the need for curator review.
We identified various underlying reasons for most of the discrepancies between the inherent GO chemical ontology and ChEBI. Many of the challenges in making separate ontologies such as the GO and ChEBI parallel result from different fields of study having different viewpoints about the importance of certain characteristics that are represented in their terms and about the axis of classification that they use.
Another issue we encountered in the alignment of the GO and ChEBI is classification by biological role. ChEBI contains a ‘biological role’ hierarchy that is separate from its structural, ‘chemical entity’ hierarchy. This hierarchy includes terms such as ‘hormone’ and ‘toxin’, and chemical entities are linked to these roles via the has_role relationship. Many of these biological roles are also referenced by the GO in terms such as ‘hormone secretion’. However, unlike the structural classification axis, using the ‘biological role’ hierarchy from ChEBI for the classification of terms within the GO subsumption hierarchy is not straightforward because roles are context-specific. For example, in ChEBI, ‘acetylcholine’ has_role ‘neurotransmitter’ and has_role ‘hormone’. This is because in the brain acetylcholine can act as a neurotransmitter, while in other tissues it can act as a hormone. If this role relationship were to be propagated in the GO, that is, asserting ‘acetylcholine secretion’ is_a ‘neurotransmitter secretion’, the GO would be in error for instances where acetylcholine was secreted but was acting as a hormone. The alignment of the classification of the GO with the ChEBI roles will be undertaken in a separate project.
The GO and ChEBI also differ when chemicals in the GO are classified based on a process in which they are involved. For example, in the GO there are terms like ‘aspartate family amino acid biosynthetic process’ that represent the metabolism of amino acid families. These families are not based on the chemical similarities of the amino acids in them, but instead are grouped because they share similar biosynthetic pathways. Participation in related pathways is not essentially a structural feature of the molecules involved, and these processes cannot be represented by the chemical structural hierarchy of ChEBI. However, neither do such groupings easily correspond to ChEBI role terms, since the chemicals are not necessarily active in the pathways involved, as they might, for example, be created by the relevant pathway, and be otherwise quite inert themselves with respect to the operation of the pathway. These pathway-derived chemical classifications will remain in the GO, and for the time being will not be cross-referenced with ChEBI, although such cross-referencing could constitute a task for the future.
We have described here a generic approach to integrating two ontologies that will be used for future projects coordinating the GO with other external ontologies. Before examining relationship concordance, ontology terms should be compared to ensure that the entities that are common to the two ontologies represent the same things and that all of the entities that are implicitly represented in the ontology whose terms are being formally defined are explicitly represented in the external ontology whose terms are being used in the definitions. Next, systematic differences in the construction rationale of the ontologies should be identified and a rational strategy should be put into place where those differences will be retained. Finally, coordinated curation should be used to identify or question relationship differences in the two ontologies. The final process is continuous and mechanisms should be put into place that will allow inconsistencies that crop up to be resolved.
This work is the first part in the integration of ChEBI and the GO. The next stage will be to describe the enzymatic reactions in the GO in terms of the ChEBI entities that participate in them. For example, the molecular function ‘aspartate dehydrogenase activity’ is defined as ‘Catalysis of the reaction: L-aspartate + H2O + NAD(P)+ = oxaloacetate + NH3 + NAD(P)H + H+’. We intend to leverage the data in Rhea, a manually curated reaction database in which all reaction participants are ChEBI entities , to create the logical definitions of GO molecular functions. These definitions will allow us to classify enzymatic reactions automatically based on the chemicals that participate in them; to make better links between biological processes and the reactions that are their parts; and to import new, manually curated reactions directly from Rhea into the GO, and allow them to be automatically classified.