UML metamodel
After reviewing the definitions of the different types of biological entity classes and relations of the OBO Relation Ontology, we have identified the elements in the UML metamodel that were relevant for the definition of our profile, viz., Class, DirectedRelationship, Association, Generalization and Dependency. Figure 1 presents an excerpt of the UML metamodel containing these elements. A number of related elements were included for completion purposes. However, we exempted ourselves from representing all existing elements and associations. More information about the UML metamodel can be found in [44, 45].
The abstract metaclass Element represents a component of a model. Element is the common superclass of all metaclasses that are part of the UML metamodel. The abstract metaclass NamedElement specializes Element. NamedElement represents an element of a model that may have a name used to unambiguously identify this element. The abstract metaclass TypedElement specializes NamedElement. TypedElement represents an element that have a name and an associated type (metaclass Type). The abstract metaclass Type defines a set of values that constrains the range of values represented by a typed element. Thus, elements associated with a type are restricted to represent only values defined by the type.
The abstract metaclass Classifier specializes the metaclass Type. Classifier represents instances (metaclass InstanceSpecification) with features in common (metaclass Feature). InstanceSpecification is a named element that describes partially or completely an instance of an entity in a model. Such description can include the entity classification, i.e., the classifier(s) from which the entity is an instance, and, based on its classifier, the kind of instance (e.g. object or link). InstanceSpecification can also be used to represent a snapshot of an existing entity at some point in time. The abstract metaclass Feature is a named element that represents behavioral or structural characteristics of classifiers.
The metaclass Relationship specializes the metaclass Element. Relationship represents a type of relation between two or more elements of a model. The abstract metaclass DirectedRelationship specializes Relationship in order to represent directed relations between source and target elements. The metaclass Dependency specializes DirectedRelationship. Dependency represents a relation defined between named elements of a model in which a set of (client) elements require other elements (supplier) for their (complete) specification. This relation establishes that the semantics of the client element(s) is dependent on the definition of the supplier element(s).
The metaclass Generalization also specializes DirectedRelationship. Generalization represents a binary relation between a general classifier and a more specific classifier. This relation is used to represent that instances of the specific classifier are also instances of the general classifier. Thus, any feature defined for the general classifier is inherited by the specific classifier. Similarly, any constraint applied to the general classifier is also applied to the specific classifier. Generalization has one boolean attribute, isSubstitutable (default value is true), which indicates whether or not the specific classifier can be used wherever the general classifier is usually used.
The abstract metaclass RedefinableElement specializes the abstract metaclass NamedElement. RedefinableElement represents an element that can be redefined in the context of a generalization. Since Classifier and Feature are specializations of RedefinableElement, they can be redefined in the context of a generalization relation. The redefinition of an element can include semantics addition or restriction in a manner consistent with the semantics initially defined.
Generalization relations can be aggregated into subsets. The metaclass GeneralizationSet is a named element that represents collections of subsets of generalization relationships. GeneralizationSet describes how a single general classifier (powertype) can be subdivided into several specific subtypes. GeneralizationSet has two boolean attributes, viz., isCovering (default value is false), which indicates whether or not every instance of a general classifier is also an instance of at least one of its specific classifiers, and isDisjoint (default value is false), which indicates whether or not the set of specific classifiers in a generalization have an instance in common.
The abstract metaclass MultiplicityElement specializes the abstract metaclass Element. MultiplicityElement defines an inclusive interval of non-negative integers beginning with a lower bound, attribute lower (default value is one), and ending with a possibly infinite upper bound, attribute upper (default value is also one). MultiplicityElement specifies the allowable cardinalities for an instantiation of this element. The abstract metaclass StructuralFeature specializes the metaclasses Feature, TypedElement and MultiplicityElement. StructuralFeature represents a typed feature of a classifier that specifies the structure of instances of the classifier. The metaclass Property specializes the metaclass StructuralFeature. In the context of this work, Property represents the types of association ends.
The metaclass Association specializes the metaclasses Classifier and Relationship. Association represents a semantic relationship that can occur between instances of typed elements. Association instances are named links. An association has at least two (ordered) ends (memberEnd), each one represented by a property and indirectly associated to a corresponding type (endType). A member end represents the participation of an instance of the classifier connected to an end of a link. Thus, an association declares that there can be links between instances of associated types. Additionally, an association may have one or more navigable ends (navigableOwnedEnd). A navigable end can be more easily accessed at runtime from instances participating in the other end(s) of the link. Navigable ends provide a navigation facility.
Aggregation represents a specific type of binary association in which elements representing "parts" are related to an element representing a "whole" (whole/part relationship). Two different types of aggregation can be defined, viz., composition and shared aggregation. Composition represents an aggregation relation in which instances of "part" can only be included in a single composition. Additionally, if the instance representing the "whole" is removed, the parts are removed as well. A shared aggregation poses no such restriction. In such relation, instances of "part" can be included in more than one shared aggregation. Further, if an instance representing the "whole" is removed, the parts may or may not be removed. Both relationships are represented in the UML metamodel by the attribute aggregation, whose type is AggregationKind. AggregationKind is an enumeration type that represents different types of association: association without aggregation (none), association with composition (composite) and association with shared aggregation (shared).
The metaclass Class specializes the metaclass Classifier. Class describes a set of instances (objects) that share features, constraints and semantics. The structural and behavioral features owned by a class (not depicted in Figure 1) are named attributes and operations, respectively. Objects of a class have their own values for attributes. These values are in accordance with the types and multiplicities defined by the class. Operations defined for a class can be invoked on objects of the class. As a result, the invocation of an operation on an object can return a value and/or cause changes in the values of attributes of this object. In addition, operation invocation can also cause changes in the values of attributes of other objects that can be reached through the links associated to the object on which the operation was invoked.
UML profile for the OBO Relation Ontology
In order to define our profile for the OBO Relation Ontology, we have proposed a number of extensions for the UML metamodel. These extensions were proposed based on the definitions of the types of biological entity classes and relations of the OBO Relation Ontology.
The different types of biological entity classes defined in the OBO Relation Ontology were represented as specializations of metaclass Class. The metaclass Class was initially specialized into the metaclasses Continuant and Process. The metaclass Continuant was in turn also specialized into the metaclasses Material and Immaterial. By default, these metaclasses (Continuant/Process and Material/Immaterial) are mutually exclusives. These definitions are consistent with the principles of the OBO Relation Ontology that describes these categories as non-overlapping. Thus, classes extended by Process can not be extended by Continuant or its subtypes and vice versa, and classes extended by Material can not be extended by Immaterial and vice versa.
Each of the proposed extension elements corresponds to a stereotype in our profile. Thus, four stereotypes were defined for representing the type of a biological entity class, viz., <<continuant>>, <<process>>, <<material>> and <<immaterial>>.
Figure 2 depicts the extensions proposed to the UML metamodel to capture the different foundational relations of the OBO Relation Ontology. The different types of relations are specializations of abstract metaclass OBORelation, which in turn specializes the abstract metaclass DirectedRelationship. OBORelation represents directed and binary (between two classes) relations that may occur between continuants, including material and immaterial continuants, and processes.
The abstract metaclass FoundationalRelation represents basic relations that can be defined between two continuant entity classes or between two process entity classes. The metaclass Is_a specializes the abstract metaclass FoundationalRelation. Is_a represents a subtype relation between a biological entity class (source) and another biological entity class (target) acting as a supertype. Since the metaclass Generalization defines a similar type of relationship, Is_a also specializes Generalization.
The metaclass Instance_of specializes the abstract metaclass FoundationalRelation. Instance_of represents a primitive relation between a general biological entity class (continuant or process) and a particular instance of this class (instance-class relation). Since the UML metaclass Dependency represents a relationship that can occur between named elements in general, such that a set of client elements is either semantically or structurally dependent on the definition of a set of supplier elements, we have used this metaclass as basis for the representation of Instance_of. Thus, Instance_of was also defined as a specialization of Dependency.
The metaclasses Part_of and Has_part also specialize the metaclasses FoundationalRelation and Association. Part_of represents an association between a source and a target biological entity class, in which each instance of the source class is part of an instance of the target class (whole). Inversely, Has_part represents an association between a source biological entity class and a target biological entity class, in which an instance of the source class (whole) has other instances of the target class as its parts.
Part_of is specialized into the metaclasses Proper _Part_of and Integral _part_of. Proper _part_of represents a Part_of relation with the additional constraint that the source entity class is different than the target entity class (Part_of has not such constraint). Additionally, in a Part_of relation defined between a source entity class and a target entity class, we cannot infer that the target has the source as its part. Such semantics is captured by the Integral_part_of relation. Integral _part_of represents a Part_of relation in which the target entity class has also the source entity class as its part (represented by the association has_part).
Similarly, Has_part is also specialized into two metaclasses, viz., Has_proper_part and Has_integral_part. Has_proper_part represents a Has_part relation with the additional constraint that the source entity class is different than the target entity class. Has_integral_part represents a Has_part relation in which the target entity class is also part of the source entity class (represented by the association part_of).
Each of the proposed extension elements corresponds to a concrete stereotype in our profile, except for the abstract metaclasses OBORelation and FoundationalRelation. Thus, the following stereotypes were defined for representing a foundation relation: <<is_a>>, <<instance_of>>, <<part_of>>, <<integral _part_of>>, <<proper _part_of>>, <<has_part>>, <<has_integral_part>> and <<has_proper_part>>.
The Is_a relation (C Is_a C1) is formally defined as follows: if c instantiates C at a time t, then c instantiates C1 at t, where both C and C1 represent either continuant or process entity classes. The UML metaclass Generalization, which we have used as basis for the definition of Is_a, represents a relationship that can occur between one specific classifier and one general classifier, such that an instance of the specific classifier is also an instance of the general classifier. Provided the specific and the general classifiers represent either two continuant entity classes or two process entity classes, we are able to capture in UML a semantic definition equivalent to the one defined in the OBO Relation Ontology for this relation. These restrictions have been defined as part of <<is_a>> stereotype specification.
The Instance_of relation (c Instance_of C) represents a primitive relation between an instance c and an entity class C, either continuant or process, which it instantiates at a specific time t. The UML metaclass Dependency, which we have used as basis for the definition of Instance_of, represents a relationship in which one or more named elements (client) are dependent on the definition of one or more named elements (supplier). Provided the client represents a particular instance of an entity class (InstanceSpecification) and the supplier represents the entity class itself, either continuant or process, which it instantiates, we are able to capture in UML a semantic definition equivalent to the one defined in the OBO Relation Ontology for this relation. These restrictions have been defined as part of <<instance_of>> stereotype specification.
The Part_of relation (C Part_of C1) is formally defined as follows: for all c that instantiates C at a time t, there is some c1 such that c1 instantiates C1 at time t and c part_of c1 at t, where both C and C1 represent either continuant or process entity classes and part_of represents a primitive instance-level relation. The all/some rule used in the definition of the Part_of relation guarantees that this relation is valid for every instance of class C being related to some instance of class C1.
The UML metaclass Association, which we have used as basis for the definition of Part_of, models the existence of a semantic relationship (link) between instances of typed elements. A link is an instance of an association. In order to relate all instances of class C to at least one instance of class C1 through links, we have constrained the <<part_of >> stereotype using the forAll and exists OCL operators. However, the pivotal difference between the OBO Relation Ontology and UML lies in the fact that instance-level relations are formally defined in the former, i.e., they form a set of primitive relations, whereas links are not formally defined in the latter. In this regard, our profile falls short in representing exactly the same semantics as defined by the OBO Relation Ontology for the Part_of relation.
The approach used to capture the semantics of the <<part_of>> stereotype has also been used in the specification of the remaining stereotypes of the profile because their corresponding OBO relations have also been formally defined using the all/some rule and, similarly to <<part_of>>, they also specialize the metaclass Association.
Figure 3 depicts the extensions proposed to the UML metamodel to capture temporal, spatial and participation relations of the OBO Relation Ontology.
The abstract metaclass SpatialRelation represents spatial relations defined between different continuant entity classes. SpatialRelation specializes the metaclasses OBORelation and Association. The metaclasses Adjacent_to, Located_in, Location_of, Contained_in and Contains are all specializations of SpatialRelation.
Adjacent_to represents that the spatial region occupied by a source continuant is adjacent to the spatial region occupied by a target continuant (no overlapping). Located_in represents that a source continuant is located in the spatial region occupied by a target continuant. Contained_in represents that a source material continuant is contained in the spatial region occupied by a target immaterial continuant. However, in this case, the material continuant is not part of the immaterial continuant. Location_of and Contains represent the inverse of relations Located_in and Contained_in, respectively.
Each of the proposed extension elements corresponds to a concrete stereotype in our profile, except for the abstract metaclass SpatialRelation, which is also used to aggregate common properties of its subtypes and help structuring the profile. Thus, the following stereotypes were defined for representing a spatial relation: <<adjacent_to>>, <<located_in>>, <<location_of >>, <<contained_in>> and <<contains>>.
The abstract metaclass TemporalRelation represents temporal relations defined between different entity classes. TemporalRelation specializes the metaclasses OBORelation and Association. The metaclasses Derives_from, Derived_into, Transformation_of, Preceded_by and Precedes are all specializations of TemporalRelation.
Derives_from represents that a source material continuant immediately derives from a target material continuant. The target continuant ceases to exist and (part of) its matter is inherited by the source continuant. Transformation_of represents that a source material continuant results from the transformation of a target material continuant (target continuant instantiates the source continuant). Preceded_by represents that a target process occurs in an instant of time prior to the occurrence of a source process. Derived_into and Precedes represent the inverse of relations Derives_from and Preceded_by, respectively.
Each of the proposed extension elements corresponds to a concrete stereotype in our profile, except for the abstract metaclass TemporalRelation, which is also used to aggregate common properties of its subtypes and help structuring the profile. Thus, the following stereotypes were defined for representing a temporal relation: <<derives_from>>, <<derived_into>>, <<transformation_of >>, <<preceded_by>> and <<precedes>>.
Finally, the abstract metaclass ParticipationRelation represents participation relations of continuants in the occurrence of processes. ParticipationRelation also specializes the metaclasses OBORelation and Association. The metaclass Has_participant specializes the metaclass ParticipationRelation. Has_participant represents that a target continuant participates somehow in a source process. Has_agent specializes the metaclass Has_participant. Has_agent represents that a source process has a material continuant as its participant and that this continuant is responsible for the occurrence of the process. Participates _in and Agent_in represent the inverse of relations Has_participant and Has_agent, respectively.
Each of the proposed extension elements corresponds to a concrete stereotype in our profile, except for the abstract metaclass ParticipationRelation. Thus, the following stereotypes were defined for representing a participation relation: <<has_participant>>, <<participates_in>>, <<has_agent>> and <<agent_in>>.
The abstract metaclasses OBORelation, FoundationalRelation, SpatialRelation, TemporalRelation and ParticipationRelation were introduced to aggregate common properties of its subtypes and help structuring the profile. Thus, we did not define a concrete syntax for these metaclasses in our profile. For each element defined in our profile, there is a brief description of its semantics, the base class(es) extended by the stereotype, associated notation and at least one example of its usage. Additionally, we also described any constraints that must be applied to elements extended by these stereotypes. These constraints were described using both text and an equivalent OCL expression. Figure 4 illustrates an example of a profile element definition (<<part_of >> stereotype).
Figure 5 shows a summary of selected stereotypes in terms of corresponding notation and example(s) of usage. Note that the stereotypes defined for the different types of entity classes have a notation similar to UML classes, while the <<is_a>> stereotype presents a notation similar to a UML generalization. Finally, the <<part_of >> stereotype and its subtypes present a notation similar to a UML shared aggregation. We have chosen the notation similar to shared aggregation instead of composition because the former is less restrictive than the latter. All other stereotypes have a notation similar to UML associations. The complete profile specification can be found in a supplementary material (see Additional File 1).
Profile application
This section describes the application of the proposed profile in the development of a number of fragments from different (standard) ontologies. The objective of this activity was to evaluate the use of the profile in the specification of a number of UML models. We have focused only on OBO Foundry ontologies. Particularly, we have considered the following ontologies: Gene Ontology (GO), PRotein Ontology (PRO) and Xenopus Anatomy and Development Ontology (XAO). Thus, no OBO Foundry candidate ontologies and/or other ontologies of interest were considered. Additionally, since the relationships defined in the OBO Relation Ontology represent the vast majority of the total relationships defined on these ontologies (over 90% in some cases), the fragments were chosen to focus only on these relationships. We have used Enterprise Architect, from Sparx Systems, as our UML modeling tool.
The first ontology considered in our study was the Gene Ontology (GO) [3]. GO provides a set of terms and relations used for standardization of genes and their products in eukaryotic organisms using three independent ontologies: Cellular Component, which describes subcellular structures and macromolecular complexes in which, generally, gene products can be located in or can be subcomponents of; Molecular Function, which describes activities that occur at the molecular level; and Biological Process, which describes collections of processes (series of events or molecular functions) related to the functioning of integrated living units. In the context of our work, we have considered only the Cellular Component ontology. In the fragments considered in the development of our models, only continuants were identified. Examples of these continuants include Cell Part, Cell Body, Membrane, etc. Is_a and Part_of, which account for over 92% of the total relationships defined by GO, represent the only relationships used in these fragments.
The second ontology considered in our study was the PRotein Ontology (PRO) [27]. PRO has been developed by the National Institute of General Medical Sciences (NIGMS) to describe proteins (protein forms) and protein evolutionary relationships (protein evolution). Thus, PRO has two overlapping components: Protein Evolution (ProEvo) and Protein Forms (ProForm). ProEvo organizes proteins according to their evolutionary relatedness, while ProForm describes multiple proteins forms derived from a given gene, which arise through variations in splicing or post-translational modifications.
Each concept represented by the ontology has a unique identifier within the scope of its components. Additionally, multiple protein forms produced from a given gene are referred as isoforms, and polymorphic sequences as variants. In the fragments considered in the development of our models, only (material) continuants were identified. Is_a and Derives_from, which account for over 90% of the total relationships defined on the PRO, represent the only relationships used in these fragments. In particular, Derives_from is used to indicate proteins with post-translational modifications derived from non-modified proteins.
Figure 6a illustrates a modeled fragment of the PRO. All depicted elements represent proteins. Since proteins refer to entity classes that have molecular weight, they are considered material continuants (<<material>>) according to the OBO Relation Ontology.
The class TGF-Beta represents a protein involved in the regulation of cell growth and differentiation. The class TGF-Beta 1 represents a TGF-beta protein that is a translation product of the TGFB1 gene. Thus, it was modeled as a specialization of the class TGF-Beta through an Is_a relation (<<is_a>>). The class Proteolytic Cleavage Product represents an amino acid chain produced as the result of peptide bond cleavage of a longer amino acid chain. The class TGF-Beta 1 Proteolytic Cleavage Product represents a proteolytic cleavage product that is derived from TGF-beta 1 protein. Thus, it was modeled as a specialization of Proteolytic Cleavage Product through an Is_a relation. Additionally, a Derives_from relation (<<derives_from>>) was also established between this class and the class TGF-Beta 1.
The class TGF-Beta 1 Isoform 1 represents a translational product of a specific transcript of the TGFB1 gene. Thus, it was modeled as a specialization of the class TGF-Beta 1 through an Is_a relation. The class TGF-Beta 1 Isoform 1 Cleaved 1 represents a specific product of TGF-Beta 1, which was modified by a specific proteolytic cleavage process. The class TGF-Beta 1 Isoform 1 Cleaved 1 represents a TGF-beta 1 proteolytic cleavage product that is derived from a TGF-beta 1 isoform 1 protein that suffered a proteolytic cleavage process. Thus, it was modeled as a specialization of TGF-Beta 1 Proteolytic Cleavage Product through an Is_a relation. Additionally, a Derives_from relation (<<derives_from>>) was also established between this class and the class TGF-Beta 1 Isoform 1.
The third ontology considered in our study was the Xenopus Anatomy and Development Ontology (XAO) [28]. XAO was created to standardize the annotation of gene expression, normal and mutant phenotypes data of Xenopus species. This ontology has two overlapping components, viz., Xenopus Anatomical Entity and Xenopus Developmental Stage. The former provides a description of anatomical structures and tissues of the specie and the latter provides a description of the developmental stages of the specie. Each concept represented by the ontology has a unique identifier within the scope of these two components. In the context of our work, we have considered only the Xenopus Developmental Stage ontology. In the fragments considered in the development of our models, only processes were identified. Is_a, Part_of and Preceded_by, which account for over 63% of the total relationships defined on the XAO, represent the only relationships used in these fragments. In particular, Preceded_by is used between developmental stages with the purpose of indicating time intervals during which certain anatomical structures and tissues exist.
Figure 6b illustrates a modeled fragment of the XAO. All depicted elements represent developmental stages of Xenopus species. Since stages refer to entity classes that have a beginning, middle and end, they are considered processes according to the OBO Relation Ontology.
The class Xenopus Developmental Stage represents any developmental stage of the Xenopus species. Classes Unfertilized Egg, Embryonic Stage, Adult and Death represent different developmental stages of this organism, each modeled as a specialization of Xenopus Developmental Stage through an Is_a relation (<<is_a>>).
The class Embryonic Stage represents a developmental stage that occurs in the time interval between fertilization and body feeding. The classes Blastula and Neurula represent specific embryonic developmental stages that occur within this time interval and thus they were modeled as specializations of Embryonic Stage through Is_a relations.
Blastula comprehends a range of developmental stages that occur between the Nieuwkoop and Faber (NF) stage 7 and NF stage 9. Each of these stages was modeled as a separate class, viz., NF Stage 7, NF Stage 8 and NF Stage 9. NF Stage 7 represents a four hour 64-cell embryo. NF Stage 8 represents a five hour 128-cell embryo. NF Stage 9 represents a seven hour embryo whose cells are smaller at dorsal than at ventral side. Since these classes are part of the range defined by Blastula, each was related to Blastula through a Part_of relation (<<part_of >>).
Classes NF Stage 8 (source) and NF Stage 7 (target) were related through a Preceded_by relation (<<preceded_by>>). This same type of relation was defined between classes NF Stage 9 and NF Stage 8. Since stages NF stage 7, NF stage 8 and NF stage 9 happen in Xenopus species respectively at 4, 5 and 7 hours (22-24 °C) after the embryo fertilization, the application of this relation was consistent with its definition because NF stage 7 occurs in an instant of time preceding NF stage 8 and likewise NF stage 8 occurs in an instant of time preceding NF stage 9.