A LinkML Schema for Representing Relationships Between Genes and Biological Processes
Suppose we want to create a LinkML schema that models interactions between genes and biological processes. A bio-curator may define such a schema to represent and standardize biological knowledge from heterogeneous sources. Additionally, this schema can act as a meta-model to be translated in targeted prompts for guiding a LLM in the extraction of entities and relationships involving genes and biological processes from scientific texts.
We begin by specifying (1) a schema name, and (2) a schema description. Moreover, we can include in the main canvas a new class named “Gene” by clicking on the button (3).
The inspector panel on the right-hand side of the canvas is shown to the user when clicking on the class graphical artifact. By means of this interface, the class Gene can be enhanced by (1) adding a description. We also (2) annotate the class with the HGNC terminology, and (3) retrieve examples of gene instances from HGNC.
The class Gene can be further refined by adding key attributes. For example, the attribute named “Symbol” is added (1) and improved by specifying its (2) description and (3) type. The required flag is added to Symbol (4): instances of class Gene require the Symbol attribute.
Similarly, additional attributes can be defined for the Gene class.
We now introduce a class named “BiologicalProcess” for representing biological processes in our LinkML schema. Following the same steps as for the Gene class, we add the “BiologicalProcess” class with meaningful examples and attributes.
Biological processes are represented using gene ontology terms. Therefore, we define a parent class named “GOterm”, and specify that biological processes are a specialization of GO terms by using the INHERITANCE
relationship type. Attributes and ontology annotations from the GOterm class are inherited by BiologicalProcess.
Finally, “participates in” ASSOCIATION
relationship between genes and biological processes can be specified. By hovering over the border of the Gene class we can connect it to the BiologicalProcess class.
Attributes for the Gene-participates in-BiologicalProcess
association relationship are added in the same way as class attributes.
The schema can be further expanded by considering other subclasses such as molecular functions and cellular components and new relationships between classes. The schema can also be enhanced by considering attributes specific to subclasses (e.g. activity type
for MolecularFunction and cellular location
for CellularComponent).
To export the schema in LinkML, click the Download / Export
button in the top right corner of the canvas. A download panel will appear.
The LinkML schema for representing relationships of type “participates in” involving genes and biological processes is reported below.
id: https://schemalink.anacleto.di.unimi.it/gene_bio_process
default_range: string
name: gene_bio_process
title: GeneBioProcess
description: LinkML schema for representing the interactions between genes and GO terms.
prefixes:
linkml: https://w3id.org/linkml/
ontogpt: http://w3id.org/ontogpt/
rdf: https://www.w3.org/1999/02/22-rdf-syntax-ns
HGNC: http://identifiers.org/hgnc/
GO: http://purl.obolibrary.org/obo/go/extensions/go-plus.owl
imports:
- ontogpt:core
- linkml:types
classes:
GeneParticipatesInBiologicalProcessRelationship:
is_a: Triple
description: >-
A triple where the subject is a Gene and where the object is a Biological
Process. A participates in relationship between a gene and a
biological process.
slot_usage:
subject:
range: Gene
annotations:
prompt.examples: RELA, BRCA1, alpha-1-B glycoprotein
minimum_cardinality: 0
maximum_cardinality: 1
object:
range: BiologicalProcess
annotations:
prompt.examples: viral genome replication, cellular homeostasis, DNA repair
minimum_cardinality: 0
predicate:
range: GeneParticipatesInBiologicalProcessPredicate
annotations:
prompt.examples: RELA participates in cell growth, IL6 participates in homeostasis
evidence:
description: The experimental methods used for validating the relationship
required: true
identifier: false
range: string
multivalued: true
GeneParticipatesInBiologicalProcessPredicate:
is_a: RelationshipType
attributes:
label:
description: >-
The predicate for the GeneParticipatesInBiologicalProcess
relationships.
id:
pattern: 'participates in'
id_prefixes: []
annotations: {}
Gene:
is_a: NamedEntity
description: ''
mixins: []
attributes:
symbol:
description: The HGNC symbol of the gene.
required: true
identifier: false
range: string
hgnc_id:
description: The HGNC identifier of the gene.
required: true
identifier: true
range: integer
synonym:
description: Synonyms for the gene.
required: false
identifier: false
range: string
multivalued: true
id_prefixes:
- HGNC
annotations:
annotators: sqlite:obo:HGNC
prompt.examples: RELA, BRCA1, alpha-1-B glycoprotein
GOterm:
is_a: NamedEntity
description: >-
A Gene Ontology (GO) term that represents a standardized concept
describing a biological process, molecular function, or cellular
component. GO terms provide a controlled vocabulary for annotating gene
products and their roles in biology.
mixins: []
attributes:
go_id:
description: The GO term identifying the concept.
required: true
identifier: true
range: string
synonym:
description: Synonyms for the GO term.
required: false
identifier: false
range: string
multivalued: true
description:
description: A description for the GO term.
required: false
identifier: false
range: string
label:
description: A label for the GO term.
required: true
identifier: false
range: string
id_prefixes:
- GO
annotations:
annotators: sqlite:obo:go
prompt.examples: >-
dolipore septum, citrulline metabolic process, peptide pheromone
export
BiologicalProcess:
is_a: GOterm
description: ''
mixins: []
attributes:
biological_process_id:
identifier: true
description: A unique identifier for the BiologicalProcess class.
id_prefixes: []
annotations:
prompt.examples: viral genome replication, cellular homeostasis, DNA repair