BiomedicalCURIERepresentation

class BiomedicalCURIERepresentation(identifiers: Sequence[str], cache: TextCache | None = None, **kwargs)[source]

Bases: CachedTextRepresentation

Textual representations for datasets grounded with biomedical CURIEs.

The label and description for each entity are obtained via pyobo using PyOBOTextCache and encoded with TextRepresentation.

Example usage:

"""Example for using biomedical CURIEs with text representations.."""

import bioontologies
import numpy as np

from pykeen.datasets.base import Dataset
from pykeen.models import ERModel
from pykeen.nn import BiomedicalCURIERepresentation
from pykeen.pipeline import pipeline
from pykeen.triples.triples_factory import TriplesFactory

# Generate graph dataset from the Monarch Disease Ontology (MONDO)
graph = bioontologies.get_obograph_by_prefix("mondo").squeeze(standardize=True)
edge_tuples = (edge.as_tuple() for edge in graph.edges)
triples = [t for t in edge_tuples if all(t)]
triples_factory = TriplesFactory.from_labeled_triples(np.array(triples))
dataset = Dataset.from_tf(triples_factory)

entity_representations = BiomedicalCURIERepresentation.from_dataset(
    dataset=dataset,
    encoder="transformer",
)
result = pipeline(
    dataset=dataset,
    model=ERModel,
    model_kwargs=dict(
        interaction="distmult",
        entity_representations=entity_representations,
        relation_representation_kwargs=dict(
            shape=entity_representations.shape,
        ),
    ),
)

— name: Biomedical CURIE Text Encoding

Initialize the representation.

Parameters:
  • identifiers (Sequence[str]) – the IDs to be resolved by the class, e.g., wikidata IDs. for WikidataTextRepresentation, biomedical entities represented as compact URIs (CURIEs) for BiomedicalCURIERepresentation

  • cache (TextCache | None) – a pre-instantiated text cache. If None, cache_cls is used to instantiate one.

  • kwargs – additional keyword-based parameters passed to TextRepresentation.__init__()