BiomedicalCURIERepresentation

class BiomedicalCURIERepresentation(identifiers, cache=None, **kwargs)[source]

Bases: CachedTextRepresentation

Textual representations for datasets grounded with biomedical CURIEs.

The label and description for each entity are obtained via pyobo using pykeen.nn.utils.PyOBOCache and encoded with TextRepresentation.

Example usage:

from pykeen.datasets import get_dataset
from pykeen.models import ERModel
from pykeen.nn import BiomedicalCURIERepresentation
from pykeen.pipeline import pipeline
import bioontologies

# Generate graph dataset from the Monarch Disease Ontology (MONDO)
graph = bioontologies.get_obograph_by_prefix("mondo").squeeze(standardize=True)
triples = (edge.as_tuple() for edge in graph.edges)
triples = [t for t in triples if all(t)]
triples = TriplesFactory.from_labeled_triples(np.array(triples))
dataset = Dataset.from_tf(triples)

entity_representations = BiomedicalCURIERepresentation.from_dataset(
    dataset=dataset, encoder="transformer",
)
result = pipeline(
    dataset=dataset,
    model=ERModel,
    model_kwargs=dict(
        interaction="distmult",
        entity_representations=entity_representations,
        relation_representation_kwargs=dict(
            shape=entity_representations.shape,
        ),
    ),
)

— name: Biomedical CURIE Text Encoding

Initialize the representation.

Parameters:
  • identifiers (Sequence[str]) – the IDs to be resolved by the class, e.g., wikidata IDs. for WikidataTextRepresentation, biomedical entities represented as compact URIs (CURIEs) for BiomedicalCURIERepresentation

  • cache (TextCache | None) – a pre-instantiated text cache. If None, cache_cls is used to instantiate one.

  • kwargs – additional keyword-based parameters passed to TextRepresentation.__init__()