BiomedicalCURIERepresentation
- class BiomedicalCURIERepresentation(identifiers: Sequence[str], cache: TextCache | None = None, **kwargs)[source]
Bases:
CachedTextRepresentation
Textual representations for datasets grounded with biomedical CURIEs.
The label and description for each entity are obtained via
pyobo
usingPyOBOTextCache
and encoded withTextRepresentation
.Example usage:
"""Example for using biomedical CURIEs with text representations..""" import bioontologies import numpy as np from pykeen.datasets.base import Dataset from pykeen.models import ERModel from pykeen.nn import BiomedicalCURIERepresentation from pykeen.pipeline import pipeline from pykeen.triples.triples_factory import TriplesFactory # Generate graph dataset from the Monarch Disease Ontology (MONDO) graph = bioontologies.get_obograph_by_prefix("mondo").squeeze(standardize=True) edge_tuples = (edge.as_tuple() for edge in graph.edges) triples = [t for t in edge_tuples if all(t)] triples_factory = TriplesFactory.from_labeled_triples(np.array(triples)) dataset = Dataset.from_tf(triples_factory) entity_representations = BiomedicalCURIERepresentation.from_dataset( dataset=dataset, encoder="transformer", ) result = pipeline( dataset=dataset, model=ERModel, model_kwargs=dict( interaction="distmult", entity_representations=entity_representations, relation_representation_kwargs=dict( shape=entity_representations.shape, ), ), )
— name: Biomedical CURIE Text Encoding
Initialize the representation.
- Parameters:
identifiers (Sequence[str]) – the IDs to be resolved by the class, e.g., wikidata IDs. for
WikidataTextRepresentation
, biomedical entities represented as compact URIs (CURIEs) forBiomedicalCURIERepresentation
cache (TextCache | None) – a pre-instantiated text cache. If None,
cache_cls
is used to instantiate one.kwargs – additional keyword-based parameters passed to
TextRepresentation.__init__()