BiomedicalCURIERepresentation
- class BiomedicalCURIERepresentation(identifiers, cache=None, **kwargs)[source]
Bases:
CachedTextRepresentation
Textual representations for datasets grounded with biomedical CURIEs.
The label and description for each entity are obtained via
pyobo
usingpykeen.nn.utils.PyOBOCache
and encoded withTextRepresentation
.Example usage:
from pykeen.datasets import get_dataset from pykeen.models import ERModel from pykeen.nn import BiomedicalCURIERepresentation from pykeen.pipeline import pipeline import bioontologies # Generate graph dataset from the Monarch Disease Ontology (MONDO) graph = bioontologies.get_obograph_by_prefix("mondo").squeeze(standardize=True) triples = (edge.as_tuple() for edge in graph.edges) triples = [t for t in triples if all(t)] triples = TriplesFactory.from_labeled_triples(np.array(triples)) dataset = Dataset.from_tf(triples) entity_representations = BiomedicalCURIERepresentation.from_dataset( dataset=dataset, encoder="transformer", ) result = pipeline( dataset=dataset, model=ERModel, model_kwargs=dict( interaction="distmult", entity_representations=entity_representations, relation_representation_kwargs=dict( shape=entity_representations.shape, ), ), )
— name: Biomedical CURIE Text Encoding
Initialize the representation.
- Parameters:
identifiers (Sequence[str]) – the IDs to be resolved by the class, e.g., wikidata IDs. for
WikidataTextRepresentation
, biomedical entities represented as compact URIs (CURIEs) forBiomedicalCURIERepresentation
cache (TextCache | None) – a pre-instantiated text cache. If None,
cache_cls
is used to instantiate one.kwargs – additional keyword-based parameters passed to
TextRepresentation.__init__()