WikidataTextRepresentation

class WikidataTextRepresentation(identifiers: Sequence[str], cache: TextCache | None = None, **kwargs)[source]

Bases: CachedTextRepresentation

Textual representations for datasets grounded in Wikidata.

The label and description for each entity are obtained from Wikidata using WikidataTextCache and encoded with TextRepresentation.

Example usage:

"""Example for using WikidataTextRepresentation."""

from pykeen.datasets import get_dataset
from pykeen.models import ERModel
from pykeen.nn import WikidataTextRepresentation
from pykeen.pipeline import pipeline

dataset = get_dataset(dataset="codexsmall")
entity_representations = WikidataTextRepresentation.from_dataset(
    dataset=dataset,
    encoder="transformer",
)

result = pipeline(
    dataset=dataset,
    model=ERModel,
    model_kwargs={
        "interaction": "distmult",
        "entity_representations": entity_representations,
        "relation_representation_kwargs": {
            "shape": entity_representations.shape,
        },
    },
)

Initialize the representation.

Parameters:

identifiers (Sequence[str]) – the IDs to be resolved by the class, e.g., wikidata IDs. for WikidataTextRepresentation, biomedical entities represented as compact URIs (CURIEs) for BiomedicalCURIERepresentation
cache (TextCache | None) – a pre-instantiated text cache. If None, cache_cls is used to instantiate one.
kwargs – additional keyword-based parameters passed to TextRepresentation.__init__()