WikidataTextCache

class WikidataTextCache[source]

Bases: TextCache

A cache for requests against Wikidata’s SPARQL endpoint.

Initialize the cache.

Attributes Summary

HEADERS

WIKIDATA_ENDPOINT

Wikidata SPARQL endpoint.

Methods Summary

get_descriptions(wikidata_identifiers)

Get entity descriptions for the given IDs.

get_labels(wikidata_identifiers)

Get entity labels for the given IDs.

get_texts(identifiers)

Get a concatenation of the title and description for each Wikidata identifier.

query(sparql, wikidata_ids[, batch_size, ...])

Batched SPARQL query execution for the given IDS.

query_text(wikidata_ids[, language, batch_size])

Query the SPARQL endpoints about information for the given IDs.

verify_ids(ids)

Raise error if invalid IDs are encountered.

Attributes Documentation

HEADERS: dict[str, str] = {'Accept-Encoding': 'gzip', 'User-Agent': 'PyKEEN-Bot/1.11.1-dev (https://pykeen.github.io; pykeen2019@gmail.com) requests/2.32.3'}
WIKIDATA_ENDPOINT = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql'

Wikidata SPARQL endpoint. See https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service#Interfacing

Methods Documentation

get_descriptions(wikidata_identifiers: Sequence[str]) Sequence[str][source]

Get entity descriptions for the given IDs.

Parameters:

wikidata_identifiers (Sequence[str]) – the Wikidata identifiers, each starting with Q (e.g., ['Q42'])

Returns:

the description for each Wikidata entity

Return type:

Sequence[str]

get_labels(wikidata_identifiers: Sequence[str]) Sequence[str][source]

Get entity labels for the given IDs.

Parameters:

wikidata_identifiers (Sequence[str]) – the Wikidata identifiers, each starting with Q (e.g., ['Q42'])

Returns:

the label for each Wikidata entity

Return type:

Sequence[str]

get_texts(identifiers: Sequence[str]) Sequence[str][source]

Get a concatenation of the title and description for each Wikidata identifier.

Parameters:

identifiers (Sequence[str]) – the Wikidata identifiers, each starting with Q (e.g., ['Q42'])

Returns:

the label and description for each Wikidata entity concatenated

Return type:

Sequence[str]

classmethod query(sparql: str | Callable[[...], str], wikidata_ids: Sequence[str], batch_size: int = 256, timeout=None) Iterable[Mapping[str, Any]][source]

Batched SPARQL query execution for the given IDS.

Parameters:
  • sparql (str | Callable[[...], str]) – the SPARQL query with a placeholder ids

  • wikidata_ids (Sequence[str]) – the Wikidata IDs

  • batch_size (int) – the batch size, i.e., maximum number of IDs per query

  • timeout – the timeout for the GET request to the SPARQL endpoint

Returns:

an iterable over JSON results, where the keys correspond to query variables, and the values to the corresponding binding

Return type:

Iterable[Mapping[str, Any]]

classmethod query_text(wikidata_ids: Sequence[str], language: str = 'en', batch_size: int = 256) Mapping[str, Mapping[str, str]][source]

Query the SPARQL endpoints about information for the given IDs.

Parameters:
  • wikidata_ids (Sequence[str]) – the Wikidata IDs

  • language (str) – the label language

  • batch_size (int) – the batch size; if more ids are provided, break the big request into multiple smaller ones

Returns:

a mapping from Wikidata Ids to dictionaries with the label and description of the entities

Return type:

Mapping[str, Mapping[str, str]]

static verify_ids(ids: Sequence[str])[source]

Raise error if invalid IDs are encountered.

Parameters:

ids (Sequence[str]) – the ids to verify

Raises:

ValueError – if any invalid ID is encountered