Utilities

Utilities for neural network components.

exception ShapeError(shape, reference)[source]

An error for a mismatch in shapes.

Initialize the error.

Parameters

shape (Sequence[int]) – the mismatching shape
reference (Sequence[int]) – the expected shape

Return type

None

classmethod verify(shape, reference)[source]

Raise an exception if the shape does not match the reference.

This method normalizes the shapes first.

Parameters

shape (Union[int, Sequence[int]]) – the shape to check
reference (Union[int, Sequence[int], None]) – the reference shape. If None, the shape always matches.

Raises

ShapeError – if the two shapes do not match.

Return type

Sequence[int]

Returns

the normalized shape

class WikidataCache[source]

A cache for requests against Wikidata’s SPARQL endpoint.

Initialize the cache.

WIKIDATA_ENDPOINT = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql': Wikidata SPARQL endpoint. See https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service#Interfacing

get_descriptions(ids)[source]

Get entity descriptions for the given IDs.

Parameters: ids (Sequence[str]) – the Wikidata IDs
Return type: Sequence[str]
Returns: the description for each Wikidata entity

get_image_paths(ids, extensions=('jpeg', 'jpg', 'gif', 'png', 'svg', 'tif'), progress=False)[source]

Get paths to images for the given IDs.

Parameters

ids (Sequence[str]) – the Wikidata IDs.
extensions (Collection[str]) – the allowed file extensions
progress (bool) – whether to display a progress bar

Return type

Sequence[Optional[Path]]

Returns

the paths to images for the given IDs.

get_labels(ids)[source]

Get entity labels for the given IDs.

Parameters: ids (Sequence[str]) – the Wikidata IDs
Return type: Sequence[str]
Returns: the label for each Wikidata entity

classmethod query(sparql, wikidata_ids, batch_size=256)[source]

Batched SPARQL query execution for the given IDS.

Parameters

sparql (Union[str, Callable[…, str]]) – the SPARQL query with a placeholder ids
wikidata_ids (Sequence[str]) – the Wikidata IDs
batch_size (int) – the batch size, i.e., maximum number of IDs per query

Return type

Iterable[Mapping[str, Any]]

Returns

an iterable over JSON results, where the keys correspond to query variables, and the values to the corresponding binding

classmethod query_text(wikidata_ids, language='en', batch_size=256)[source]

Query the SPARQL endpoints about information for the given IDs.

Parameters

wikidata_ids (Sequence[str]) – the Wikidata IDs
language (str) – the label language
batch_size (int) – the batch size; if more ids are provided, break the big request into multiple smaller ones

Return type

Mapping[str, Mapping[str, str]]

Returns

a mapping from Wikidata Ids to dictionaries with the label and description of the entities

static verify_ids(ids)[source]

Raise error if invalid IDs are encountered.

Parameters: ids (Sequence[str]) – the ids to verify
Raises: ValueError – if any invalid ID is encountered

adjacency_tensor_to_stacked_matrix(num_relations, num_entities, source, target, edge_type, edge_weights=None, horizontal=True)[source]

Stack adjacency matrices as described in [thanapalasingam2021].

This method re-arranges the (sparse) adjacency tensor of shape (num_entities, num_relations, num_entities) to a sparse adjacency matrix of shape (num_entities, num_relations * num_entities) (horizontal stacking) or (num_entities * num_relations, num_entities) (vertical stacking). Thereby, we can perform the relation-specific message passing of R-GCN by a single sparse matrix multiplication (and some additional pre- and/or post-processing) of the inputs.

Parameters

num_relations (int) – the number of relations
num_entities (int) – the number of entities
source (LongTensor) – shape: (num_triples,) the source entity indices
target (LongTensor) – shape: (num_triples,) the target entity indices
edge_type (LongTensor) – shape: (num_triples,) the edge type, i.e., relation ID
edge_weights (Optional[FloatTensor]) – shape: (num_triples,) scalar edge weights
horizontal (bool) – whether to use horizontal or vertical stacking

Return type

Tensor

Returns

shape: (num_entities * num_relations, num_entities) or (num_entities, num_entities * num_relations) the stacked adjacency matrix

safe_diagonal(matrix)[source]

Extract diagonal from a potentially sparse matrix.

Note

this is a work-around as long as torch.diagonal() does not work for sparse tensors

Parameters: matrix (Tensor) – shape: (n, n) the matrix
Return type: Tensor
Returns: shape: (n,) the diagonal values.

use_horizontal_stacking(input_dim, output_dim)[source]

Determine a stacking direction based on the input and output dimension.

The vertical stacking approach is suitable for low dimensional input and high dimensional output, because the projection to low dimensions is done first. While the horizontal stacking approach is good for high dimensional input and low dimensional output as the projection to high dimension is done last.

Parameters

input_dim (int) – the layer’s input dimension
output_dim (int) – the layer’s output dimension

Return type

bool

Returns

whether to use horizontal (True) or vertical stacking