Utilities

Utilities for neural network components.

class PyOBOCache(*args, **kwargs)[source]

A cache that looks up labels of biomedical entities based on their CURIEs.

Instantiate the PyOBO cache, ensuring PyOBO is installed.

get_texts(identifiers)[source]

Get text for the given CURIEs.

Parameters:: identifiers (Sequence[str]) – The compact URIs for each entity (e.g., ['doid:1234', ...])
Return type:: Sequence[Optional[str]]
Returns:: the label for each entity, looked up via pyobo.get_name(). Might be none if no label is available.

exception ShapeError(shape, reference)[source]

An error for a mismatch in shapes.

Initialize the error.

Parameters:

shape (Sequence[int]) – the mismatching shape
reference (Sequence[int]) – the expected shape

Return type:

None

classmethod verify(shape, reference)[source]

Raise an exception if the shape does not match the reference.

This method normalizes the shapes first.

Parameters:

shape (Union[int, Sequence[int]]) – the shape to check
reference (Union[int, Sequence[int], None]) – the reference shape. If None, the shape always matches.

Raises:

ShapeError – if the two shapes do not match.

Return type:

Sequence[int]

Returns:

the normalized shape

class TextCache[source]

An interface for looking up text for various flavors of entity identifiers.

abstract get_texts(identifiers)[source]

Get text for the given identifiers for the cache.

Return type:: Sequence[Optional[str]]
Parameters:: identifiers (Sequence[str]) –

class WikidataCache[source]

A cache for requests against Wikidata’s SPARQL endpoint.

Initialize the cache.

WIKIDATA_ENDPOINT = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql': Wikidata SPARQL endpoint. See https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service#Interfacing

get_descriptions(wikidata_identifiers)[source]

Get entity descriptions for the given IDs.

Parameters:: wikidata_identifiers (Sequence[str]) – the Wikidata identifiers, each starting with Q (e.g., ['Q42'])
Return type:: Sequence[str]
Returns:: the description for each Wikidata entity

get_image_paths(ids, extensions=('jpeg', 'jpg', 'gif', 'png', 'svg', 'tif'), progress=False)[source]

Get paths to images for the given IDs.

Parameters:

ids (Sequence[str]) – the Wikidata IDs.
extensions (Collection[str]) – the allowed file extensions
progress (bool) – whether to display a progress bar

Return type:

Sequence[Optional[Path]]

Returns:

the paths to images for the given IDs.

get_labels(wikidata_identifiers)[source]

Get entity labels for the given IDs.

Parameters:: wikidata_identifiers (Sequence[str]) – the Wikidata identifiers, each starting with Q (e.g., ['Q42'])
Return type:: Sequence[str]
Returns:: the label for each Wikidata entity

get_texts(identifiers)[source]

Get a concatenation of the title and description for each Wikidata identifier.

Parameters:: identifiers (Sequence[str]) – the Wikidata identifiers, each starting with Q (e.g., ['Q42'])
Return type:: Sequence[str]
Returns:: the label and description for each Wikidata entity concatenated

classmethod query(sparql, wikidata_ids, batch_size=256)[source]

Batched SPARQL query execution for the given IDS.

Parameters:

sparql (Union[str, Callable[…, str]]) – the SPARQL query with a placeholder ids
wikidata_ids (Sequence[str]) – the Wikidata IDs
batch_size (int) – the batch size, i.e., maximum number of IDs per query

Return type:

Iterable[Mapping[str, Any]]

Returns:

an iterable over JSON results, where the keys correspond to query variables, and the values to the corresponding binding

classmethod query_text(wikidata_ids, language='en', batch_size=256)[source]

Query the SPARQL endpoints about information for the given IDs.

Parameters:

wikidata_ids (Sequence[str]) – the Wikidata IDs
language (str) – the label language
batch_size (int) – the batch size; if more ids are provided, break the big request into multiple smaller ones

Return type:

Mapping[str, Mapping[str, str]]

Returns:

a mapping from Wikidata Ids to dictionaries with the label and description of the entities

static verify_ids(ids)[source]

Raise error if invalid IDs are encountered.

Parameters:: ids (Sequence[str]) – the ids to verify
Raises:: ValueError – if any invalid ID is encountered

adjacency_tensor_to_stacked_matrix(num_relations, num_entities, source, target, edge_type, edge_weights=None, horizontal=True)[source]

Stack adjacency matrices as described in [thanapalasingam2021].

This method re-arranges the (sparse) adjacency tensor of shape (num_entities, num_relations, num_entities) to a sparse adjacency matrix of shape (num_entities, num_relations * num_entities) (horizontal stacking) or (num_entities * num_relations, num_entities) (vertical stacking). Thereby, we can perform the relation-specific message passing of R-GCN by a single sparse matrix multiplication (and some additional pre- and/or post-processing) of the inputs.

Parameters:

num_relations (int) – the number of relations
num_entities (int) – the number of entities
source (LongTensor) – shape: (num_triples,) the source entity indices
target (LongTensor) – shape: (num_triples,) the target entity indices
edge_type (LongTensor) – shape: (num_triples,) the edge type, i.e., relation ID
edge_weights (Optional[FloatTensor]) – shape: (num_triples,) scalar edge weights
horizontal (bool) – whether to use horizontal or vertical stacking

Return type:

Tensor

Returns:

shape: (num_entities * num_relations, num_entities) or (num_entities, num_entities * num_relations) the stacked adjacency matrix

safe_diagonal(matrix)[source]

Extract diagonal from a potentially sparse matrix.

Note

this is a work-around as long as torch.diagonal() does not work for sparse tensors

Parameters:: matrix (Tensor) – shape: (n, n) the matrix
Return type:: Tensor
Returns:: shape: (n,) the diagonal values.

use_horizontal_stacking(input_dim, output_dim)[source]

Determine a stacking direction based on the input and output dimension.

The vertical stacking approach is suitable for low dimensional input and high dimensional output, because the projection to low dimensions is done first. While the horizontal stacking approach is good for high dimensional input and low dimensional output as the projection to high dimension is done last.

Parameters:

input_dim (int) – the layer’s input dimension
output_dim (int) – the layer’s output dimension

Return type:

bool

Returns:

whether to use horizontal (True) or vertical stacking