Utilities
Utilities for neural network components.
- class PyOBOCache(*args, **kwargs)[source]
A cache that looks up labels of biomedical entities based on their CURIEs.
Instantiate the PyOBO cache, ensuring PyOBO is installed.
- exception ShapeError(shape, reference)[source]
An error for a mismatch in shapes.
Initialize the error.
- Parameters:
- Return type:
None
- class TextCache[source]
An interface for looking up text for various flavors of entity identifiers.
- class WikidataCache[source]
A cache for requests against Wikidata’s SPARQL endpoint.
Initialize the cache.
- WIKIDATA_ENDPOINT = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql'
Wikidata SPARQL endpoint. See https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service#Interfacing
- get_image_paths(ids, extensions=('jpeg', 'jpg', 'gif', 'png', 'svg', 'tif'), progress=False)[source]
Get paths to images for the given IDs.
- get_texts(identifiers)[source]
Get a concatenation of the title and description for each Wikidata identifier.
- classmethod query(sparql, wikidata_ids, batch_size=256)[source]
Batched SPARQL query execution for the given IDS.
- Parameters:
- Return type:
- Returns:
an iterable over JSON results, where the keys correspond to query variables, and the values to the corresponding binding
- classmethod query_text(wikidata_ids, language='en', batch_size=256)[source]
Query the SPARQL endpoints about information for the given IDs.
- Parameters:
- Return type:
- Returns:
a mapping from Wikidata Ids to dictionaries with the label and description of the entities
- static verify_ids(ids)[source]
Raise error if invalid IDs are encountered.
- Parameters:
- Raises:
ValueError – if any invalid ID is encountered
- adjacency_tensor_to_stacked_matrix(num_relations, num_entities, source, target, edge_type, edge_weights=None, horizontal=True)[source]
Stack adjacency matrices as described in [thanapalasingam2021].
This method re-arranges the (sparse) adjacency tensor of shape (num_entities, num_relations, num_entities) to a sparse adjacency matrix of shape (num_entities, num_relations * num_entities) (horizontal stacking) or (num_entities * num_relations, num_entities) (vertical stacking). Thereby, we can perform the relation-specific message passing of R-GCN by a single sparse matrix multiplication (and some additional pre- and/or post-processing) of the inputs.
- Parameters:
num_relations (
int
) – the number of relationsnum_entities (
int
) – the number of entitiessource (
LongTensor
) – shape: (num_triples,) the source entity indicestarget (
LongTensor
) – shape: (num_triples,) the target entity indicesedge_type (
LongTensor
) – shape: (num_triples,) the edge type, i.e., relation IDedge_weights (
Optional
[FloatTensor
]) – shape: (num_triples,) scalar edge weightshorizontal (
bool
) – whether to use horizontal or vertical stacking
- Return type:
- Returns:
shape: (num_entities * num_relations, num_entities) or (num_entities, num_entities * num_relations) the stacked adjacency matrix
- safe_diagonal(matrix)[source]
Extract diagonal from a potentially sparse matrix.
Note
this is a work-around as long as
torch.diagonal()
does not work for sparse tensors
- use_horizontal_stacking(input_dim, output_dim)[source]
Determine a stacking direction based on the input and output dimension.
The vertical stacking approach is suitable for low dimensional input and high dimensional output, because the projection to low dimensions is done first. While the horizontal stacking approach is good for high dimensional input and low dimensional output as the projection to high dimension is done last.
- Parameters:
- Return type:
- Returns:
whether to use horizontal (True) or vertical stacking
See also