Utilities

Utilities for PyKEEN.

class Bias(dim)[source]

A module wrapper for adding a bias.

Initialize the module.

Parameters

dim (int) – >0 The dimension of the input.

forward(x)[source]

Add the learned bias to the input.

Parameters

x (FloatTensor) – shape: (n, d) The input.

Return type

FloatTensor

Returns

x + b[None, :]

reset_parameters()[source]

Reset the layer’s parameters.

class NoRandomSeedNecessary[source]

Used in pipeline when random seed is set automatically.

class Result[source]

A superclass of results that can be saved to a directory.

abstract save_to_directory(directory, **kwargs)[source]

Save the results to the directory.

Return type

None

Parameters

directory (str) –

abstract save_to_ftp(directory, ftp)[source]

Save the results to the directory in an FTP server.

Return type

None

Parameters
  • directory (str) –

  • ftp (FTP) –

abstract save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters
  • directory (str) – The directory in the S3 bucket

  • bucket (str) – The name of the S3 bucket

  • s3 – A client from boto3.client(), if already instantiated

Return type

None

all_in_bounds(x, low=None, high=None, a_tol=0.0)[source]

Check if tensor values respect lower and upper bound.

Parameters
Return type

bool

Returns

If all values are within the given bounds

at_least_eps(x)[source]

Make sure a tensor is greater than zero.

Return type

FloatTensor

Parameters

x (FloatTensor) –

broadcast_upgrade_to_sequences(*xs)[source]

Apply upgrade_to_sequence to each input, and afterwards repeat singletons to match the maximum length.

Parameters

xs (Union[~X, Sequence[~X]]) – length: m the inputs.

Return type

Sequence[Sequence[~X]]

Returns

a sequence of length m, where each element is a sequence and all elements have the same length.

Raises

ValueError – if there is a non-singleton sequence input with length different from the maximum sequence length.

>>> broadcast_upgrade_to_sequences(1)
((1,),)
>>> broadcast_upgrade_to_sequences(1, 2)
((1,), (2,))
>>> broadcast_upgrade_to_sequences(1, (2, 3))
((1, 1), (2, 3))
calculate_broadcasted_elementwise_result_shape(first, second)[source]

Determine the return shape of a broadcasted elementwise operation.

Return type

Tuple[int, …]

Parameters
check_shapes(*x, raise_on_errors=True)[source]

Verify that a sequence of tensors are of matching shapes.

Parameters
  • x (Tuple[Union[Tensor, Tuple[int, …]], str]) – A tuple (t, s), where t is a tensor, or an actual shape of a tensor (a tuple of integers), and s is a string, where each character corresponds to a (named) dimension. If the shapes of different tensors share a character, the corresponding dimensions are expected to be of equal size.

  • raise_on_errors (bool) – Whether to raise an exception in case of a mismatch.

Return type

bool

Returns

Whether the shapes matched.

Raises

ValueError – If the shapes mismatch and raise_on_error is True.

Examples: >>> check_shapes(((10, 20), “bd”), ((10, 20, 20), “bdd”)) True >>> check_shapes(((10, 20), “bd”), ((10, 30, 20), “bdd”), raise_on_errors=False) False

clamp_norm(x, maxnorm, p='fro', dim=None)[source]

Ensure that a tensor’s norm does not exceeds some threshold.

Parameters
Return type

Tensor

Returns

A vector with \(|x| <= maxnorm\).

combine_complex(x_re, x_im)[source]

Combine a complex tensor from real and imaginary part.

Return type

FloatTensor

Parameters
  • x_re (FloatTensor) –

  • x_im (FloatTensor) –

compact_mapping(mapping)[source]

Update a mapping (key -> id) such that the IDs range from 0 to len(mappings) - 1.

Parameters

mapping (Mapping[~X, int]) – The mapping to compact.

Return type

Tuple[Mapping[~X, int], Mapping[int, int]]

Returns

A pair (translated, translation) where translated is the updated mapping, and translation a dictionary from old to new ids.

complex_normalize(x)[source]

Normalize a vector of complex numbers such that each element is of unit-length.

Let \(x \in \mathbb{C}^d\) denote a complex vector. Then, the operation computes

\[x_i' = \frac{x_i}{|x_i|}\]

where \(|x_i| = \sqrt{Re(x_i)^2 + Im(x_i)^2}\) is the modulus of complex number

Parameters

x (Tensor) – A tensor formulating complex numbers

Return type

Tensor

Returns

An elementwise noramlized vector.

class compose(*operations, name)[source]

A class representing the composition of several functions.

Initialize the composition with a sequence of operations.

Parameters
  • operations (Callable[[~X], ~X]) – unary operations that will be applied in succession

  • name (str) – The name of the composed function.

compute_box(base, delta, size)[source]

Compute the lower and upper corners of a resulting box.

Parameters
  • base (FloatTensor) – shape: (*, d) the base position (box center) of the input relation embeddings

  • delta (FloatTensor) – shape: (*, d) the base shape of the input relation embeddings

  • size (FloatTensor) – shape: (*, d) the size scalar vectors of the input relation embeddings

Return type

Tuple[FloatTensor, FloatTensor]

Returns

shape: (*, d) each lower and upper bounds of the box whose embeddings are provided as input.

convert_to_canonical_shape(x, dim, num=None, batch_size=1, suffix_shape=- 1)[source]

Convert a tensor to canonical shape.

Parameters
  • x (FloatTensor) – The tensor in compatible shape.

  • dim (Union[int, str]) – The “num” dimension.

  • batch_size (int) – The batch size.

  • num (Optional[int]) – The number.

  • suffix_shape (Union[int, Sequence[int]]) – The suffix shape.

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails, *) A tensor in canonical shape.

create_relation_to_entity_set_mapping(triples)[source]

Create mappings from relation IDs to the set of their head / tail entities.

Parameters

triples (Iterable[Tuple[int, int, int]]) – The triples.

Return type

Tuple[Mapping[int, Set[int]], Mapping[int, Set[int]]]

Returns

A pair of dictionaries, each mapping relation IDs to entity ID sets.

ensure_complex(*xs)[source]

Ensure that all tensors are of complex dtype.

Reshape and convert if necessary.

Parameters

xs (Tensor) – the tensors

Yields

complex tensors.

Return type

Iterable[Tensor]

ensure_ftp_directory(*, ftp, directory)[source]

Ensure the directory exists on the FTP server.

Return type

None

Parameters
  • ftp (FTP) –

  • directory (str) –

ensure_torch_random_state(random_state)[source]

Prepare a random state for PyTorch.

Return type

Generator

Parameters

random_state (Union[None, int, Generator]) –

ensure_tuple(*x)[source]

Ensure that all elements in the sequence are upgraded to sequences.

Parameters

x (Union[~X, Sequence[~X]]) – A sequence of sequences or literals

Return type

Sequence[Sequence[~X]]

Returns

An upgraded sequence of sequences

>>> ensure_tuple(1, (1,), (1, 2))
((1,), (1,), (1, 2))
estimate_cost_of_sequence(shape, *other_shapes)[source]

Cost of a sequence of broadcasted element-wise operations of tensors, given their shapes.

Return type

int

Parameters
extend_batch(batch, max_id, dim)[source]

Extend batch for 1-to-all scoring by explicit enumeration.

Parameters
  • batch (LongTensor) – shape: (batch_size, 2) The batch.

  • max_id (int) – The maximum IDs to enumerate.

  • dim (int) – in {0,1,2} The column along which to insert the enumerated IDs.

Return type

LongTensor

Returns

shape: (batch_size * num_choices, 3) A large batch, where every pair from the original batch is combined with every ID.

extended_einsum(eq, *tensors)[source]

Drop dimensions of size 1 to allow broadcasting.

Return type

FloatTensor

Parameters

eq (str) –

fix_dataclass_init_docs(cls)[source]

Fix the __init__ documentation for a dataclasses.dataclass.

Parameters

cls (Type) – The class whose docstring needs fixing

Return type

Type

Returns

The class that was passed so this function can be used as a decorator

flatten_dictionary(dictionary, prefix=None, sep='.')[source]

Flatten a nested dictionary.

Return type

Dict[str, Any]

Parameters
format_relative_comparison(part, total)[source]

Format a relative comparison.

Return type

str

Parameters
  • part (int) –

  • total (int) –

get_batchnorm_modules(module)[source]

Return all submodules which are batch normalization layers.

Return type

List[Module]

Parameters

module (Module) –

get_benchmark(name)[source]

Get the benchmark directory for this version.

Return type

Path

Parameters

name (str) –

get_connected_components(pairs)[source]

Calculate the connected components for a graph given as edge list.

The implementation uses a union-find data structure with path compression.

Parameters

pairs (Iterable[Tuple[~X, ~X]]) – the edge list, i.e., pairs of node ids.

Return type

Collection[Collection[~X]]

Returns

a collection of connected components, i.e., a collection of disjoint collections of node ids.

get_devices(module)[source]

Return the device(s) from each components of the model.

Return type

Collection[device]

Parameters

module (Module) –

get_df_io(df)[source]

Get the dataframe as bytes.

Return type

BytesIO

Parameters

df (DataFrame) –

get_dropout_modules(module)[source]

Return all submodules which are dropout layers.

Return type

List[Module]

Parameters

module (Module) –

get_edge_index(*, triples_factory=None, mapped_triples=None, edge_index=None)[source]

Get the edge index from a number of different sources.

Parameters
  • triples_factory (Optional[Any]) – the triples factory

  • mapped_triples (Optional[LongTensor]) – shape: (m, 3) ID-based triples

  • edge_index (Optional[LongTensor]) – shape: (2, m) the edge index

Raises

ValueError – if none of the source was different from None

Return type

LongTensor

Returns

shape: (2, m) the edge index

get_expected_norm(p, d)[source]

Compute the expected value of the L_p norm.

\[E[\|x\|_p] = d^{1/p} E[|x_1|^p]^{1/p}\]

under the assumption that \(x_i \sim N(0, 1)\), i.e.

\[E[|x_1|^p] = 2^{p/2} \cdot \Gamma(\frac{p+1}{2} \cdot \pi^{-1/2}\]
Parameters
  • p (Union[int, float, str]) – The parameter p of the norm.

  • d (int) – The dimension of the vector.

Return type

float

Returns

The expected value.

Raises
get_json_bytes_io(obj)[source]

Get the JSON as bytes.

Return type

BytesIO

get_model_io(model)[source]

Get the model as bytes.

Return type

BytesIO

get_optimal_sequence(*shapes)[source]

Find the optimal sequence in which to combine tensors elementwise based on the shapes.

Parameters

shapes (Tuple[int, …]) – The shapes of the tensors to combine.

Return type

Tuple[int, Tuple[int, …]]

Returns

The optimal execution order (as indices), and the cost.

get_preferred_device(module, allow_ambiguity=True)[source]

Return the preferred device.

Return type

device

Parameters
  • module (Module) –

  • allow_ambiguity (bool) –

get_until_first_blank(s)[source]

Recapitulate all lines in the string until the first blank line.

Return type

str

Parameters

s (str) –

invert_mapping(mapping)[source]

Invert a mapping.

Parameters

mapping (Mapping[~K, ~V]) – The mapping, key -> value.

Return type

Mapping[~V, ~K]

Returns

The inverse mapping, value -> key.

Raises

ValueError – if the mapping is not bijective

is_cuda_oom_error(runtime_error)[source]

Check whether the caught RuntimeError was due to CUDA being out of memory.

Return type

bool

Parameters

runtime_error (RuntimeError) –

is_cudnn_error(runtime_error)[source]

Check whether the caught RuntimeError was due to a CUDNN error.

Return type

bool

Parameters

runtime_error (RuntimeError) –

is_triple_tensor_subset(a, b)[source]

Check whether one tensor of triples is a subset of another one.

Return type

bool

Parameters
  • a (LongTensor) –

  • b (LongTensor) –

logcumsumexp(a)[source]

Compute log(cumsum(exp(a))).

Parameters

a (ndarray) – shape: s the array

Return type

ndarray

Returns

shape s the log-cumsum-exp of the array

See also

scipy.special.logsumexp() and torch.logcumsumexp()

lp_norm(x, p, dim, normalize)[source]

Return the \(L_p\) norm.

Return type

FloatTensor

Parameters
negative_norm(x, p=2, power_norm=False)[source]

Evaluate negative norm of a vector.

Parameters
Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

negative_norm_of_sum(*x, p=2, power_norm=False)[source]

Evaluate negative norm of a sum of vectors on already broadcasted representations.

Parameters
Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

nested_get(d, *key, default=None)[source]

Get from a nested dictionary.

Parameters
  • d (Mapping[str, Any]) – the (nested) dictionary

  • key (str) – a sequence of keys

  • default – the default value

Return type

Any

Returns

the value or default

normalize_path(path, *other, mkdir=False, is_file=False, default=None)[source]

Normalize a path.

Parameters
  • path (Union[str, Path, TextIO, None]) – the path in either of the valid forms.

  • other (Union[str, Path]) – additional parts to join to the path

  • mkdir (bool) – whether to ensure that the path refers to an existing directory by creating it if necessary

  • is_file (bool) – whether the path is intended to be a file - only relevant for creating directories

  • default (Union[str, Path, TextIO, None]) – the default to use if path is None

Raises
Return type

Path

Returns

the absolute and resolved path

normalize_string(s, *, suffix=None)[source]

Normalize a string for lookup.

Return type

str

Parameters
point_to_box_distance(points, box_lows, box_highs)[source]

Compute the point to box distance function proposed by [abboud2020] in an element-wise fashion.

Parameters
  • points (FloatTensor) – shape: (*, d) the positions of the points being scored against boxes

  • box_lows (FloatTensor) – shape: (*, d) the lower corners of the boxes

  • box_highs (FloatTensor) – shape: (*, d) the upper corners of the boxes

Return type

FloatTensor

Returns

Element-wise distance function scores as per the definition above

Given points \(p\), box_lows \(l\), and box_highs \(h\), the following quantities are defined:

  • Width \(w\) is the difference between the upper and lower box bound: \(w = h - l\)

  • Box centers \(c\) are the mean of the box bounds: \(c = (h + l) / 2\)

Finally, the point to box distance \(dist(p,l,h)\) is defined as the following piecewise function:

\[\begin{split}dist(p,l,h) = \begin{cases} |p-c|/(w+1) & l <= p <+ h \\ |p-c|*(w+1) - 0.5*w*((w+1)-1/(w+1)) & otherwise \\ \end{cases}\end{split}\]

powersum_norm(x, p, dim, normalize)[source]

Return the power sum norm.

Return type

FloatTensor

Parameters
prepare_filter_triples(mapped_triples, additional_filter_triples=None, warn=True)[source]

Prepare the filter triples from the evaluation triples, and additional filter triples.

Return type

LongTensor

Parameters
  • mapped_triples (LongTensor) –

  • additional_filter_triples (Union[None, LongTensor, List[LongTensor]]) –

  • warn (bool) –

product_normalize(x, dim=- 1)[source]

Normalize a tensor along a given dimension so that the geometric mean is 1.0.

Parameters
  • x (FloatTensor) – shape: s An input tensor

  • dim (int) – the dimension along which to normalize the tensor

Return type

FloatTensor

Returns

shape: s An output tensor where the given dimension is normalized to have a geometric mean of 1.0.

project_entity(e, e_p, r_p)[source]

Project entity relation-specific.

\[e_{\bot} = M_{re} e = (r_p e_p^T + I^{d_r \times d_e}) e = r_p e_p^T e + I^{d_r \times d_e} e = r_p (e_p^T e) + e'\]

and additionally enforces

\[\|e_{\bot}\|_2 \leq 1\]
Parameters
  • e (FloatTensor) – shape: (…, d_e) The entity embedding.

  • e_p (FloatTensor) – shape: (…, d_e) The entity projection.

  • r_p (FloatTensor) – shape: (…, d_r) The relation projection.

Return type

FloatTensor

Returns

shape: (…, d_r)

random_non_negative_int()[source]

Generate a random positive integer.

Return type

int

rate_limited(xs, min_avg_time=1.0)[source]

Iterate over iterable with rate limit.

Parameters
  • xs (Iterable[~X]) – the iterable

  • min_avg_time (float) – the minimum average time per element

Yields

elements of the iterable

Return type

Iterable[~X]

resolve_device(device=None)[source]

Resolve a torch.device given a desired device (string).

Return type

device

Parameters

device (Optional[Union[str, device]]) –

set_random_seed(seed)[source]

Set the random seed on numpy, torch, and python.

Parameters

seed (int) – The seed that will be used in np.random.seed(), torch.manual_seed(), and random.seed().

Return type

Tuple[None, Generator, None]

Returns

A three tuple with None, the torch generator, and None.

split_complex(x)[source]

Split a complex tensor into real and imaginary part.

Return type

Tuple[FloatTensor, FloatTensor]

Parameters

x (FloatTensor) –

split_list_in_batches_iter(input_list, batch_size)[source]

Split a list of instances in batches of size batch_size.

Return type

Iterable[List[~X]]

Parameters
  • input_list (List[X]) –

  • batch_size (int) –

tensor_product(*tensors)[source]

Compute element-wise product of tensors in broadcastable shape.

Return type

FloatTensor

Parameters

tensors (FloatTensor) –

tensor_sum(*tensors)[source]

Compute element-wise sum of tensors in broadcastable shape.

Return type

FloatTensor

Parameters

tensors (FloatTensor) –

triple_tensor_to_set(tensor)[source]

Convert a tensor of triples to a set of int-tuples.

Return type

Set[Tuple[int, …]]

Parameters

tensor (LongTensor) –

unpack_singletons(*xs)[source]

Unpack sequences of length one.

Parameters

xs (Tuple[~X]) – A sequence of tuples of length 1 or more

Return type

Sequence[Union[~X, Tuple[~X]]]

Returns

An unpacked sequence of sequences

>>> unpack_singletons((1,), (1, 2), (1, 2, 3))
(1, (1, 2), (1, 2, 3))
upgrade_to_sequence(x)[source]

Ensure that the input is a sequence.

Note

While strings are technically also a sequence, i.e.,

isinstance("test", typing.Sequence) is True

this may lead to unexpected behaviour when calling upgrade_to_sequence(“test”). We thus handle strings as non-sequences. To recover the other behavior, the following may be used:

upgrade_to_sequence(tuple("test"))
Parameters

x (Union[~X, Sequence[~X]]) – A literal or sequence of literals

Return type

Sequence[~X]

Returns

If a literal was given, a one element tuple with it in it. Otherwise, return the given value.

>>> upgrade_to_sequence(1)
(1,)
>>> upgrade_to_sequence((1, 2, 3))
(1, 2, 3)
>>> upgrade_to_sequence("test")
('test',)
>>> upgrade_to_sequence(tuple("test"))
('t', 'e', 's', 't')
view_complex(x)[source]

Convert a PyKEEN complex tensor representation into a torch one.

Return type

Tensor

Parameters

x (FloatTensor) –

env(file=None)[source]

Print the env or output as HTML if in Jupyter.

Parameters

file – The file to print to if not in a Jupyter setting. Defaults to sys.stdout

Returns

A IPython.display.HTML if in a Jupyter notebook setting, otherwise none.

Version information for PyKEEN.

get_git_branch()[source]

Get the PyKEEN branch, if installed from git in editable mode.

Return type

Optional[str]

Returns

Returns the name of the current branch, or None if not installed in development mode.

get_git_hash(terse=True)[source]

Get the PyKEEN git hash.

Parameters

terse (bool) – Should the hash be clipped to 8 characters?

Return type

str

Returns

The git hash, equals ‘UNHASHED’ if encountered CalledProcessError, signifying that the code is not installed in development mode.

get_version(with_git_hash=False)[source]

Get the PyKEEN version string, including a git hash.

Parameters

with_git_hash (bool) – If set to True, the git hash will be appended to the version.

Return type

str

Returns

The PyKEEN version as well as the git hash, if the parameter with_git_hash was set to true.