Utilities

Utilities for PyKEEN.

class Bias(dim)[source]

A module wrapper for adding a bias.

Initialize the module.

Parameters

dim (int) – >0 The dimension of the input.

forward(x)[source]

Add the learned bias to the input.

Parameters

x (FloatTensor) – shape: (n, d) The input.

Return type

FloatTensor

Returns

x + b[None, :]

reset_parameters()[source]

Reset the layer’s parameters.

class NoRandomSeedNecessary[source]

Used in pipeline when random seed is set automatically.

class Result[source]

A superclass of results that can be saved to a directory.

abstract save_to_directory(directory, **kwargs)[source]

Save the results to the directory.

Return type

None

abstract save_to_ftp(directory, ftp)[source]

Save the results to the directory in an FTP server.

Return type

None

abstract save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters
  • directory (str) – The directory in the S3 bucket

  • bucket (str) – The name of the S3 bucket

  • s3 – A client from boto3.client(), if already instantiated

Return type

None

all_in_bounds(x, low=None, high=None, a_tol=0.0)[source]

Check if tensor values respect lower and upper bound.

Parameters
Return type

bool

Returns

If all values are within the given bounds

at_least_eps(x)[source]

Make sure a tensor is greater than zero.

Return type

FloatTensor

broadcast_cat(tensors, dim)[source]

Concatenate tensors with broadcasting support.

Parameters
  • tensors (Sequence[FloatTensor]) – The tensors. Each of the tensors is require to have the same number of dimensions. For each dimension not equal to dim, the extent has to match the other tensors’, or be one. If it is one, the tensor is repeated to match the extent of the othe tensors.

  • dim (int) – The concat dimension.

Return type

FloatTensor

Returns

A concatenated, broadcasted tensor.

Raises
  • ValueError – if the x and y dimensions are not the same

  • ValueError – if broadcasting is not possible

calculate_broadcasted_elementwise_result_shape(first, second)[source]

Determine the return shape of a broadcasted elementwise operation.

Return type

Tuple[int, …]

check_shapes(*x, raise_on_errors=True)[source]

Verify that a sequence of tensors are of matching shapes.

Parameters
  • x (Tuple[Union[Tensor, Tuple[int, …]], str]) – A tuple (t, s), where t is a tensor, or an actual shape of a tensor (a tuple of integers), and s is a string, where each character corresponds to a (named) dimension. If the shapes of different tensors share a character, the corresponding dimensions are expected to be of equal size.

  • raise_on_errors (bool) – Whether to raise an exception in case of a mismatch.

Return type

bool

Returns

Whether the shapes matched.

Raises

ValueError – If the shapes mismatch and raise_on_error is True.

Examples: >>> check_shapes(((10, 20), “bd”), ((10, 20, 20), “bdd”)) True >>> check_shapes(((10, 20), “bd”), ((10, 30, 20), “bdd”), raise_on_errors=False) False

clamp_norm(x, maxnorm, p='fro', dim=None)[source]

Ensure that a tensor’s norm does not exceeds some threshold.

Parameters
Return type

Tensor

Returns

A vector with \(|x| <= maxnorm\).

combine_complex(x_re, x_im)[source]

Combine a complex tensor from real and imaginary part.

Return type

FloatTensor

compact_mapping(mapping)[source]

Update a mapping (key -> id) such that the IDs range from 0 to len(mappings) - 1.

Parameters

mapping (Mapping[~X, int]) – The mapping to compact.

Return type

Tuple[Mapping[~X, int], Mapping[int, int]]

Returns

A pair (translated, translation) where translated is the updated mapping, and translation a dictionary from old to new ids.

complex_normalize(x)[source]

Normalize a vector of complex numbers such that each element is of unit-length.

Parameters

x (Tensor) – A tensor formulating complex numbers

Return type

Tensor

Returns

A normalized version accoring to the following definition.

The modulus of complex number is given as:

\[|a + ib| = \sqrt{a^2 + b^2}\]

\(l_2\) norm of complex vector \(x \in \mathbb{C}^d\):

\[\|x\|^2 = \sum_{i=1}^d |x_i|^2 = \sum_{i=1}^d \left(\operatorname{Re}(x_i)^2 + \operatorname{Im}(x_i)^2\right) = \left(\sum_{i=1}^d \operatorname{Re}(x_i)^2) + (\sum_{i=1}^d \operatorname{Im}(x_i)^2\right) = \|\operatorname{Re}(x)\|^2 + \|\operatorname{Im}(x)\|^2 = \| [\operatorname{Re}(x); \operatorname{Im}(x)] \|^2\]
class compose(*operations)[source]

A class representing the composition of several functions.

Initialize the composition with a sequence of operations.

Parameters

operations (Callable[[~X], ~X]) – unary operations that will be applied in succession

compute_box(base, delta, size)[source]

Compute the lower and upper corners of a resulting box.

Parameters
  • base (FloatTensor) – shape: (*, d) the base position (box center) of the input relation embeddings

  • delta (FloatTensor) – shape: (*, d) the base shape of the input relation embeddings

  • size (FloatTensor) – shape: (*, d) the size scalar vectors of the input relation embeddings

Return type

Tuple[FloatTensor, FloatTensor]

Returns

shape: (*, d) each lower and upper bounds of the box whose embeddings are provided as input.

convert_to_canonical_shape(x, dim, num=None, batch_size=1, suffix_shape=- 1)[source]

Convert a tensor to canonical shape.

Parameters
  • x (FloatTensor) – The tensor in compatible shape.

  • dim (Union[int, str]) – The “num” dimension.

  • batch_size (int) – The batch size.

  • num (Optional[int]) – The number.

  • suffix_shape (Union[int, Sequence[int]]) – The suffix shape.

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails, *) A tensor in canonical shape.

ensure_ftp_directory(*, ftp, directory)[source]

Ensure the directory exists on the FTP server.

Return type

None

ensure_torch_random_state(random_state)[source]

Prepare a random state for PyTorch.

Return type

Generator

ensure_tuple(*x)[source]

Ensure that all elements in the sequence are upgraded to sequences.

Parameters

x (Union[~X, Sequence[~X]]) – A sequence of sequences or literals

Return type

Sequence[Sequence[~X]]

Returns

An upgraded sequence of sequences

>>> ensure_tuple(1, (1,), (1, 2))
((1,), (1,), (1, 2))
estimate_cost_of_sequence(shape, *other_shapes)[source]

Cost of a sequence of broadcasted element-wise operations of tensors, given their shapes.

Return type

int

extend_batch(batch, all_ids, dim)[source]

Extend batch for 1-to-all scoring by explicit enumeration.

Parameters
  • batch (LongTensor) – shape: (batch_size, 2) The batch.

  • all_ids (List[int]) – len: num_choices The IDs to enumerate.

  • dim (int) – in {0,1,2} The column along which to insert the enumerated IDs.

Return type

LongTensor

Returns

shape: (batch_size * num_choices, 3) A large batch, where every pair from the original batch is combined with every ID.

extended_einsum(eq, *tensors)[source]

Drop dimensions of size 1 to allow broadcasting.

Return type

FloatTensor

fix_dataclass_init_docs(cls)[source]

Fix the __init__ documentation for a dataclasses.dataclass.

Parameters

cls (Type) – The class whose docstring needs fixing

Return type

Type

Returns

The class that was passed so this function can be used as a decorator

flatten_dictionary(dictionary, prefix=None, sep='.')[source]

Flatten a nested dictionary.

Return type

Dict[str, Any]

format_relative_comparison(part, total)[source]

Format a relative comparison.

Return type

str

get_batchnorm_modules(module)[source]

Return all submodules which are batch normalization layers.

Return type

List[Module]

get_benchmark(name)[source]

Get the benchmark directory for this version.

Return type

Path

get_df_io(df)[source]

Get the dataframe as bytes.

Return type

BytesIO

get_dropout_modules(module)[source]

Return all submodules which are dropout layers.

Return type

List[Module]

get_expected_norm(p, d)[source]

Compute the expected value of the L_p norm.

\[E[\|x\|_p] = d^{1/p} E[|x_1|^p]^{1/p}\]

under the assumption that \(x_i \sim N(0, 1)\), i.e.

\[E[|x_1|^p] = 2^{p/2} \cdot \Gamma(\frac{p+1}{2} \cdot \pi^{-1/2}\]
Parameters
  • p (Union[int, float, str]) – The parameter p of the norm.

  • d (int) – The dimension of the vector.

Return type

float

Returns

The expected value.

Raises
get_json_bytes_io(obj)[source]

Get the JSON as bytes.

Return type

BytesIO

get_model_io(model)[source]

Get the model as bytes.

Return type

BytesIO

get_optimal_sequence(*shapes)[source]

Find the optimal sequence in which to combine tensors elementwise based on the shapes.

Parameters

shapes (Tuple[int, …]) – The shapes of the tensors to combine.

Return type

Tuple[int, Tuple[int, …]]

Returns

The optimal execution order (as indices), and the cost.

get_until_first_blank(s)[source]

Recapitulate all lines in the string until the first blank line.

Return type

str

invert_mapping(mapping)[source]

Invert a mapping.

Parameters

mapping (Mapping[~K, ~V]) – The mapping, key -> value.

Return type

Mapping[~V, ~K]

Returns

The inverse mapping, value -> key.

Raises

ValueError – if the mapping is not bijective

is_cuda_oom_error(runtime_error)[source]

Check whether the caught RuntimeError was due to CUDA being out of memory.

Return type

bool

is_cudnn_error(runtime_error)[source]

Check whether the caught RuntimeError was due to a CUDNN error.

Return type

bool

lp_norm(x, p, dim, normalize)[source]

Return the \(L_p\) norm.

Return type

FloatTensor

negative_norm(x, p=2, power_norm=False)[source]

Evaluate negative norm of a vector.

Parameters
Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

negative_norm_of_sum(*x, p=2, power_norm=False)[source]

Evaluate negative norm of a sum of vectors on already broadcasted representations.

Parameters
Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

normalize_string(s, *, suffix=None)[source]

Normalize a string for lookup.

Return type

str

point_to_box_distance(points, box_lows, box_highs)[source]

Compute the point to box distance function proposed by [abboud2020] in an element-wise fashion.

Parameters
  • points (FloatTensor) – shape: (*, d) the positions of the points being scored against boxes

  • box_lows (FloatTensor) – shape: (*, d) the lower corners of the boxes

  • box_highs (FloatTensor) – shape: (*, d) the upper corners of the boxes

Return type

FloatTensor

Returns

Element-wise distance function scores as per the definition above

Given points \(p\), box_lows \(l\), and box_highs \(h\), the following quantities are defined:

  • Width \(w\) is the difference between the upper and lower box bound: \(w = h - l\)

  • Box centers \(c\) are the mean of the box bounds: \(c = (h + l) / 2\)

Finally, the point to box distance \(dist(p,l,h)\) is defined as the following piecewise function:

\[\begin{split}dist(p,l,h) = \begin{cases} |p-c|/(w+1) & l <= p <+ h \\ |p-c|*(w+1) - 0.5*w*((w+1)-1/(w+1)) & otherwise \\ \end{cases}\end{split}\]

powersum_norm(x, p, dim, normalize)[source]

Return the power sum norm.

Return type

FloatTensor

product_normalize(x, dim=- 1)[source]

Normalize a tensor along a given dimension so that the geometric mean is 1.0.

Parameters
  • x (FloatTensor) – shape: s An input tensor

  • dim (int) – the dimension along which to normalize the tensor

Return type

FloatTensor

Returns

shape: s An output tensor where the given dimension is normalized to have a geometric mean of 1.0.

project_entity(e, e_p, r_p)[source]

Project entity relation-specific.

\[e_{\bot} = M_{re} e = (r_p e_p^T + I^{d_r \times d_e}) e = r_p e_p^T e + I^{d_r \times d_e} e = r_p (e_p^T e) + e'\]

and additionally enforces

\[\|e_{\bot}\|_2 \leq 1\]
Parameters
  • e (FloatTensor) – shape: (…, d_e) The entity embedding.

  • e_p (FloatTensor) – shape: (…, d_e) The entity projection.

  • r_p (FloatTensor) – shape: (…, d_r) The relation projection.

Return type

FloatTensor

Returns

shape: (…, d_r)

random_non_negative_int()[source]

Generate a random positive integer.

Return type

int

resolve_device(device=None)[source]

Resolve a torch.device given a desired device (string).

Return type

device

set_random_seed(seed)[source]

Set the random seed on numpy, torch, and python.

Parameters

seed (int) – The seed that will be used in np.random.seed(), torch.manual_seed(), and random.seed().

Return type

Tuple[None, Generator, None]

Returns

A three tuple with None, the torch generator, and None.

split_complex(x)[source]

Split a complex tensor into real and imaginary part.

Return type

Tuple[FloatTensor, FloatTensor]

split_list_in_batches_iter(input_list, batch_size)[source]

Split a list of instances in batches of size batch_size.

Return type

Iterable[List[~X]]

strip_dim(*tensors, n=4)[source]

Strip the first dimensions.

Parameters
  • tensors (FloatTensor) – The tensors whose first n dimensions should be independently stripped

  • n (int) – The number of initial dimensions to strip

Return type

Sequence[FloatTensor]

Returns

A tuple of the reduced tensors

tensor_product(*tensors)[source]

Compute element-wise product of tensors in broadcastable shape.

Return type

FloatTensor

tensor_sum(*tensors)[source]

Compute element-wise sum of tensors in broadcastable shape.

Return type

FloatTensor

unpack_singletons(*xs)[source]

Unpack sequences of length one.

Parameters

xs (Tuple[~X]) – A sequence of tuples of length 1 or more

Return type

Sequence[Union[~X, Tuple[~X]]]

Returns

An unpacked sequence of sequences

>>> unpack_singletons((1,), (1, 2), (1, 2, 3))
(1, (1, 2), (1, 2, 3))
upgrade_to_sequence(x)[source]

Ensure that the input is a sequence.

Note

While strings are technically also a sequence, i.e.,

isinstance("test", typing.Sequence) is True

this may lead to unexpected behaviour when calling upgrade_to_sequence(“test”). We thus handle strings as non-sequences. To recover the other behavior, the following may be used:

upgrade_to_sequence(tuple("test"))
Parameters

x (Union[~X, Sequence[~X]]) – A literal or sequence of literals

Return type

Sequence[~X]

Returns

If a literal was given, a one element tuple with it in it. Otherwise, return the given value.

>>> upgrade_to_sequence(1)
(1,)
>>> upgrade_to_sequence((1, 2, 3))
(1, 2, 3)
>>> upgrade_to_sequence("test")
('test',)
>>> upgrade_to_sequence(tuple("test"))
('t', 'e', 's', 't')
view_complex(x)[source]

Convert a PyKEEN complex tensor representation into a torch one.

Return type

Tensor

env(file=None)[source]

Print the env or output as HTML if in Jupyter.

Param

The file to print to if not in a Jupyter setting. Defaults to sys.stdout

Returns

A IPython.display.HTML if in a Jupyter notebook setting, otherwise none.

Version information for PyKEEN.

get_git_branch()[source]

Get the PyKEEN branch, if installed from git in editable mode.

Return type

Optional[str]

Returns

Returns the name of the current branch, or None if not installed in development mode.

get_git_hash(terse=True)[source]

Get the PyKEEN git hash.

Return type

str

Returns

The git hash, equals ‘UNHASHED’ if encountered CalledProcessError, signifying that the code is not installed in development mode.

get_version(with_git_hash=False)[source]

Get the PyKEEN version string, including a git hash.

Parameters

with_git_hash (bool) – If set to True, the git hash will be appended to the version.

Return type

str

Returns

The PyKEEN version as well as the git hash, if the parameter with_git_hash was set to true.