Utilities

Utilities for PyKEEN.

class Bias(dim)[source]

A module wrapper for adding a bias.

Initialize the module.

Parameters: dim (int) – >0 The dimension of the input.

forward(x)[source]

Add the learned bias to the input.

Parameters: x (FloatTensor) – shape: (n, d) The input.
Return type: FloatTensor
Returns: x + b[None, :]

reset_parameters()[source]: Reset the layer’s parameters.

class NoRandomSeedNecessary[source]: Used in pipeline when random seed is set automatically.

class Result[source]

A superclass of results that can be saved to a directory.

abstract save_to_directory(directory, **kwargs)[source]

Save the results to the directory.

Return type: None

abstract save_to_ftp(directory, ftp)[source]

Save the results to the directory in an FTP server.

Return type: None

abstract save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters

directory (str) – The directory in the S3 bucket
bucket (str) – The name of the S3 bucket
s3 – A client from boto3.client(), if already instantiated

Return type

None

all_in_bounds(x, low=None, high=None, a_tol=0.0)[source]

Check if tensor values respect lower and upper bound.

Parameters

x (Tensor) – The tensor.
low (Optional[float]) – The lower bound.
high (Optional[float]) – The upper bound.
a_tol (float) – Absolute tolerance.

Return type

bool

Returns

If all values are within the given bounds

at_least_eps(x)[source]

Make sure a tensor is greater than zero.

Return type: FloatTensor

broadcast_upgrade_to_sequences(*xs)[source]

Apply upgrade_to_sequence to each input, and afterwards repeat singletons to match the maximum length.

Parameters: xs (Union[~X, Sequence[~X]]) – length: m the inputs.
Return type: Sequence[Sequence[~X]]
Returns: a sequence of length m, where each element is a sequence and all elements have the same length.
Raises: ValueError – if there is a non-singleton sequence input with length different from the maximum sequence length.

>>> broadcast_upgrade_to_sequences(1)
((1,),)
>>> broadcast_upgrade_to_sequences(1, 2)
((1,), (2,))
>>> broadcast_upgrade_to_sequences(1, (2, 3))
((1, 1), (2, 3))

calculate_broadcasted_elementwise_result_shape(first, second)[source]

Determine the return shape of a broadcasted elementwise operation.

Return type: Tuple[int, …]

check_shapes(*x, raise_on_errors=True)[source]

Verify that a sequence of tensors are of matching shapes.

Parameters

x (Tuple[Union[Tensor, Tuple[int, …]], str]) – A tuple (t, s), where t is a tensor, or an actual shape of a tensor (a tuple of integers), and s is a string, where each character corresponds to a (named) dimension. If the shapes of different tensors share a character, the corresponding dimensions are expected to be of equal size.
raise_on_errors (bool) – Whether to raise an exception in case of a mismatch.

Return type

bool

Returns

Whether the shapes matched.

Raises

ValueError – If the shapes mismatch and raise_on_error is True.

Examples: >>> check_shapes(((10, 20), “bd”), ((10, 20, 20), “bdd”)) True >>> check_shapes(((10, 20), “bd”), ((10, 30, 20), “bdd”), raise_on_errors=False) False

clamp_norm(x, maxnorm, p='fro', dim=None)[source]

Ensure that a tensor’s norm does not exceeds some threshold.

Parameters

x (Tensor) – The vector.
maxnorm (float) – The maximum norm (>0).
p (Union[str, int]) – The norm type.
dim (Union[None, int, Iterable[int]]) – The dimension(s).

Return type

Tensor

Returns

A vector with \(|x| <= maxnorm\).

combine_complex(x_re, x_im)[source]

Combine a complex tensor from real and imaginary part.

Return type: FloatTensor

compact_mapping(mapping)[source]

Update a mapping (key -> id) such that the IDs range from 0 to len(mappings) - 1.

Parameters: mapping (Mapping[~X, int]) – The mapping to compact.
Return type: Tuple[Mapping[~X, int], Mapping[int, int]]
Returns: A pair (translated, translation) where translated is the updated mapping, and translation a dictionary from old to new ids.

complex_normalize(x)[source]

Normalize a vector of complex numbers such that each element is of unit-length.

Let \(x \in \mathbb{C}^d\) denote a complex vector. Then, the operation computes

\[x_i' = \frac{x_i}{|x_i|}\]

where \(|x_i| = \sqrt{Re(x_i)^2 + Im(x_i)^2}\) is the modulus of complex number

Parameters: x (Tensor) – A tensor formulating complex numbers
Return type: Tensor
Returns: An elementwise noramlized vector.

class compose(*operations, name)[source]

A class representing the composition of several functions.

Initialize the composition with a sequence of operations.

Parameters

operations (Callable[[~X], ~X]) – unary operations that will be applied in succession
name (str) – The name of the composed function.

compute_box(base, delta, size)[source]

Compute the lower and upper corners of a resulting box.

Parameters

base (FloatTensor) – shape: (*, d) the base position (box center) of the input relation embeddings
delta (FloatTensor) – shape: (*, d) the base shape of the input relation embeddings
size (FloatTensor) – shape: (*, d) the size scalar vectors of the input relation embeddings

Return type

Tuple[FloatTensor, FloatTensor]

Returns

shape: (*, d) each lower and upper bounds of the box whose embeddings are provided as input.

convert_to_canonical_shape(x, dim, num=None, batch_size=1, suffix_shape=- 1)[source]

Convert a tensor to canonical shape.

Parameters

x (FloatTensor) – The tensor in compatible shape.
dim (Union[int, str]) – The “num” dimension.
batch_size (int) – The batch size.
num (Optional[int]) – The number.
suffix_shape (Union[int, Sequence[int]]) – The suffix shape.

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails, *) A tensor in canonical shape.

create_relation_to_entity_set_mapping(triples)[source]

Create mappings from relation IDs to the set of their head / tail entities.

Parameters: triples (Iterable[Tuple[int, int, int]]) – The triples.
Return type: Tuple[Mapping[int, Set[int]], Mapping[int, Set[int]]]
Returns: A pair of dictionaries, each mapping relation IDs to entity ID sets.

ensure_complex(*xs)[source]

Ensure that all tensors are of complex dtype.

Reshape and convert if necessary.

Parameters: xs (Tensor) – the tensors
Yields: complex tensors.
Return type: Iterable[Tensor]

ensure_ftp_directory(*, ftp, directory)[source]

Ensure the directory exists on the FTP server.

Return type: None

ensure_torch_random_state(random_state)[source]

Prepare a random state for PyTorch.

Return type: Generator

ensure_tuple(*x)[source]

Ensure that all elements in the sequence are upgraded to sequences.

Parameters: x (Union[~X, Sequence[~X]]) – A sequence of sequences or literals
Return type: Sequence[Sequence[~X]]
Returns: An upgraded sequence of sequences

>>> ensure_tuple(1, (1,), (1, 2))
((1,), (1,), (1, 2))

estimate_cost_of_sequence(shape, *other_shapes)[source]

Cost of a sequence of broadcasted element-wise operations of tensors, given their shapes.

Return type: int

extend_batch(batch, max_id, dim)[source]

Extend batch for 1-to-all scoring by explicit enumeration.

Parameters

batch (LongTensor) – shape: (batch_size, 2) The batch.
max_id (int) – The maximum IDs to enumerate.
dim (int) – in {0,1,2} The column along which to insert the enumerated IDs.

Return type

LongTensor

Returns

shape: (batch_size * num_choices, 3) A large batch, where every pair from the original batch is combined with every ID.

extended_einsum(eq, *tensors)[source]

Drop dimensions of size 1 to allow broadcasting.

Return type: FloatTensor

fix_dataclass_init_docs(cls)[source]

Fix the __init__ documentation for a dataclasses.dataclass.

Parameters: cls (Type) – The class whose docstring needs fixing
Return type: Type
Returns: The class that was passed so this function can be used as a decorator

See also

scipy.special.logsumexp() and torch.logcumsumexp()

lp_norm(x, p, dim, normalize)[source]

Return the \(L_p\) norm.

Return type: FloatTensor

negative_norm(x, p=2, power_norm=False)[source]

Evaluate negative norm of a vector.

Parameters

x (FloatTensor) – shape: (batch_size, num_heads, num_relations, num_tails, dim) The vectors.
p (Union[str, int, float]) – The p for the norm. cf. torch.linalg.vector_norm().
power_norm (bool) – Whether to return \(|x-y|_p^p\), cf. https://github.com/pytorch/pytorch/issues/28119

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

negative_norm_of_sum(*x, p=2, power_norm=False)[source]

Evaluate negative norm of a sum of vectors on already broadcasted representations.

Parameters

x (FloatTensor) – shape: (batch_size, num_heads, num_relations, num_tails, dim) The representations.
p (Union[str, int, float]) – The p for the norm. cf. torch.linalg.vector_norm().
power_norm (bool) – Whether to return \(|x-y|_p^p\), cf. https://github.com/pytorch/pytorch/issues/28119

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

normalize_path(path, *other, mkdir=False, is_file=False, default=None)[source]

Normalize a path.

Parameters

path (Union[str, Path, TextIO, None]) – the path in either of the valid forms.
other (Union[str, Path]) – additional parts to join to the path
mkdir (bool) – whether to ensure that the path refers to an existing directory by creating it if necessary
is_file (bool) – whether the path is intended to be a file - only relevant for creating directories
default (Union[str, Path, TextIO, None]) – the default to use if path is None

Raises

TypeError – if path is of unsuitable type
ValueError – if path and default are both None

Return type

Path

Returns

the absolute and resolved path

normalize_string(s, *, suffix=None)[source]

Normalize a string for lookup.

Return type: str

point_to_box_distance(points, box_lows, box_highs)[source]

Compute the point to box distance function proposed by [abboud2020] in an element-wise fashion.

Parameters

points (FloatTensor) – shape: (*, d) the positions of the points being scored against boxes
box_lows (FloatTensor) – shape: (*, d) the lower corners of the boxes
box_highs (FloatTensor) – shape: (*, d) the upper corners of the boxes

Return type

FloatTensor

Returns

Element-wise distance function scores as per the definition above

Given points \(p\), box_lows \(l\), and box_highs \(h\), the following quantities are defined:

Width \(w\) is the difference between the upper and lower box bound: \(w = h - l\)
Box centers \(c\) are the mean of the box bounds: \(c = (h + l) / 2\)

Finally, the point to box distance \(dist(p,l,h)\) is defined as the following piecewise function:

\[\begin{split}dist(p,l,h) = \begin{cases} |p-c|/(w+1) & l <= p <+ h \\ |p-c|*(w+1) - 0.5*w*((w+1)-1/(w+1)) & otherwise \\ \end{cases}\end{split}\]

powersum_norm(x, p, dim, normalize)[source]

Return the power sum norm.

Return type: FloatTensor

product_normalize(x, dim=- 1)[source]

Normalize a tensor along a given dimension so that the geometric mean is 1.0.

Parameters

x (FloatTensor) – shape: s An input tensor
dim (int) – the dimension along which to normalize the tensor

Return type

FloatTensor

Returns

shape: s An output tensor where the given dimension is normalized to have a geometric mean of 1.0.

project_entity(e, e_p, r_p)[source]

Project entity relation-specific.

\[e_{\bot} = M_{re} e = (r_p e_p^T + I^{d_r \times d_e}) e = r_p e_p^T e + I^{d_r \times d_e} e = r_p (e_p^T e) + e'\]

and additionally enforces

\[\|e_{\bot}\|_2 \leq 1\]

Parameters

e (FloatTensor) – shape: (…, d_e) The entity embedding.
e_p (FloatTensor) – shape: (…, d_e) The entity projection.
r_p (FloatTensor) – shape: (…, d_r) The relation projection.

Return type

FloatTensor

Returns

shape: (…, d_r)

random_non_negative_int()[source]

Generate a random positive integer.

Return type: int

resolve_device(device=None)[source]

Resolve a torch.device given a desired device (string).

Return type: device

set_random_seed(seed)[source]

Set the random seed on numpy, torch, and python.

Parameters: seed (int) – The seed that will be used in np.random.seed(), torch.manual_seed(), and random.seed().
Return type: Tuple[None, Generator, None]
Returns: A three tuple with None, the torch generator, and None.

split_complex(x)[source]

Split a complex tensor into real and imaginary part.

Return type: Tuple[FloatTensor, FloatTensor]

split_list_in_batches_iter(input_list, batch_size)[source]

Split a list of instances in batches of size batch_size.

Return type: Iterable[List[~X]]

tensor_product(*tensors)[source]

Compute element-wise product of tensors in broadcastable shape.

Return type: FloatTensor

tensor_sum(*tensors)[source]

Compute element-wise sum of tensors in broadcastable shape.

Return type: FloatTensor

triple_tensor_to_set(tensor)[source]

Convert a tensor of triples to a set of int-tuples.

Return type: Set[Tuple[int, …]]

unpack_singletons(*xs)[source]

Unpack sequences of length one.

Parameters: xs (Tuple[~X]) – A sequence of tuples of length 1 or more
Return type: Sequence[Union[~X, Tuple[~X]]]
Returns: An unpacked sequence of sequences

>>> unpack_singletons((1,), (1, 2), (1, 2, 3))
(1, (1, 2), (1, 2, 3))

upgrade_to_sequence(x)[source]

Ensure that the input is a sequence.

Note

While strings are technically also a sequence, i.e.,

isinstance("test", typing.Sequence) is True

this may lead to unexpected behaviour when calling upgrade_to_sequence(“test”). We thus handle strings as non-sequences. To recover the other behavior, the following may be used:

upgrade_to_sequence(tuple("test"))

Parameters: x (Union[~X, Sequence[~X]]) – A literal or sequence of literals
Return type: Sequence[~X]
Returns: If a literal was given, a one element tuple with it in it. Otherwise, return the given value.

>>> upgrade_to_sequence(1)
(1,)
>>> upgrade_to_sequence((1, 2, 3))
(1, 2, 3)
>>> upgrade_to_sequence("test")
('test',)
>>> upgrade_to_sequence(tuple("test"))
('t', 'e', 's', 't')

view_complex(x)[source]

Convert a PyKEEN complex tensor representation into a torch one.

Return type: Tensor

env(file=None)[source]

Print the env or output as HTML if in Jupyter.

Param: The file to print to if not in a Jupyter setting. Defaults to sys.stdout
Returns: A IPython.display.HTML if in a Jupyter notebook setting, otherwise none.

Version information for PyKEEN.

get_git_branch()[source]

Get the PyKEEN branch, if installed from git in editable mode.

Return type: Optional[str]
Returns: Returns the name of the current branch, or None if not installed in development mode.

get_git_hash(terse=True)[source]

Get the PyKEEN git hash.

Return type: str
Returns: The git hash, equals ‘UNHASHED’ if encountered CalledProcessError, signifying that the code is not installed in development mode.

get_version(with_git_hash=False)[source]

Get the PyKEEN version string, including a git hash.

Parameters: with_git_hash (bool) – If set to True, the git hash will be appended to the version.
Return type: str
Returns: The PyKEEN version as well as the git hash, if the parameter with_git_hash was set to true.