Utilities

Utilities for PyKEEN.

class Bias(dim)[source]

A module wrapper for adding a bias.

Initialize the module.

Parameters: dim (int) – >0 The dimension of the input.

forward(x)[source]

Add the learned bias to the input.

Parameters: x (FloatTensor) – shape: (n, d) The input.
Return type: FloatTensor
Returns: x + b[None, :]

reset_parameters()[source]: Reset the layer’s parameters.

class NoRandomSeedNecessary[source]: Used in pipeline when random seed is set automatically.

class Result[source]

A superclass of results that can be saved to a directory.

abstract save_to_directory(directory, **kwargs)[source]

Save the results to the directory.

Return type: None

abstract save_to_ftp(directory, ftp)[source]

Save the results to the directory in an FTP server.

Return type: None

abstract save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters

directory (str) – The directory in the S3 bucket
bucket (str) – The name of the S3 bucket
s3 – A client from boto3.client(), if already instantiated

Return type

None

all_in_bounds(x, low=None, high=None, a_tol=0.0)[source]

Check if tensor values respect lower and upper bound.

Parameters

x (Tensor) – The tensor.
low (Optional[float]) – The lower bound.
high (Optional[float]) – The upper bound.
a_tol (float) – Absolute tolerance.

Return type

bool

Returns

If all values are within the given bounds

at_least_eps(x)[source]

Make sure a tensor is greater than zero.

Return type: FloatTensor

broadcast_cat(tensors, dim)[source]

Concatenate tensors with broadcasting support.

Parameters

tensors (Sequence[FloatTensor]) – The tensors. Each of the tensors is require to have the same number of dimensions. For each dimension not equal to dim, the extent has to match the other tensors’, or be one. If it is one, the tensor is repeated to match the extent of the othe tensors.
dim (int) – The concat dimension.

Return type

FloatTensor

Returns

A concatenated, broadcasted tensor.

Raises

ValueError – if the x and y dimensions are not the same
ValueError – if broadcasting is not possible

broadcast_upgrade_to_sequences(*xs)[source]

Apply upgrade_to_sequence to each input, and afterwards repeat singletons to match the maximum length.

Parameters: xs (Union[~X, Sequence[~X]]) – length: m the inputs.
Return type: Sequence[Sequence[~X]]
Returns: a sequence of length m, where each element is a sequence and all elements have the same length.
Raises: ValueError – if there is a non-singleton sequence input with length different from the maximum sequence length.

>>> broadcast_upgrade_to_sequences(1)
((1,),)
>>> broadcast_upgrade_to_sequences(1, 2)
((1,), (2,))
>>> broadcast_upgrade_to_sequences(1, (2, 3))
((1, 1), (2, 3))

calculate_broadcasted_elementwise_result_shape(first, second)[source]

Determine the return shape of a broadcasted elementwise operation.

Return type: Tuple[int, …]

check_shapes(*x, raise_on_errors=True)[source]

Verify that a sequence of tensors are of matching shapes.

Parameters

x (Tuple[Union[Tensor, Tuple[int, …]], str]) – A tuple (t, s), where t is a tensor, or an actual shape of a tensor (a tuple of integers), and s is a string, where each character corresponds to a (named) dimension. If the shapes of different tensors share a character, the corresponding dimensions are expected to be of equal size.
raise_on_errors (bool) – Whether to raise an exception in case of a mismatch.

Return type

bool

Returns

Whether the shapes matched.

Raises

ValueError – If the shapes mismatch and raise_on_error is True.

Examples: >>> check_shapes(((10, 20), “bd”), ((10, 20, 20), “bdd”)) True >>> check_shapes(((10, 20), “bd”), ((10, 30, 20), “bdd”), raise_on_errors=False) False

clamp_norm(x, maxnorm, p='fro', dim=None)[source]

Ensure that a tensor’s norm does not exceeds some threshold.

Parameters

x (Tensor) – The vector.
maxnorm (float) – The maximum norm (>0).
p (Union[str, int]) – The norm type.
dim (Union[None, int, Iterable[int]]) – The dimension(s).

Return type

Tensor

Returns

A vector with \(|x| <= maxnorm\).

combine_complex(x_re, x_im)[source]

Combine a complex tensor from real and imaginary part.

Return type: FloatTensor

compact_mapping(mapping)[source]

Update a mapping (key -> id) such that the IDs range from 0 to len(mappings) - 1.

Parameters: mapping (Mapping[~X, int]) – The mapping to compact.
Return type: Tuple[Mapping[~X, int], Mapping[int, int]]
Returns: A pair (translated, translation) where translated is the updated mapping, and translation a dictionary from old to new ids.

complex_normalize(x)[source]

Normalize a vector of complex numbers such that each element is of unit-length.

Parameters: x (Tensor) – A tensor formulating complex numbers
Return type: Tensor
Returns: A normalized version accoring to the following definition.

The modulus of complex number is given as:

\[|a + ib| = \sqrt{a^2 + b^2}\]

\(l_2\) norm of complex vector \(x \in \mathbb{C}^d\):

\[\|x\|^2 = \sum_{i=1}^d |x_i|^2 = \sum_{i=1}^d \left(\operatorname{Re}(x_i)^2 + \operatorname{Im}(x_i)^2\right) = \left(\sum_{i=1}^d \operatorname{Re}(x_i)^2) + (\sum_{i=1}^d \operatorname{Im}(x_i)^2\right) = \|\operatorname{Re}(x)\|^2 + \|\operatorname{Im}(x)\|^2 = \| [\operatorname{Re}(x); \operatorname{Im}(x)] \|^2\]

class compose(*operations, name)[source]

A class representing the composition of several functions.

Initialize the composition with a sequence of operations.

Parameters

operations (Callable[[~X], ~X]) – unary operations that will be applied in succession
name (str) – The name of the composed function.

compute_box(base, delta, size)[source]

Compute the lower and upper corners of a resulting box.

Parameters

base (FloatTensor) – shape: (*, d) the base position (box center) of the input relation embeddings
delta (FloatTensor) – shape: (*, d) the base shape of the input relation embeddings
size (FloatTensor) – shape: (*, d) the size scalar vectors of the input relation embeddings

Return type

Tuple[FloatTensor, FloatTensor]

Returns

shape: (*, d) each lower and upper bounds of the box whose embeddings are provided as input.

convert_to_canonical_shape(x, dim, num=None, batch_size=1, suffix_shape=- 1)[source]

Convert a tensor to canonical shape.

Parameters

x (FloatTensor) – The tensor in compatible shape.
dim (Union[int, str]) – The “num” dimension.
batch_size (int) – The batch size.
num (Optional[int]) – The number.
suffix_shape (Union[int, Sequence[int]]) – The suffix shape.

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails, *) A tensor in canonical shape.

create_relation_to_entity_set_mapping(triples)[source]

Create mappings from relation IDs to the set of their head / tail entities.

Parameters: triples (Iterable[Tuple[int, int, int]]) – The triples.
Return type: Tuple[Mapping[int, Set[int]], Mapping[int, Set[int]]]
Returns: A pair of dictionaries, each mapping relation IDs to entity ID sets.

ensure_ftp_directory(*, ftp, directory)[source]

Ensure the directory exists on the FTP server.

Return type: None

ensure_torch_random_state(random_state)[source]

Prepare a random state for PyTorch.

Return type: Generator

ensure_tuple(*x)[source]

Ensure that all elements in the sequence are upgraded to sequences.

Parameters: x (Union[~X, Sequence[~X]]) – A sequence of sequences or literals
Return type: Sequence[Sequence[~X]]
Returns: An upgraded sequence of sequences

>>> ensure_tuple(1, (1,), (1, 2))
((1,), (1,), (1, 2))

estimate_cost_of_sequence(shape, *other_shapes)[source]

Cost of a sequence of broadcasted element-wise operations of tensors, given their shapes.

Return type: int

extend_batch(batch, all_ids, dim)[source]

Extend batch for 1-to-all scoring by explicit enumeration.

Parameters

batch (LongTensor) – shape: (batch_size, 2) The batch.
all_ids (List[int]) – len: num_choices The IDs to enumerate.
dim (int) – in {0,1,2} The column along which to insert the enumerated IDs.

Return type

LongTensor

Returns

shape: (batch_size * num_choices, 3) A large batch, where every pair from the original batch is combined with every ID.

extended_einsum(eq, *tensors)[source]

Drop dimensions of size 1 to allow broadcasting.

Return type: FloatTensor

fix_dataclass_init_docs(cls)[source]

Fix the __init__ documentation for a dataclasses.dataclass.

Parameters: cls (Type) – The class whose docstring needs fixing
Return type: Type
Returns: The class that was passed so this function can be used as a decorator

See also

scipy.special.logsumexp() and torch.logcumsumexp()

lp_norm(x, p, dim, normalize)[source]

Return the \(L_p\) norm.

Return type: FloatTensor

negative_norm(x, p=2, power_norm=False)[source]

Evaluate negative norm of a vector.

Parameters

x (FloatTensor) – shape: (batch_size, num_heads, num_relations, num_tails, dim) The vectors.
p (Union[str, int, float]) – The p for the norm. cf. torch.linalg.vector_norm().
power_norm (bool) – Whether to return \(|x-y|_p^p\), cf. https://github.com/pytorch/pytorch/issues/28119

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

negative_norm_of_sum(*x, p=2, power_norm=False)[source]

Evaluate negative norm of a sum of vectors on already broadcasted representations.

Parameters

x (FloatTensor) – shape: (batch_size, num_heads, num_relations, num_tails, dim) The representations.
p (Union[str, int, float]) – The p for the norm. cf. torch.linalg.vector_norm().
power_norm (bool) – Whether to return \(|x-y|_p^p\), cf. https://github.com/pytorch/pytorch/issues/28119

Return type

FloatTensor

Returns

shape: (batch_size, num_heads, num_relations, num_tails) The scores.

normalize_string(s, *, suffix=None)[source]

Normalize a string for lookup.

Return type: str

point_to_box_distance(points, box_lows, box_highs)[source]

Compute the point to box distance function proposed by [abboud2020] in an element-wise fashion.

Parameters

points (FloatTensor) – shape: (*, d) the positions of the points being scored against boxes
box_lows (FloatTensor) – shape: (*, d) the lower corners of the boxes
box_highs (FloatTensor) – shape: (*, d) the upper corners of the boxes

Return type

FloatTensor

Returns

Element-wise distance function scores as per the definition above

Given points \(p\), box_lows \(l\), and box_highs \(h\), the following quantities are defined:

Width \(w\) is the difference between the upper and lower box bound: \(w = h - l\)
Box centers \(c\) are the mean of the box bounds: \(c = (h + l) / 2\)

Finally, the point to box distance \(dist(p,l,h)\) is defined as the following piecewise function:

\[\begin{split}dist(p,l,h) = \begin{cases} |p-c|/(w+1) & l <= p <+ h \\ |p-c|*(w+1) - 0.5*w*((w+1)-1/(w+1)) & otherwise \\ \end{cases}\end{split}\]

powersum_norm(x, p, dim, normalize)[source]

Return the power sum norm.

Return type: FloatTensor

product_normalize(x, dim=- 1)[source]

Normalize a tensor along a given dimension so that the geometric mean is 1.0.

Parameters

x (FloatTensor) – shape: s An input tensor
dim (int) – the dimension along which to normalize the tensor

Return type

FloatTensor

Returns

shape: s An output tensor where the given dimension is normalized to have a geometric mean of 1.0.

project_entity(e, e_p, r_p)[source]

Project entity relation-specific.

\[e_{\bot} = M_{re} e = (r_p e_p^T + I^{d_r \times d_e}) e = r_p e_p^T e + I^{d_r \times d_e} e = r_p (e_p^T e) + e'\]

and additionally enforces

\[\|e_{\bot}\|_2 \leq 1\]

Parameters

e (FloatTensor) – shape: (…, d_e) The entity embedding.
e_p (FloatTensor) – shape: (…, d_e) The entity projection.
r_p (FloatTensor) – shape: (…, d_r) The relation projection.

Return type

FloatTensor

Returns

shape: (…, d_r)

random_non_negative_int()[source]

Generate a random positive integer.

Return type: int

resolve_device(device=None)[source]

Resolve a torch.device given a desired device (string).

Return type: device

set_random_seed(seed)[source]

Set the random seed on numpy, torch, and python.

Parameters: seed (int) – The seed that will be used in np.random.seed(), torch.manual_seed(), and random.seed().
Return type: Tuple[None, Generator, None]
Returns: A three tuple with None, the torch generator, and None.

split_complex(x)[source]

Split a complex tensor into real and imaginary part.

Return type: Tuple[FloatTensor, FloatTensor]

split_list_in_batches_iter(input_list, batch_size)[source]

Split a list of instances in batches of size batch_size.

Return type: Iterable[List[~X]]

tensor_product(*tensors)[source]

Compute element-wise product of tensors in broadcastable shape.

Return type: FloatTensor

tensor_sum(*tensors)[source]

Compute element-wise sum of tensors in broadcastable shape.

Return type: FloatTensor

triple_tensor_to_set(tensor)[source]

Convert a tensor of triples to a set of int-tuples.

Return type: Set[Tuple[int, …]]

unpack_singletons(*xs)[source]

Unpack sequences of length one.

Parameters: xs (Tuple[~X]) – A sequence of tuples of length 1 or more
Return type: Sequence[Union[~X, Tuple[~X]]]
Returns: An unpacked sequence of sequences

>>> unpack_singletons((1,), (1, 2), (1, 2, 3))
(1, (1, 2), (1, 2, 3))

upgrade_to_sequence(x)[source]

Ensure that the input is a sequence.

Note

While strings are technically also a sequence, i.e.,

isinstance("test", typing.Sequence) is True

this may lead to unexpected behaviour when calling upgrade_to_sequence(“test”). We thus handle strings as non-sequences. To recover the other behavior, the following may be used:

upgrade_to_sequence(tuple("test"))

Parameters: x (Union[~X, Sequence[~X]]) – A literal or sequence of literals
Return type: Sequence[~X]
Returns: If a literal was given, a one element tuple with it in it. Otherwise, return the given value.

>>> upgrade_to_sequence(1)
(1,)
>>> upgrade_to_sequence((1, 2, 3))
(1, 2, 3)
>>> upgrade_to_sequence("test")
('test',)
>>> upgrade_to_sequence(tuple("test"))
('t', 'e', 's', 't')

view_complex(x)[source]

Convert a PyKEEN complex tensor representation into a torch one.

Return type: Tensor

env(file=None)[source]

Print the env or output as HTML if in Jupyter.

Param: The file to print to if not in a Jupyter setting. Defaults to sys.stdout
Returns: A IPython.display.HTML if in a Jupyter notebook setting, otherwise none.

Version information for PyKEEN.

get_git_branch()[source]

Get the PyKEEN branch, if installed from git in editable mode.

Return type: Optional[str]
Returns: Returns the name of the current branch, or None if not installed in development mode.

get_git_hash(terse=True)[source]

Get the PyKEEN git hash.

Return type: str
Returns: The git hash, equals ‘UNHASHED’ if encountered CalledProcessError, signifying that the code is not installed in development mode.

get_version(with_git_hash=False)[source]

Get the PyKEEN version string, including a git hash.

Parameters: with_git_hash (bool) – If set to True, the git hash will be appended to the version.
Return type: str
Returns: The PyKEEN version as well as the git hash, if the parameter with_git_hash was set to true.