# Representation

Embedding modules.

class CombinedCompGCNRepresentations(*, triples_factory, embedding_specification, num_layers=1, dims=None, layer_kwargs=None)[source]

A sequence of CompGCN layers.

Initialize the combined entity and relation representation module.

Parameters
forward()[source]

Compute enriched representations.

Return type

Tuple[FloatTensor, FloatTensor]

split()[source]

Return the separated representations.

Return type
train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Args:
mode (bool): whether to set training mode (True) or evaluation

mode (False). Default: True.

Returns:

Module: self

class CompGCNLayer(input_dim, output_dim=None, dropout=0.0, use_bias=True, use_relation_bias=False, composition=None, activation=<class 'torch.nn.modules.linear.Identity'>, activation_kwargs=None, edge_weighting=<class 'pykeen.nn.weighting.SymmetricEdgeWeighting'>)[source]

A single layer of the CompGCN model.

Initialize the module.

Parameters
forward(x_e, x_r, edge_index, edge_type)[source]

Update entity and relation representations.

$X_E'[e] = \frac{1}{3} \left( X_E W_s + \left( \sum_{h,r,e \in T} \alpha(h, e) \phi(X_E[h], X_R[r]) W_f \right) + \left( \sum_{e,r,t \in T} \alpha(e, t) \phi(X_E[t], X_R[r^{-1}]) W_b \right) \right)$
Parameters
• x_e (FloatTensor) – shape: (num_entities, input_dim) The entity representations.

• x_r (FloatTensor) – shape: (2 * num_relations, input_dim) The relation representations (including inverse relations).

• edge_index (LongTensor) – shape: (2, num_edges) The edge index, pairs of source and target entity for each triple.

• edge_type (LongTensor) – shape (num_edges,) The edge type, i.e., relation ID, for each triple.

Return type

Tuple[FloatTensor, FloatTensor]

Returns

shape: (num_entities, output_dim) / (2 * num_relations, output_dim) The updated entity and relation representations.

message(x_e, x_r, edge_index, edge_type, weight)[source]

Perform message passing.

Parameters
• x_e (FloatTensor) – shape: (num_entities, input_dim) The entity representations.

• x_r (FloatTensor) – shape: (2 * num_relations, input_dim) The relation representations (including inverse relations).

• edge_index (LongTensor) – shape: (2, num_edges) The edge index, pairs of source and target entity for each triple.

• edge_type (LongTensor) – shape (num_edges,) The edge type, i.e., relation ID, for each triple.

• weight (Parameter) – The transformation weight.

Return type

FloatTensor

Returns

The updated entity representations.

reset_parameters()[source]

Reset the model’s parameters.

class Embedding(num_embeddings, embedding_dim=None, shape=None, initializer=None, initializer_kwargs=None, normalizer=None, normalizer_kwargs=None, constrainer=None, constrainer_kwargs=None, regularizer=None, regularizer_kwargs=None, trainable=True, dtype=None, dropout=None)[source]

Trainable embeddings.

This class provides the same interface as torch.nn.Embedding and can be used throughout PyKEEN as a more fully featured drop-in replacement.

It extends it by adding additional options for normalizing, constraining, or applying dropout.

When a normalizer is selected, it is applied in every forward pass. It can be used, e.g., to ensure that the embedding vectors are of unit length. A constrainer can be used similarly, but it is applied after each parameter update (using the post_parameter_update hook), i.e., outside of the automatic gradient computation.

The optional dropout can also be used as a regularization technique. Moreover, it enables to obtain uncertainty estimates via techniques such as Monte-Carlo dropout. The following simple example shows how to obtain different scores for a single triple from an (untrained) model. These scores can be considered as samples from a distribution over the scores.

>>> from pykeen.datasets import Nations
>>> dataset = Nations()
>>> from pykeen.nn.emb import EmbeddingSpecification
>>> spec = EmbeddingSpecification(embedding_dim=3, dropout=0.1)
>>> from pykeen.models import ERModel
>>> model = ERModel(
...     triples_factory=dataset.training,
...     interaction='distmult',
...     entity_representations=spec,
...     relation_representations=spec,
... )
>>> import torch
>>> batch = torch.as_tensor(data=[[0, 1, 0]]).repeat(10, 1)
>>> scores = model.score_hrt(batch)


Instantiate an embedding with extended functionality.

Parameters
property embedding_dim: int

The representation dimension.

Return type

int

forward(indices=None)[source]

Get representations for indices.

Parameters

indices (Optional[LongTensor]) – shape: s The indices, or None. If None, this is interpreted as torch.arange(self.max_id) (although implemented more efficiently).

Return type

FloatTensor

Returns

shape: (*s, *self.shape) The representations.

classmethod init_with_device(num_embeddings, embedding_dim, device, initializer=None, initializer_kwargs=None, normalizer=None, normalizer_kwargs=None, constrainer=None, constrainer_kwargs=None)[source]

Create an embedding object on the given device by wrapping __init__().

This method is a hotfix for not being able to pass a device during initialization of torch.nn.Embedding. Instead the weight is always initialized on CPU and has to be moved to GPU afterwards.

Return type

ForwardRef

Returns

The embedding.

property num_embeddings: int

The total number of representations (i.e. the maximum ID).

Return type

int

post_parameter_update()[source]

Apply constraints which should not be included in gradients.

reset_parameters()[source]

Reset the module’s parameters.

Return type

None

class EmbeddingSpecification(embedding_dim=None, shape=None, initializer=None, initializer_kwargs=None, normalizer=None, normalizer_kwargs=None, constrainer=None, constrainer_kwargs=None, regularizer=None, regularizer_kwargs=None, dtype=None, dropout=None)[source]

An embedding specification.

make(*, num_embeddings, device=None)[source]

Create an embedding with this specification.

Return type

Embedding

class LiteralRepresentation(numeric_literals)[source]

Literal representations.

Instantiate an embedding with extended functionality.

Parameters
• num_embeddings – >0 The number of embeddings.

• embedding_dim – >0 The embedding dimensionality.

• initializer

An optional initializer, which takes an uninitialized (num_embeddings, embedding_dim) tensor as input, and returns an initialized tensor of same shape and dtype (which may be the same, i.e. the initialization may be in-place). Can be passed as a function, or as string corresponding to a key in pykeen.nn.emb.initializers such as:

• "xavier_uniform"

• "xavier_uniform_norm"

• "xavier_normal"

• "xavier_normal_norm"

• "normal"

• "normal_norm"

• "uniform"

• "uniform_norm"

• "init_phases"

• initializer_kwargs – Additional keyword arguments passed to the initializer

• normalizer – A normalization function, which is applied in every forward pass.

• normalizer_kwargs – Additional keyword arguments passed to the normalizer

• constrainer

A function which is applied to the weights after each parameter update, without tracking gradients. It may be used to enforce model constraints outside of gradient-based training. The function does not need to be in-place, but the weight tensor is modified in-place. Can be passed as a function, or as a string corresponding to a key in pykeen.nn.emb.constrainers such as:

• 'normalize'

• 'complex_normalize'

• 'clamp'

• 'clamp_norm'

• constrainer_kwargs – Additional keyword arguments passed to the constrainer

• regularizer – A regularizer, which is applied to the selected embeddings in forward pass

• regularizer_kwargs – Additional keyword arguments passed to the regularizer

• dropout – A dropout value for the embeddings.

class RGCNRepresentations(triples_factory, embedding_specification, num_layers=2, use_bias=True, use_batch_norm=False, activation=None, activation_kwargs=None, edge_dropout=0.4, self_loop_dropout=0.2, edge_weighting=None, decomposition=None, decomposition_kwargs=None)[source]

Entity representations enriched by R-GCN.

The GCN employed by the entity encoder is adapted to include typed edges. The forward pass of the GCN is defined by:

$\textbf{e}_{i}^{l+1} = \sigma \left( \sum_{r \in \mathcal{R}}\sum_{j\in \mathcal{N}_{i}^{r}} \frac{1}{c_{i,r}} \textbf{W}_{r}^{l} \textbf{e}_{j}^{l} + \textbf{W}_{0}^{l} \textbf{e}_{i}^{l}\right)$

where $$\mathcal{N}_{i}^{r}$$ is the set of neighbors of node $$i$$ that are connected to $$i$$ by relation $$r$$, $$c_{i,r}$$ is a fixed normalization constant (but it can also be introduced as an additional parameter), and $$\textbf{W}_{r}^{l} \in \mathbb{R}^{d^{(l)} \times d^{(l)}}$$ and $$\textbf{W}_{0}^{l} \in \mathbb{R}^{d^{(l)} \times d^{(l)}}$$ are weight matrices of the l-th layer of the R-GCN.

The encoder aggregates for each node $$e_i$$ the latent representations of its neighbors and its own latent representation $$e_{i}^{l}$$ into a new latent representation $$e_{i}^{l+1}$$. In contrast to standard GCN, R-GCN defines relation specific transformations $$\textbf{W}_{r}^{l}$$ which depend on the type and direction of an edge.

Since having one matrix for each relation introduces a large number of additional parameters, the authors instead propose to use a decomposition, cf. pykeen.nn.message_passing.Decomposition.

Instantiate the R-GCN encoder.

Parameters
forward(indices=None)[source]

Enrich the entity embeddings of the decoder using R-GCN message propagation.

Return type

FloatTensor

post_parameter_update()[source]

Apply constraints which should not be included in gradients.

Return type

None

reset_parameters()[source]

Reset the module’s parameters.

class RepresentationModule(max_id, shape)[source]

A base class for obtaining representations for entities/relations.

A representation module maps integer IDs to representations, which are tensors of floats.

max_id defines the upper bound of indices we are allowed to request (exclusively). For simple embeddings this is equivalent to num_embeddings, but more a more appropriate word for general non-embedding representations, where the representations could come from somewhere else, e.g. a GNN encoder.

shape describes the shape of a single representation. In case of a vector embedding, this is just a single dimension. For others, e.g. pykeen.models.RESCAL, we have 2-d representations, and in general it can be any fixed shape.

We can look at all representations as a tensor of shape (max_id, *shape), and this is exactly the result of passing indices=None to the forward method.

We can also pass multi-dimensional indices to the forward method, in which case the indices’ shape becomes the prefix of the result shape: (*indices.shape, *self.shape).

Initialize the representation module.

Parameters
property embedding_dim: int

Return the “embedding dimension”. Kept for backward compatibility.

Return type

int

abstract forward(indices=None)[source]

Get representations for indices.

Parameters

indices (Optional[LongTensor]) – shape: s The indices, or None. If None, this is interpreted as torch.arange(self.max_id) (although implemented more efficiently).

Return type

FloatTensor

Returns

shape: (*s, *self.shape) The representations.

get_in_canonical_shape(indices=None)[source]

Get representations in canonical shape.

Parameters

indices (Optional[LongTensor]) – None, shape: (b,) or (b, n) The indices. If None, return all representations.

Return type

FloatTensor

Returns

shape: (b?, n?, d) If indices is None, b=1, n=max_id. If indices is 1-dimensional, b=indices.shape[0] and n=1. If indices is 2-dimensional, b, n = indices.shape

get_in_more_canonical_shape(dim, indices=None)[source]

Get representations in canonical shape.

The canonical shape is given as

(batch_size, d_1, d_2, d_3, *)

fulfilling the following properties:

Let i = dim. If indices is None, the return shape is (1, d_1, d_2, d_3) with d_i = num_representations, d_i = 1 else. If indices is not None, then batch_size = indices.shape[0], and d_i = 1 if indices.ndimension() = 1 else d_i = indices.shape[1]

The canonical shape is given by (batch_size, 1, *) if indices is not None, where batch_size=len(indices), or (1, num, *) if indices is None with num equal to the total number of embeddings.

Examples: >>> emb = EmbeddingSpecification(shape=(20,)).make(num_embeddings=10) >>> # Get head representations for given batch indices >>> emb.get_in_more_canonical_shape(dim=”h”, indices=torch.arange(5)).shape (5, 1, 1, 1, 20) >>> # Get head representations for given 2D batch indices, as e.g. used by fast slcwa scoring >>> emb.get_in_more_canonical_shape(dim=”h”, indices=torch.arange(6).view(2, 3)).shape (2, 3, 1, 1, 20) >>> # Get head representations for 1:n scoring >>> emb.get_in_more_canonical_shape(dim=”h”, indices=None).shape (1, 10, 1, 1, 20)

Parameters
Return type

FloatTensor

Returns

shape: (batch_size, d1, d2, d3, *self.shape)

max_id: int

the maximum ID (exclusively)

post_parameter_update()[source]

Apply constraints which should not be included in gradients.

reset_parameters()[source]

Reset the module’s parameters.

Return type

None

shape: Tuple[int, ...]

the shape of an individual representation

class SingleCompGCNRepresentation(combined, position=0)[source]

A wrapper around the combined representation module.

Initialize the module.

Parameters
forward(indices=None)[source]

Get representations for indices.

Parameters

indices (Optional[LongTensor]) – shape: s The indices, or None. If None, this is interpreted as torch.arange(self.max_id) (although implemented more efficiently).

Return type

FloatTensor

Returns

shape: (*s, *self.shape) The representations.