Representation

Representation modules.

class CombinedCompGCNRepresentations(*, triples_factory, entity_representations=None, entity_representations_kwargs=None, relation_representations=None, relation_representations_kwargs=None, num_layers=1, dims=None, layer_kwargs=None)[source]

A sequence of CompGCN layers.

Initialize the combined entity and relation representation module.

Parameters

triples_factory (CoreTriplesFactory) – The triples factory containing the training triples.
entity_representations (Union[str, Representation, Type[Representation], None]) – the base entity representations
entity_representations_kwargs (Optional[Mapping[str, Any]]) – additional keyword parameters for the base entity representations
relation_representations (Union[str, Representation, Type[Representation], None]) – the base relation representations
relation_representations_kwargs (Optional[Mapping[str, Any]]) – additional keyword parameters for the base relation representations
num_layers (Optional[int]) – The number of message passing layers to use. If None, will be inferred by len(dims), i.e., requires dims to be a sequence / list.
dims (Union[int, Sequence[int], None]) – The hidden dimensions to use. If None, defaults to the embedding dimension of the base representations. If an integer, is the same for all layers. The last dimension is equal to the output dimension.
layer_kwargs (Optional[Mapping[str, Any]]) – Additional key-word based parameters passed to the individual layers; cf. CompGCNLayer.

Raises

ValueError – for several invalid combinations of arguments: 1. If the dimensions were given as an integer but no number of layers were given 2. If the dimensions were given as a ist but it does not match the number of layers that were given

forward()[source]

Compute enriched representations.

Return type: Tuple[FloatTensor, FloatTensor]

split()[source]

Return the separated representations.

Return type: Tuple[ForwardRef, ForwardRef]

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Args:

mode (bool): whether to set training mode (True) or evaluation: mode (False). Default: True.

Returns:

Module: self

class CompGCNLayer(input_dim, output_dim=None, dropout=0.0, use_bias=True, use_relation_bias=False, composition=None, attention_heads=4, attention_dropout=0.1, activation=<class 'torch.nn.modules.linear.Identity'>, activation_kwargs=None, edge_weighting=<class 'pykeen.nn.weighting.SymmetricEdgeWeighting'>)[source]

A single layer of the CompGCN model.

Initialize the module.

Parameters

input_dim (int) – The input dimension.
output_dim (Optional[int]) – The output dimension. If None, equals the input dimension.
dropout (float) – The dropout to use for forward and backward edges.
use_bias (bool) – # TODO: do we really need this? it comes before a mandatory batch norm layer Whether to use bias.
use_relation_bias (bool) – Whether to use a bias for the relation transformation.
composition (Union[str, CompositionModule, None]) – The composition function.
attention_heads (int) – Number of attention heads when using the attention weighting
attention_dropout (float) – Dropout for the attention message weighting
activation (Union[str, Module, None]) – The activation to use.
activation_kwargs (Optional[Mapping[str, Any]]) – Additional key-word based arguments passed to the activation.
edge_weighting (Union[str, Type[EdgeWeighting], None]) – A pre-instantiated EdgeWeighting, a class, or name to look up with class_resolver.

forward(x_e, x_r, edge_index, edge_type)[source]

Update entity and relation representations.

\[X_E'[e] = \frac{1}{3} \left( X_E W_s + \left( \sum_{h,r,e \in T} \alpha(h, e) \phi(X_E[h], X_R[r]) W_f \right) + \left( \sum_{e,r,t \in T} \alpha(e, t) \phi(X_E[t], X_R[r^{-1}]) W_b \right) \right)\]

Parameters

x_e (FloatTensor) – shape: (num_entities, input_dim) The entity representations.
x_r (FloatTensor) – shape: (2 * num_relations, input_dim) The relation representations (including inverse relations).
edge_index (LongTensor) – shape: (2, num_edges) The edge index, pairs of source and target entity for each triple.
edge_type (LongTensor) – shape (num_edges,) The edge type, i.e., relation ID, for each triple.

Return type

Tuple[FloatTensor, FloatTensor]

Returns

shape: (num_entities, output_dim) / (2 * num_relations, output_dim) The updated entity and relation representations.

message(x_e, x_r, edge_index, edge_type, weight)[source]

Perform message passing.

Parameters

x_e (FloatTensor) – shape: (num_entities, input_dim) The entity representations.
x_r (FloatTensor) – shape: (2 * num_relations, input_dim) The relation representations (including inverse relations).
edge_index (LongTensor) – shape: (2, num_edges) The edge index, pairs of source and target entity for each triple.
edge_type (LongTensor) – shape (num_edges,) The edge type, i.e., relation ID, for each triple.
weight (Parameter) – The transformation weight.

Return type

FloatTensor

Returns

The updated entity representations.

reset_parameters()[source]: Reset the model’s parameters.

class Embedding(max_id=None, num_embeddings=None, embedding_dim=None, shape=None, initializer=None, initializer_kwargs=None, constrainer=None, constrainer_kwargs=None, trainable=True, dtype=None, **kwargs)[source]

Trainable embeddings.

This class provides the same interface as torch.nn.Embedding and can be used throughout PyKEEN as a more fully featured drop-in replacement.

It extends it by adding additional options for normalizing, constraining, or applying dropout.

When a normalizer is selected, it is applied in every forward pass. It can be used, e.g., to ensure that the embedding vectors are of unit length. A constrainer can be used similarly, but it is applied after each parameter update (using the post_parameter_update hook), i.e., outside of the automatic gradient computation.

The optional dropout can also be used as a regularization technique. Moreover, it enables to obtain uncertainty estimates via techniques such as Monte-Carlo dropout. The following simple example shows how to obtain different scores for a single triple from an (untrained) model. These scores can be considered as samples from a distribution over the scores.

>>> from pykeen.datasets import Nations
>>> dataset = Nations()
>>> from pykeen.models import ERModel
>>> model = ERModel(
...     triples_factory=dataset.training,
...     interaction='distmult',
...     entity_representations_kwargs=dict(embedding_dim=3, dropout=0.1),
...     relation_representations_kwargs=dict(embedding_dim=3, dropout=0.1),
... )
>>> import torch
>>> batch = torch.as_tensor(data=[[0, 1, 0]]).repeat(10, 1)
>>> scores = model.score_hrt(batch)

Instantiate an embedding with extended functionality.

Parameters

max_id (Optional[int]) – >0 The number of embeddings.
num_embeddings (Optional[int]) – >0 The number of embeddings.
embedding_dim (Optional[int]) – >0 The embedding dimensionality.
shape (Union[int, Sequence[int], None]) – The shape of an individual representation.
initializer (Union[str, Callable[[FloatTensor], FloatTensor], None]) –
An optional initializer, which takes an uninitialized (num_embeddings, embedding_dim) tensor as input, and returns an initialized tensor of same shape and dtype (which may be the same, i.e. the initialization may be in-place). Can be passed as a function, or as string corresponding to a key in pykeen.nn.representation.initializers such as:
- "xavier_uniform"
- "xavier_uniform_norm"
- "xavier_normal"
- "xavier_normal_norm"
- "normal"
- "normal_norm"
- "uniform"
- "uniform_norm"
- "init_phases"
initializer_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to the initializer
constrainer (Union[str, Callable[[FloatTensor], FloatTensor], None]) –
A function which is applied to the weights after each parameter update, without tracking gradients. It may be used to enforce model constraints outside of gradient-based training. The function does not need to be in-place, but the weight tensor is modified in-place. Can be passed as a function, or as a string corresponding to a key in pykeen.nn.representation.constrainers such as:
- 'normalize'
- 'complex_normalize'
- 'clamp'
- 'clamp_norm'
constrainer_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to the constrainer
trainable (bool) – Should the wrapped embeddings be marked to require gradient. Defaults to True.
dtype (Optional[dtype]) – The datatype (otherwise uses torch.get_default_dtype() to look up)
kwargs – additional keyword-based parameters passed to Representation.__init__

property embedding_dim: int

The representation dimension.

Return type: int

property num_embeddings: int

The total number of representations (i.e. the maximum ID).

Return type: int

post_parameter_update()[source]: Apply constraints which should not be included in gradients.

reset_parameters()[source]

Reset the module’s parameters.

Return type: None

class LabelBasedTransformerRepresentation(labels, pretrained_model_name_or_path='bert-base-cased', max_length=512, **kwargs)[source]

Label-based representations using a transformer encoder.

Example Usage:

Entity representations are obtained by encoding the labels with a Transformer model. The transformer model becomes part of the KGE model, and its parameters are trained jointly.

from pykeen.datasets import get_dataset
from pykeen.nn.representation import EmbeddingSpecification, LabelBasedTransformerRepresentation
from pykeen.models import ERModel

dataset = get_dataset(dataset="nations")
entity_representations = LabelBasedTransformerRepresentation.from_triples_factory(
    triples_factory=dataset.training,
)
model = ERModel(
    interaction="ermlp",
    entity_representations=entity_representations,
    relation_representations=EmbeddingSpecification(shape=entity_representations.shape),
)

Initialize the representation.

Parameters

labels (Sequence[str]) – the labels
pretrained_model_name_or_path (str) – the name of the pretrained model, or a path, cf. AutoModel.from_pretrained
max_length (int) – >0 the maximum number of tokens to pad/trim the labels to
kwargs – additional keyword-based parameters passed to super.__init__

classmethod from_triples_factory(triples_factory, for_entities=True, **kwargs)[source]

Prepare a label-based transformer representations with labels from a triples factory.

Parameters

triples_factory (TriplesFactory) – the triples factory
for_entities (bool) – whether to create the initializer for entities (or relations)
kwargs – additional keyword-based arguments passed to LabelBasedTransformerRepresentation.__init__()

Return type

ForwardRef

Returns

A label-based transformer from the triples factory

Raises

ImportError – if the transformers library could not be imported

class LowRankRepresentation(*, max_id, shape, num_bases=3, weight_initializer=<pykeen.utils.compose object>, **kwargs)[source]

Low-rank embedding factorization.

This representation reduces the number of trainable parameters by not learning independent weights for each index, but rather having shared bases among all indices, and only learn the weights of the linear combination.

\[E[i] = \sum_k B[i, k] * W[k]\]

Initialize the representations.

Parameters

max_id (int) – the maximum ID (exclusively). Valid Ids reach from 0, …, max_id-1
shape (Union[int, Sequence[int]]) – the shape of an individual base representation.
num_bases (int) – the number of bases. More bases increase expressivity, but also increase the number of trainable parameters.
weight_initializer (Callable[[FloatTensor], FloatTensor]) – the initializer for basis weights
kwargs – additional keyword based arguments passed to pykeen.nn.representation.Embedding, which is used for the base representations.

reset_parameters()[source]

Reset the module’s parameters.

Return type: None

class Representation(max_id, shape, normalizer=None, normalizer_kwargs=None, regularizer=None, regularizer_kwargs=None, dropout=None)[source]

A base class for obtaining representations for entities/relations.

A representation module maps integer IDs to representations, which are tensors of floats.

max_id defines the upper bound of indices we are allowed to request (exclusively). For simple embeddings this is equivalent to num_embeddings, but more a more appropriate word for general non-embedding representations, where the representations could come from somewhere else, e.g. a GNN encoder.

shape describes the shape of a single representation. In case of a vector embedding, this is just a single dimension. For others, e.g. pykeen.models.RESCAL, we have 2-d representations, and in general it can be any fixed shape.

We can look at all representations as a tensor of shape (max_id, *shape), and this is exactly the result of passing indices=None to the forward method.

We can also pass multi-dimensional indices to the forward method, in which case the indices’ shape becomes the prefix of the result shape: (*indices.shape, *self.shape).

Initialize the representation module.

Parameters

max_id (int) – The maximum ID (exclusively). Valid Ids reach from 0, …, max_id-1
shape (Union[int, Sequence[int]]) – The shape of an individual representation.
normalizer (Union[str, Callable[[FloatTensor], FloatTensor], Type[Callable[[FloatTensor], FloatTensor]], None]) – A normalization function, which is applied to the selected representations in every forward pass.
normalizer_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to the normalizer
regularizer (Union[str, Regularizer, Type[Regularizer], None]) – An output regularizer, which is applied to the selected representations in forward pass
regularizer_kwargs (Optional[Mapping[str, Any]]) – Additional keyword arguments passed to the regularizer
dropout (Optional[float]) – The optional dropout probability

property device: torch.device

Return the device.

Return type: device

dropout: Optional[nn.Dropout]: dropout

property embedding_dim: int

Return the “embedding dimension”. Kept for backward compatibility.

Return type: int

forward(indices=None)[source]

Get representations for indices.

Note

this method is implemented in subclasses. Prefer using forward_unique instead, which optimizes for duplicate indices.

Parameters: indices (Optional[LongTensor]) – shape: s The indices, or None. If None, this is interpreted as torch.arange(self.max_id) (although implemented more efficiently).
Return type: FloatTensor
Returns: shape: (*s, *self.shape) The representations.

forward_unique(indices=None)[source]

Get representations for indices.

Parameters: indices (Optional[LongTensor]) – shape: s The indices, or None. If None, this is interpreted as torch.arange(self.max_id) (although implemented more efficiently).
Return type: FloatTensor
Returns: shape: (*s, *self.shape) The representations.

max_id: int: the maximum ID (exclusively)

normalizer: Optional[Normalizer]: a normalizer for individual representations

post_parameter_update()[source]: Apply constraints which should not be included in gradients.

regularizer: Optional[Regularizer]: a regularizer for individual representations

reset_parameters()[source]

Reset the module’s parameters.

Return type: None

shape: Tuple[int, ...]: the shape of an individual representation

class SingleCompGCNRepresentation(combined, position=0, **kwargs)[source]

A wrapper around the combined representation module.

Initialize the module.

Parameters

combined (CombinedCompGCNRepresentations) – The combined representations.
position (int) – The position, either 0 for entities, or 1 for relations.
kwargs – additional keyword-based parameters passed to super.__init__

Raises

ValueError – If an invalid value is given for the position

class SubsetRepresentation(max_id, base, base_kwargs=None, **kwargs)[source]

A representation module, which only exposes a subset of representations of its base.

Initialize the representations.

Parameters

max_id (int) – the maximum number of relations.
base (Union[str, Representation, Type[Representation], None]) – the base representations. have to have a sufficient number of representations, i.e., at least max_id.
base_kwargs (Optional[Mapping[str, Any]]) – additional keyword arguments for the base representation
kwargs – additional keyword-based parameters passed to super.__init__

Raises

ValueError – if max_id is larger than the base representation’s mad_id