ConvE¶

class ConvE(triples_factory, input_channels=None, output_channels=32, embedding_height=None, embedding_width=None, kernel_height=3, kernel_width=3, input_dropout=0.2, output_dropout=0.3, feature_map_dropout=0.2, embedding_dim=200, automatic_memory_optimization=None, loss=None, preferred_device=None, random_seed=None, regularizer=None, apply_batch_normalization=True)[source]¶

Bases: pykeen.models.base.EntityRelationEmbeddingModel

An implementation of ConvE from [dettmers2018].

ConvE is a CNN-based approach. For each triple \((h,r,t)\), the input to ConvE is a matrix \(\mathbf{A} \in \mathbb{R}^{2 \times d}\) where the first row of \(\mathbf{A}\) represents \(\mathbf{h} \in \mathbb{R}^d\) and the second row represents \(\mathbf{r} \in \mathbb{R}^d\). \(\mathbf{A}\) is reshaped to a matrix \(\mathbf{B} \in \mathbb{R}^{m \times n}\) where the first \(m/2\) half rows represent \(\mathbf{h}\) and the remaining \(m/2\) half rows represent \(\mathbf{r}\). In the convolution layer, a set of textit{2-dimensional} convolutional filters \(\Omega = \{\omega_i \ | \ \omega_i \in \mathbb{R}^{r \times c}\}\) are applied on \(\mathbf{B}\) that capture interactions between \(\mathbf{h}\) and \(\mathbf{r}\). The resulting feature maps are reshaped and concatenated in order to create a feature vector \(\mathbf{v} \in \mathbb{R}^{|\Omega|rc}\). In the next step, \(\mathbf{v}\) is mapped into the entity space using a linear transformation \(\mathbf{W} \in \mathbb{R}^{|\Omega|rc \times d}\), that is \(\mathbf{e}_{h,r} = \mathbf{v}^{T} \mathbf{W}\). The score for the triple \((h,r,t) \in \mathbb{K}\) is then given by:

\[f(h,r,t) = \mathbf{e}_{h,r} \mathbf{t}\]

Since the interaction model can be decomposed into \(f(h,r,t) = \left\langle f'(\mathbf{h}, \mathbf{r}), \mathbf{t} \right\rangle\), the model is particularly designed to 1-N scoring, i.e. efficient computation of scores for \((h,r,t)\) for fixed \(h,r\) and many different \(t\).

See also

Official Implementation: https://github.com/TimDettmers/ConvE/blob/master/model.py

The default setting uses batch normalization. Batch normalization normalizes the output of the activation functions, in order to ensure that the weights of the NN don’t become imbalanced and to speed up training. However, batch normalization is not the only way to achieve more robust and effective training [1]. Therefore, we added the flag ‘apply_batch_normalization’ to turn batch normalization on/off (it’s turned on as default).

[1]: Santurkar, Shibani, et al. “How does batch normalization help optimization?.” Advances in Neural Information Processing Systems. 2018.

Example usage:

>>> # Step 1: Get triples
>>> from pykeen.datasets import Nations
>>> dataset = Nations(create_inverse_triples=True)
>>> # Step 2: Configure the model
>>> from pykeen.models import ConvE
>>> model = ConvE(
...     embedding_dim       = 200,
...     input_channels      = 1,
...     output_channels     = 32,
...     embedding_height    = 10,
...     embedding_width     = 20,
...     kernel_height       = 3,
...     kernel_width        = 3,
...     input_dropout       = 0.2,
...     feature_map_dropout = 0.2,
...     output_dropout      = 0.3,
...     preferred_device    = 'gpu',
... )
>>> # Step 3: Configure the loop
>>> from torch.optim import Adam
>>> optimizer = Adam(params=model.get_grad_params())
>>> from pykeen.training import LCWATrainingLoop
>>> training_loop = LCWATrainingLoop(model=model, optimizer=optimizer)
>>> # Step 4: Train
>>> losses = training_loop.train(num_epochs=5, batch_size=256)
>>> # Step 5: Evaluate the model
>>> from pykeen.evaluation import RankBasedEvaluator
>>> evaluator = RankBasedEvaluator()
>>> metric_result = evaluator.evaluate(model=model, mapped_triples=dataset.testing.mapped_triples, batch_size=8192)

Initialize the model.

Attributes Summary

`hpo_default`	The default strategy for optimizing the model’s hyper-parameters
`loss_default_kwargs`	The default parameters for the default loss function class

Methods Summary

`score_h`(rt_batch)	Forward pass using left side (head) prediction.
`score_hrt`(hrt_batch)	Forward pass.
`score_t`(hr_batch)	Forward pass using right side (tail) prediction.

Attributes Documentation

hpo_default: ClassVar[Mapping[str, Any]] = {'feature_map_dropout': {'high': 1.0, 'low': 0.0, 'type': <class 'float'>}, 'input_dropout': {'high': 1.0, 'low': 0.0, 'type': <class 'float'>}, 'output_channels': {'high': 64, 'low': 16, 'type': <class 'int'>}, 'output_dropout': {'high': 1.0, 'low': 0.0, 'type': <class 'float'>}}¶: The default strategy for optimizing the model’s hyper-parameters

loss_default_kwargs: ClassVar[Optional[Mapping[str, Any]]] = {}¶: The default parameters for the default loss function class

Methods Documentation

score_h(rt_batch)[source]¶

Forward pass using left side (head) prediction.

This method calculates the score for all possible heads for each (relation, tail) pair.

Parameters: rt_batch (LongTensor) – shape: (batch_size, 2), dtype: long The indices of (relation, tail) pairs.
Return type: FloatTensor
Returns: shape: (batch_size, num_entities), dtype: float For each r-t pair, the scores for all possible heads.

score_hrt(hrt_batch)[source]¶

Forward pass.

This method takes head, relation and tail of each triple and calculates the corresponding score.

Parameters: hrt_batch (LongTensor) – shape: (batch_size, 3), dtype: long The indices of (head, relation, tail) triples.
Raises: NotImplementedError – If the method was not implemented for this class.
Return type: FloatTensor
Returns: shape: (batch_size, 1), dtype: float The score for each triple.

score_t(hr_batch)[source]¶

Forward pass using right side (tail) prediction.

This method calculates the score for all possible tails for each (head, relation) pair.

Parameters: hr_batch (LongTensor) – shape: (batch_size, 2), dtype: long The indices of (head, relation) pairs.
Return type: FloatTensor
Returns: shape: (batch_size, num_entities), dtype: float For each h-r pair, the scores for all possible tails.