Interaction Functions

In PyKEEN, an interaction function refers to a function that maps representations for head entities, relations, and tail entities to a scalar plausibility score. In the simplest case, head entities, relations, and tail entities are each represented by a single tensor. However, there are also interaction functions that use multiple tensors, e.g. NTNInteraction.

Interaction functions can also have trainable parameters that are global and not related to a single entity or relation. An example is TuckERInteraction with its core tensor. We call such functions stateful and all others stateless.

Base

Interaction is the base class for all interactions. It defines the API for (broadcastable, batch) calculation of plausibility scores, see forward(). It also provides some meta information about required symbolic shapes of different arguments.

Combinations & Adapters

The DirectionAverageInteraction calculates a plausibility by averaging the plausibility scores of a base function over the forward and backward representations. It can be seen as a generalization of SimplEInteraction.

The MonotonicAffineTransformationInteraction adds trainable scalar scale and bias terms to an existing interaction function. The scale parameter is parametrized to take only positive values, preserving the interpretation of larger values corresponding to more plausible triples. This adapter is particularly useful for base interactions with a restricted range of values, such as norm-based interactions, and loss functions with absolute decision thresholds, such as point-wise losses, e.g., BCEWithLogitsLoss.

The ClampedInteraction constrains the scores to a given range of values. While this ensures that scores cannot exceed the bounds, using torch.clamp() also means that no gradients are propagated for inputs with out-of-bounds scores. It can also lead to tied scores during evaluation, which can cause problems with some variants of the score functions, see Understanding the Evaluation.

Norm-Based Interactions

Norm-based interactions can be generally written as

\[-\|g(\mathbf{h}, \mathbf{r}, \mathbf{t})\|\]

for some (vector) norm \(\|\cdot\|\) and inner function \(g\). Sometimes, the \(p\)-th power of a \(p\) norm is used instead.

Unstructured Model (UM)

The unstructured model (UM) interaction, UMInteraction, uses the distance between head and tail representation \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^d\) as inner function:

\[\mathbf{h} - \mathbf{t}\]

Structure Embedding

SEInteraction can be seen as an extension of UM, where head and relation representation \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^d\) are first linearly transformed using a relation-specific head and tail transformation matrices \(\mathbf{R}_h, \mathbf{R}_t \in \mathbb{R}^{k \times d}\)

\[\mathbf{R}_{h} \mathbf{h} - \mathbf{R}_t \mathbf{t}\]

TransE

TransEInteraction interprets the relation representation as translation vector and defines

\[\mathbf{h} + \mathbf{r} - \mathbf{t}\]

for \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\)

TransR

TransRInteraction uses a relation-specific projection matrix \(\mathbf{R} \in \mathbb{R}^{k \times d}\) to project \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^{d}\) into the relation subspace, and then applies a TransEInteraction-style translation by \(\mathbf{r} \in \mathbb{R}^{k}\):

\[c(\mathbf{R}\mathbf{h}) + \mathbf{r} - c(\mathbf{R}\mathbf{t})\]

\(c\) refers to an additional norm-clamping function.

TransD

TransDInteraction extends TransRInteraction to construct separate head and tail projections, \(\mathbf{M}_{r, h}, \mathbf{M}_{r, t} \in \mathbb{R}^{k \times d}\), similar to SEInteraction. These projections are build (low-rank) from a shared relation-specific part \(\mathbf{r}_p \in \mathbb{R}^{k}\), and an additional head/tail representation, \(\mathbf{h}_p, \mathbf{t}_p \in \mathbb{R}^{d}\). The matrices project the base head and tail representations \(\mathbf{h}_v, \mathbf{t}_v \in \mathbb{R}^{d}\) into a relation-specific sub-space before a translation \(\mathbf{r}_v \in \mathbb{R}^{k}\) is applied.

\[c(\mathbf{M}_{r, h} \mathbf{h}_v) + \mathbf{r}_v - c(\mathbf{M}_{r, t} \mathbf{t}_v)\]

where

\[\begin{split}\mathbf{M}_{r, h} &=& \mathbf{r}_p \mathbf{h}_p^{T} + \tilde{\mathbf{I}} \\ \mathbf{M}_{r, t} &=& \mathbf{r}_p \mathbf{t}_p^{T} + \tilde{\mathbf{I}}\end{split}\]

\(c\) refers to an additional norm-clamping function.

TransH

TransHInteraction projects head and tail representations \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^{d}\) to a relation-specific hyper-plane defined by \(\mathbf{r}_{w} \in \mathbf{R}^d\), before applying the relation-specific translation \(\mathbf{r}_{d} \in \mathbb{R}^d\).

\[\mathbf{h}_{r} + \mathbf{r}_d - \mathbf{t}_{r}\]

where

\[\begin{split}\mathbf{h}_{r} &=& \mathbf{h} - \mathbf{r}_{w}^T \mathbf{h} \mathbf{r}_w \\ \mathbf{t}_{r} &=& \mathbf{t} - \mathbf{r}_{w}^T \mathbf{t} \mathbf{r}_w\end{split}\]

PairRE

PairREInteraction modulates the head and tail representations \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^{d}\) by elementwise multiplication by relation-specific \(\mathbf{r}_h, \mathbf{r}_t \in \mathbb{R}^{d}\), before taking their difference

\[\mathbf{h} \odot \mathbf{r}_h - \mathbf{t} \odot \mathbf{r}_t\]

LineaRE

LineaREInteraction adds an additional relation-specific translation \(\mathbf{r} \in \mathbb{R}^d\) to PairREInteraction.

\[\mathbf{h} \odot \mathbf{r}_h - \mathbf{t} \odot \mathbf{r}_t + \mathbf{r}\]

TripleRE

TripleREInteraction adds an additional global scalar term \(u \in \mathbb{r}\) to the modulation vectors LineaREInteraction.

\[\mathbf{h} \odot (\mathbf{r}_h + u) - \mathbf{t} \odot (\mathbf{r}_t + u) + \mathbf{r}\]

RotatE

RotatEInteraction uses

\[\mathbf{h} \odot \mathbf{r} - \mathbf{t}\]

with complex representations \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{C}^d\). When \(\mathbf{r}\) is element-wise normalized to unit length, this operation corresponds to dimension-wise rotation in the complex plane.

Todo

BoxEInteraction
has some extra projections
MuREInteraction
has some extra head/tail biases
TorusEInteraction

Semantic Matching / Factorization

All semantic matching or factorization-based interactions can be expressed as

\[\sum \mathbf{Z}_{i, j, k} \mathbf{h}_i \mathbf{r}_j \mathbf{t}_k\]

for suitable tensor \(\mathbf{Z} \in \mathbb{R}^{d_h \times d_r \times d_t}\), and potentially re-shaped head entity, relation, and tail entity representations \(\mathbf{h} \in \mathbb{R}^{d_h}, \mathbf{r} \in \mathbb{R}^{d_r}, \mathbf{t} \in \mathbb{R}^{d_t}\). Many of the interactions have a regular structured choice for \(\mathbf{Z}\) which permits efficient calculation. We will use the simplified formulae where possible.

DistMult

The DistMultInteraction uses the sum of products along each dimension

\[\sum_i \mathbf{h}_i \mathbf{r}_i \mathbf{t}_i\]

for \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\).

Canonical Tensor Decomposition

CPInteraction is equivalent to DistMultInteraction, except that it uses different sources for head and tail representations, while DistMultInteraction uses one shared entity embedding matrix.

\[\sum_{i, j} \mathbf{h}_{i, j} \mathbf{r}_{i, j} \mathbf{t}_{i, j}\]

SimplE

SimplEInteraction defines the interaction as

\[\frac{1}{2} \left( \langle \mathbf{h}_h, \mathbf{r}_{\rightarrow}, \mathbf{t}_t \rangle + \langle \mathbf{t}_h, \mathbf{r}_{\leftarrow}, \mathbf{h}_t \rangle \right)\]

for \(\mathbf{h}_h, \mathbf{h}_t, \mathbf{r}_{\rightarrow}, \mathbf{r}_{\leftarrow}, \mathbf{t}_{h}, \mathbf{t}_{t} \in \mathbb{R}^{d}\). In contrast to CPInteraction, SimplEInteraction introduces separate weights for each relation \(\mathbf{r}_{\rightarrow}\) and \(\mathbf{r}_{\leftarrow}\) for the inverse relation.

RESCAL

RESCALInteraction operates on \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^d\) and \(\mathbf{R} \in \mathbb{R}^{d \times d}\) by

\[\sum_{i, j} \mathbf{h}_{i} \mathbf{R}_{i,j} \mathbf{t}_{j}\]

Tucker Decomposition

TuckERInteraction / MultiLinearTuckerInteraction are stateful interaction functions which make \(\mathbf{Z}\) a trainable global parameter and set \(d_h = d_t\).

\[\sum \mathbf{Z}_{i, j, k} \mathbf{h}_i \mathbf{r}_j \mathbf{t}_k\]

Warning

Both additionally add batch normalization and dropout layers, which technically makes them neural models. However, the intuition behind the interaction is still similar to semantic matching based models, which is why we list them here.

DistMA

DistMAInteraction uses the sum of pairwise scalar products between \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^{d}\):

\[\langle \mathbf{h}, \mathbf{r} \rangle + \langle \mathbf{r}, \mathbf{t} \rangle + \langle \mathbf{t}, \mathbf{h} \rangle\]

TransF

TransFInteraction defines the interaction between \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^{d}\) as:

\[2 \cdot \langle \mathbf{h}, \mathbf{t} \rangle + \langle \mathbf{r}, \mathbf{t} \rangle - \langle \mathbf{h}, \mathbf{r} \rangle\]

ComplEx

ComplExInteraction extends DistMultInteraction to use complex numbers instead, i.e., operate on \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbf{C}^{d}\), and defines

\[\textit{Re}\left( \sum_i \mathbf{h}_i \mathbf{r}_i \bar{\mathbf{t}}_i \right)\]

where Re refers to the real part, and \(\bar{\cdot}\) denotes the complex conjugate.

QuatE

QuatEInteraction uses

\[\langle \mathbf{h} \otimes \mathbf{r}, \mathbf{t} \rangle\]

for quaternions \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbf{H}^{d}\), and Hamilton product \(\otimes\).

HolE

HolEInteraction is given by

\[\langle \mathbf{r}, \mathbf{h} \star \mathbf{t}\rangle\]

where \(\star: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^d\) denotes the circular correlation:

\[[\mathbf{a} \star \mathbf{b}]_i = \sum_{k=0}^{d-1} \mathbf{a}_{k} * \mathbf{b}_{(i+k) \mod d}\]

AutoSF

AutoSFInteraction is an attempt to parametrize block-based semantic matching interaction functions to enable automated search across those. Its interaction is given as

\[\sum_{(i_h, i_r, i_t, s) \in \mathcal{C}} s \cdot \langle h[i_h], r[i_r], t[i_t] \rangle\]

where \(\mathcal{C}\) defines the block interactions, and \(h, r, t\) are lists of blocks.

Neural Interactions

All other interaction functions are usually called neural. They share that they usually have a multi-layer architecture (usually two) and employ non-linearities. Many of them also introduce customized hidden layers such as interpreting concatenated embedding vectors as image, pairs of embedding vectors as normal distributions, or semantic matching inspired sums of linear products.

Moreover, some choose a form that can be decomposed into

\[f(\mathbf{h}, \mathbf{r}, \mathbf{t}) = f_o(f_i(\mathbf{h}, \mathbf{r}), \mathbf{t})\]

with an expensive \(f_i\) and a cheap \(f_o\). Such form allows efficient scoring of many tails for a given head-relation combination, and can becombined with inverse relation modelling for an overall efficient training and inference architecture.

ConvE

ConvEInteraction uses an interaction of the form

\[\langle g(\mathbf{h}, \mathbf{r}), \mathbf{t} \rangle + t_b\]

for \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) are the head entity, relation, and tail entity representation, and \(t_b \in \mathbb{R}\) is an entity bias. \(g\) is a CNN-based encoder, which first operates on a 2D-reshaped “image” and then flattens the output for a second linear layer. Dropout and batch normalization is utilized, too.

ConvKB

ConvKBInteraction concatenates \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) to a \(3 \times d\) “image” and applies a \(3 \times 1\) convolution. The output is flattened and a linear layer predicts the score.

CrossE

CrossEInteraction uses

\[\langle g(\mathbf{h}, \mathbf{r}), \mathbf{t} \rangle\]

where

\[g(\mathbf{h}, \mathbf{r}) = \sigma( \mathbf{c}_r \odot \mathbf{h} + \mathbf{c}_r \odot \mathbf{h} \odot \mathbf{r} + \mathbf{b} )\]

with an activation function \(\sigma\) and \(\odot\) denoting the element-wise product. Moreover, dropout is applied to the output of \(g\).

ERMLP

ERMLPInteraction uses a simple 2-layer MLP on the concatenated head, relation, and tail representations \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\).

ERMLP (E)

ERMLPEInteraction adjusts ERMLPInteraction for a more efficient training and inference architecture by using

\[\langle g(\mathbf{h}, \mathbf{r}), \mathbf{t} \rangle\]

where \(g\) is a 2-layer MLP.

KG2E

KG2EInteraction interprets pairs of vectors \(\mathbf{h}_{\mu}, \mathbf{h}_{\Sigma}, \mathbf{r}_{\mu}, \mathbf{r}_{\Sigma}, \mathbf{t}_{\mu}, \mathbf{t}_{\Sigma} \in \mathbb{R}^d\) as normal distributions \(\mathcal{N}_h, \mathcal{N}_r, \mathcal{N}_t\) and determines a similarity between \(\mathcal{N}_h - \mathcal{N}_t\) and \(\mathcal{N}_r\).

Todo

This does not really fit well into the neural category.

Neural Tensor Network (NTN)

NTNInteraction defines the interaction function as

\[\left \langle \mathbf{r}_{u}, \sigma( \mathbf{h} \mathbf{R}_{3} \mathbf{t} + \mathbf{R}_{2} [\mathbf{h};\mathbf{t}] + \mathbf{r}_1 ) \right \rangle\]

where \(\mathbf{h}, \mathbf{t} \in \mathbf{R}^d\) are head and tail entity representations, and \(\mathbf{r}_1, \mathbf{r}_u \in \mathbb{R}^d, \mathbf{R}_2 \in \mathbb{R}^{k \times 2d}, \mathbf{R}_3 \in \mathbf{R}^{d \times d \times k}\) are relation-specific parameters, and \(\sigma\) is an activation.

ProjE

ProjEInteraction uses

\[\sigma_1( \left \langle \sigma_2( \mathbf{d}_h \odot \mathbf{h} + \mathbf{d}_r \odot \mathbf{r} + \mathbf{b} ), \mathbf{t} \right \rangle + b_p )\]

where \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) are the head entity, relation, and tail entity representations, \(\mathbf{d}_h, \mathbf{d}_r, \mathbf{b} \in \mathbb{R}^d\) and \(b_p \in \mathbb{R}\) are global parameters, and \(\sigma_1, \sigma_2\) activation functions.

Transformer

TransformerInteraction uses

\[\langle g([\mathbf{h}; \mathbf{r}]), \mathbf{t} \rangle\]

with \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) and \(g\) denoting a transformer encoder with learnable absolute positional embedding followed by sum pooling and a linear projection.