Interaction Functions
In PyKEEN, an interaction function refers to a function that maps representations for head entities, relations, and
tail entities to a scalar plausibility score. In the simplest case, head entities, relations, and tail entities are each
represented by a single tensor. However, there are also interaction functions that use multiple tensors, e.g.
NTNInteraction
.
Interaction functions can also have trainable parameters that are global and not related to a single entity or relation.
An example is TuckERInteraction
with its core tensor. We call such functions stateful and
all others stateless.
Base
Interaction
is the base class for all interactions. It defines the API for (broadcastable,
batch) calculation of plausibility scores, see forward()
. It also provides some
meta information about required symbolic shapes of different arguments.
Combinations & Adapters
The DirectionAverageInteraction
calculates a plausibility by averaging the plausibility
scores of a base function over the forward and backward representations. It can be seen as a generalization of
SimplEInteraction
.
The MonotonicAffineTransformationInteraction
adds trainable scalar scale and bias terms to
an existing interaction function. The scale parameter is parametrized to take only positive values, preserving the
interpretation of larger values corresponding to more plausible triples. This adapter is particularly useful for base
interactions with a restricted range of values, such as norm-based interactions, and loss functions with absolute
decision thresholds, such as point-wise losses, e.g., BCEWithLogitsLoss
.
The ClampedInteraction
constrains the scores to a given range of values. While this ensures
that scores cannot exceed the bounds, using torch.clamp()
also means that no gradients are propagated for inputs
with out-of-bounds scores. It can also lead to tied scores during evaluation, which can cause problems with some
variants of the score functions, see Understanding the Evaluation.
Norm-Based Interactions
Norm-based interactions can be generally written as
for some (vector) norm \(\|\cdot\|\) and inner function \(g\). Sometimes, the \(p\)-th power of a \(p\) norm is used instead.
Unstructured Model (UM)
The unstructured model (UM) interaction, UMInteraction
, uses the distance between head and
tail representation \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^d\) as inner function:
Structure Embedding
SEInteraction
can be seen as an extension of UM, where head and relation representation
\(\mathbf{h}, \mathbf{t} \in \mathbb{R}^d\) are first linearly transformed using a relation-specific head and tail
transformation matrices \(\mathbf{R}_h, \mathbf{R}_t \in \mathbb{R}^{k \times d}\)
TransE
TransEInteraction
interprets the relation representation as translation vector and defines
for \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\)
TransR
TransRInteraction
uses a relation-specific projection matrix \(\mathbf{R} \in \mathbb{R}^{k
\times d}\) to project \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^{d}\) into the relation subspace, and then applies a
TransEInteraction
-style translation by \(\mathbf{r} \in \mathbb{R}^{k}\):
\(c\) refers to an additional norm-clamping function.
TransD
TransDInteraction
extends TransRInteraction
to construct
separate head and tail projections, \(\mathbf{M}_{r, h}, \mathbf{M}_{r, t} \in \mathbb{R}^{k \times d}\), similar to
SEInteraction
. These projections are build (low-rank) from a shared relation-specific part
\(\mathbf{r}_p \in \mathbb{R}^{k}\), and an additional head/tail representation, \(\mathbf{h}_p, \mathbf{t}_p \in
\mathbb{R}^{d}\). The matrices project the base head and tail representations \(\mathbf{h}_v, \mathbf{t}_v \in
\mathbb{R}^{d}\) into a relation-specific sub-space before a translation \(\mathbf{r}_v \in \mathbb{R}^{k}\) is applied.
where
\(c\) refers to an additional norm-clamping function.
TransH
TransHInteraction
projects head and tail representations \(\mathbf{h}, \mathbf{t} \in
\mathbb{R}^{d}\) to a relation-specific hyper-plane defined by \(\mathbf{r}_{w} \in \mathbf{R}^d\), before applying the
relation-specific translation \(\mathbf{r}_{d} \in \mathbb{R}^d\).
where
PairRE
PairREInteraction
modulates the head and tail representations \(\mathbf{h}, \mathbf{t} \in
\mathbb{R}^{d}\) by elementwise multiplication by relation-specific \(\mathbf{r}_h, \mathbf{r}_t \in \mathbb{R}^{d}\),
before taking their difference
LineaRE
LineaREInteraction
adds an additional relation-specific translation \(\mathbf{r} \in
\mathbb{R}^d\) to PairREInteraction
.
TripleRE
TripleREInteraction
adds an additional global scalar term \(u \in \mathbb{r}\) to the
modulation vectors LineaREInteraction
.
RotatE
RotatEInteraction
uses
with complex representations \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{C}^d\). When \(\mathbf{r}\) is element-wise normalized to unit length, this operation corresponds to dimension-wise rotation in the complex plane.
Todo
has some extra projections
has some extra head/tail biases
Semantic Matching / Factorization
All semantic matching or factorization-based interactions can be expressed as
for suitable tensor \(\mathbf{Z} \in \mathbb{R}^{d_h \times d_r \times d_t}\), and potentially re-shaped head entity, relation, and tail entity representations \(\mathbf{h} \in \mathbb{R}^{d_h}, \mathbf{r} \in \mathbb{R}^{d_r}, \mathbf{t} \in \mathbb{R}^{d_t}\). Many of the interactions have a regular structured choice for \(\mathbf{Z}\) which permits efficient calculation. We will use the simplified formulae where possible.
DistMult
The DistMultInteraction
uses the sum of products along each dimension
for \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\).
Canonical Tensor Decomposition
CPInteraction
is equivalent to DistMultInteraction
, except that
it uses different sources for head and tail representations, while DistMultInteraction
uses
one shared entity embedding matrix.
SimplE
SimplEInteraction
defines the interaction as
for \(\mathbf{h}_h, \mathbf{h}_t, \mathbf{r}_{\rightarrow}, \mathbf{r}_{\leftarrow}, \mathbf{t}_{h}, \mathbf{t}_{t} \in
\mathbb{R}^{d}\). In contrast to CPInteraction
, SimplEInteraction
introduces separate weights for each relation \(\mathbf{r}_{\rightarrow}\) and \(\mathbf{r}_{\leftarrow}\) for the inverse
relation.
RESCAL
RESCALInteraction
operates on \(\mathbf{h}, \mathbf{t} \in \mathbb{R}^d\) and \(\mathbf{R} \in
\mathbb{R}^{d \times d}\) by
Tucker Decomposition
TuckERInteraction
/ MultiLinearTuckerInteraction
are stateful
interaction functions which make \(\mathbf{Z}\) a trainable global parameter and set \(d_h = d_t\).
Warning
Both additionally add batch normalization and dropout layers, which technically makes them neural models. However, the intuition behind the interaction is still similar to semantic matching based models, which is why we list them here.
DistMA
DistMAInteraction
uses the sum of pairwise scalar products between \(\mathbf{h}, \mathbf{r},
\mathbf{t} \in \mathbb{R}^{d}\):
TransF
TransFInteraction
defines the interaction between \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in
\mathbb{R}^{d}\) as:
ComplEx
ComplExInteraction
extends DistMultInteraction
to use complex
numbers instead, i.e., operate on \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbf{C}^{d}\), and defines
where Re refers to the real part, and \(\bar{\cdot}\) denotes the complex conjugate.
QuatE
QuatEInteraction
uses
for quaternions \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbf{H}^{d}\), and Hamilton product \(\otimes\).
HolE
HolEInteraction
is given by
where \(\star: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^d\) denotes the circular correlation:
AutoSF
AutoSFInteraction
is an attempt to parametrize block-based semantic matching interaction
functions to enable automated search across those. Its interaction is given as
where \(\mathcal{C}\) defines the block interactions, and \(h, r, t\) are lists of blocks.
Neural Interactions
All other interaction functions are usually called neural. They share that they usually have a multi-layer architecture (usually two) and employ non-linearities. Many of them also introduce customized hidden layers such as interpreting concatenated embedding vectors as image, pairs of embedding vectors as normal distributions, or semantic matching inspired sums of linear products.
Moreover, some choose a form that can be decomposed into
with an expensive \(f_i\) and a cheap \(f_o\). Such form allows efficient scoring of many tails for a given head-relation combination, and can becombined with inverse relation modelling for an overall efficient training and inference architecture.
ConvE
ConvEInteraction
uses an interaction of the form
for \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) are the head entity, relation, and tail entity representation, and \(t_b \in \mathbb{R}\) is an entity bias. \(g\) is a CNN-based encoder, which first operates on a 2D-reshaped “image” and then flattens the output for a second linear layer. Dropout and batch normalization is utilized, too.
ConvKB
ConvKBInteraction
concatenates \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) to a \(3
\times d\) “image” and applies a \(3 \times 1\) convolution. The output is flattened and a linear layer predicts the score.
CrossE
CrossEInteraction
uses
where
with an activation function \(\sigma\) and \(\odot\) denoting the element-wise product. Moreover, dropout is applied to the output of \(g\).
ERMLP
ERMLPInteraction
uses a simple 2-layer MLP on the concatenated head, relation, and tail
representations \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\).
ERMLP (E)
ERMLPEInteraction
adjusts ERMLPInteraction
for a more efficient
training and inference architecture by using
where \(g\) is a 2-layer MLP.
KG2E
KG2EInteraction
interprets pairs of vectors \(\mathbf{h}_{\mu}, \mathbf{h}_{\Sigma},
\mathbf{r}_{\mu}, \mathbf{r}_{\Sigma}, \mathbf{t}_{\mu}, \mathbf{t}_{\Sigma} \in \mathbb{R}^d\) as normal distributions
\(\mathcal{N}_h, \mathcal{N}_r, \mathcal{N}_t\) and determines a similarity between \(\mathcal{N}_h - \mathcal{N}_t\) and
\(\mathcal{N}_r\).
Todo
This does not really fit well into the neural category.
Neural Tensor Network (NTN)
NTNInteraction
defines the interaction function as
where \(\mathbf{h}, \mathbf{t} \in \mathbf{R}^d\) are head and tail entity representations, and \(\mathbf{r}_1, \mathbf{r}_u \in \mathbb{R}^d, \mathbf{R}_2 \in \mathbb{R}^{k \times 2d}, \mathbf{R}_3 \in \mathbf{R}^{d \times d \times k}\) are relation-specific parameters, and \(\sigma\) is an activation.
ProjE
ProjEInteraction
uses
where \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) are the head entity, relation, and tail entity representations, \(\mathbf{d}_h, \mathbf{d}_r, \mathbf{b} \in \mathbb{R}^d\) and \(b_p \in \mathbb{R}\) are global parameters, and \(\sigma_1, \sigma_2\) activation functions.
Transformer
with \(\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d\) and \(g\) denoting a transformer encoder with learnable absolute positional embedding followed by sum pooling and a linear projection.