.. _interactions:

Interaction Functions
=====================

In PyKEEN, an *interaction function* refers to a function that maps *representations* for head entities, relations, and
tail entities to a scalar plausibility score. In the simplest case, head entities, relations, and tail entities are each
represented by a single tensor. However, there are also interaction functions that use multiple tensors, e.g.
:class:`~pykeen.nn.modules.NTNInteraction`.

Interaction functions can also have trainable parameters that are global and not related to a single entity or relation.
An example is :class:`~pykeen.nn.modules.TuckERInteraction` with its core tensor. We call such functions stateful and
all others stateless.

Base
----

:class:`~pykeen.nn.modules.Interaction` is the base class for all interactions. It defines the API for (broadcastable,
batch) calculation of plausibility scores, see :meth:`~pykeen.nn.modules.Interaction.forward`. It also provides some
meta information about required symbolic shapes of different arguments.

Combinations & Adapters
-----------------------

The :class:`~pykeen.nn.modules.DirectionAverageInteraction` calculates a plausibility by averaging the plausibility
scores of a base function over the forward and backward representations. It can be seen as a generalization of
:class:`~pykeen.nn.modules.SimplEInteraction`.

The :class:`~pykeen.nn.modules.MonotonicAffineTransformationInteraction` adds trainable scalar scale and bias terms to
an existing interaction function. The scale parameter is parametrized to take only positive values, preserving the
interpretation of larger values corresponding to more plausible triples. This adapter is particularly useful for base
interactions with a restricted range of values, such as norm-based interactions, and loss functions with absolute
decision thresholds, such as point-wise losses, e.g., :class:`~pykeen.losses.BCEWithLogitsLoss`.

The :class:`~pykeen.nn.modules.ClampedInteraction` constrains the scores to a given range of values. While this ensures
that scores cannot exceed the bounds, using :func:`torch.clamp()` also means that no gradients are propagated for inputs
with out-of-bounds scores. It can also lead to tied scores during evaluation, which can cause problems with some
variants of the score functions, see :ref:`understanding-evaluation`.

Norm-Based Interactions
-----------------------

Norm-based interactions can be generally written as

.. math::

    -\|g(\mathbf{h}, \mathbf{r}, \mathbf{t})\|

for some (vector) norm $\|\cdot\|$ and inner function $g$. Sometimes, the $p$-th power of a $p$ norm is used instead.

Unstructured Model (UM)
~~~~~~~~~~~~~~~~~~~~~~~

The unstructured model (UM) interaction, :class:`~pykeen.nn.modules.UMInteraction`, uses the distance between head and
tail representation $\mathbf{h}, \mathbf{t} \in \mathbb{R}^d$ as inner function:

.. math::

    \mathbf{h}  - \mathbf{t}

Structure Embedding
~~~~~~~~~~~~~~~~~~~

:class:`~pykeen.nn.modules.SEInteraction` can be seen as an extension of UM, where head and relation representation
$\mathbf{h}, \mathbf{t} \in \mathbb{R}^d$ are first linearly transformed using a relation-specific head and tail
transformation matrices $\mathbf{R}_h, \mathbf{R}_t \in \mathbb{R}^{k \times d}$

.. math::

    \mathbf{R}_{h} \mathbf{h}  - \mathbf{R}_t \mathbf{t}

TransE
~~~~~~

:class:`~pykeen.nn.modules.TransEInteraction` interprets the relation representation as translation vector and defines

.. math::

    \mathbf{h} + \mathbf{r} - \mathbf{t}

for $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$

TransR
~~~~~~

:class:`~pykeen.nn.modules.TransRInteraction` uses a relation-specific projection matrix $\mathbf{R} \in \mathbb{R}^{k
\times d}$ to project $\mathbf{h}, \mathbf{t} \in \mathbb{R}^{d}$ into the relation subspace, and then applies a
:class:`~pykeen.nn.modules.TransEInteraction`-style translation by $\mathbf{r} \in \mathbb{R}^{k}$:

.. math::

    c(\mathbf{R}\mathbf{h}) + \mathbf{r} - c(\mathbf{R}\mathbf{t})

$c$ refers to an additional norm-clamping function.

TransD
~~~~~~

:class:`~pykeen.nn.modules.TransDInteraction` extends :class:`~pykeen.nn.modules.TransRInteraction` to construct
separate head and tail projections, $\mathbf{M}_{r, h}, \mathbf{M}_{r, t} \in \mathbb{R}^{k \times d}$, similar to
:class:`~pykeen.nn.modules.SEInteraction`. These projections are build (low-rank) from a shared relation-specific part
$\mathbf{r}_p \in \mathbb{R}^{k}$, and an additional head/tail representation, $\mathbf{h}_p, \mathbf{t}_p \in
\mathbb{R}^{d}$. The matrices project the base head and tail representations $\mathbf{h}_v, \mathbf{t}_v \in
\mathbb{R}^{d}$ into a relation-specific sub-space before a translation $\mathbf{r}_v \in \mathbb{R}^{k}$ is applied.

.. math::

    c(\mathbf{M}_{r, h} \mathbf{h}_v) + \mathbf{r}_v - c(\mathbf{M}_{r, t} \mathbf{t}_v)

where

.. math::

    \mathbf{M}_{r, h} &=& \mathbf{r}_p \mathbf{h}_p^{T} + \tilde{\mathbf{I}} \\
    \mathbf{M}_{r, t} &=& \mathbf{r}_p \mathbf{t}_p^{T} + \tilde{\mathbf{I}}

$c$ refers to an additional norm-clamping function.

TransH
~~~~~~

:class:`~pykeen.nn.modules.TransHInteraction` projects head and tail representations $\mathbf{h}, \mathbf{t} \in
\mathbb{R}^{d}$ to a relation-specific hyper-plane defined by $\mathbf{r}_{w} \in \mathbf{R}^d$, before applying the
relation-specific translation $\mathbf{r}_{d} \in \mathbb{R}^d$.

.. math::

    \mathbf{h}_{r} + \mathbf{r}_d - \mathbf{t}_{r}

where

.. math::

    \mathbf{h}_{r} &=& \mathbf{h} - \mathbf{r}_{w}^T \mathbf{h} \mathbf{r}_w \\
    \mathbf{t}_{r} &=& \mathbf{t} - \mathbf{r}_{w}^T \mathbf{t} \mathbf{r}_w

PairRE
~~~~~~

:class:`~pykeen.nn.modules.PairREInteraction` modulates the head and tail representations $\mathbf{h}, \mathbf{t} \in
\mathbb{R}^{d}$ by elementwise multiplication by relation-specific $\mathbf{r}_h, \mathbf{r}_t \in \mathbb{R}^{d}$,
before taking their difference

.. math::

    \mathbf{h} \odot \mathbf{r}_h - \mathbf{t} \odot \mathbf{r}_t

LineaRE
~~~~~~~

:class:`~pykeen.nn.modules.LineaREInteraction` adds an additional relation-specific translation $\mathbf{r} \in
\mathbb{R}^d$ to :class:`~pykeen.nn.modules.PairREInteraction`.

.. math::

    \mathbf{h} \odot \mathbf{r}_h - \mathbf{t} \odot \mathbf{r}_t + \mathbf{r}

TripleRE
~~~~~~~~

:class:`~pykeen.nn.modules.TripleREInteraction` adds an additional global scalar term $u \in \mathbb{r}$ to the
modulation vectors :class:`~pykeen.nn.modules.LineaREInteraction`.

.. math::

    \mathbf{h} \odot (\mathbf{r}_h + u) - \mathbf{t} \odot (\mathbf{r}_t + u) + \mathbf{r}

RotatE
~~~~~~

:class:`~pykeen.nn.modules.RotatEInteraction` uses

.. math::

    \mathbf{h} \odot \mathbf{r} - \mathbf{t}

with complex representations $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{C}^d$. When $\mathbf{r}$ is element-wise
normalized to unit length, this operation corresponds to dimension-wise rotation in the complex plane.

.. todo::

    - :class:`~pykeen.nn.modules.BoxEInteraction`
    - has some extra projections
    - :class:`~pykeen.nn.modules.MuREInteraction`
    - has some extra head/tail biases
    - :class:`~pykeen.nn.modules.TorusEInteraction`

Semantic Matching / Factorization
---------------------------------

All *semantic matching* or *factorization-based* interactions can be expressed as

.. math::

    \sum \mathbf{Z}_{i, j, k} \mathbf{h}_i \mathbf{r}_j \mathbf{t}_k

for suitable tensor $\mathbf{Z} \in \mathbb{R}^{d_h \times d_r \times d_t}$, and potentially re-shaped head entity,
relation, and tail entity representations $\mathbf{h} \in \mathbb{R}^{d_h}, \mathbf{r} \in \mathbb{R}^{d_r}, \mathbf{t}
\in \mathbb{R}^{d_t}$. Many of the interactions have a regular structured choice for $\mathbf{Z}$ which permits
efficient calculation. We will use the simplified formulae where possible.

DistMult
~~~~~~~~

The :class:`~pykeen.nn.modules.DistMultInteraction` uses the sum of products along each dimension

.. math::

    \sum_i \mathbf{h}_i \mathbf{r}_i \mathbf{t}_i

for $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$.

Canonical Tensor Decomposition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`~pykeen.nn.modules.CPInteraction` is equivalent to :class:`~pykeen.nn.modules.DistMultInteraction`, except that
it uses different sources for head and tail representations, while :class:`~pykeen.nn.modules.DistMultInteraction` uses
one shared entity embedding matrix.

.. math::

    \sum_{i, j} \mathbf{h}_{i, j} \mathbf{r}_{i, j} \mathbf{t}_{i, j}

SimplE
~~~~~~

:class:`~pykeen.nn.modules.SimplEInteraction` defines the interaction as

.. math::

    \frac{1}{2} \left(
        \langle \mathbf{h}_h, \mathbf{r}_{\rightarrow}, \mathbf{t}_t \rangle
        + \langle \mathbf{t}_h, \mathbf{r}_{\leftarrow}, \mathbf{h}_t \rangle
    \right)

for $\mathbf{h}_h, \mathbf{h}_t, \mathbf{r}_{\rightarrow}, \mathbf{r}_{\leftarrow}, \mathbf{t}_{h}, \mathbf{t}_{t} \in
\mathbb{R}^{d}$. In contrast to :class:`~pykeen.nn.modules.CPInteraction`, :class:`~pykeen.nn.modules.SimplEInteraction`
introduces separate weights for each relation $\mathbf{r}_{\rightarrow}$ and $\mathbf{r}_{\leftarrow}$ for the inverse
relation.

RESCAL
~~~~~~

:class:`~pykeen.nn.modules.RESCALInteraction` operates on $\mathbf{h}, \mathbf{t} \in \mathbb{R}^d$ and $\mathbf{R} \in
\mathbb{R}^{d \times d}$ by

.. math::

    \sum_{i, j} \mathbf{h}_{i} \mathbf{R}_{i,j} \mathbf{t}_{j}

Tucker Decomposition
~~~~~~~~~~~~~~~~~~~~

:class:`~pykeen.nn.modules.TuckERInteraction` / :class:`~pykeen.nn.modules.MultiLinearTuckerInteraction` are stateful
interaction functions which make $\mathbf{Z}$ a trainable global parameter and set $d_h = d_t$.

.. math::

    \sum \mathbf{Z}_{i, j, k} \mathbf{h}_i \mathbf{r}_j \mathbf{t}_k

.. warning::

    Both additionally add batch normalization and dropout layers, which technically makes them neural models. However,
    the intuition behind the interaction is still similar to semantic matching based models, which is why we list them
    here.

DistMA
~~~~~~

:class:`~pykeen.nn.modules.DistMAInteraction` uses the sum of pairwise scalar products between $\mathbf{h}, \mathbf{r},
\mathbf{t} \in \mathbb{R}^{d}$:

.. math::

    \langle \mathbf{h}, \mathbf{r} \rangle
    + \langle \mathbf{r}, \mathbf{t} \rangle
    + \langle \mathbf{t}, \mathbf{h} \rangle

TransF
~~~~~~

:class:`~pykeen.nn.modules.TransFInteraction` defines the interaction between $\mathbf{h}, \mathbf{r}, \mathbf{t} \in
\mathbb{R}^{d}$ as:

.. math::

    2 \cdot \langle \mathbf{h}, \mathbf{t} \rangle
    + \langle \mathbf{r}, \mathbf{t} \rangle
    - \langle \mathbf{h}, \mathbf{r} \rangle

ComplEx
~~~~~~~

:class:`~pykeen.nn.modules.ComplExInteraction` extends :class:`~pykeen.nn.modules.DistMultInteraction` to use complex
numbers instead, i.e., operate on $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbf{C}^{d}$, and defines

.. math::

    \textit{Re}\left(
        \sum_i \mathbf{h}_i \mathbf{r}_i \bar{\mathbf{t}}_i
    \right)

where *Re* refers to the real part, and $\bar{\cdot}$ denotes the complex conjugate.

QuatE
~~~~~

:class:`~pykeen.nn.modules.QuatEInteraction` uses

.. math::

    \langle
        \mathbf{h} \otimes \mathbf{r},
        \mathbf{t}
    \rangle

for quaternions $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbf{H}^{d}$, and Hamilton product $\otimes$.

HolE
~~~~

:class:`~pykeen.nn.modules.HolEInteraction` is given by

.. math::

    \langle \mathbf{r}, \mathbf{h} \star \mathbf{t}\rangle

where $\star: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^d$ denotes the circular correlation:

.. math::

    [\mathbf{a} \star \mathbf{b}]_i = \sum_{k=0}^{d-1} \mathbf{a}_{k} * \mathbf{b}_{(i+k) \mod d}

AutoSF
~~~~~~

:class:`~pykeen.nn.modules.AutoSFInteraction` is an attempt to parametrize *block-based* semantic matching interaction
functions to enable automated search across those. Its interaction is given as

.. math::

    \sum_{(i_h, i_r, i_t, s) \in \mathcal{C}} s \cdot \langle h[i_h], r[i_r], t[i_t] \rangle

where $\mathcal{C}$ defines the block interactions, and $h, r, t$ are lists of blocks.

Neural Interactions
-------------------

All other interaction functions are usually called *neural*. They share that they usually have a multi-layer
architecture (usually two) and employ non-linearities. Many of them also introduce customized hidden layers such as
interpreting concatenated embedding vectors as image, pairs of embedding vectors as normal distributions, or semantic
matching inspired sums of linear products.

Moreover, some choose a form that can be decomposed into

.. math::

    f(\mathbf{h}, \mathbf{r}, \mathbf{t}) = f_o(f_i(\mathbf{h}, \mathbf{r}), \mathbf{t})

with an expensive $f_i$ and a cheap $f_o$. Such form allows efficient scoring of many tails for a given head-relation
combination, and can becombined with inverse relation modelling for an overall efficient training and inference
architecture.

ConvE
~~~~~

:class:`~pykeen.nn.modules.ConvEInteraction` uses an interaction of the form

.. math::

    \langle g(\mathbf{h}, \mathbf{r}), \mathbf{t} \rangle + t_b

for $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$ are the head entity, relation, and tail entity representation,
and $t_b \in \mathbb{R}$ is an entity bias. $g$ is a CNN-based encoder, which first operates on a 2D-reshaped "image"
and then flattens the output for a second linear layer. Dropout and batch normalization is utilized, too.

ConvKB
~~~~~~

:class:`~pykeen.nn.modules.ConvKBInteraction` concatenates $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$ to a $3
\times d$ "image" and applies a $3 \times 1$ convolution. The output is flattened and a linear layer predicts the score.

CrossE
~~~~~~

:class:`~pykeen.nn.modules.CrossEInteraction` uses

.. math::

    \langle g(\mathbf{h}, \mathbf{r}), \mathbf{t} \rangle

where

.. math::

    g(\mathbf{h}, \mathbf{r}) = \sigma(
        \mathbf{c}_r \odot \mathbf{h}
        + \mathbf{c}_r \odot \mathbf{h} \odot \mathbf{r}
        + \mathbf{b}
    )

with an activation function $\sigma$ and $\odot$ denoting the element-wise product. Moreover, dropout is applied to the
output of $g$.

ERMLP
~~~~~

:class:`~pykeen.nn.modules.ERMLPInteraction` uses a simple 2-layer MLP on the concatenated head, relation, and tail
representations $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$.

ERMLP (E)
~~~~~~~~~

:class:`~pykeen.nn.modules.ERMLPEInteraction` adjusts :class:`~pykeen.nn.modules.ERMLPInteraction` for a more efficient
training and inference architecture by using

.. math::

    \langle g(\mathbf{h}, \mathbf{r}), \mathbf{t} \rangle

where $g$ is a 2-layer MLP.

KG2E
~~~~

:class:`~pykeen.nn.modules.KG2EInteraction` interprets pairs of vectors $\mathbf{h}_{\mu}, \mathbf{h}_{\Sigma},
\mathbf{r}_{\mu}, \mathbf{r}_{\Sigma}, \mathbf{t}_{\mu}, \mathbf{t}_{\Sigma} \in \mathbb{R}^d$ as normal distributions
$\mathcal{N}_h, \mathcal{N}_r, \mathcal{N}_t$ and determines a similarity between $\mathcal{N}_h - \mathcal{N}_t$ and
$\mathcal{N}_r$.

.. todo:: This does not really fit well into the neural category.

Neural Tensor Network (NTN)
~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`~pykeen.nn.modules.NTNInteraction` defines the interaction function as

.. math::

    \left \langle
        \mathbf{r}_{u},
        \sigma(
            \mathbf{h} \mathbf{R}_{3} \mathbf{t}
            + \mathbf{R}_{2} [\mathbf{h};\mathbf{t}]
            + \mathbf{r}_1
        )
    \right \rangle

where $\mathbf{h}, \mathbf{t} \in \mathbf{R}^d$ are head and tail entity representations, and $\mathbf{r}_1,
\mathbf{r}_u \in \mathbb{R}^d, \mathbf{R}_2 \in \mathbb{R}^{k \times 2d}, \mathbf{R}_3 \in \mathbf{R}^{d \times d \times
k}$ are relation-specific parameters, and $\sigma$ is an activation.

ProjE
~~~~~

:class:`~pykeen.nn.modules.ProjEInteraction` uses

.. math::

    \sigma_1(
        \left \langle
        \sigma_2(
            \mathbf{d}_h \odot \mathbf{h}
            + \mathbf{d}_r \odot \mathbf{r}
            + \mathbf{b}
        ),
        \mathbf{t}
        \right \rangle
        + b_p
    )

where $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$ are the head entity, relation, and tail entity
representations, $\mathbf{d}_h, \mathbf{d}_r, \mathbf{b} \in \mathbb{R}^d$ and $b_p \in \mathbb{R}$ are global
parameters, and $\sigma_1, \sigma_2$ activation functions.

Transformer
~~~~~~~~~~~

:class:`~pykeen.nn.modules.TransformerInteraction` uses

.. math::

    \langle g([\mathbf{h}; \mathbf{r}]), \mathbf{t} \rangle

with $\mathbf{h}, \mathbf{r}, \mathbf{t} \in \mathbb{R}^d$ and $g$ denoting a transformer encoder with learnable
absolute positional embedding followed by sum pooling and a linear projection.