NTN

class NTN(*, embedding_dim=100, num_slices=4, non_linearity=None, non_linearity_kwargs=None, entity_initializer=None, **kwargs)[source]

Bases: pykeen.models.nbase.ERModel

An implementation of NTN from [socher2013].

NTN uses a bilinear tensor layer instead of a standard linear neural network layer:

\[f(h,r,t) = \textbf{u}_{r}^{T} \cdot \tanh(\textbf{h} \mathfrak{W}_{r} \textbf{t} + \textbf{V}_r [\textbf{h};\textbf{t}] + \textbf{b}_r)\]

where \(\mathfrak{W}_r \in \mathbb{R}^{d \times d \times k}\) is the relation specific tensor, and the weight matrix \(\textbf{V}_r \in \mathbb{R}^{k \times 2d}\), and the bias vector \(\textbf{b}_r\) and the weight vector \(\textbf{u}_r \in \mathbb{R}^k\) are the standard parameters of a neural network, which are also relation specific. The result of the tensor product \(\textbf{h} \mathfrak{W}_{r} \textbf{t}\) is a vector \(\textbf{x} \in \mathbb{R}^k\) where each entry \(x_i\) is computed based on the slice \(i\) of the tensor \(\mathfrak{W}_{r}\): \(\textbf{x}_i = \textbf{h}\mathfrak{W}_{r}^{i} \textbf{t}\). As indicated by the interaction model, NTN defines for each relation a separate neural network which makes the model very expressive, but at the same time computationally expensive.

Note

We split the original \(V_r\) matrix into two parts, to separate \(V_r [h; r] = V_r^h h + V_r^t t\). The latter is more efficient, if \(h\) and \(t\) are not of the same shape, e.g., since we are in a score_h() / score_t() setting.