# NTN

class NTN(*, embedding_dim=100, num_slices=4, non_linearity=None, non_linearity_kwargs=None, entity_initializer=None, **kwargs)[source]

An implementation of NTN from [socher2013].

NTN uses a bilinear tensor layer instead of a standard linear neural network layer:

$f(h,r,t) = \textbf{u}_{r}^{T} \cdot \tanh(\textbf{h} \mathfrak{W}_{r} \textbf{t} + \textbf{V}_r [\textbf{h};\textbf{t}] + \textbf{b}_r)$

where $$\mathfrak{W}_r \in \mathbb{R}^{d \times d \times k}$$ is the relation specific tensor, and the weight matrix $$\textbf{V}_r \in \mathbb{R}^{k \times 2d}$$, and the bias vector $$\textbf{b}_r$$ and the weight vector $$\textbf{u}_r \in \mathbb{R}^k$$ are the standard parameters of a neural network, which are also relation specific. The result of the tensor product $$\textbf{h} \mathfrak{W}_{r} \textbf{t}$$ is a vector $$\textbf{x} \in \mathbb{R}^k$$ where each entry $$x_i$$ is computed based on the slice $$i$$ of the tensor $$\mathfrak{W}_{r}$$: $$\textbf{x}_i = \textbf{h}\mathfrak{W}_{r}^{i} \textbf{t}$$. As indicated by the interaction model, NTN defines for each relation a separate neural network which makes the model very expressive, but at the same time computationally expensive.

Note

We split the original $$V_r$$ matrix into two parts, to separate $$V_r [h; r] = V_r^h h + V_r^t t$$. The latter is more efficient, if $$h$$ and $$t$$ are not of the same shape, e.g., since we are in a score_h() / score_t() setting.

hpo_default: ClassVar[Mapping[str, Any]] = {'embedding_dim': {'high': 256, 'low': 16, 'q': 16, 'type': <class 'int'>}, 'num_slices': {'high': 4, 'low': 2, 'type': <class 'int'>}}

The default strategy for optimizing the model’s hyper-parameters