MonotonicAffineTransformationInteraction

class MonotonicAffineTransformationInteraction(base: Interaction[HeadRepresentation, RelationRepresentation, TailRepresentation], initial_bias: float = 0.0, trainable_bias: bool = True, initial_scale: float = 1.0, trainable_scale: bool = True)[source]

Bases: Interaction[HeadRepresentation, RelationRepresentation, TailRepresentation]

An adapter of interaction functions which adds a scalar (trainable) monotonic affine transformation of the score.

\[score(h, r, t) = \alpha \cdot score'(h, r, t) + \beta\]

This adapter is useful for losses such as BCE, where there is a fixed decision threshold, or margin-based losses, where the margin is not be treated as hyper-parameter, but rather a trainable parameter. This is particularly useful, if the value range of the score function is not known in advance, and thus choosing an appropriate margin becomes difficult.

Monotonicity is required to preserve the ordering of the original scoring function, and thus ensures that more plausible triples are still more plausible after the transformation.

For example, we can add a bias to a distance-based interaction function to enable positive values:

>>> base = TransEInteraction(p=2)
>>> interaction = MonotonicAffineTransformationInteraction(base=base, trainable_bias=True, trainable_scale=False)

When combined with BCE loss, we can geometrically think about predicting a (soft) sphere at \(h + r\) with radius equal to the bias of the transformation. When we add a trainable scale, the model can control the “softness” of the decision boundary itself.

Initialize the interaction.

Parameters:

base (Interaction[HeadRepresentation, RelationRepresentation, TailRepresentation]) – The base interaction.
initial_bias (float) – The initial value for the bias.
trainable_bias (bool) – Whether the bias should be trainable.
initial_scale (float) – >0 The initial value for the scale. Must be strictly positive.
trainable_scale (bool) – Whether the scale should be trainable.

Methods Summary

`forward`(h, r, t)	Compute broadcasted triple scores given broadcasted representations for head, relation and tails.
`reset_parameters`()	Reset parameters the interaction function may have.

Methods Documentation

forward(h: HeadRepresentation, r: RelationRepresentation, t: TailRepresentation) → Tensor[source]

Compute broadcasted triple scores given broadcasted representations for head, relation and tails.

In general, each interaction function (class) expects a certain format for each of head, relation and tail representations. This format is composed of the number and the shape of the representations.

Many simple interaction functions such as TransEInteraction operate on a single representation, however there are also interactions such as TransDInteraction, which requires two representations for each slot, or PairREInteraction, which requires two relation representations, but only a single representation for head and tail entity respectively.

Each individual representation has a shape. This can be a simple \(d\)-dimensional vector, but also comprise matrices, or even high-order tensors.

This method supports the general batched calculation, i.e., each of the representations can have a preceding batch dimensions. Those batch dimensions do not necessarily need to be exactly the same, but they need to be broadcastable. A good explanation of broadcasting rules can be found in NumPy’s documentation.