# Training

Training loops for KGE models using multi-modal information.

Throughout the following explanations of training loops, we will assume the set of entities $$\mathcal{E}$$, set of relations $$\mathcal{R}$$, set of possible triples $$\mathcal{T} = \mathcal{E} \times \mathcal{R} \times \mathcal{E}$$. We stratify $$\mathcal{T}$$ into the disjoint union of positive triples $$\mathcal{T^{+}} \subseteq \mathcal{T}$$ and negative triples $$\mathcal{T^{-}} \subseteq \mathcal{T}$$ such that $$\mathcal{T^{+}} \cap \mathcal{T^{-}} = \emptyset$$ and $$\mathcal{T^{+}} \cup \mathcal{T^{-}} = \mathcal{T}$$.

A knowledge graph $$\mathcal{K}$$ constructed under the open world assumption contains a subset of all possible positive triples such that $$\mathcal{K} \subseteq \mathcal{T^{+}}$$.

## Assumptions

### Open World Assumption

When training under the open world assumption (OWA), all triples that are not part of the knowledge graph are considered unknown (e.g., neither positive nor negative). This leads to under-fitting (i.e., over-generalization) and is therefore usually a poor choice for training knowledge graph embedding models [nickel2016review]. PyKEEN does not implement a training loop with the OWA.

Warning

Many publications and software packages use OWA to incorrectly refer to the stochastic local closed world assumption (sLCWA). See below for an explanation.

### Closed World Assumption

When training under the close world assumption (CWA), all triples that are not part of the knowledge graph are considered as negative. As most knowledge graphs are inherently incomplete, this leads to over-fitting and is therefore usually a poor choice for training knowledge graph embedding models. PyKEEN does not implement a training loop with the CWA.

### Local Closed World Assumption

When training under the local closed world assumption (LCWA; introduced in [dong2014]), a particular subset of triples that are not part of the knowledge graph are considered as negative.

Strategy

Local Generator

Global Generator

$$\mathcal{T}_h^-(r,t)=\{(h,r,t) \mid h \in \mathcal{E} \land (h,r,t) \notin \mathcal{K} \}$$

$$\bigcup\limits_{(\_,r,t) \in \mathcal{K}} \mathcal{T}_h^-(r,t)$$

Relation

$$\mathcal{T}_r^-(h,t)=\{(h,r,t) \mid r \in \mathcal{R} \land (h,r,t) \notin \mathcal{K} \}$$

$$\bigcup\limits_{(h,\_,t) \in \mathcal{K}} \mathcal{T}_r^-(h,t)$$

Tail

$$\mathcal{T}_t^-(h,r)=\{(h,r,t) \mid t \in \mathcal{E} \land (h,r,t) \notin \mathcal{K} \}$$

$$\bigcup\limits_{(h,r,\_) \in \mathcal{K}} \mathcal{T}_t^-(h,r)$$

Most articles refer exclusively to the tail generation strategy when discussing LCWA. However, the relation generation strategy is a popular choice in visual relation detection domain (see [zhang2017] and [sharifzadeh2019vrd]). However, PyKEEN additionally implements head generation since PR #602.

### Stochastic Local Closed World Assumption

When training under the stochastic local closed world assumption (SLCWA), a random subset of the union of the head and tail generation strategies from LCWA are considered as negative triples. There are a few benefits from doing this:

2. Spare updates (i.e., only a few rows of the embedding are affected)

3. Ability to integrate new negative sampling strategies

There are two other major considerations when randomly sampling negative triples: the random sampling strategy and the filtering of positive triples. A full guide on negative sampling with the SLCWA can be found in pykeen.sampling. The following chart from [ali2020a] demonstrates the different potential triples considered in LCWA vs. sLCWA based on the given true triples (in red):

## Classes

 TrainingLoop(model, triples_factory[, ...]) A training loop. SLCWATrainingLoop([negative_sampler, ...]) A training loop that uses the stochastic local closed world assumption training approach. LCWATrainingLoop(*[, target]) A training loop that is based upon the local closed world assumption (LCWA). SymmetricLCWATrainingLoop(model, triples_factory) A "symmetric" LCWA scoring heads and tails at once. NonFiniteLossError An exception raised for non-finite loss values.

## Callbacks

Training callbacks.

Training callbacks allow for arbitrary extension of the functionality of the pykeen.training.TrainingLoop without subclassing it. Each callback instance has a loop attribute that allows access to the parent training loop and all of its attributes, including the model. The interaction points are similar to those of Keras.

### Examples

The following are vignettes showing how PyKEEN’s training loop can be arbitrarily extended using callbacks. If you find that none of the hooks in the TrainingCallback help do what you want, feel free to open an issue.

#### Reporting Batch Loss

It was suggested in Issue #333 that it might be useful to log all batch losses. This could be accomplished with the following:

from pykeen.training import TrainingCallback

class BatchLossReportCallback(TrainingCallback):
def on_batch(self, epoch: int, batch, batch_loss: float):
print(epoch, batch_loss)


Gradient clipping is one technique used to avoid the exploding gradient problem. Despite it being a very simple, it has several theoretical implications.

In order to reproduce the reference experiments on R-GCN performed by [schlichtkrull2018], gradient clipping must be used before each step of the optimizer. The following example shows how to implement a gradient clipping callback:

from pykeen.training import TrainingCallback

def __init__(self, clip_value: float = 1.0):
super().__init__()
self.clip_value = clip_value

def pre_step(self, **kwargs: Any):

 An interface for training callbacks. StopperTrainingCallback(stopper, *, ...[, ...]) An adapter for the pykeen.stopper.Stopper. An adapter for the pykeen.trackers.ResultTracker. EvaluationLoopTrainingCallback(factory[, ...]) A callback for regular evaluation using new-style evaluation loops. EvaluationTrainingCallback(*, evaluation_triples) A callback for regular evaluation. MultiTrainingCallback([callbacks, ...]) A wrapper for calling multiple training callbacks together. A callback for gradient clipping before stepping the optimizer with torch.nn.utils.clip_grad_norm_(). GradientAbsClippingTrainingCallback(clip_value) A callback for gradient clipping before stepping the optimizer with torch.nn.utils.clip_grad_value_().