Training
Training loops for KGE models using multimodal information.
Throughout the following explanations of training loops, we will assume the set of entities \(\mathcal{E}\), set of relations \(\mathcal{R}\), set of possible triples \(\mathcal{T} = \mathcal{E} \times \mathcal{R} \times \mathcal{E}\). We stratify \(\mathcal{T}\) into the disjoint union of positive triples \(\mathcal{T^{+}} \subseteq \mathcal{T}\) and negative triples \(\mathcal{T^{}} \subseteq \mathcal{T}\) such that \(\mathcal{T^{+}} \cap \mathcal{T^{}} = \emptyset\) and \(\mathcal{T^{+}} \cup \mathcal{T^{}} = \mathcal{T}\).
A knowledge graph \(\mathcal{K}\) constructed under the open world assumption contains a subset of all possible positive triples such that \(\mathcal{K} \subseteq \mathcal{T^{+}}\).
Assumptions
Open World Assumption
When training under the open world assumption (OWA), all triples that are not part of the knowledge graph are considered unknown (e.g., neither positive nor negative). This leads to underfitting (i.e., overgeneralization) and is therefore usually a poor choice for training knowledge graph embedding models [nickel2016review]. PyKEEN does not implement a training loop with the OWA.
Warning
Many publications and software packages use OWA to incorrectly refer to the stochastic local closed world assumption (sLCWA). See below for an explanation.
Closed World Assumption
When training under the close world assumption (CWA), all triples that are not part of the knowledge graph are considered as negative. As most knowledge graphs are inherently incomplete, this leads to overfitting and is therefore usually a poor choice for training knowledge graph embedding models. PyKEEN does not implement a training loop with the CWA.
Local Closed World Assumption
When training under the local closed world assumption (LCWA; introduced in [dong2014]), a particular subset of triples that are not part of the knowledge graph are considered as negative.
Strategy 
Local Generator 
Global Generator 

Head 
\(\mathcal{T}_h^(r,t)=\{(h,r,t) \mid h \in \mathcal{E} \land (h,r,t) \notin \mathcal{K} \}\) 
\(\bigcup\limits_{(\_,r,t) \in \mathcal{K}} \mathcal{T}_h^(r,t)\) 
Relation 
\(\mathcal{T}_r^(h,t)=\{(h,r,t) \mid r \in \mathcal{R} \land (h,r,t) \notin \mathcal{K} \}\) 
\(\bigcup\limits_{(h,\_,t) \in \mathcal{K}} \mathcal{T}_r^(h,t)\) 
Tail 
\(\mathcal{T}_t^(h,r)=\{(h,r,t) \mid t \in \mathcal{E} \land (h,r,t) \notin \mathcal{K} \}\) 
\(\bigcup\limits_{(h,r,\_) \in \mathcal{K}} \mathcal{T}_t^(h,r)\) 
Most articles refer exclusively to the tail generation strategy when discussing LCWA. However, the relation generation strategy is a popular choice in visual relation detection domain (see [zhang2017] and [sharifzadeh2019vrd]). However, PyKEEN additionally implements head generation since PR #602.
Stochastic Local Closed World Assumption
When training under the stochastic local closed world assumption (SLCWA), a random subset of the union of the head and tail generation strategies from LCWA are considered as negative triples. There are a few benefits from doing this:
Reduce computational workload
Spare updates (i.e., only a few rows of the embedding are affected)
Ability to integrate new negative sampling strategies
There are two other major considerations when randomly sampling negative triples: the random sampling
strategy and the filtering of positive triples. A full guide on negative sampling with the SLCWA can be
found in pykeen.sampling
. The following chart from [ali2020a] demonstrates the different potential
triples considered in LCWA vs. sLCWA based on the given true triples (in red):
Classes

A training loop. 

A training loop that uses the stochastic local closed world assumption training approach. 

A training loop that is based upon the local closed world assumption (LCWA). 
An exception raised for nonfinite loss values. 
Callbacks
Training callbacks.
Training callbacks allow for arbitrary extension of the functionality of the pykeen.training.TrainingLoop
without subclassing it. Each callback instance has a loop
attribute that allows access to the parent training
loop and all of its attributes, including the model. The interaction points are similar to those of
Keras.
Examples
The following are vignettes showing how PyKEEN’s training loop can be arbitrarily extended
using callbacks. If you find that none of the hooks in the TrainingCallback
help do what you want, feel free to open an issue.
Reporting Batch Loss
It was suggested in Issue #333 that it might be useful to log all batch losses. This could be accomplished with the following:
from pykeen.training import TrainingCallback
class BatchLossReportCallback(TrainingCallback):
def on_batch(self, epoch: int, batch, batch_loss: float):
print(epoch, batch_loss)
Implementing Gradient Clipping
Gradient clipping is one technique used to avoid the exploding gradient problem. Despite it being a very simple, it has several theoretical implications.
In order to reproduce the reference experiments on RGCN performed by [schlichtkrull2018], gradient clipping must be used before each step of the optimizer. The following example shows how to implement a gradient clipping callback:
from pykeen.training import TrainingCallback
from pykeen.nn.utils import clip_grad_value_
class GradientClippingCallback(TrainingCallback):
def __init__(self, clip_value: float = 1.0):
super().__init__()
self.clip_value = clip_value
def pre_step(self, **kwargs: Any):
clip_grad_value_(self.model.parameters(), clip_value=self.clip_value)
Classes
An interface for training callbacks. 


An adapter for the 

A wrapper for calling multiple training callbacks together. 

A callback for gradient clipping before stepping the optimizer with 

A callback for gradient clipping before stepping the optimizer with 
Class Inheritance Diagram
Learning Rate Schedulers
Learning Rate Schedulers available in PyKEEN.