First Steps¶
The easiest way to train and evaluate a model is with the pykeen.pipeline.pipeline()
function.
It provides a high-level entry point into the extensible functionality of this package.
Training a Model¶
The following example shows how to train and evaluate the pykeen.models.TransE
model
on the pykeen.dataset.Nations
dataset. Throughout the documentation, you’ll notice
that each asset has a corresponding class in PyKEEN. You can follow the links to learn more
about each and see the reference on how to use them specifically. Don’t worry, in this part of
the tutorial, the pykeen.pipeline.pipeline()
function will take care of everything for you.
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... )
>>> pipeline_result.save_to_directory('nations_transe')
The results are returned in a pykeen.pipeline.PipelineResult
instance, which has
attributes for the trained model, the training loop, and the evaluation.
In this example, the model was given as a string. A list of available models can be found in
pykeen.models
. Alternatively, the class corresponding to the implementation of the model
could be used as in:
>>> from pykeen.pipeline import pipeline
>>> from pykeen.models import TransE
>>> pipeline_result = pipeline(
... dataset='Nations',
... model=TransE,
... )
>>> pipeline_result.save_to_directory('nations_transe')
In this example, the dataset was given as a string. A list of available datasets can be found in
pykeen.datasets
. Alternatively, the instance of the pykeen.datasets.Dataset
could be
used as in:
>>> from pykeen.pipeline import pipeline
>>> from pykeen.models import TransE
>>> from pykeen.datasets import Nations
>>> pipeline_result = pipeline(
... dataset=Nations,
... model=TransE,
... )
>>> pipeline_result.save_to_directory('nations_transe')
In each of the previous three examples, the training approach, optimizer, and evaluation scheme were omitted. By default, the stochastic local closed world assumption (sLCWA) training approach is used in training. This can be explicitly given as a string:
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... training_loop='sLCWA',
... )
>>> pipeline_result.save_to_directory('nations_transe')
Alternatively, the local closed world assumption (LCWA) training approach can be given with 'LCWA'
.
No additional configuration is necessary, but it’s worth reading up on the differences between these training
approaches.
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... training_loop='LCWA',
... )
>>> pipeline_result.save_to_directory('nations_transe')
One of these differences is that the sLCWA relies on negative sampling. The type of negative sampling can be given as in:
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... training_loop='sLCWA',
... negative_sampler='basic',
... )
>>> pipeline_result.save_to_directory('nations_transe')
In this example, the negative sampler was given as a string. A list of available negative samplers
can be found in pykeen.sampling
. Alternatively, the class corresponding to the implementation
of the negative sampler could be used as in:
>>> from pykeen.pipeline import pipeline
>>> from pykeen.sampling import BasicNegativeSampler
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... training_loop='sLCWA',
... negative_sampler=BasicNegativeSampler,
... )
>>> pipeline_result.save_to_directory('nations_transe')
Warning
The negative_sampler
keyword argument should not be used if the LCWA is being used.
In general, all other options are available under either training approach.
The type of evaluation perfomed can be specified with the evaluator
keyword. By default,
rank-based evaluation is used. It can be given explictly as in:
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... evaluator='RankBasedEvaluator',
... )
>>> pipeline_result.save_to_directory('nations_transe')
In this example, the evaluator string. A list of available evaluators can be found in
pykeen.evaluation
. Alternatively, the class corresponding to the implementation
of the evaluator could be used as in:
>>> from pykeen.pipeline import pipeline
>>> from pykeen.evaluation import RankBasedEvaluator
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... evaluator=RankBasedEvaluator,
... )
>>> pipeline_result.save_to_directory('nations_transe')
PyKEEN implements early stopping, which can be turned on with the stopper
keyword
argument as in:
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... stopper='early',
... )
>>> pipeline_result.save_to_directory('nations_transe')
Deeper Configuration¶
Arguments for the model can be given as a dictionary using model_kwargs
.
>>> from pykeen.pipeline import pipeline
>>> pipeline_result = pipeline(
... dataset='Nations',
... model='TransE',
... model_kwargs=dict(
... scoring_fct_norm=2,
... ),
... )
>>> pipeline_result.save_to_directory('nations_transe')
The entries in model_kwargs
correspond to the arguments given to pykeen.models.TransE.__init__()
. For a
complete listing of models, see pykeen.models
, where there are links to the reference for each
model that explain what kwargs are possible. Each model’s default hyper-parameters were chosen based on the
best reported values from the paper originally publishing the model unless otherwise noted on the model’s
reference page.
Because the pipeline takes care of looking up classes and instantiating them,
there are several other parameters to pykeen.pipeline.pipeline()
that
can be used to specify the parameters during their respective instantiations.
Arguments can be given to the dataset with dataset_kwargs
. These are passed on to
the pykeen.dataset.Nations
Loading a pre-trained Model¶
Many of the previous examples ended with saving the results using the
pykeen.pipeline.PipelineResult.save_to_directory()
. One of the
artifacts written to the given directory is the trained_model.pkl
file. Because all PyKEEN models inherit from torch.nn.Module
,
we use the PyTorch mechanisms for saving and loading them. This means
that you can use torch.load()
to load a model like:
import torch
my_pykeen_model = torch.load('trained_model.pkl')
More information on PyTorch’s model persistence can be found at: https://pytorch.org/tutorials/beginner/saving_loading_models.html.
Beyond the Pipeline¶
While the pipeline provides a high-level interface, each aspect of the training process is encapsulated in classes that can be more finely tuned or subclassed. Below is an example of code that might have been executed with one of the previous examples.
# Get a training dataset
from pykeen.datasets import Nations
dataset = Nations()
training_triples_factory = dataset.training
# Pick a model
from pykeen.models import TransE
model = TransE(triples_factory=training_triples_factory)
# Pick an optimizer from Torch
from torch.optim import Adam
optimizer = Adam(params=model.get_grad_params())
# Pick a training approach (sLCWA or LCWA)
from pykeen.training import SLCWATrainingLoop
training_loop = SLCWATrainingLoop(model=model, optimizer=optimizer)
# Train like Cristiano Ronaldo
training_loop.train(num_epochs=5, batch_size=256)
# Pick an evaluator
from pykeen.evaluation import RankBasedEvaluator
evaluator = RankBasedEvaluator()
# Get triples to test
mapped_triples = dataset.testing.mapped_triples
# Evaluate
results = evaluator.evaluate(model, mapped_triples, batch_size=1024)
print(results)