Pipeline¶

pykeen.pipeline.pipeline(*, dataset=None, dataset_kwargs=None, training_triples_factory=None, testing_triples_factory=None, validation_triples_factory=None, model, model_kwargs=None, loss=None, loss_kwargs=None, regularizer=None, regularizer_kwargs=None, optimizer=None, optimizer_kwargs=None, clear_optimizer=True, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, training_kwargs=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, mlflow_tracking_uri=None, metadata=None, device=None, random_seed=None, use_testing_data=True)[source]¶

Train and evaluate a model.

Parameters

dataset (Union[None, str, Type[DataSet]]) – The name of the dataset (a key from pykeen.datasets.datasets) or the pykeen.datasets.DataSet instance. Alternatively, the training_triples_factory and testing_triples_factory can be specified.
dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation
training_triples_factory (Optional[TriplesFactory]) – A triples factory with training instances if a a dataset was not specified
testing_triples_factory (Optional[TriplesFactory]) – A triples factory with training instances if a dataset was not specified
validation_triples_factory (Optional[TriplesFactory]) – A triples factory with validation instances if a dataset was not specified
model (Union[str, Type[Model]]) – The name of the model or the model class
model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation
loss (Union[None, str, Type[_Loss]]) – The name of the loss or the loss class.
loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation
regularizer (Union[None, str, Type[Regularizer]]) – The name of the regularizer or the regularizer class.
regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation
optimizer (Union[None, str, Type[Optimizer]]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.
optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation
clear_optimizer (bool) – Whether to delete the optimizer instance after training. As the optimizer might have additional memory consumption due to e.g. moments in Adam, this is the default option. If you want to continue training, you should set it to False, as the optimizer’s internal parameter will get lost otherwise.
training_loop (Union[None, str, Type[TrainingLoop]]) – The name of the training loop’s training approach ('slcwa' or 'lcwa') or the training loop class. Defaults to pykeen.training.SLCWATrainingLoop.
negative_sampler (Union[None, str, Type[NegativeSampler]]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class. Only allowed when training with sLCWA. Defaults to pykeen.sampling.BasicNegativeSampler.
negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation
training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call
stopper (Union[None, str, Type[Stopper]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.
stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.
evaluator (Union[None, str, Type[Evaluator]]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.
evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation
evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call
mlflow_tracking_uri (Optional[str]) – The MLFlow tracking URL. If None is given, MLFlow is not used to track results.
metadata (Optional[Dict[str, Any]]) – A JSON dictionary to store with the experiment
use_testing_data (bool) – If true, use the testing triples. Otherwise, use the validation triples. Defaults to true - use testing triples.

Return type

PipelineResult

class pykeen.pipeline.PipelineResult(random_seed, model, training_loop, losses, metric_results, train_seconds, evaluate_seconds, stopper=None, metadata=<factory>, version=<factory>, git_hash=<factory>)[source]¶

A dataclass containing the results of running pykeen.pipeline.pipeline().

evaluate_seconds: float¶: How long in seconds did evaluation take?

git_hash: str¶: The git hash of PyKEEN used to create these results

losses: List[float]¶: The losses during training

metadata: Optional[Mapping[str, Any]]¶: Any additional metadata as a dictionary

metric_results: pykeen.evaluation.evaluator.MetricResults¶: The results evaluated by the pipeline

model: pykeen.models.base.Model¶: The model trained by the pipeline

plot_losses()[source]¶: Plot the losses per epoch.

random_seed: int¶: The random seed used at the beginning of the pipeline

save_model(path)[source]¶

Save the trained model to the given path using torch.save().

The model contains within it the triples factory that was used for training.

Return type: None

save_to_directory(directory, save_metadata=True, save_replicates=True)[source]¶

Save all artifacts in the given directory.

Return type: None

stopper: Optional[pykeen.stoppers.stopper.Stopper] = None¶: An early stopper

property title¶

The title of the experiment.

Return type: Optional[str]

train_seconds: float¶: How long in seconds did training take?

training_loop: pykeen.training.training_loop.TrainingLoop¶: The training loop used by the pipeline

version: str¶: The version of PyKEEN used to create these results