Pipeline

pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model, model_kwargs=None, loss=None, loss_kwargs=None, regularizer=None, regularizer_kwargs=None, optimizer=None, optimizer_kwargs=None, clear_optimizer=True, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, training_kwargs=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, result_tracker=None, result_tracker_kwargs=None, metadata=None, device=None, random_seed=None, use_testing_data=True)[source]

Train and evaluate a model.

Parameters
  • dataset (Union[None, str, DataSet, Type[DataSet]]) – The name of the dataset (a key from pykeen.datasets.datasets) or the pykeen.datasets.DataSet instance. Alternatively, the training_triples_factory and testing_triples_factory can be specified.

  • dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation

  • training (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the training file if a a dataset was not specified

  • testing (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the test file if a dataset was not specified

  • validation (Union[None, str, TriplesFactory]) – A triples factory with validation instances or path to the validation file if a dataset was not specified

  • evaluation_entity_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality.

  • evaluation_relation_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality.

  • model (Union[str, Type[Model]]) – The name of the model or the model class

  • model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation

  • loss (Union[None, str, Type[Loss]]) – The name of the loss or the loss class.

  • loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation

  • regularizer (Union[None, str, Type[Regularizer]]) – The name of the regularizer or the regularizer class.

  • regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation

  • optimizer (Union[None, str, Type[Optimizer]]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.

  • optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation

  • clear_optimizer (bool) – Whether to delete the optimizer instance after training. As the optimizer might have additional memory consumption due to e.g. moments in Adam, this is the default option. If you want to continue training, you should set it to False, as the optimizer’s internal parameter will get lost otherwise.

  • training_loop (Union[None, str, Type[TrainingLoop]]) – The name of the training loop’s training approach ('slcwa' or 'lcwa') or the training loop class. Defaults to pykeen.training.SLCWATrainingLoop.

  • negative_sampler (Union[None, str, Type[NegativeSampler]]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class. Only allowed when training with sLCWA. Defaults to pykeen.sampling.BasicNegativeSampler.

  • negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation

  • training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call

  • stopper (Union[None, str, Type[Stopper]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.

  • stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.

  • evaluator (Union[None, str, Type[Evaluator]]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.

  • evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation

  • evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call

  • result_tracker (Union[None, str, Type[ResultTracker]]) – The ResultsTracker class or name

  • result_tracker_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the results tracker on instantiation

  • metadata (Optional[Dict[str, Any]]) – A JSON dictionary to store with the experiment

  • use_testing_data (bool) – If true, use the testing triples. Otherwise, use the validation triples. Defaults to true - use testing triples.

Return type

PipelineResult

class PipelineResult(random_seed, model, training_loop, losses, metric_results, train_seconds, evaluate_seconds, stopper=None, metadata=<factory>, version=<factory>, git_hash=<factory>)[source]

A dataclass containing the results of running pykeen.pipeline.pipeline().

evaluate_seconds: float

How long in seconds did evaluation take?

git_hash: str

The git hash of PyKEEN used to create these results

losses: List[float]

The losses during training

metadata: Optional[Mapping[str, Any]]

Any additional metadata as a dictionary

metric_results: pykeen.evaluation.evaluator.MetricResults

The results evaluated by the pipeline

model: pykeen.models.base.Model

The model trained by the pipeline

plot(er_kwargs=None, figsize=(10, 4))[source]

Plot all plots.

plot_er(model=None, margin=0.4, ax=None, entities=None, relations=None, apply_limits=True, plot_entities=True, plot_relations=None, annotation_x_offset=0.02, annotation_y_offset=0.03, **kwargs)[source]

Plot the reduced entities and relation vectors in 2D.

Parameters
  • model (Optional[str]) – The dimensionality reduction model from sklearn. Defaults to PCA. Can also use KPCA, GRP, SRP, TSNE, LLE, ISOMAP, MDS, or SE.

  • kwargs – The keyword arguments passed to __init__() of the reducer class (e.g., PCA, TSNE)

  • plot_relations (Optional[bool]) – By default, this is only enabled on translational distance models like pykeen.models.TransE.

Warning

Plotting relations and entities on the same plot is only meaningful for translational distance models like TransE.

plot_losses(ax=None)[source]

Plot the losses per epoch.

random_seed: int

The random seed used at the beginning of the pipeline

save_model(path)[source]

Save the trained model to the given path using torch.save().

Parameters

path (str) – The path to which the model is saved. Should have an extension appropriate for a pickle, like *.pkl or *.pickle.

The model contains within it the triples factory that was used for training.

Return type

None

save_to_directory(directory, save_metadata=True, save_replicates=True)[source]

Save all artifacts in the given directory.

Return type

None

save_to_ftp(directory, ftp)[source]

Save all artifacts to the given directory in the FTP server.

Parameters
  • directory (str) – The directory in the FTP server to save to

  • ftp (FTP) – A connection to the FTP server

The following code will train a model and upload it to FTP using Python’s builtin ftplib.FTP:

import ftplib
from pykeen.pipeline import pipeline

directory = 'test/test'
pipeline_result = pipeline(
    model='TransE',
    dataset='Kinships',
)
with ftplib.FTP(host='0.0.0.0', user='user', passwd='12345') as ftp:
    pipeline_result.save_to_ftp(directory, ftp)

If you want to try this with your own local server, run this code based on the example from Giampaolo Rodola’s excellent library, pyftpdlib.

import os
from pyftpdlib.authorizers import DummyAuthorizer
from pyftpdlib.handlers import FTPHandler
from pyftpdlib.servers import FTPServer

authorizer = DummyAuthorizer()
authorizer.add_user("user", "12345", homedir=os.path.expanduser('~/ftp'), perm="elradfmwMT")

handler = FTPHandler
handler.authorizer = authorizer

address = '0.0.0.0', 21
server = FTPServer(address, handler)
server.serve_forever()
Return type

None

save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters
  • directory (str) – The directory in the S3 bucket

  • bucket (str) – The name of the S3 bucket

  • s3 – A client from boto3.client(), if already instantiated

Note

Need to have ~/.aws/credentials file set up. Read: https://realpython.com/python-boto3-aws-s3/

The following code will train a model and upload it to S3 using boto3:

import time
from pykeen.pipeline import pipeline
pipeline_result = pipeline(
    dataset='Kinships',
    model='TransE',
)
directory = f'tests/{time.strftime("%Y-%m-%d-%H%M%S")}'
bucket = 'pykeen'
pipeline_result.save_to_s3(directory, bucket=bucket)
Return type

None

stopper: Optional[pykeen.stoppers.stopper.Stopper] = None

An early stopper

property title: Optional[str]

The title of the experiment.

Return type

Optional[str]

train_seconds: float

How long in seconds did training take?

training_loop: pykeen.training.training_loop.TrainingLoop

The training loop used by the pipeline

version: str

The version of PyKEEN used to create these results