Pipeline¶
-
pipeline
(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model, model_kwargs=None, loss=None, loss_kwargs=None, regularizer=None, regularizer_kwargs=None, optimizer=None, optimizer_kwargs=None, clear_optimizer=True, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, training_kwargs=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, result_tracker=None, result_tracker_kwargs=None, automatic_memory_optimization=True, metadata=None, device=None, random_seed=None, use_testing_data=True)[source]¶ Train and evaluate a model.
- Parameters
dataset (
Union
[None
,str
,Dataset
,Type
[Dataset
]]) – The name of the dataset (a key frompykeen.datasets.datasets
) or thepykeen.datasets.Dataset
instance. Alternatively, the training triples factory (training
), testing triples factory (testing
), and validation triples factory (validation
; optional) can be specified.dataset_kwargs (
Optional
[Mapping
[str
,Any
]]) – The keyword arguments passed to the dataset upon instantiationtraining (
Union
[None
,str
,TriplesFactory
]) – A triples factory with training instances or path to the training file if a a dataset was not specifiedtesting (
Union
[None
,str
,TriplesFactory
]) – A triples factory with training instances or path to the test file if a dataset was not specifiedvalidation (
Union
[None
,str
,TriplesFactory
]) – A triples factory with validation instances or path to the validation file if a dataset was not specifiedevaluation_entity_whitelist (
Optional
[Collection
[str
]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality.evaluation_relation_whitelist (
Optional
[Collection
[str
]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality.model (
Union
[str
,Type
[Model
]]) – The name of the model or the model classmodel_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the model class on instantiationloss (
Union
[None
,str
,Type
[Loss
]]) – The name of the loss or the loss class.loss_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the loss on instantiationregularizer (
Union
[None
,str
,Type
[Regularizer
]]) – The name of the regularizer or the regularizer class.regularizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the regularizer on instantiationoptimizer (
Union
[None
,str
,Type
[Optimizer
]]) – The name of the optimizer or the optimizer class. Defaults totorch.optim.Adagrad
.optimizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the optimizer on instantiationclear_optimizer (
bool
) – Whether to delete the optimizer instance after training. As the optimizer might have additional memory consumption due to e.g. moments in Adam, this is the default option. If you want to continue training, you should set it to False, as the optimizer’s internal parameter will get lost otherwise.training_loop (
Union
[None
,str
,Type
[TrainingLoop
]]) – The name of the training loop’s training approach ('slcwa'
or'lcwa'
) or the training loop class. Defaults topykeen.training.SLCWATrainingLoop
.negative_sampler (
Union
[None
,str
,Type
[NegativeSampler
]]) – The name of the negative sampler ('basic'
or'bernoulli'
) or the negative sampler class. Only allowed when training with sLCWA. Defaults topykeen.sampling.BasicNegativeSampler
.negative_sampler_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the negative sampler class on instantiationtraining_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the training loop’s train function on callstopper (
Union
[None
,str
,Type
[Stopper
]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.stopper_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the stopper upon instantiation.evaluator (
Union
[None
,str
,Type
[Evaluator
]]) – The name of the evaluator or an evaluator class. Defaults topykeen.evaluation.RankBasedEvaluator
.evaluator_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the evaluator on instantiationevaluation_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the evaluator’s evaluate function on callresult_tracker (
Union
[None
,str
,Type
[ResultTracker
]]) – The ResultsTracker class or nameresult_tracker_kwargs (
Optional
[Mapping
[str
,Any
]]) – The keyword arguments passed to the results tracker on instantiationmetadata (
Optional
[Dict
[str
,Any
]]) – A JSON dictionary to store with the experimentuse_testing_data (
bool
) – If true, use the testing triples. Otherwise, use the validation triples. Defaults to true - use testing triples.
- Return type
-
class
PipelineResult
(random_seed, model, training_loop, losses, metric_results, train_seconds, evaluate_seconds, stopper=None, metadata=<factory>, version=<factory>, git_hash=<factory>)[source]¶ A dataclass containing the results of running
pykeen.pipeline.pipeline()
.-
metric_results
: pykeen.evaluation.evaluator.MetricResults¶ The results evaluated by the pipeline
-
model
: pykeen.models.base.Model¶ The model trained by the pipeline
-
plot
(**kwargs)[source]¶ Plot all plots.
- Parameters
kwargs – The keyword arguments passed to
pykeen.pipeline_plot.plot()
-
plot_early_stopping
(**kwargs)[source]¶ Plot the evaluations during early stopping.
- Parameters
kwargs – The keyword arguments passed to
pykeen.pipeline_plot.plot_early_stopping()
-
plot_er
(**kwargs)[source]¶ Plot the reduced entities and relation vectors in 2D.
- Parameters
kwargs – The keyword arguments passed to
pykeen.pipeline_plot.plot_er()
Warning
Plotting relations and entities on the same plot is only meaningful for translational distance models like TransE.
-
plot_losses
(**kwargs)[source]¶ Plot the losses per epoch.
- Parameters
kwargs – The keyword arguments passed to
pykeen.pipeline_plot.plot_losses()
.
-
save_model
(path)[source]¶ Save the trained model to the given path using
torch.save()
.- Parameters
path (
str
) – The path to which the model is saved. Should have an extension appropriate for a pickle, like *.pkl or *.pickle.
The model contains within it the triples factory that was used for training.
- Return type
-
save_to_directory
(directory, *, save_metadata=True, save_replicates=True, **_kwargs)[source]¶ Save all artifacts in the given directory.
- Return type
-
save_to_ftp
(directory, ftp)[source]¶ Save all artifacts to the given directory in the FTP server.
- Parameters
The following code will train a model and upload it to FTP using Python’s builtin
ftplib.FTP
:import ftplib from pykeen.pipeline import pipeline directory = 'test/test' pipeline_result = pipeline( model='TransE', dataset='Kinships', ) with ftplib.FTP(host='0.0.0.0', user='user', passwd='12345') as ftp: pipeline_result.save_to_ftp(directory, ftp)
If you want to try this with your own local server, run this code based on the example from Giampaolo Rodola’s excellent library, pyftpdlib.
import os from pyftpdlib.authorizers import DummyAuthorizer from pyftpdlib.handlers import FTPHandler from pyftpdlib.servers import FTPServer authorizer = DummyAuthorizer() authorizer.add_user("user", "12345", homedir=os.path.expanduser('~/ftp'), perm="elradfmwMT") handler = FTPHandler handler.authorizer = authorizer address = '0.0.0.0', 21 server = FTPServer(address, handler) server.serve_forever()
- Return type
-
save_to_s3
(directory, bucket, s3=None)[source]¶ Save all artifacts to the given directory in an S3 Bucket.
- Parameters
directory (
str
) – The directory in the S3 bucketbucket (
str
) – The name of the S3 buckets3 – A client from
boto3.client()
, if already instantiated
Note
Need to have
~/.aws/credentials
file set up. Read: https://realpython.com/python-boto3-aws-s3/The following code will train a model and upload it to S3 using
boto3
:import time from pykeen.pipeline import pipeline pipeline_result = pipeline( dataset='Kinships', model='TransE', ) directory = f'tests/{time.strftime("%Y-%m-%d-%H%M%S")}' bucket = 'pykeen' pipeline_result.save_to_s3(directory, bucket=bucket)
- Return type
-
stopper
: Optional[pykeen.stoppers.stopper.Stopper] = None¶ An early stopper
-
training_loop
: pykeen.training.training_loop.TrainingLoop¶ The training loop used by the pipeline
-
Plotting utilities for the pipeline results.
-
plot_early_stopping
(pipeline_result, *, ax=None, lineplot_kwargs=None)[source]¶ Plot the evaluations during early stopping.
-
plot_er
(pipeline_result, *, model=None, entities=None, relations=None, apply_limits=True, margin=0.4, plot_entities=True, plot_relations=None, annotation_x_offset=0.02, annotation_y_offset=0.03, ax=None, **kwargs)[source]¶ Plot the reduced entities and relation vectors in 2D.
- Parameters
pipeline_result – The result returned by
pykeen.pipeline.pipeline()
.model (
Optional
[str
]) – The dimensionality reduction model fromsklearn
. Defaults to PCA. Can also use KPCA, GRP, SRP, TSNE, LLE, ISOMAP, MDS, or SE.entities (
Optional
[Set
[str
]]) – A subset of entities to plotrelations (
Optional
[Set
[str
]]) – A subset of relations to plotapply_limits (
bool
) – Should the x and y limits be applied?margin (
float
) – The margin size around the minimum/maximum x and y valuesplot_entities (
bool
) – If true, plot the entities based on their reduced embeddingsplot_relations (
Optional
[bool
]) – By default, this is only enabled on translational distance models likepykeen.models.TransE
.annotation_x_offset (
float
) – X offset of label from entity positionannotation_y_offset (
float
) – Y offset of label from entity positionax – The matplotlib axis, if pre-defined
kwargs – The keyword arguments passed to __init__() of the reducer class (e.g., PCA, TSNE)
- Returns
The axis
- Raises
ValueError – if entity plotting and relation plotting are both turned off
Warning
Plotting relations and entities on the same plot is only meaningful for translational distance models like TransE.