Hyper-parameter Optimization¶

class HpoPipelineResult(study, objective)[source]¶

A container for the results of the HPO pipeline.

objective: pykeen.hpo.hpo.Objective¶: The objective class, containing information on preset hyper-parameters and those to optimize

replicate_best_pipeline(*, directory, replicates, move_to_cpu=False, save_replicates=True)[source]¶

Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.

Parameters

directory (str) – Output directory
replicates (int) – The number of times to retrain the model
move_to_cpu (bool) – Should the model be moved back to the CPU? Only relevant if training on GPU.
save_replicates (bool) – Should the artifacts of the replicates be saved?

Return type

None

save_to_directory(directory, **kwargs)[source]¶

Dump the results of a study to the given directory.

Return type: None

save_to_ftp(directory, ftp)[source]¶

Save the results to the directory in an FTP server.

Parameters

directory (str) – The directory in the FTP server to save to
ftp (FTP) – A connection to the FTP server

save_to_s3(directory, bucket, s3=None)[source]¶

Save all artifacts to the given directory in an S3 Bucket.

Parameters

directory (str) – The directory in the S3 bucket
bucket (str) – The name of the S3 bucket
s3 – A client from boto3.client(), if already instantiated

Return type

None

study: optuna.study.Study¶: The optuna study object

hpo_pipeline_from_path(path, **kwargs)[source]¶

Run a HPO study from the configuration at the given path.

Return type: HpoPipelineResult

hpo_pipeline_from_config(config, **kwargs)[source]¶

Run the HPO pipeline using a properly formatted configuration dictionary.

Return type: HpoPipelineResult

hpo_pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, model, model_kwargs=None, model_kwargs_ranges=None, loss=None, loss_kwargs=None, loss_kwargs_ranges=None, regularizer=None, regularizer_kwargs=None, regularizer_kwargs_ranges=None, optimizer=None, optimizer_kwargs=None, optimizer_kwargs_ranges=None, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, negative_sampler_kwargs_ranges=None, training_kwargs=None, training_kwargs_ranges=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, metric=None, result_tracker=None, result_tracker_kwargs=None, device=None, storage=None, sampler=None, sampler_kwargs=None, pruner=None, pruner_kwargs=None, study_name=None, direction=None, load_if_exists=False, n_trials=None, timeout=None, n_jobs=None, save_model_directory=None)[source]¶

Train a model on the given dataset.

Parameters

dataset (Union[None, str, DataSet, Type[DataSet]]) – The name of the dataset (a key from pykeen.datasets.datasets) or the pykeen.datasets.DataSet instance. Alternatively, the training_triples_factory and testing_triples_factory can be specified.
dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation
training (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the training file if a a dataset was not specified
testing (Union[None, str, TriplesFactory]) – A triples factory with test instances or path to the test file if a dataset was not specified
validation (Union[None, str, TriplesFactory]) – A triples factory with validation instances or path to the validation file if a dataset was not specified
model (Union[str, Type[Model]]) – The name of the model or the model class to pass to pykeen.pipeline.pipeline()
model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation
model_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the models’ hyper-parameters to override the defaults
loss (Union[None, str, Type[Loss]]) – The name of the loss or the loss class to pass to pykeen.pipeline.pipeline()
loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation
loss_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the losses’ hyper-parameters to override the defaults
regularizer (Union[None, str, Type[Regularizer]]) – The name of the regularizer or the regularizer class to pass to pykeen.pipeline.pipeline()
regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation
regularizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaults
optimizer (Union[None, str, Type[Optimizer]]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.
optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation
optimizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaults
training_loop (Union[None, str, Type[TrainingLoop]]) – The name of the training approach ('slcwa' or 'lcwa') or the training loop class to pass to pykeen.pipeline.pipeline()
negative_sampler (Union[None, str, Type[NegativeSampler]]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class to pass to pykeen.pipeline.pipeline(). Only allowed when training with sLCWA.
negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation
negative_sampler_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaults
training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call
training_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.
stopper (Union[None, str, Type[Stopper]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.
stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.
evaluator (Union[None, str, Type[Evaluator]]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.
evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation
evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call
result_tracker (Union[None, str, Type[ResultTracker]]) – The ResultsTracker class or name
result_tracker_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the results tracker on instantiation
metric (Optional[str]) – The metric to optimize over. Defaults to adjusted_mean_rank.
direction (Optional[str]) – The direction of optimization. Because the default metric is adjusted_mean_rank, the default direction is minimize.
n_jobs (Optional[int]) – The number of parallel jobs. If this argument is set to -1, the number is set to CPU counts. If none, defaults to 1.

Note

The remaining parameters are passed to optuna.study.create_study() or optuna.study.Study.optimize().

Return type: HpoPipelineResult