Hyper-parameter Optimization¶

class HpoPipelineResult(study, objective)[source]¶

A container for the results of the HPO pipeline.

objective: pykeen.hpo.hpo.Objective¶: The objective class, containing information on preset hyper-parameters and those to optimize

replicate_best_pipeline(*, directory, replicates, move_to_cpu=False, save_replicates=True)[source]¶

Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.

Parameters

directory (str) – Output directory
replicates (int) – The number of times to retrain the model
move_to_cpu (bool) – Should the model be moved back to the CPU? Only relevant if training on GPU.
save_replicates (bool) – Should the artifacts of the replicates be saved?

Return type

None

save_to_directory(directory, **kwargs)[source]¶

Dump the results of a study to the given directory.

Return type: None

save_to_ftp(directory, ftp)[source]¶

Save the results to the directory in an FTP server.

Parameters

directory (str) – The directory in the FTP server to save to
ftp (FTP) – A connection to the FTP server

save_to_s3(directory, bucket, s3=None)[source]¶

Save all artifacts to the given directory in an S3 Bucket.

Parameters

directory (str) – The directory in the S3 bucket
bucket (str) – The name of the S3 bucket
s3 – A client from boto3.client(), if already instantiated

Return type

None

study: optuna.study.Study¶: The optuna study object

hpo_pipeline_from_path(path, **kwargs)[source]¶

Run a HPO study from the configuration at the given path.

Return type: HpoPipelineResult

hpo_pipeline_from_config(config, **kwargs)[source]¶

Run the HPO pipeline using a properly formatted configuration dictionary.

Return type: HpoPipelineResult

hpo_pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model, model_kwargs=None, model_kwargs_ranges=None, loss=None, loss_kwargs=None, loss_kwargs_ranges=None, regularizer=None, regularizer_kwargs=None, regularizer_kwargs_ranges=None, optimizer=None, optimizer_kwargs=None, optimizer_kwargs_ranges=None, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, negative_sampler_kwargs_ranges=None, training_kwargs=None, training_kwargs_ranges=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, metric=None, result_tracker=None, result_tracker_kwargs=None, device=None, storage=None, sampler=None, sampler_kwargs=None, pruner=None, pruner_kwargs=None, study_name=None, direction=None, load_if_exists=False, n_trials=None, timeout=None, n_jobs=None, save_model_directory=None)[source]¶

Train a model on the given dataset.

Parameters

dataset (Union[None, str, Dataset, Type[Dataset]]) – The name of the dataset (a key from pykeen.datasets.datasets) or the pykeen.datasets.Dataset instance. Alternatively, the training triples factory (training), testing triples factory (testing), and validation triples factory (validation; optional) can be specified.
dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation
training (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the training file if a a dataset was not specified
testing (Union[None, str, TriplesFactory]) – A triples factory with test instances or path to the test file if a dataset was not specified
validation (Union[None, str, TriplesFactory]) – A triples factory with validation instances or path to the validation file if a dataset was not specified
evaluation_entity_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality. Passed to pykeen.pipeline.pipeline().
evaluation_relation_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality. Passed to pykeen.pipeline.pipeline().
model (Union[str, Type[Model]]) – The name of the model or the model class to pass to pykeen.pipeline.pipeline()
model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation
model_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the models’ hyper-parameters to override the defaults
loss (Union[None, str, Type[Loss]]) – The name of the loss or the loss class to pass to pykeen.pipeline.pipeline()
loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation
loss_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the losses’ hyper-parameters to override the defaults
regularizer (Union[None, str, Type[Regularizer]]) – The name of the regularizer or the regularizer class to pass to pykeen.pipeline.pipeline()
regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation
regularizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaults
optimizer (Union[None, str, Type[Optimizer]]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.
optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation
optimizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaults
training_loop (Union[None, str, Type[TrainingLoop]]) – The name of the training approach ('slcwa' or 'lcwa') or the training loop class to pass to pykeen.pipeline.pipeline()
negative_sampler (Union[None, str, Type[NegativeSampler]]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class to pass to pykeen.pipeline.pipeline(). Only allowed when training with sLCWA.
negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation
negative_sampler_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaults
training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call
training_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.
stopper (Union[None, str, Type[Stopper]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.
stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.
evaluator (Union[None, str, Type[Evaluator]]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.
evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation
evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call
result_tracker (Union[None, str, Type[ResultTracker]]) – The ResultsTracker class or name
result_tracker_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the results tracker on instantiation
metric (Optional[str]) – The metric to optimize over. Defaults to adjusted_mean_rank.
direction (Optional[str]) – The direction of optimization. Because the default metric is adjusted_mean_rank, the default direction is minimize.
n_jobs (Optional[int]) – The number of parallel jobs. If this argument is set to -1, the number is set to CPU counts. If none, defaults to 1.

Note

The remaining parameters are passed to optuna.study.create_study() or optuna.study.Study.optimize().

Return type: HpoPipelineResult