Hyper-parameter Optimization

class HpoPipelineResult(study: Study, objective: Objective)[source]

A container for the results of the HPO pipeline.

Parameters:

study (Study)
objective (Objective)

objective: Objective = <dataclasses._MISSING_TYPE object>: The objective class, containing information on preset hyper-parameters and those to optimize

replicate_best_pipeline(*, directory: str | Path, replicates: int, move_to_cpu: bool = False, save_replicates: bool = True, save_training: bool = False) → None[source]

Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.

Parameters:

directory (str | Path) – Output directory
replicates (int) – The number of times to retrain the model
move_to_cpu (bool) – Should the model be moved back to the CPU? Only relevant if training on GPU.
save_replicates (bool) – Should the artifacts of the replicates be saved?
save_training (bool) – Should the training triples be saved?

Raises:

ValueError – if "use_testing_data" is provided in the best pipeline’s config.

Return type:

None

save_to_directory(directory: str | Path, **kwargs) → None[source]

Dump the results of a study to the given directory.

Parameters:: directory (str | Path)
Return type:: None

save_to_ftp(directory: str | Path, ftp: FTP)[source]

Save the results to the directory in an FTP server.

Parameters:

directory (str | Path) – The directory in the FTP server to save to
ftp (FTP) – A connection to the FTP server

save_to_s3(directory: str, bucket: str, s3=None) → None[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters:

directory (str) – The directory in the S3 bucket
bucket (str) – The name of the S3 bucket
s3 – A client from boto3.client(), if already instantiated

Return type:

None

study: Study = <dataclasses._MISSING_TYPE object>: The optuna study object

hpo_pipeline_from_path(path: str | Path, **kwargs) → HpoPipelineResult[source]

Run a HPO study from the configuration at the given path.

Parameters:: path (str | Path)
Return type:: HpoPipelineResult

hpo_pipeline_from_config(config: Mapping[str, Any], **kwargs) → HpoPipelineResult[source]

Run the HPO pipeline using a properly formatted configuration dictionary.

Parameters:: config (Mapping[str, Any])
Return type:: HpoPipelineResult

Train a model on the given dataset.

Parameters:

dataset (None | str | Dataset | type[Dataset]) – The name of the dataset (a key for the pykeen.datasets.dataset_resolver) or the pykeen.datasets.Dataset instance. Alternatively, the training triples factory (training), testing triples factory (testing), and validation triples factory (validation; optional) can be specified.
dataset_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the dataset upon instantiation
training (str | CoreTriplesFactory | None) – A triples factory with training instances or path to the training file if a a dataset was not specified
testing (str | CoreTriplesFactory | None) – A triples factory with test instances or path to the test file if a dataset was not specified
validation (str | CoreTriplesFactory | None) – A triples factory with validation instances or path to the validation file if a dataset was not specified
evaluation_entity_whitelist (Collection[str] | None) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality. Passed to pykeen.pipeline.pipeline().
evaluation_relation_whitelist (Collection[str] | None) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality. Passed to pykeen.pipeline.pipeline().
model (str | type[Model]) – The name of the model or the model class to pass to pykeen.pipeline.pipeline()
model_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the model class on instantiation
model_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the models’ hyper-parameters to override the defaults
loss (str | type[Loss] | None) – The name of the loss or the loss class to pass to pykeen.pipeline.pipeline()
loss_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the loss on instantiation
loss_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the losses’ hyper-parameters to override the defaults
regularizer (str | type[Regularizer] | None) – The name of the regularizer or the regularizer class to pass to pykeen.pipeline.pipeline()
regularizer_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the regularizer on instantiation
regularizer_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaults
optimizer (str | type[Optimizer] | None) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.
optimizer_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the optimizer on instantiation
optimizer_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaults
lr_scheduler (str | type[LRScheduler] | None) – The name of the lr_scheduler or the lr_scheduler class.
lr_scheduler_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the lr_scheduler on instantiation
lr_scheduler_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the lr_schedulers’ hyper-parameters to override the defaults
training_loop (str | type[TrainingLoop] | None) – The name of the training approach ('slcwa' or 'lcwa') or the training loop class to pass to pykeen.pipeline.pipeline()
training_loop_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters passed to the training loop upon instantiation.
negative_sampler (str | type[NegativeSampler] | None) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class to pass to pykeen.pipeline.pipeline(). Only allowed when training with sLCWA.
negative_sampler_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the negative sampler class on instantiation
negative_sampler_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaults
epochs (int | None) – A shortcut for setting the num_epochs key in the training_kwargs dict.
training_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the training loop’s train function on call
training_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.
stopper (str | type[Stopper] | None) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.
stopper_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the stopper upon instantiation.
evaluator (str | type[Evaluator] | None) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.
evaluator_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the evaluator on instantiation
evaluation_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the evaluator’s evaluate function on call
filter_validation_when_testing (bool) – If true, during evaluating on the test dataset, validation triples are added to the set of known positive triples, which are filtered out when performing filtered evaluation following the approach described by [bordes2013]. Defaults to true.
result_tracker (str | type[ResultTracker] | None) – The ResultsTracker class or name
result_tracker_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the results tracker on instantiation
metric (str | None) – The metric to optimize over. Defaults to mean reciprocal rank.
n_jobs (int | None) – The number of parallel jobs. If this argument is set to -1, the number is set to CPU counts. If none, defaults to 1.
save_model_directory (str | None) – If given, the final model of each trial is saved under this directory.
storage (str | BaseStorage | None) – the study’s storage, cf. optuna.study.create_study()
sampler (str | type[BaseSampler] | None) – the sampler, or a hint thereof, cf. optuna.study.create_study()
sampler_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters for the sampler
pruner (str | type[BasePruner] | None) – the pruner, or a hint thereof, cf. optuna.study.create_study()
pruner_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters for the pruner
device (str | device | None) – the device to use.
study_name (str | None) – the study’s name, cf. optuna.study.create_study()
direction (str | None) – The direction of optimization. Because the default metric is mean reciprocal rank, the default direction is maximize. cf. optuna.study.create_study()
load_if_exists (bool) – whether to load the study if it already exists, cf. optuna.study.create_study()
n_trials (int | None) – the number of trials, cf. optuna.study.Study.optimize().
timeout (int | None) – the timeout, cf. optuna.study.Study.optimize().
gc_after_trial (bool | None) – the garbage collection after trial, cf. optuna.study.Study.optimize().
n_jobs – the number of jobs, cf. optuna.study.Study.optimize(). Defaults to 1.

Returns:

the optimization result

Raises:

ValueError – if early stopping is enabled, but the number of epochs is to be optimized, too.

Return type:

HpoPipelineResult