Hyper-parameter Optimization

class HpoPipelineResult(study, objective)[source]

A container for the results of the HPO pipeline.

objective: pykeen.hpo.hpo.Objective

The objective class, containing information on preset hyper-parameters and those to optimize

replicate_best_pipeline(*, directory, replicates, move_to_cpu=False, save_replicates=True)[source]

Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.

Parameters
  • directory (Union[str, Path]) – Output directory

  • replicates (int) – The number of times to retrain the model

  • move_to_cpu (bool) – Should the model be moved back to the CPU? Only relevant if training on GPU.

  • save_replicates (bool) – Should the artifacts of the replicates be saved?

Return type

None

save_to_directory(directory, **kwargs)[source]

Dump the results of a study to the given directory.

Return type

None

save_to_ftp(directory, ftp)[source]

Save the results to the directory in an FTP server.

Parameters
  • directory (str) – The directory in the FTP server to save to

  • ftp (FTP) – A connection to the FTP server

save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters
  • directory (str) – The directory in the S3 bucket

  • bucket (str) – The name of the S3 bucket

  • s3 – A client from boto3.client(), if already instantiated

Return type

None

study: optuna.study.Study

The optuna study object

hpo_pipeline_from_path(path, **kwargs)[source]

Run a HPO study from the configuration at the given path.

Return type

HpoPipelineResult

hpo_pipeline_from_config(config, **kwargs)[source]

Run the HPO pipeline using a properly formatted configuration dictionary.

Return type

HpoPipelineResult

hpo_pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model, model_kwargs=None, model_kwargs_ranges=None, loss=None, loss_kwargs=None, loss_kwargs_ranges=None, regularizer=None, regularizer_kwargs=None, regularizer_kwargs_ranges=None, optimizer=None, optimizer_kwargs=None, optimizer_kwargs_ranges=None, lr_scheduler=None, lr_scheduler_kwargs=None, lr_scheduler_kwargs_ranges=None, training_loop=None, training_loop_kwargs=None, negative_sampler=None, negative_sampler_kwargs=None, negative_sampler_kwargs_ranges=None, epochs=None, training_kwargs=None, training_kwargs_ranges=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, metric=None, filter_validation_when_testing=True, result_tracker=None, result_tracker_kwargs=None, device=None, storage=None, sampler=None, sampler_kwargs=None, pruner=None, pruner_kwargs=None, study_name=None, direction=None, load_if_exists=False, n_trials=None, timeout=None, n_jobs=None, save_model_directory=None)[source]

Train a model on the given dataset.

Parameters
  • dataset (Union[None, str, Dataset, Type[Dataset]]) – The name of the dataset (a key for the pykeen.datasets.dataset_resolver) or the pykeen.datasets.Dataset instance. Alternatively, the training triples factory (training), testing triples factory (testing), and validation triples factory (validation; optional) can be specified.

  • dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation

  • training (Union[str, CoreTriplesFactory, None]) – A triples factory with training instances or path to the training file if a a dataset was not specified

  • testing (Union[str, CoreTriplesFactory, None]) – A triples factory with test instances or path to the test file if a dataset was not specified

  • validation (Union[str, CoreTriplesFactory, None]) – A triples factory with validation instances or path to the validation file if a dataset was not specified

  • evaluation_entity_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality. Passed to pykeen.pipeline.pipeline().

  • evaluation_relation_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality. Passed to pykeen.pipeline.pipeline().

  • model (Union[str, Type[Model]]) – The name of the model or the model class to pass to pykeen.pipeline.pipeline()

  • model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation

  • model_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the models’ hyper-parameters to override the defaults

  • loss (Union[str, Type[Loss], None]) – The name of the loss or the loss class to pass to pykeen.pipeline.pipeline()

  • loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation

  • loss_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the losses’ hyper-parameters to override the defaults

  • regularizer (Union[str, Type[Regularizer], None]) – The name of the regularizer or the regularizer class to pass to pykeen.pipeline.pipeline()

  • regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation

  • regularizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaults

  • optimizer (Union[str, Type[Optimizer], None]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.

  • optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation

  • optimizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaults

  • lr_scheduler (Union[str, Type[_LRScheduler], None]) – The name of the lr_scheduler or the lr_scheduler class.

  • lr_scheduler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the lr_scheduler on instantiation

  • lr_scheduler_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the lr_schedulers’ hyper-parameters to override the defaults

  • training_loop (Union[str, Type[TrainingLoop], None]) – The name of the training approach ('slcwa' or 'lcwa') or the training loop class to pass to pykeen.pipeline.pipeline()

  • negative_sampler (Union[str, Type[NegativeSampler], None]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class to pass to pykeen.pipeline.pipeline(). Only allowed when training with sLCWA.

  • negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation

  • negative_sampler_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaults

  • epochs (Optional[int]) – A shortcut for setting the num_epochs key in the training_kwargs dict.

  • training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call

  • training_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.

  • stopper (Union[str, Type[Stopper], None]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.

  • stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.

  • evaluator (Union[str, Type[Evaluator], None]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.

  • evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation

  • evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call

  • filter_validation_when_testing (bool) – If true, during evaluating on the test dataset, validation triples are added to the set of known positive triples, which are filtered out when performing filtered evaluation following the approach described by [bordes2013]. Defaults to true.

  • result_tracker (Union[str, Type[ResultTracker], None]) – The ResultsTracker class or name

  • result_tracker_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the results tracker on instantiation

  • metric (Optional[str]) – The metric to optimize over. Defaults to ADJUSTED_ARITHMETIC_MEAN_RANK_INDEX.

  • direction (Optional[str]) – The direction of optimization. Because the default metric is ADJUSTED_ARITHMETIC_MEAN_RANK_INDEX, the default direction is maximize.

  • n_jobs (Optional[int]) – The number of parallel jobs. If this argument is set to -1, the number is set to CPU counts. If none, defaults to 1.

Note

The remaining parameters are passed to optuna.study.create_study() or optuna.study.Study.optimize().

Return type

HpoPipelineResult