Hyper-parameter Optimization

class HpoPipelineResult(study, objective)[source]

A container for the results of the HPO pipeline.

objective: pykeen.hpo.hpo.Objective

The objective class, containing information on preset hyper-parameters and those to optimize

replicate_best_pipeline(*, directory, replicates, move_to_cpu=False, save_replicates=True)[source]

Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.

Parameters
  • directory (str) – Output directory

  • replicates (int) – The number of times to retrain the model

  • move_to_cpu (bool) – Should the model be moved back to the CPU? Only relevant if training on GPU.

  • save_replicates (bool) – Should the artifacts of the replicates be saved?

Return type

None

save_to_directory(directory, **kwargs)[source]

Dump the results of a study to the given directory.

Return type

None

save_to_ftp(directory, ftp)[source]

Save the results to the directory in an FTP server.

Parameters
  • directory (str) – The directory in the FTP server to save to

  • ftp (FTP) – A connection to the FTP server

save_to_s3(directory, bucket, s3=None)[source]

Save all artifacts to the given directory in an S3 Bucket.

Parameters
  • directory (str) – The directory in the S3 bucket

  • bucket (str) – The name of the S3 bucket

  • s3 – A client from boto3.client(), if already instantiated

Return type

None

study: optuna.study.Study

The optuna study object

hpo_pipeline_from_path(path, **kwargs)[source]

Run a HPO study from the configuration at the given path.

Return type

HpoPipelineResult

hpo_pipeline_from_config(config, **kwargs)[source]

Run the HPO pipeline using a properly formatted configuration dictionary.

Return type

HpoPipelineResult

hpo_pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, model, model_kwargs=None, model_kwargs_ranges=None, loss=None, loss_kwargs=None, loss_kwargs_ranges=None, regularizer=None, regularizer_kwargs=None, regularizer_kwargs_ranges=None, optimizer=None, optimizer_kwargs=None, optimizer_kwargs_ranges=None, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, negative_sampler_kwargs_ranges=None, training_kwargs=None, training_kwargs_ranges=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, metric=None, result_tracker=None, result_tracker_kwargs=None, device=None, storage=None, sampler=None, sampler_kwargs=None, pruner=None, pruner_kwargs=None, study_name=None, direction=None, load_if_exists=False, n_trials=None, timeout=None, n_jobs=None, save_model_directory=None)[source]

Train a model on the given dataset.

Parameters
  • dataset (Union[None, str, DataSet, Type[DataSet]]) – The name of the dataset (a key from pykeen.datasets.datasets) or the pykeen.datasets.DataSet instance. Alternatively, the training_triples_factory and testing_triples_factory can be specified.

  • dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation

  • training (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the training file if a a dataset was not specified

  • testing (Union[None, str, TriplesFactory]) – A triples factory with test instances or path to the test file if a dataset was not specified

  • validation (Union[None, str, TriplesFactory]) – A triples factory with validation instances or path to the validation file if a dataset was not specified

  • model (Union[str, Type[Model]]) – The name of the model or the model class to pass to pykeen.pipeline.pipeline()

  • model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation

  • model_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the models’ hyper-parameters to override the defaults

  • loss (Union[None, str, Type[Loss]]) – The name of the loss or the loss class to pass to pykeen.pipeline.pipeline()

  • loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation

  • loss_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the losses’ hyper-parameters to override the defaults

  • regularizer (Union[None, str, Type[Regularizer]]) – The name of the regularizer or the regularizer class to pass to pykeen.pipeline.pipeline()

  • regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation

  • regularizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaults

  • optimizer (Union[None, str, Type[Optimizer]]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.

  • optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation

  • optimizer_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaults

  • training_loop (Union[None, str, Type[TrainingLoop]]) – The name of the training approach ('slcwa' or 'lcwa') or the training loop class to pass to pykeen.pipeline.pipeline()

  • negative_sampler (Union[None, str, Type[NegativeSampler]]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class to pass to pykeen.pipeline.pipeline(). Only allowed when training with sLCWA.

  • negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation

  • negative_sampler_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaults

  • training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call

  • training_kwargs_ranges (Optional[Mapping[str, Any]]) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.

  • stopper (Union[None, str, Type[Stopper]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.

  • stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.

  • evaluator (Union[None, str, Type[Evaluator]]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.

  • evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation

  • evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call

  • result_tracker (Union[None, str, Type[ResultTracker]]) – The ResultsTracker class or name

  • result_tracker_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the results tracker on instantiation

  • metric (Optional[str]) – The metric to optimize over. Defaults to adjusted_mean_rank.

  • direction (Optional[str]) – The direction of optimization. Because the default metric is adjusted_mean_rank, the default direction is minimize.

  • n_jobs (Optional[int]) – The number of parallel jobs. If this argument is set to -1, the number is set to CPU counts. If none, defaults to 1.

Note

The remaining parameters are passed to optuna.study.create_study() or optuna.study.Study.optimize().

Return type

HpoPipelineResult