Hyper-parameter Optimization
- class HpoPipelineResult(study: Study, objective: Objective)[source]
A container for the results of the HPO pipeline.
- Parameters:
study (Study)
objective (Objective)
- objective: Objective
The objective class, containing information on preset hyper-parameters and those to optimize
- replicate_best_pipeline(*, directory: str | Path, replicates: int, move_to_cpu: bool = False, save_replicates: bool = True, save_training: bool = False) None [source]
Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.
- Parameters:
- Raises:
ValueError – if
"use_testing_data"
is provided in the best pipeline’s config.- Return type:
None
- save_to_directory(directory: str | Path, **kwargs) None [source]
Dump the results of a study to the given directory.
- save_to_s3(directory: str, bucket: str, s3=None) None [source]
Save all artifacts to the given directory in an S3 Bucket.
- Parameters:
directory (str) – The directory in the S3 bucket
bucket (str) – The name of the S3 bucket
s3 – A client from
boto3.client()
, if already instantiated
- Return type:
None
- hpo_pipeline_from_path(path: str | Path, **kwargs) HpoPipelineResult [source]
Run a HPO study from the configuration at the given path.
- Parameters:
- Return type:
- hpo_pipeline_from_config(config: Mapping[str, Any], **kwargs) HpoPipelineResult [source]
Run the HPO pipeline using a properly formatted configuration dictionary.
- Parameters:
- Return type:
- hpo_pipeline(*, dataset: None | str | Dataset | type[Dataset] = None, dataset_kwargs: Mapping[str, Any] | None = None, training: str | CoreTriplesFactory | None = None, testing: str | CoreTriplesFactory | None = None, validation: str | CoreTriplesFactory | None = None, evaluation_entity_whitelist: Collection[str] | None = None, evaluation_relation_whitelist: Collection[str] | None = None, model: str | type[Model], model_kwargs: Mapping[str, Any] | None = None, model_kwargs_ranges: Mapping[str, Any] | None = None, loss: str | type[Loss] | None = None, loss_kwargs: Mapping[str, Any] | None = None, loss_kwargs_ranges: Mapping[str, Any] | None = None, regularizer: str | type[Regularizer] | None = None, regularizer_kwargs: Mapping[str, Any] | None = None, regularizer_kwargs_ranges: Mapping[str, Any] | None = None, optimizer: str | type[Optimizer] | None = None, optimizer_kwargs: Mapping[str, Any] | None = None, optimizer_kwargs_ranges: Mapping[str, Any] | None = None, lr_scheduler: str | type[LRScheduler] | None = None, lr_scheduler_kwargs: Mapping[str, Any] | None = None, lr_scheduler_kwargs_ranges: Mapping[str, Any] | None = None, training_loop: str | type[TrainingLoop] | None = None, training_loop_kwargs: Mapping[str, Any] | None = None, negative_sampler: str | type[NegativeSampler] | None = None, negative_sampler_kwargs: Mapping[str, Any] | None = None, negative_sampler_kwargs_ranges: Mapping[str, Any] | None = None, epochs: int | None = None, training_kwargs: Mapping[str, Any] | None = None, training_kwargs_ranges: Mapping[str, Any] | None = None, stopper: str | type[Stopper] | None = None, stopper_kwargs: Mapping[str, Any] | None = None, evaluator: str | type[Evaluator] | None = None, evaluator_kwargs: Mapping[str, Any] | None = None, evaluation_kwargs: Mapping[str, Any] | None = None, metric: str | None = None, filter_validation_when_testing: bool = True, result_tracker: str | type[ResultTracker] | None = None, result_tracker_kwargs: Mapping[str, Any] | None = None, device: str | device | None = None, storage: str | BaseStorage | None = None, sampler: str | type[BaseSampler] | None = None, sampler_kwargs: Mapping[str, Any] | None = None, pruner: str | type[BasePruner] | None = None, pruner_kwargs: Mapping[str, Any] | None = None, study_name: str | None = None, direction: str | None = None, load_if_exists: bool = False, n_trials: int | None = None, timeout: int | None = None, gc_after_trial: bool | None = None, n_jobs: int | None = None, save_model_directory: str | None = None) HpoPipelineResult [source]
Train a model on the given dataset.
- Parameters:
dataset (None | str | Dataset | type[Dataset]) – The name of the dataset (a key for the
pykeen.datasets.dataset_resolver
) or thepykeen.datasets.Dataset
instance. Alternatively, the training triples factory (training
), testing triples factory (testing
), and validation triples factory (validation
; optional) can be specified.dataset_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the dataset upon instantiation
training (str | CoreTriplesFactory | None) – A triples factory with training instances or path to the training file if a a dataset was not specified
testing (str | CoreTriplesFactory | None) – A triples factory with test instances or path to the test file if a dataset was not specified
validation (str | CoreTriplesFactory | None) – A triples factory with validation instances or path to the validation file if a dataset was not specified
evaluation_entity_whitelist (Collection[str] | None) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality. Passed to
pykeen.pipeline.pipeline()
.evaluation_relation_whitelist (Collection[str] | None) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality. Passed to
pykeen.pipeline.pipeline()
.model (str | type[Model]) – The name of the model or the model class to pass to
pykeen.pipeline.pipeline()
model_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the model class on instantiation
model_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the models’ hyper-parameters to override the defaults
loss (str | type[Loss] | None) – The name of the loss or the loss class to pass to
pykeen.pipeline.pipeline()
loss_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the loss on instantiation
loss_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the losses’ hyper-parameters to override the defaults
regularizer (str | type[Regularizer] | None) – The name of the regularizer or the regularizer class to pass to
pykeen.pipeline.pipeline()
regularizer_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the regularizer on instantiation
regularizer_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaults
optimizer (str | type[Optimizer] | None) – The name of the optimizer or the optimizer class. Defaults to
torch.optim.Adagrad
.optimizer_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the optimizer on instantiation
optimizer_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaults
lr_scheduler (str | type[LRScheduler] | None) – The name of the lr_scheduler or the lr_scheduler class.
lr_scheduler_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the lr_scheduler on instantiation
lr_scheduler_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the lr_schedulers’ hyper-parameters to override the defaults
training_loop (str | type[TrainingLoop] | None) – The name of the training approach (
'slcwa'
or'lcwa'
) or the training loop class to pass topykeen.pipeline.pipeline()
training_loop_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters passed to the training loop upon instantiation.
negative_sampler (str | type[NegativeSampler] | None) – The name of the negative sampler (
'basic'
or'bernoulli'
) or the negative sampler class to pass topykeen.pipeline.pipeline()
. Only allowed when training with sLCWA.negative_sampler_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the negative sampler class on instantiation
negative_sampler_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaults
epochs (int | None) – A shortcut for setting the
num_epochs
key in thetraining_kwargs
dict.training_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the training loop’s train function on call
training_kwargs_ranges (Mapping[str, Any] | None) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.
stopper (str | type[Stopper] | None) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.
stopper_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the stopper upon instantiation.
evaluator (str | type[Evaluator] | None) – The name of the evaluator or an evaluator class. Defaults to
pykeen.evaluation.RankBasedEvaluator
.evaluator_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the evaluator on instantiation
evaluation_kwargs (Mapping[str, Any] | None) – Keyword arguments to pass to the evaluator’s evaluate function on call
filter_validation_when_testing (bool) – If true, during evaluating on the test dataset, validation triples are added to the set of known positive triples, which are filtered out when performing filtered evaluation following the approach described by [bordes2013]. Defaults to true.
result_tracker (str | type[ResultTracker] | None) – The ResultsTracker class or name
result_tracker_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the results tracker on instantiation
metric (str | None) – The metric to optimize over. Defaults to mean reciprocal rank.
n_jobs (int | None) – The number of parallel jobs. If this argument is set to
-1
, the number is set to CPU counts. If none, defaults to 1.save_model_directory (str | None) – If given, the final model of each trial is saved under this directory.
storage (str | BaseStorage | None) – the study’s storage, cf.
optuna.study.create_study()
sampler (str | type[BaseSampler] | None) – the sampler, or a hint thereof, cf.
optuna.study.create_study()
sampler_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters for the sampler
pruner (str | type[BasePruner] | None) – the pruner, or a hint thereof, cf.
optuna.study.create_study()
pruner_kwargs (Mapping[str, Any] | None) – additional keyword-based parameters for the pruner
study_name (str | None) – the study’s name, cf.
optuna.study.create_study()
direction (str | None) – The direction of optimization. Because the default metric is mean reciprocal rank, the default direction is
maximize
. cf.optuna.study.create_study()
load_if_exists (bool) – whether to load the study if it already exists, cf.
optuna.study.create_study()
n_trials (int | None) – the number of trials, cf.
optuna.study.Study.optimize()
.timeout (int | None) – the timeout, cf.
optuna.study.Study.optimize()
.gc_after_trial (bool | None) – the garbage collection after trial, cf.
optuna.study.Study.optimize()
.n_jobs – the number of jobs, cf.
optuna.study.Study.optimize()
. Defaults to 1.
- Returns:
the optimization result
- Raises:
ValueError – if early stopping is enabled, but the number of epochs is to be optimized, too.
- Return type: