Hyper-parameter Optimization¶
- class HpoPipelineResult(study, objective)[source]¶
A container for the results of the HPO pipeline.
- Parameters:
study (Study) –
objective (Objective) –
- objective: Objective¶
The objective class, containing information on preset hyper-parameters and those to optimize
- replicate_best_pipeline(*, directory, replicates, move_to_cpu=False, save_replicates=True, save_training=False)[source]¶
Run the pipeline on the best configuration, but this time on the “test” set instead of “evaluation” set.
- Parameters:
- Raises:
ValueError – if
"use_testing_data"
is provided in the best pipeline’s config.- Return type:
- save_to_s3(directory, bucket, s3=None)[source]¶
Save all artifacts to the given directory in an S3 Bucket.
- Parameters:
directory (
str
) – The directory in the S3 bucketbucket (
str
) – The name of the S3 buckets3 – A client from
boto3.client()
, if already instantiated
- Return type:
- hpo_pipeline_from_path(path, **kwargs)[source]¶
Run a HPO study from the configuration at the given path.
- Return type:
- Parameters:
- hpo_pipeline_from_config(config, **kwargs)[source]¶
Run the HPO pipeline using a properly formatted configuration dictionary.
- Return type:
- Parameters:
- hpo_pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model, model_kwargs=None, model_kwargs_ranges=None, loss=None, loss_kwargs=None, loss_kwargs_ranges=None, regularizer=None, regularizer_kwargs=None, regularizer_kwargs_ranges=None, optimizer=None, optimizer_kwargs=None, optimizer_kwargs_ranges=None, lr_scheduler=None, lr_scheduler_kwargs=None, lr_scheduler_kwargs_ranges=None, training_loop=None, training_loop_kwargs=None, negative_sampler=None, negative_sampler_kwargs=None, negative_sampler_kwargs_ranges=None, epochs=None, training_kwargs=None, training_kwargs_ranges=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, metric=None, filter_validation_when_testing=True, result_tracker=None, result_tracker_kwargs=None, device=None, storage=None, sampler=None, sampler_kwargs=None, pruner=None, pruner_kwargs=None, study_name=None, direction=None, load_if_exists=False, n_trials=None, timeout=None, gc_after_trial=None, n_jobs=None, save_model_directory=None)[source]¶
Train a model on the given dataset.
- Parameters:
dataset (
Union
[None
,str
,Dataset
,Type
[Dataset
]]) – The name of the dataset (a key for thepykeen.datasets.dataset_resolver
) or thepykeen.datasets.Dataset
instance. Alternatively, the training triples factory (training
), testing triples factory (testing
), and validation triples factory (validation
; optional) can be specified.dataset_kwargs (
Optional
[Mapping
[str
,Any
]]) – The keyword arguments passed to the dataset upon instantiationtraining (
Union
[str
,CoreTriplesFactory
,None
]) – A triples factory with training instances or path to the training file if a a dataset was not specifiedtesting (
Union
[str
,CoreTriplesFactory
,None
]) – A triples factory with test instances or path to the test file if a dataset was not specifiedvalidation (
Union
[str
,CoreTriplesFactory
,None
]) – A triples factory with validation instances or path to the validation file if a dataset was not specifiedevaluation_entity_whitelist (
Optional
[Collection
[str
]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality. Passed topykeen.pipeline.pipeline()
.evaluation_relation_whitelist (
Optional
[Collection
[str
]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality. Passed topykeen.pipeline.pipeline()
.model (
Union
[str
,Type
[Model
]]) – The name of the model or the model class to pass topykeen.pipeline.pipeline()
model_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the model class on instantiationmodel_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the models’ hyper-parameters to override the defaultsloss (
Union
[str
,Type
[Loss
],None
]) – The name of the loss or the loss class to pass topykeen.pipeline.pipeline()
loss_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the loss on instantiationloss_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the losses’ hyper-parameters to override the defaultsregularizer (
Union
[str
,Type
[Regularizer
],None
]) – The name of the regularizer or the regularizer class to pass topykeen.pipeline.pipeline()
regularizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the regularizer on instantiationregularizer_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the regularizers’ hyper-parameters to override the defaultsoptimizer (
Union
[str
,Type
[Optimizer
],None
]) – The name of the optimizer or the optimizer class. Defaults totorch.optim.Adagrad
.optimizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the optimizer on instantiationoptimizer_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the optimizers’ hyper-parameters to override the defaultslr_scheduler (
Union
[str
,Type
[LRScheduler
],None
]) – The name of the lr_scheduler or the lr_scheduler class.lr_scheduler_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the lr_scheduler on instantiationlr_scheduler_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the lr_schedulers’ hyper-parameters to override the defaultstraining_loop (
Union
[str
,Type
[TrainingLoop
],None
]) – The name of the training approach ('slcwa'
or'lcwa'
) or the training loop class to pass topykeen.pipeline.pipeline()
training_loop_kwargs (
Optional
[Mapping
[str
,Any
]]) – additional keyword-based parameters passed to the training loop upon instantiation.negative_sampler (
Union
[str
,Type
[NegativeSampler
],None
]) – The name of the negative sampler ('basic'
or'bernoulli'
) or the negative sampler class to pass topykeen.pipeline.pipeline()
. Only allowed when training with sLCWA.negative_sampler_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the negative sampler class on instantiationnegative_sampler_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the negative samplers’ hyper-parameters to override the defaultsepochs (
Optional
[int
]) – A shortcut for setting thenum_epochs
key in thetraining_kwargs
dict.training_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the training loop’s train function on calltraining_kwargs_ranges (
Optional
[Mapping
[str
,Any
]]) – Strategies for optimizing the training loops’ hyper-parameters to override the defaults. Can not specify ranges for batch size if early stopping is enabled.stopper (
Union
[str
,Type
[Stopper
],None
]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.stopper_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the stopper upon instantiation.evaluator (
Union
[str
,Type
[Evaluator
],None
]) – The name of the evaluator or an evaluator class. Defaults topykeen.evaluation.RankBasedEvaluator
.evaluator_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the evaluator on instantiationevaluation_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the evaluator’s evaluate function on callfilter_validation_when_testing (
bool
) – If true, during evaluating on the test dataset, validation triples are added to the set of known positive triples, which are filtered out when performing filtered evaluation following the approach described by [bordes2013]. Defaults to true.result_tracker (
Union
[str
,Type
[ResultTracker
],None
]) – The ResultsTracker class or nameresult_tracker_kwargs (
Optional
[Mapping
[str
,Any
]]) – The keyword arguments passed to the results tracker on instantiationmetric (
Optional
[str
]) – The metric to optimize over. Defaults to mean reciprocal rank.n_jobs (
Optional
[int
]) – The number of parallel jobs. If this argument is set to-1
, the number is set to CPU counts. If none, defaults to 1.save_model_directory (
Optional
[str
]) – If given, the final model of each trial is saved under this directory.storage (
Union
[str
,BaseStorage
,None
]) – the study’s storage, cf.optuna.study.create_study()
sampler (
Union
[str
,Type
[BaseSampler
],None
]) – the sampler, or a hint thereof, cf.optuna.study.create_study()
sampler_kwargs (
Optional
[Mapping
[str
,Any
]]) – additional keyword-based parameters for the samplerpruner (
Union
[str
,Type
[BasePruner
],None
]) – the pruner, or a hint thereof, cf.optuna.study.create_study()
pruner_kwargs (
Optional
[Mapping
[str
,Any
]]) – additional keyword-based parameters for the prunerstudy_name (
Optional
[str
]) – the study’s name, cf.optuna.study.create_study()
direction (
Optional
[str
]) – The direction of optimization. Because the default metric is mean reciprocal rank, the default direction ismaximize
. cf.optuna.study.create_study()
load_if_exists (
bool
) – whether to load the study if it already exists, cf.optuna.study.create_study()
n_trials (
Optional
[int
]) – the number of trials, cf.optuna.study.Study.optimize()
.timeout (
Optional
[int
]) – the timeout, cf.optuna.study.Study.optimize()
.gc_after_trial (
Optional
[bool
]) – the garbage collection after trial, cf.optuna.study.Study.optimize()
.n_jobs – the number of jobs, cf.
optuna.study.Study.optimize()
. Defaults to 1.
- Return type:
- Returns:
the optimization result
- Raises:
ValueError – if early stopping is enabled, but the number of epochs is to be optimized, too.