Ablation

ablation_pipeline(datasets, directory, models, losses, optimizers, training_loops, *, epochs=None, create_inverse_triples=False, regularizers=None, negative_sampler=None, evaluator=None, stopper='NopStopper', model_to_model_kwargs=None, model_to_model_kwargs_ranges=None, model_to_loss_to_loss_kwargs=None, model_to_loss_to_loss_kwargs_ranges=None, model_to_optimizer_to_optimizer_kwargs=None, model_to_optimizer_to_optimizer_kwargs_ranges=None, model_to_negative_sampler_to_negative_sampler_kwargs=None, model_to_negative_sampler_to_negative_sampler_kwargs_ranges=None, model_to_training_loop_to_training_loop_kwargs=None, model_to_training_loop_to_training_kwargs=None, model_to_training_loop_to_training_kwargs_ranges=None, model_to_regularizer_to_regularizer_kwargs=None, model_to_regularizer_to_regularizer_kwargs_ranges=None, evaluator_kwargs=None, evaluation_kwargs=None, stopper_kwargs=None, n_trials=5, timeout=3600, metric='hits@10', direction='maximize', sampler='random', pruner='nop', metadata=None, save_artifacts=True, move_to_cpu=True, dry_run=False, best_replicates=None, discard_replicates=False, create_unique_subdir=False)[source]

Run ablation study.

Parameters:
  • datasets (Union[str, List[str]]) – A dataset name or list of dataset names.

  • directory (Union[str, Path]) – The directory in which the experimental artifacts will be saved.

  • models (Union[str, List[str]]) – A model name or list of model names.

  • losses (Union[str, List[str]]) – A loss function name or list of loss function names.

  • optimizers (Union[str, List[str]]) – An optimizer name or list of optimizer names.

  • training_loops (Union[str, List[str]]) – A training loop name or list of training loop names.

  • epochs (Optional[int]) – A quick way to set the num_epochs in the training kwargs.

  • create_inverse_triples (Union[bool, List[bool]]) – Either a boolean for a single entry or a list of booleans.

  • regularizers (Union[None, str, List[str]]) – A regularizer name, list of regularizer names, or None if no regularizer is desired.

  • negative_sampler (Optional[str]) – A negative sampler name, list of regularizer names, or None if no negative sampler is desired. Negative sampling is used only in combination with pykeen.training.SLCWATrainingLoop.

  • evaluator (Optional[str]) – The name of the evaluator to be used. Defaults to rank-based evaluator.

  • stopper (Optional[str]) – The name of the stopper to be used. Defaults to NopStopper which doesn’t define a stopping criterion.

  • model_to_model_kwargs (Optional[Mapping[str, Mapping[str, Any]]]) – A mapping from model name to dictionaries of default keyword arguments for the instantiation of that model.

  • model_to_model_kwargs_ranges (Optional[Mapping[str, Mapping[str, Any]]]) – A mapping from model name to dictionaries of keyword argument ranges for that model to be used in HPO.

  • model_to_loss_to_loss_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of loss name to a mapping of default keyword arguments for the instantiation of that loss function. This is useful because for some losses, have hyper-parameters such as pykeen.losses.MarginRankingLoss.

  • model_to_loss_to_loss_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of loss name to a mapping of keyword argument ranges for that loss to be used in HPO.

  • model_to_optimizer_to_optimizer_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of optimizer name to a mapping of default keyword arguments for the instantiation of that optimizer. This is useful because the optimizers, have hyper-parameters such as the learning rate.

  • model_to_optimizer_to_optimizer_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of optimizer name to a mapping of keyword argument ranges for that optimizer to be used in HPO.

  • model_to_regularizer_to_regularizer_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of regularizer name to a mapping of default keyword arguments for the instantiation of that regularizer. This is useful because the optimizers, have hyper-parameters such as the regularization weight.

  • model_to_regularizer_to_regularizer_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of regularizer name to a mapping of keyword argument ranges for that regularizer to be used in HPO.

  • model_to_negative_sampler_to_negative_sampler_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of negative sampler name to a mapping of default keyword arguments for the instantiation of that negative sampler. This is useful because the negative samplers, have hyper-parameters such as the number of negatives that should get generated for each positive training example.

  • model_to_negative_sampler_to_negative_sampler_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of negative sampler name to a mapping of keyword argument ranges for that negative sampler to be used in HPO.

  • model_to_training_loop_to_training_loop_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of training loop name to a mapping of default keyword arguments for the training loop.

  • model_to_training_loop_to_training_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of trainer name to a mapping of default keyword arguments for the training procedure. This is useful because you can set the hyper-parameters such as the number of training epochs and the batch size.

  • model_to_training_loop_to_training_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of trainer name to a mapping of keyword argument ranges for that trainer to be used in HPO.

  • evaluator_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the evaluator.

  • evaluation_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed during evaluation.

  • stopper_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the stopper.

  • n_trials (Optional[int]) – Number of HPO trials.

  • timeout (Optional[int]) – The time (seconds) after which the ablation study will be terminated.

  • metric (Optional[str]) – The metric to optimize during HPO.

  • direction (Optional[str]) – Defines, whether to ‘maximize’ or ‘minimize’ the metric during HPO.

  • sampler (Optional[str]) – The HPO sampler, it defaults to random search.

  • pruner (Optional[str]) – Defines approach for pruning trials. Per default no pruning is used, i.e., pruner is set to ‘Nopruner’.

  • metadata (Optional[Mapping]) – A mapping of meta data arguments such as name of the ablation study.

  • save_artifacts (bool) – Defines, whether each trained model sampled during HPO should be saved.

  • move_to_cpu (bool) – Defines, whether a replicate of the best model should be moved to CPU.

  • dry_run (bool) – Defines whether only the configurations for the single experiments should be created without running them.

  • best_replicates (Optional[int]) – Defines how often the final model should be re-trained and evaluated based on the best hyper-parameters enabling to measure the variance in performance.

  • discard_replicates (bool) – Defines, whether the best model should be discarded after training and evaluation.

  • create_unique_subdir (bool) – Defines, whether a unique sub-directory for the experimental artifacts should be created. The sub-directory name is defined by the current data + a unique id.

prepare_ablation_from_config(config, directory, save_artifacts)[source]

Prepare a set of ablation study directories.

Parameters:
  • config (Mapping[str, Any]) – Dictionary defining the ablation studies.

  • directory (Union[str, Path]) – The directory in which the experimental artifacts (including the ablation configurations) will be saved.

  • save_artifacts (bool) – Defines, whether the output directories for the trained models sampled during HPO should be created.

Return type:

List[Tuple[Path, Path]]

Returns:

pairs of output directories and HPO config paths inside those directories

prepare_ablation(datasets, models, losses, optimizers, training_loops, directory, *, epochs=None, create_inverse_triples=False, regularizers=None, negative_sampler=None, evaluator=None, model_to_model_kwargs=None, model_to_model_kwargs_ranges=None, model_to_loss_to_loss_kwargs=None, model_to_loss_to_loss_kwargs_ranges=None, model_to_optimizer_to_optimizer_kwargs=None, model_to_optimizer_to_optimizer_kwargs_ranges=None, model_to_training_loop_to_training_loop_kwargs=None, model_to_neg_sampler_to_neg_sampler_kwargs=None, model_to_neg_sampler_to_neg_sampler_kwargs_ranges=None, model_to_training_loop_to_training_kwargs=None, model_to_training_loop_to_training_kwargs_ranges=None, model_to_regularizer_to_regularizer_kwargs=None, model_to_regularizer_to_regularizer_kwargs_ranges=None, n_trials=5, timeout=3600, metric='hits@10', direction='maximize', sampler='random', pruner='nop', evaluator_kwargs=None, evaluation_kwargs=None, stopper='NopStopper', stopper_kwargs=None, metadata=None, save_artifacts=True)[source]

Prepare an ablation directory.

Parameters:
  • datasets (Union[str, List[str]]) – A dataset name or list of dataset names.

  • models (Union[str, List[str]]) – A model name or list of model names.

  • losses (Union[str, List[str]]) – A loss function name or list of loss function names.

  • optimizers (Union[str, List[str]]) – An optimizer name or list of optimizer names.

  • training_loops (Union[str, List[str]]) – A training loop name or list of training loop names.

  • epochs (Optional[int]) – A quick way to set the num_epochs in the training kwargs.

  • create_inverse_triples (Union[bool, List[bool]]) – Either a boolean for a single entry or a list of booleans.

  • regularizers (Union[None, str, List[str], List[None]]) – A regularizer name, list of regularizer names, or None if no regularizer is desired.

  • negative_sampler (Optional[str]) – A negative sampler name, list of regularizer names, or None if no negative sampler is desired. Negative sampling is used only in combination with the pykeen.training.sclwa training loop.

  • evaluator (Optional[str]) – The name of the evaluator to be used. Defaults to rank-based evaluator.

  • stopper (Optional[str]) – The name of the stopper to be used. Defaults to NopStopper which doesn’t define a stopping criterion.

  • model_to_model_kwargs (Optional[Mapping[str, Mapping[str, Any]]]) – A mapping from model name to dictionaries of default keyword arguments for the instantiation of that model.

  • model_to_model_kwargs_ranges (Optional[Mapping[str, Mapping[str, Any]]]) – A mapping from model name to dictionaries of keyword argument ranges for that model to be used in HPO.

  • model_to_loss_to_loss_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of loss name to a mapping of default keyword arguments for the instantiation of that loss function. This is useful because for some losses, have hyper-parameters such as pykeen.losses.MarginRankingLoss

  • model_to_loss_to_loss_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of loss name to a mapping of keyword argument ranges for that loss to be used in HPO.

  • model_to_optimizer_to_optimizer_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of optimizer name to a mapping of default keyword arguments for the instantiation of that optimizer. This is useful because the optimizers, have hyper-parameters such as the learning rate.

  • model_to_optimizer_to_optimizer_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of optimizer name to a mapping of keyword argument ranges for that optimizer to be used in HPO.

  • model_to_regularizer_to_regularizer_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of regularizer name to a mapping of default keyword arguments for the instantiation of that regularizer. This is useful because the optimizers, have hyper-parameters such as the regularization weight.

  • model_to_regularizer_to_regularizer_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of regularizer name to a mapping of keyword argument ranges for that regularizer to be used in HPO.

  • model_to_neg_sampler_to_neg_sampler_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of negative sampler name to a mapping of default keyword arguments for the instantiation of that negative sampler. This is useful because the negative samplers, have hyper-parameters such as the number of negatives that should get generated for each positive training example.

  • model_to_neg_sampler_to_neg_sampler_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of negative sampler name to a mapping of keyword argument ranges for that negative sampler to be used in HPO.

  • model_to_training_loop_to_training_loop_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of training loop name to a mapping of default keyword arguments for the training loop.

  • model_to_training_loop_to_training_kwargs (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of trainer name to a mapping of default keyword arguments for the training procedure. This is useful because you can set the hyper-parameters such as the number of training epochs and the batch size.

  • model_to_training_loop_to_training_kwargs_ranges (Optional[Mapping[str, Mapping[str, Mapping[str, Any]]]]) – A mapping from model name to a mapping of trainer name to a mapping of keyword argument ranges for that trainer to be used in HPO.

  • evaluator_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the evaluator.

  • evaluation_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed during evaluation.

  • stopper_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the stopper.

  • n_trials (Optional[int]) – Number of HPO trials.

  • timeout (Optional[int]) – The time (seconds) after which the ablation study will be terminated.

  • metric (Optional[str]) – The metric to optimize during HPO.

  • direction (Optional[str]) – Defines, whether to ‘maximize’ or ‘minimize’ the metric during HPO.

  • sampler (Optional[str]) – The HPO sampler, it defaults to random search.

  • pruner (Optional[str]) – Defines approach for pruning trials. Per default no pruning is used, i.e., pruner is set to ‘Nopruner’.

  • metadata (Optional[Mapping]) – A mapping of meta data arguments such as name of the ablation study.

  • directory (Union[str, Path]) – The directory in which the experimental artifacts will be saved.

  • save_artifacts (bool) – Defines, whether each trained model sampled during HPO should be saved.

Return type:

List[Tuple[Path, Path]]

Returns:

pairs of output directories and HPO config paths inside those directories.

Raises:

ValueError – If the dataset is not specified correctly, i.e., dataset is not of type str, or a dictionary containing the paths to the training, testing, and validation data.