Ablation

ablation_pipeline(datasets: str | SplitToPathDict | Sequence[str | SplitToPathDict], directory: str | Path, models: str | list[str], losses: str | list[str], optimizers: str | list[str], training_loops: str | list[str], *, epochs: int | None = None, create_inverse_triples: bool | list[bool] = False, regularizers: None | str | list[str] = None, negative_sampler: str | None = None, evaluator: str | None = None, stopper: str | None = 'NopStopper', model_to_model_kwargs: Mapping[str, Mapping[str, Any]] | None = None, model_to_model_kwargs_ranges: Mapping[str, Mapping[str, Any]] | None = None, model_to_loss_to_loss_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_loss_to_loss_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_optimizer_to_optimizer_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_optimizer_to_optimizer_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_negative_sampler_to_negative_sampler_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_negative_sampler_to_negative_sampler_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_training_loop_to_training_loop_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_training_loop_to_training_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_training_loop_to_training_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_regularizer_to_regularizer_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_regularizer_to_regularizer_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, evaluator_kwargs: Mapping[str, Any] | None = None, evaluation_kwargs: Mapping[str, Any] | None = None, stopper_kwargs: Mapping[str, Any] | None = None, n_trials: int | None = 5, timeout: int | None = 3600, metric: str | None = 'hits@10', direction: str | None = 'maximize', sampler: str | None = 'random', pruner: str | None = 'nop', metadata: Mapping | None = None, save_artifacts: bool = True, move_to_cpu: bool = True, dry_run: bool = False, best_replicates: int | None = None, discard_replicates: bool = False, create_unique_subdir: bool = False) → None[source]

Run ablation study.

Parameters:

datasets (str | SplitToPathDict | Sequence[str | SplitToPathDict]) – A single or a list of dataset specifications. Datasets can be specified either by name (referring to a single built-in dataset) or as a dictionary with paths for training, validation, and testing.
directory (str | Path) – The directory in which the experimental artifacts will be saved.
models (str | list[str]) – A model name or list of model names.
losses (str | list[str]) – A loss function name or list of loss function names.
optimizers (str | list[str]) – An optimizer name or list of optimizer names.
training_loops (str | list[str]) – A training loop name or list of training loop names.
epochs (int | None) – A quick way to set the num_epochs in the training kwargs.
create_inverse_triples (bool | list[bool]) – Either a boolean for a single entry or a list of booleans.
regularizers (None | str | list[str]) – A regularizer name, list of regularizer names, or None if no regularizer is desired.
negative_sampler (str | None) – A negative sampler name, list of regularizer names, or None if no negative sampler is desired. Negative sampling is used only in combination with pykeen.training.SLCWATrainingLoop.
evaluator (str | None) – The name of the evaluator to be used. Defaults to rank-based evaluator.
stopper (str | None) – The name of the stopper to be used. Defaults to NopStopper which doesn’t define a stopping criterion.
model_to_model_kwargs (Mapping[str, Mapping[str, Any]] | None) – A mapping from model name to dictionaries of default keyword arguments for the instantiation of that model.
model_to_model_kwargs_ranges (Mapping[str, Mapping[str, Any]] | None) – A mapping from model name to dictionaries of keyword argument ranges for that model to be used in HPO.
model_to_loss_to_loss_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of loss name to a mapping of default keyword arguments for the instantiation of that loss function. This is useful because for some losses, have hyper-parameters such as pykeen.losses.MarginRankingLoss.
model_to_loss_to_loss_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of loss name to a mapping of keyword argument ranges for that loss to be used in HPO.
model_to_optimizer_to_optimizer_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of optimizer name to a mapping of default keyword arguments for the instantiation of that optimizer. This is useful because the optimizers, have hyper-parameters such as the learning rate.
model_to_optimizer_to_optimizer_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of optimizer name to a mapping of keyword argument ranges for that optimizer to be used in HPO.
model_to_regularizer_to_regularizer_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of regularizer name to a mapping of default keyword arguments for the instantiation of that regularizer. This is useful because the optimizers, have hyper-parameters such as the regularization weight.
model_to_regularizer_to_regularizer_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of regularizer name to a mapping of keyword argument ranges for that regularizer to be used in HPO.
model_to_negative_sampler_to_negative_sampler_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of negative sampler name to a mapping of default keyword arguments for the instantiation of that negative sampler. This is useful because the negative samplers, have hyper-parameters such as the number of negatives that should get generated for each positive training example.
model_to_negative_sampler_to_negative_sampler_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of negative sampler name to a mapping of keyword argument ranges for that negative sampler to be used in HPO.
model_to_training_loop_to_training_loop_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of training loop name to a mapping of default keyword arguments for the training loop.
model_to_training_loop_to_training_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of trainer name to a mapping of default keyword arguments for the training procedure. This is useful because you can set the hyper-parameters such as the number of training epochs and the batch size.
model_to_training_loop_to_training_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of trainer name to a mapping of keyword argument ranges for that trainer to be used in HPO.
evaluator_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the evaluator.
evaluation_kwargs (Mapping[str, Any] | None) – The keyword arguments passed during evaluation.
stopper_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the stopper.
n_trials (int | None) – Number of HPO trials.
timeout (int | None) – The time (seconds) after which the ablation study will be terminated.
metric (str | None) – The metric to optimize during HPO.
direction (str | None) – Defines, whether to ‘maximize’ or ‘minimize’ the metric during HPO.
sampler (str | None) – The HPO sampler, it defaults to random search.
pruner (str | None) – Defines approach for pruning trials. Per default no pruning is used, i.e., pruner is set to ‘Nopruner’.
metadata (Mapping | None) – A mapping of meta data arguments such as name of the ablation study.
save_artifacts (bool) – Defines, whether each trained model sampled during HPO should be saved.
move_to_cpu (bool) – Defines, whether a replicate of the best model should be moved to CPU.
dry_run (bool) – Defines whether only the configurations for the single experiments should be created without running them.
best_replicates (int | None) – Defines how often the final model should be re-trained and evaluated based on the best hyper-parameters enabling to measure the variance in performance.
discard_replicates (bool) – Defines, whether the best model should be discarded after training and evaluation.
create_unique_subdir (bool) – Defines, whether a unique sub-directory for the experimental artifacts should be created. The sub-directory name is defined by the current data + a unique id.

Return type:

None

prepare_ablation_from_config(config: Mapping[str, Any], directory: str | Path, save_artifacts: bool) → list[tuple[Path, Path]][source]

Prepare a set of ablation study directories.

Parameters:

config (Mapping[str, Any]) – Dictionary defining the ablation studies.
directory (str | Path) – The directory in which the experimental artifacts (including the ablation configurations) will be saved.
save_artifacts (bool) – Defines, whether the output directories for the trained models sampled during HPO should be created.

Returns:

pairs of output directories and HPO config paths inside those directories

Return type:

list[tuple[Path, Path]]

prepare_ablation(datasets: str | SplitToPathDict | Sequence[str | SplitToPathDict], models: str | Sequence[str], losses: str | Sequence[str], optimizers: str | Sequence[str], training_loops: str | Sequence[str], directory: str | Path, *, create_inverse_triples: bool | Sequence[bool] = False, regularizers: None | str | Sequence[None | str] = None, epochs: int | None = None, negative_sampler: str | None = None, evaluator: str | None = None, model_to_model_kwargs: Mapping[str, Mapping[str, Any]] | None = None, model_to_model_kwargs_ranges: Mapping[str, Mapping[str, Any]] | None = None, model_to_loss_to_loss_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_loss_to_loss_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_optimizer_to_optimizer_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_optimizer_to_optimizer_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_training_loop_to_training_loop_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_neg_sampler_to_neg_sampler_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_neg_sampler_to_neg_sampler_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_training_loop_to_training_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_training_loop_to_training_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_regularizer_to_regularizer_kwargs: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, model_to_regularizer_to_regularizer_kwargs_ranges: Mapping[str, Mapping[str, Mapping[str, Any]]] | None = None, n_trials: int | None = 5, timeout: int | None = 3600, metric: str | None = 'hits@10', direction: str | None = 'maximize', sampler: str | None = 'random', pruner: str | None = 'nop', evaluator_kwargs: Mapping[str, Any] | None = None, evaluation_kwargs: Mapping[str, Any] | None = None, stopper: str | None = 'NopStopper', stopper_kwargs: Mapping[str, Any] | None = None, metadata: Mapping | None = None, save_artifacts: bool = True) → list[tuple[Path, Path]][source]

Prepare an ablation directory.

Parameters:

datasets (str | SplitToPathDict | Sequence[str | SplitToPathDict]) – A single or a list of dataset specifications. Datasets can be specified either by name (referring to a single built-in dataset) or as a dictionary with paths for training, validation, and testing.
models (str | Sequence[str]) – A model name or list of model names.
losses (str | Sequence[str]) – A loss function name or list of loss function names.
optimizers (str | Sequence[str]) – An optimizer name or list of optimizer names.
training_loops (str | Sequence[str]) – A training loop name or list of training loop names.
epochs (int | None) – A quick way to set the num_epochs in the training kwargs.
create_inverse_triples (bool | Sequence[bool]) – Either a boolean for a single entry or a list of booleans.
regularizers (None | str | Sequence[None | str]) – A regularizer name, list of regularizer names, or None if no regularizer is desired.
negative_sampler (str | None) – A negative sampler name, list of regularizer names, or None if no negative sampler is desired. Negative sampling is used only in combination with the pykeen.training.sclwa training loop.
evaluator (str | None) – The name of the evaluator to be used. Defaults to rank-based evaluator.
stopper (str | None) – The name of the stopper to be used. Defaults to NopStopper which doesn’t define a stopping criterion.
model_to_model_kwargs (Mapping[str, Mapping[str, Any]] | None) – A mapping from model name to dictionaries of default keyword arguments for the instantiation of that model.
model_to_model_kwargs_ranges (Mapping[str, Mapping[str, Any]] | None) – A mapping from model name to dictionaries of keyword argument ranges for that model to be used in HPO.
model_to_loss_to_loss_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of loss name to a mapping of default keyword arguments for the instantiation of that loss function. This is useful because for some losses, have hyper-parameters such as pykeen.losses.MarginRankingLoss
model_to_loss_to_loss_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of loss name to a mapping of keyword argument ranges for that loss to be used in HPO.
model_to_optimizer_to_optimizer_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of optimizer name to a mapping of default keyword arguments for the instantiation of that optimizer. This is useful because the optimizers, have hyper-parameters such as the learning rate.
model_to_optimizer_to_optimizer_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of optimizer name to a mapping of keyword argument ranges for that optimizer to be used in HPO.
model_to_regularizer_to_regularizer_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of regularizer name to a mapping of default keyword arguments for the instantiation of that regularizer. This is useful because the optimizers, have hyper-parameters such as the regularization weight.
model_to_regularizer_to_regularizer_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of regularizer name to a mapping of keyword argument ranges for that regularizer to be used in HPO.
model_to_neg_sampler_to_neg_sampler_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of negative sampler name to a mapping of default keyword arguments for the instantiation of that negative sampler. This is useful because the negative samplers, have hyper-parameters such as the number of negatives that should get generated for each positive training example.
model_to_neg_sampler_to_neg_sampler_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of negative sampler name to a mapping of keyword argument ranges for that negative sampler to be used in HPO.
model_to_training_loop_to_training_loop_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of training loop name to a mapping of default keyword arguments for the training loop.
model_to_training_loop_to_training_kwargs (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of trainer name to a mapping of default keyword arguments for the training procedure. This is useful because you can set the hyper-parameters such as the number of training epochs and the batch size.
model_to_training_loop_to_training_kwargs_ranges (Mapping[str, Mapping[str, Mapping[str, Any]]] | None) – A mapping from model name to a mapping of trainer name to a mapping of keyword argument ranges for that trainer to be used in HPO.
evaluator_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the evaluator.
evaluation_kwargs (Mapping[str, Any] | None) – The keyword arguments passed during evaluation.
stopper_kwargs (Mapping[str, Any] | None) – The keyword arguments passed to the stopper.
n_trials (int | None) – Number of HPO trials.
timeout (int | None) – The time (seconds) after which the ablation study will be terminated.
metric (str | None) – The metric to optimize during HPO.
direction (str | None) – Defines, whether to ‘maximize’ or ‘minimize’ the metric during HPO.
sampler (str | None) – The HPO sampler, it defaults to random search.
pruner (str | None) – Defines approach for pruning trials. Per default no pruning is used, i.e., pruner is set to ‘Nopruner’.
metadata (Mapping | None) – A mapping of meta data arguments such as name of the ablation study.
directory (str | Path) – The directory in which the experimental artifacts will be saved.
save_artifacts (bool) – Defines, whether each trained model sampled during HPO should be saved.

Returns:

pairs of output directories and HPO config paths inside those directories.

Raises:

ValueError – If the dataset is not specified correctly, i.e., dataset is not of type str, or a dictionary containing the paths to the training, testing, and validation data.

Return type:

list[tuple[Path, Path]]