pipeline

pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model=None, model_kwargs=None, interaction=None, interaction_kwargs=None, dimensions=None, loss=None, loss_kwargs=None, regularizer=None, regularizer_kwargs=None, optimizer=None, optimizer_kwargs=None, clear_optimizer=True, training_loop=None, negative_sampler=None, negative_sampler_kwargs=None, training_kwargs=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, result_tracker=None, result_tracker_kwargs=None, automatic_memory_optimization=True, metadata=None, device=None, random_seed=None, use_testing_data=True)[source]

Train and evaluate a model.

Parameters
  • dataset (Union[None, str, Dataset, Type[Dataset]]) – The name of the dataset (a key from pykeen.datasets.datasets) or the pykeen.datasets.Dataset instance. Alternatively, the training triples factory (training), testing triples factory (testing), and validation triples factory (validation; optional) can be specified.

  • dataset_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the dataset upon instantiation

  • training (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the training file if a a dataset was not specified

  • testing (Union[None, str, TriplesFactory]) – A triples factory with training instances or path to the test file if a dataset was not specified

  • validation (Union[None, str, TriplesFactory]) – A triples factory with validation instances or path to the validation file if a dataset was not specified

  • evaluation_entity_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality.

  • evaluation_relation_whitelist (Optional[Collection[str]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality.

  • model (Union[None, str, Model, Type[Model]]) – The name of the model, subclass of pykeen.models.Model, or an instance of pykeen.models.Model. Can be given as None if the interaction keyword is used.

  • model_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the model class on instantiation

  • interaction (Union[None, str, Interaction, Type[Interaction]]) – The name of the interaction class, a subclass of pykeen.nn.modules.Interaction, or an instance of pykeen.nn.modules.Interaction. Can not be given when there is also a model.

  • interaction_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass during instantiation of the interaction class. Only use with interaction.

  • dimensions (Union[None, int, Mapping[str, int]]) – Dimensions to assign to the embeddings of the interaction. Only use with interaction.

  • loss (Union[None, str, Type[Loss]]) – The name of the loss or the loss class.

  • loss_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the loss on instantiation

  • regularizer (Union[None, str, Type[Regularizer]]) – The name of the regularizer or the regularizer class.

  • regularizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the regularizer on instantiation

  • optimizer (Union[None, str, Type[Optimizer]]) – The name of the optimizer or the optimizer class. Defaults to torch.optim.Adagrad.

  • optimizer_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the optimizer on instantiation

  • clear_optimizer (bool) – Whether to delete the optimizer instance after training. As the optimizer might have additional memory consumption due to e.g. moments in Adam, this is the default option. If you want to continue training, you should set it to False, as the optimizer’s internal parameter will get lost otherwise.

  • training_loop (Union[None, str, Type[TrainingLoop]]) – The name of the training loop’s training approach ('slcwa' or 'lcwa') or the training loop class. Defaults to pykeen.training.SLCWATrainingLoop.

  • negative_sampler (Union[None, str, Type[NegativeSampler]]) – The name of the negative sampler ('basic' or 'bernoulli') or the negative sampler class. Only allowed when training with sLCWA. Defaults to pykeen.sampling.BasicNegativeSampler.

  • negative_sampler_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the negative sampler class on instantiation

  • training_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the training loop’s train function on call

  • stopper (Union[None, str, Type[Stopper]]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.

  • stopper_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the stopper upon instantiation.

  • evaluator (Union[None, str, Type[Evaluator]]) – The name of the evaluator or an evaluator class. Defaults to pykeen.evaluation.RankBasedEvaluator.

  • evaluator_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator on instantiation

  • evaluation_kwargs (Optional[Mapping[str, Any]]) – Keyword arguments to pass to the evaluator’s evaluate function on call

  • result_tracker (Union[None, str, Type[ResultTracker]]) – The ResultsTracker class or name

  • result_tracker_kwargs (Optional[Mapping[str, Any]]) – The keyword arguments passed to the results tracker on instantiation

  • metadata (Optional[Dict[str, Any]]) – A JSON dictionary to store with the experiment

  • use_testing_data (bool) – If true, use the testing triples. Otherwise, use the validation triples. Defaults to true - use testing triples.

  • automatic_memory_optimization (bool) – Should automatic memory optimization be performed during training and evaluation? See arguments to pykeen.training_loop.TrainingLoop and pykeen.evaluation.Evaluator.

  • device (Union[None, str, device]) – The device or device name to run on. If none is given, the device will be looked up with pykeen.utils.resolve_device().

  • random_seed (Optional[int]) – The random seed to use. If none is specified, one will be assigned before any code is run for reproducibility purposes. In the returned PipelineResult instance, it can be accessed through PipelineResult.random_seed.

Return type

PipelineResult

Returns

A pipeline result package.

Raises

ValueError – if a negative sampler is specified with LCWA