pipeline
- pipeline(*, dataset=None, dataset_kwargs=None, training=None, testing=None, validation=None, evaluation_entity_whitelist=None, evaluation_relation_whitelist=None, model=None, model_kwargs=None, interaction=None, interaction_kwargs=None, dimensions=None, loss=None, loss_kwargs=None, regularizer=None, regularizer_kwargs=None, optimizer=None, optimizer_kwargs=None, clear_optimizer=True, lr_scheduler=None, lr_scheduler_kwargs=None, training_loop=None, training_loop_kwargs=None, negative_sampler=None, negative_sampler_kwargs=None, epochs=None, training_kwargs=None, stopper=None, stopper_kwargs=None, evaluator=None, evaluator_kwargs=None, evaluation_kwargs=None, result_tracker=None, result_tracker_kwargs=None, metadata=None, device=None, random_seed=None, use_testing_data=True, evaluation_fallback=False, filter_validation_when_testing=True, use_tqdm=None)[source]
Train and evaluate a model.
- Parameters:
dataset (
Union
[None
,str
,Dataset
,Type
[Dataset
]]) – The name of the dataset (a key for thepykeen.datasets.dataset_resolver
) or thepykeen.datasets.Dataset
instance. Alternatively, the training triples factory (training
), testing triples factory (testing
), and validation triples factory (validation
; optional) can be specified.dataset_kwargs (
Optional
[Mapping
[str
,Any
]]) – The keyword arguments passed to the dataset upon instantiationtraining (
Union
[str
,CoreTriplesFactory
,None
]) – A triples factory with training instances or path to the training file if a a dataset was not specifiedtesting (
Union
[str
,CoreTriplesFactory
,None
]) – A triples factory with training instances or path to the test file if a dataset was not specifiedvalidation (
Union
[str
,CoreTriplesFactory
,None
]) – A triples factory with validation instances or path to the validation file if a dataset was not specifiedevaluation_entity_whitelist (
Optional
[Collection
[str
]]) – Optional restriction of evaluation to triples containing only these entities. Useful if the downstream task is only interested in certain entities, but the relational patterns with other entities improve the entity embedding quality.evaluation_relation_whitelist (
Optional
[Collection
[str
]]) – Optional restriction of evaluation to triples containing only these relations. Useful if the downstream task is only interested in certain relation, but the relational patterns with other relations improve the entity embedding quality.model (
Union
[None
,str
,Model
,Type
[Model
]]) – The name of the model, subclass ofpykeen.models.Model
, or an instance ofpykeen.models.Model
. Can be given as None if theinteraction
keyword is used.model_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the model class on instantiationinteraction (
Union
[None
,str
,Interaction
,Type
[Interaction
]]) – The name of the interaction class, a subclass ofpykeen.nn.modules.Interaction
, or an instance ofpykeen.nn.modules.Interaction
. Can not be given when there is also a model.interaction_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass during instantiation of the interaction class. Only use withinteraction
.dimensions (
Union
[None
,int
,Mapping
[str
,int
]]) – Dimensions to assign to the embeddings of the interaction. Only use withinteraction
.loss (
Union
[str
,Type
[Loss
],None
]) – The name of the loss or the loss class.loss_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the loss on instantiationregularizer (
Union
[str
,Type
[Regularizer
],None
]) – The name of the regularizer or the regularizer class.regularizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the regularizer on instantiationoptimizer (
Union
[str
,Type
[Optimizer
],None
]) – The name of the optimizer or the optimizer class. Defaults totorch.optim.Adagrad
.optimizer_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the optimizer on instantiationclear_optimizer (
bool
) – Whether to delete the optimizer instance after training. As the optimizer might have additional memory consumption due to e.g. moments in Adam, this is the default option. If you want to continue training, you should set it to False, as the optimizer’s internal parameter will get lost otherwise.lr_scheduler (
Union
[str
,Type
[LRScheduler
],None
]) – The name of the lr_scheduler or the lr_scheduler class. Defaults totorch.optim.lr_scheduler.ExponentialLR
.lr_scheduler_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the lr_scheduler on instantiationtraining_loop (
Union
[str
,Type
[TrainingLoop
],None
]) – The name of the training loop’s training approach ('slcwa'
or'lcwa'
) or the training loop class. Defaults topykeen.training.SLCWATrainingLoop
.training_loop_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the training loop on instantiationnegative_sampler (
Union
[str
,Type
[NegativeSampler
],None
]) – The name of the negative sampler ('basic'
or'bernoulli'
) or the negative sampler class. Only allowed when training with sLCWA. Defaults topykeen.sampling.BasicNegativeSampler
.negative_sampler_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the negative sampler class on instantiationepochs (
Optional
[int
]) – A shortcut for setting thenum_epochs
key in thetraining_kwargs
dict.training_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the training loop’s train function on callstopper (
Union
[str
,Type
[Stopper
],None
]) – What kind of stopping to use. Default to no stopping, can be set to ‘early’.stopper_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the stopper upon instantiation.evaluator (
Union
[str
,Type
[Evaluator
],None
]) – The name of the evaluator or an evaluator class. Defaults topykeen.evaluation.RankBasedEvaluator
.evaluator_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the evaluator on instantiationevaluation_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keyword arguments to pass to the evaluator’s evaluate function on callresult_tracker (
Union
[str
,ResultTracker
,Type
[ResultTracker
],None
,Sequence
[Union
[str
,ResultTracker
,Type
[ResultTracker
],None
]]]) – Either none (will result in a Python result tracker), a single tracker (as either a class, instance, or string for class name), or a list of trackers (as either a class, instance, or string for class nameresult_tracker_kwargs (
Union
[Mapping
[str
,Any
],None
,Sequence
[Optional
[Mapping
[str
,Any
]]]]) – Either none (will use all defaults), a single dictionary (will be used for all trackers), or a list of dictionaries with the same length as the result trackersmetadata (
Optional
[Dict
[str
,Any
]]) – A JSON dictionary to store with the experimentuse_testing_data (
bool
) – If true, use the testing triples. Otherwise, use the validation triples. Defaults to true - use testing triples.device (
Union
[str
,device
,None
]) – The device or device name to run on. If none is given, the device will be looked up withpykeen.utils.resolve_device()
.random_seed (
Optional
[int
]) – The random seed to use. If none is specified, one will be assigned before any code is run for reproducibility purposes. In the returnedPipelineResult
instance, it can be accessed throughPipelineResult.random_seed
.evaluation_fallback (
bool
) – If true, in cases where the evaluation failed using the GPU it will fall back to using a smaller batch size or in the last instance evaluate on the CPU, if even the smallest possible batch size is too big for the GPU.filter_validation_when_testing (
bool
) – If true, during the evaluating of the test dataset, validation triples are added to the set of known positive triples, which are filtered out when performing filtered evaluation following the approach described by [bordes2013]. This should be explicitly set to false only in the scenario that you are training a single model using the pipeline and evaluating with the testing set, but never using the validation set for optimization at all. This is a very atypical scenario, so it is left as true by default to promote comparability to previous publications.use_tqdm (
Optional
[bool
]) – Globally set the usage of tqdm progress bars. Typically more useful to set to false, since the training loop and evaluation have it turned on by default.
- Return type:
- Returns:
A pipeline result package.