Model¶
- class Model(triples_factory, loss=None, predict_with_sigmoid=False, automatic_memory_optimization=None, preferred_device=None, random_seed=None, regularizer=None)[source]¶
Bases:
torch.nn.modules.module.Module
A base module for all of the KGE models.
Initialize the module.
- Parameters
triples_factory (
TriplesFactory
) – The triples factory facilitates access to the dataset.loss (
Optional
[Loss
]) – The loss to use. If None is given, use the loss default specific to the model subclass.predict_with_sigmoid (
bool
) – Whether to apply sigmoid onto the scores when predicting scores. Applying sigmoid at prediction time may lead to exactly equal scores for certain triples with very high, or very low score. When not trained with applying sigmoid (or using BCEWithLogitsLoss), the scores are not calibrated to perform well with sigmoid.automatic_memory_optimization (
Optional
[bool
]) – If set to True, the model derives the maximum possible batch sizes for the scoring of triples during evaluation and also training (if no batch size was given). This allows to fully utilize the hardware at hand and achieves the fastest calculations possible.preferred_device (
Optional
[str
]) – The preferred device for model training and inference.random_seed (
Optional
[int
]) – A random seed to use for initialising the model’s weights. Should be set when aiming at reproducibility.regularizer (
Optional
[Regularizer
]) – A regularizer to use for training.
Attributes Summary
Whether score_h supports slicing.
Whether score_r supports slicing.
Whether score_t supports slicing.
The default parameters for the default loss function class
Return all modules not supporting sub-batching.
The number of entities in the knowledge graph.
Calculate the number of bytes used for all parameters of the model.
The number of unique relation types in the knowledge graph.
The default parameters for the default regularizer class
Does this model support sub-batching?
Methods Summary
compute_label_loss
(predictions, labels)Compute the classification loss.
compute_mr_loss
(positive_scores, negative_scores)Compute the mean ranking loss for the positive and negative scores.
Compute self adversarial negative sampling loss.
Get the parameters that require gradients.
load_state
(path)Load the state of the model.
make_labeled_df
(tensor, **kwargs)Take a tensor of triples and make a pandas dataframe with labels.
Has to be called after each parameter update.
predict_heads
(relation_label, tail_label[, …])Predict tails for the given head and relation (given by label).
predict_scores
(triples)Calculate the scores for triples.
predict_scores_all_heads
(rt_batch[, slice_size])Forward pass using left side (head) prediction for obtaining scores of all possible heads.
predict_scores_all_relations
(ht_batch[, …])Forward pass using middle (relation) prediction for obtaining scores of all possible relations.
predict_scores_all_tails
(hr_batch[, slice_size])Forward pass using right side (tail) prediction for obtaining scores of all possible tails.
predict_tails
(head_label, relation_label[, …])Predict tails for the given head and relation (given by label).
regularize_if_necessary
(*tensors)Update the regularizer’s term given some tensors, if regularization is requested.
Reset all parameters of the model and enforce model constraints.
save_state
(path)Save the state of the model.
score_all_triples
([k, batch_size, …])Compute scores for all triples, optionally returning only the k highest scoring.
score_h
(rt_batch)Forward pass using left side (head) prediction.
score_hrt
(hrt_batch)Forward pass.
score_r
(ht_batch)Forward pass using middle (relation) prediction.
score_t
(hr_batch)Forward pass using right side (tail) prediction.
to_cpu_
()Transfer the entire model to CPU.
Transfer model to device.
to_embeddingdb
([session, use_tqdm])Upload to the embedding database.
to_gpu_
()Transfer the entire model to GPU.
Attributes Documentation
- loss_default_kwargs: ClassVar[Optional[Mapping[str, Any]]] = {'margin': 1.0, 'reduction': 'mean'}¶
The default parameters for the default loss function class
- modules_not_supporting_sub_batching¶
Return all modules not supporting sub-batching.
- Return type
Collection
[Module
]
- num_parameter_bytes¶
Calculate the number of bytes used for all parameters of the model.
- Return type
- regularizer_default_kwargs: ClassVar[Optional[Mapping[str, Any]]] = None¶
The default parameters for the default regularizer class
Methods Documentation
- compute_label_loss(predictions, labels)[source]¶
Compute the classification loss.
- Parameters
predictions (
FloatTensor
) – shape: s The tensor containing predictions.labels (
FloatTensor
) – shape: s The tensor containing labels.
- Return type
FloatTensor
- Returns
dtype: float, scalar The label loss value.
- compute_mr_loss(positive_scores, negative_scores)[source]¶
Compute the mean ranking loss for the positive and negative scores.
- Parameters
positive_scores (
FloatTensor
) – shape: s, dtype: float The scores for positive triples.negative_scores (
FloatTensor
) – shape: s, dtype: float The scores for negative triples.
- Raises
RuntimeError – If the chosen loss function does not allow the calculation of margin ranking
- Return type
FloatTensor
- Returns
dtype: float, scalar The margin ranking loss value.
- compute_self_adversarial_negative_sampling_loss(positive_scores, negative_scores)[source]¶
Compute self adversarial negative sampling loss.
- Parameters
positive_scores (
FloatTensor
) – shape: s The tensor containing the positive scores.negative_scores (
FloatTensor
) – shape: s Tensor containing the negative scores.
- Raises
RuntimeError – If the chosen loss does not allow the calculation of self adversarial negative sampling losses.
- Return type
FloatTensor
- Returns
dtype: float, scalar The loss value.
- make_labeled_df(tensor, **kwargs)[source]¶
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters
tensor (
LongTensor
) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).kwargs (
Union
[Tensor
,ndarray
,Sequence
]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Return type
DataFrame
- Returns
A dataframe with n rows, and 6 + len(kwargs) columns.
- predict_heads(relation_label, tail_label, add_novelties=True, remove_known=False, testing=None)[source]¶
Predict tails for the given head and relation (given by label).
- Parameters
relation_label (
str
) – The string label for the relationtail_label (
str
) – The string label for the tail entityadd_novelties (
bool
) – Should the dataframe include a column denoting if the ranked head entities correspond to novel triples?remove_known (
bool
) – Should non-novel triples (those appearing in the training set) be shown with the results? On one hand, this allows you to better assess the goodness of the predictions - you want to see that the non-novel triples generally have higher scores. On the other hand, if you’re doing hypothesis generation, they may pose as a distraction. If this is set to True, then non-novel triples will be removed and the column denoting novelty will be excluded, since all remaining triples will be novel. Defaults to false.testing (
Optional
[LongTensor
]) – The mapped_triples from the testing triples factory (TriplesFactory.mapped_triples)
The following example shows that after you train a model on the Nations dataset, you can score all entities w.r.t a given relation and tail entity.
>>> from pykeen.pipeline import pipeline >>> result = pipeline( ... dataset='Nations', ... model='RotatE', ... ) >>> df = result.model.predict_heads('accusation', 'brazil')
- Return type
DataFrame
- predict_scores(triples)[source]¶
Calculate the scores for triples.
This method takes head, relation and tail of each triple and calculates the corresponding score.
Additionally, the model is set to evaluation mode.
- Parameters
triples (
LongTensor
) – shape: (number of triples, 3), dtype: long The indices of (head, relation, tail) triples.- Return type
FloatTensor
- Returns
shape: (number of triples, 1), dtype: float The score for each triple.
- predict_scores_all_heads(rt_batch, slice_size=None)[source]¶
Forward pass using left side (head) prediction for obtaining scores of all possible heads.
This method calculates the score for all possible heads for each (relation, tail) pair.
Additionally, the model is set to evaluation mode.
- Parameters
- Return type
FloatTensor
- Returns
shape: (batch_size, num_entities), dtype: float For each r-t pair, the scores for all possible heads.
- predict_scores_all_relations(ht_batch, slice_size=None)[source]¶
Forward pass using middle (relation) prediction for obtaining scores of all possible relations.
This method calculates the score for all possible relations for each (head, tail) pair.
Additionally, the model is set to evaluation mode.
- Parameters
- Return type
FloatTensor
- Returns
shape: (batch_size, num_relations), dtype: float For each h-t pair, the scores for all possible relations.
- predict_scores_all_tails(hr_batch, slice_size=None)[source]¶
Forward pass using right side (tail) prediction for obtaining scores of all possible tails.
This method calculates the score for all possible tails for each (head, relation) pair.
Additionally, the model is set to evaluation mode.
- Parameters
- Return type
FloatTensor
- Returns
shape: (batch_size, num_entities), dtype: float For each h-r pair, the scores for all possible tails.
- predict_tails(head_label, relation_label, add_novelties=True, remove_known=False, testing=None)[source]¶
Predict tails for the given head and relation (given by label).
- Parameters
head_label (
str
) – The string label for the head entityrelation_label (
str
) – The string label for the relationadd_novelties (
bool
) – Should the dataframe include a column denoting if the ranked tail entities correspond to novel triples?remove_known (
bool
) – Should non-novel triples (those appearing in the training set) be shown with the results? On one hand, this allows you to better assess the goodness of the predictions - you want to see that the non-novel triples generally have higher scores. On the other hand, if you’re doing hypothesis generation, they may pose as a distraction. If this is set to True, then non-novel triples will be removed and the column denoting novelty will be excluded, since all remaining triples will be novel. Defaults to false.testing (
Optional
[LongTensor
]) – The mapped_triples from the testing triples factory (TriplesFactory.mapped_triples)
The following example shows that after you train a model on the Nations dataset, you can score all entities w.r.t a given head entity and relation.
>>> from pykeen.pipeline import pipeline >>> result = pipeline( ... dataset='Nations', ... model='RotatE', ... ) >>> df = result.model.predict_tails('brazil', 'accusation')
- Return type
DataFrame
- regularize_if_necessary(*tensors)[source]¶
Update the regularizer’s term given some tensors, if regularization is requested.
- Parameters
tensors (
FloatTensor
) – The tensors that should be passed to the regularizer to update its term.- Return type
- reset_parameters_()[source]¶
Reset all parameters of the model and enforce model constraints.
- Return type
- score_all_triples(k=None, batch_size=1, return_tensors=False, add_novelties=True, remove_known=False, testing=None)[source]¶
Compute scores for all triples, optionally returning only the k highest scoring.
Note
This operation is computationally very expensive for reasonably-sized knowledge graphs.
Warning
Setting k=None may lead to huge memory requirements.
- Parameters
- Return type
- Returns
shape: (k, 3) A tensor containing the k highest scoring triples, or all possible triples if k=None.
Example usage:
from pykeen.pipeline import pipeline # Train a model (quickly) result = pipeline(model='RotatE', dataset='Nations', training_kwargs=dict(num_epochs=5)) model = result.model # Get scores for *all* triples tensor = model.score_all_triples() df = model.make_labeled_df(tensor) # Get scores for top 15 triples top_df = model.score_all_triples(k=15)
- score_h(rt_batch)[source]¶
Forward pass using left side (head) prediction.
This method calculates the score for all possible heads for each (relation, tail) pair.
- Parameters
rt_batch (
LongTensor
) – shape: (batch_size, 2), dtype: long The indices of (relation, tail) pairs.- Return type
FloatTensor
- Returns
shape: (batch_size, num_entities), dtype: float For each r-t pair, the scores for all possible heads.
- abstract score_hrt(hrt_batch)[source]¶
Forward pass.
This method takes head, relation and tail of each triple and calculates the corresponding score.
- Parameters
hrt_batch (
LongTensor
) – shape: (batch_size, 3), dtype: long The indices of (head, relation, tail) triples.- Raises
NotImplementedError – If the method was not implemented for this class.
- Return type
FloatTensor
- Returns
shape: (batch_size, 1), dtype: float The score for each triple.
- score_r(ht_batch)[source]¶
Forward pass using middle (relation) prediction.
This method calculates the score for all possible relations for each (head, tail) pair.
- Parameters
ht_batch (
LongTensor
) – shape: (batch_size, 2), dtype: long The indices of (head, tail) pairs.- Return type
FloatTensor
- Returns
shape: (batch_size, num_relations), dtype: float For each h-t pair, the scores for all possible relations.
- score_t(hr_batch)[source]¶
Forward pass using right side (tail) prediction.
This method calculates the score for all possible tails for each (head, relation) pair.
- Parameters
hr_batch (
LongTensor
) – shape: (batch_size, 2), dtype: long The indices of (head, relation) pairs.- Return type
FloatTensor
- Returns
shape: (batch_size, num_entities), dtype: float For each h-r pair, the scores for all possible tails.