get_all_prediction_df

get_all_prediction_df(model, *, k=None, batch_size=1, return_tensors=False, add_novelties=True, remove_known=False, testing=None)[source]

Compute scores for all triples, optionally returning only the k highest scoring.

Note

This operation is computationally very expensive for reasonably-sized knowledge graphs.

Warning

Setting k=None may lead to huge memory requirements.

Parameters
  • model (Model) – A PyKEEN model

  • k (Optional[int]) – The number of triples to return. Set to None to keep all.

  • batch_size (int) – The batch size to use for calculating scores

  • return_tensors (bool) – If true, only return tensors. If false (default), return as a pandas DataFrame

  • add_novelties (bool) – Should the dataframe include a column denoting if the ranked relations correspond to novel triples?

  • remove_known (bool) – Should non-novel triples (those appearing in the training set) be shown with the results? On one hand, this allows you to better assess the goodness of the predictions - you want to see that the non-novel triples generally have higher scores. On the other hand, if you’re doing hypothesis generation, they may pose as a distraction. If this is set to True, then non-novel triples will be removed and the column denoting novelty will be excluded, since all remaining triples will be novel. Defaults to false.

  • testing (Optional[LongTensor]) – The mapped_triples from the testing triples factory (TriplesFactory.mapped_triples)

Return type

Union[ScorePack, DataFrame]

Returns

shape: (k, 3) A dataframe with columns based on the settings or a tensor. Contains either the k highest scoring triples, or all possible triples if k is None.

Example usage:

from pykeen.pipeline import pipeline
from pykeen.models.predict import get_all_prediction_df

# Train a model (quickly)
result = pipeline(model='RotatE', dataset='Nations', training_kwargs=dict(num_epochs=5))
model = result.model

# Get scores for *all* triples
df = get_all_prediction_df(model)

# Get scores for top 15 triples
top_df = get_all_prediction_df(model, k=15)