get_all_prediction_df

get_all_prediction_df(model, *, triples_factory, k=None, batch_size=1, return_tensors=False, add_novelties=True, remove_known=False, testing=None, mode=None)[source]

Compute scores for all triples, optionally returning only the k highest scoring.

Note

This operation is computationally very expensive for reasonably-sized knowledge graphs.

Warning

Setting k=None may lead to huge memory requirements.

Parameters
  • model (Model) – A PyKEEN model

  • triples_factory (CoreTriplesFactory) – Training triples factory

  • k (Optional[int]) – The number of triples to return. Set to None to keep all.

  • batch_size (int) – The batch size to use for calculating scores

  • return_tensors (bool) – If true, only return tensors. If false (default), return as a pandas DataFrame

  • add_novelties (bool) – Should the dataframe include a column denoting if the ranked relations correspond to novel triples?

  • remove_known (bool) – Should non-novel triples (those appearing in the training set) be shown with the results? On one hand, this allows you to better assess the goodness of the predictions - you want to see that the non-novel triples generally have higher scores. On the other hand, if you’re doing hypothesis generation, they may pose as a distraction. If this is set to True, then non-novel triples will be removed and the column denoting novelty will be excluded, since all remaining triples will be novel. Defaults to false.

  • testing (Optional[LongTensor]) – The mapped_triples from the testing triples factory (TriplesFactory.mapped_triples)

  • mode (Optional[Literal[‘training’, ‘validation’, ‘testing’]]) – The pass mode, which is None in the transductive setting and one of “training”, “validation”, or “testing” in the inductive setting.

Return type

Union[ScorePack, DataFrame]

Returns

shape: (k, 3) A dataframe with columns based on the settings or a tensor. Contains either the k highest scoring triples, or all possible triples if k is None.

Example usage:

from pykeen.pipeline import pipeline
from pykeen.models.predict import get_all_prediction_df

# Train a model (quickly)
result = pipeline(model='RotatE', dataset='Nations', epochs=5)
model = result.model

# Get scores for *all* triples
df = get_all_prediction_df(model, triples_factory=result.training)

# Get scores for top 15 triples
top_df = get_all_prediction_df(model, k=15, triples_factory=result.training)