get_all_prediction_df
- get_all_prediction_df(model, *, triples_factory, k=None, batch_size=1, return_tensors=False, add_novelties=True, remove_known=False, testing=None, mode=None)[source]
Compute scores for all triples, optionally returning only the k highest scoring.
Note
This operation is computationally very expensive for reasonably-sized knowledge graphs.
Warning
Setting k=None may lead to huge memory requirements.
- Parameters
model (
Model
) – A PyKEEN modeltriples_factory (
CoreTriplesFactory
) – Training triples factoryk (
Optional
[int
]) – The number of triples to return. Set toNone
to keep all.batch_size (
int
) – The batch size to use for calculating scoresreturn_tensors (
bool
) – If true, only return tensors. If false (default), return as a pandas DataFrameadd_novelties (
bool
) – Should the dataframe include a column denoting if the ranked relations correspond to novel triples?remove_known (
bool
) – Should non-novel triples (those appearing in the training set) be shown with the results? On one hand, this allows you to better assess the goodness of the predictions - you want to see that the non-novel triples generally have higher scores. On the other hand, if you’re doing hypothesis generation, they may pose as a distraction. If this is set to True, then non-novel triples will be removed and the column denoting novelty will be excluded, since all remaining triples will be novel. Defaults to false.testing (
Optional
[LongTensor
]) – The mapped_triples from the testing triples factory (TriplesFactory.mapped_triples)mode (
Optional
[Literal
[‘training’, ‘validation’, ‘testing’]]) – The pass mode, which is None in the transductive setting and one of “training”, “validation”, or “testing” in the inductive setting.
- Return type
- Returns
shape: (k, 3) A dataframe with columns based on the settings or a tensor. Contains either the k highest scoring triples, or all possible triples if k is None.
Example usage:
from pykeen.pipeline import pipeline from pykeen.models.predict import get_all_prediction_df # Train a model (quickly) result = pipeline(model='RotatE', dataset='Nations', epochs=5) model = result.model # Get scores for *all* triples df = get_all_prediction_df(model, triples_factory=result.training) # Get scores for top 15 triples top_df = get_all_prediction_df(model, k=15, triples_factory=result.training)