- get_all_prediction_df(model, *, triples_factory, k=None, batch_size=1, return_tensors=False, add_novelties=True, remove_known=False, testing=None)¶
Compute scores for all triples, optionally returning only the k highest scoring.
This operation is computationally very expensive for reasonably-sized knowledge graphs.
Setting k=None may lead to huge memory requirements.
Model) – A PyKEEN model
CoreTriplesFactory) – Training triples factory
int) – The batch size to use for calculating scores
bool) – If true, only return tensors. If false (default), return as a pandas DataFrame
bool) – Should the dataframe include a column denoting if the ranked relations correspond to novel triples?
bool) – Should non-novel triples (those appearing in the training set) be shown with the results? On one hand, this allows you to better assess the goodness of the predictions - you want to see that the non-novel triples generally have higher scores. On the other hand, if you’re doing hypothesis generation, they may pose as a distraction. If this is set to True, then non-novel triples will be removed and the column denoting novelty will be excluded, since all remaining triples will be novel. Defaults to false.
LongTensor]) – The mapped_triples from the testing triples factory (TriplesFactory.mapped_triples)
- Return type
shape: (k, 3) A dataframe with columns based on the settings or a tensor. Contains either the k highest scoring triples, or all possible triples if k is None.
from pykeen.pipeline import pipeline from pykeen.models.predict import get_all_prediction_df # Train a model (quickly) result = pipeline(model='RotatE', dataset='Nations', epochs=5) model = result.model # Get scores for *all* triples df = get_all_prediction_df(model, triples_factory=result.training) # Get scores for top 15 triples top_df = get_all_prediction_df(model, k=15, triples_factory=result.training)