Uncertainty

Analyze uncertainty.

Currently, all implemented approaches are based on Monte-Carlo dropout [gal2016]. Monte-Carlo dropout relies on the model having dropout layers. While dropout usually is turned off for inference / evaluation mode, MC dropout leaves dropout enabled. Thereby, if we run the same prediction method \(k\) times, we get \(k\) different predictions. The variance of these predictions can be used as an approximation of uncertainty, where larger variance indicates higher uncertainty in the predicted score.

The absolute variance is usually hard to interpret, but comparing the variances with each other can help to identify which scores are more uncertain than others.

The following code-block sketches an example use case, where we train a model with a classification loss, i.e., on the triple classification task.

from pykeen.pipeline import pipeline
from pykeen.models.uncertainty import predict_hrt_uncertain

# train model
# note: as this is an example, the model is only trained for a few epochs,
#       but not until convergence. In practice, you would usually first verify that
#       the model is sufficiently good in prediction, before looking at uncertainty scores
result = pipeline(dataset="nations", model="ERMLPE", loss="bcewithlogits")

# predict triple scores with uncertainty
prediction_with_uncertainty = predict_hrt_uncertain(
    model=result.model,
    hrt_batch=result.training.mapped_triples[0:8],
)

# use a larger number of samples, to increase quality of uncertainty estimate
prediction_with_uncertainty = predict_hrt_uncertain(
    model=result.model,
    hrt_batch=result.training.mapped_triples[0:8],
    num_samples=100,
)

# get most and least uncertain prediction on training set
prediction_with_uncertainty = predict_hrt_uncertain(
    model=result.model,
    hrt_batch=result.training.mapped_triples,
    num_samples=100,
)
df = result.training.tensor_to_df(
    result.training.mapped_triples,
    logits=prediction_with_uncertainty.score[:, 0],
    probability=prediction_with_uncertainty.score[:, 0].sigmoid(),
    uncertainty=prediction_with_uncertainty.uncertainty[:, 0],
)
print(df.nlargest(5, columns="uncertainty"))
print(df.nsmallest(5, columns="uncertainty"))

A collection of related work on uncertainty quantification can be found here: https://github.com/uncertainty-toolbox/uncertainty-toolbox/blob/master/docs/paper_list.md

Functions

`predict_hrt_uncertain`(model, hrt_batch[, ...])	Calculate the scores with uncertainty quantification via Monte-Carlo dropout.
`predict_h_uncertain`(model, rt_batch[, ...])	Forward pass using left side (head) prediction for obtaining scores of all possible heads.
`predict_t_uncertain`(model, hr_batch[, ...])	Forward pass using right side (tail) prediction for obtaining scores of all possible tails.
`predict_r_uncertain`(model, ht_batch[, ...])	Forward pass using middle (relation) prediction for obtaining scores of all possible relations.
`predict_uncertain_helper`(model, batch, ...)	Predict with uncertainty estimates via Monte-Carlo dropout.

Classes

`MissingDropoutError`	Raised during uncertainty analysis if no dropout modules are present.
`UncertainPrediction`(score, uncertainty)	A pair of predicted scores and corresponding uncertainty.