get_relation_pattern_types_df

get_relation_pattern_types_df(dataset: Dataset, *, min_support: int = 0, min_confidence: float = 0.95, drop_confidence: bool = False, parts: Collection[str] | None = None, force: bool = False, add_labels: bool = True) → DataFrame[source]

Categorize relations based on patterns from RotatE [sun2019].

The relation classifications are based upon checking whether the corresponding rules hold with sufficient support and confidence. By default, we do not require a minimum support, however, a relatively high confidence.

The following four non-exclusive classes for relations are considered:

symmetry
anti-symmetry
inversion
composition

This method generally follows the terminology of association rule mining. The patterns are expressed as

\[X_1 \land \cdot \land X_k \implies Y\]

where \(X_i\) is of the form \(r_i(h_i, t_i)\), and some of the \(h_i / t_i\) might re-occur in other atoms. The support of a pattern is the number of distinct instantiations of all variables for the left hand side. The confidence is the proportion of these instantiations where the right-hand side is also true.

Parameters:

dataset (Dataset) – The dataset to investigate.
min_support (int) – A minimum support for patterns.
min_confidence (float) – A minimum confidence for the tested patterns.
drop_confidence (bool) – Whether to drop the support/confidence information from the result frame, and also drop duplicates.
parts (Collection[str] | None) – Only use certain parts of the dataset, e.g., train triples. Defaults to using all triples, i.e. {“training”, “validation”, “testing}.
force (bool) – Whether to enforce re-calculation even if a cached version is available.
add_labels (bool) – Whether to add relation labels (if available).

Return type:

DataFrame

Warning

If you intend to use the relation categorization as input to your model, or hyper-parameter selection, do not include testing triples to avoid leakage!

Returns:

A dataframe with columns {“relation_id”, “pattern”, “support”?, “confidence”?}.

Parameters:

dataset (Dataset)
min_support (int)
min_confidence (float)
drop_confidence (bool)
parts (Collection[str] | None)
force (bool)
add_labels (bool)

Return type:

DataFrame