CleanupSplitter

class CleanupSplitter(cleaner: str | Cleaner | type[Cleaner] | None = None)[source]

Bases: Splitter

The cleanup splitter first randomly splits the triples and then cleans up.

In the cleanup process, triples are moved into the train part until all entities occur at least once in train.

The splitter supports two variants of cleanup, cf. cleaner_resolver.

Initialize the splitter.

Parameters:

cleaner (str | Cleaner | type[Cleaner] | None) – the cleanup method to use. Defaults to the fast deterministic cleaner, which may lead to larger deviances between desired and actual triple count.

Methods Summary

split_absolute_size(mapped_triples, sizes, ...)

Split triples into clean groups.

Methods Documentation

split_absolute_size(mapped_triples: Tensor, sizes: Sequence[int], generator: Generator) Sequence[Tensor][source]

Split triples into clean groups.

This method partitions the triples, i.e., each triple is in exactly one group. Moreover, it ensures that the first group contains all entities at least once.

Parameters:
  • mapped_triples (Tensor) – shape: (n, 3) the ID-based triples

  • sizes (Sequence[int]) – the absolute number of triples for each split part.

  • generator (Generator) – the random state used for splitting

Returns:

a sequence of ID-based triples for each split part. the absolute may be different to ensure the constraint.

Return type:

Sequence[Tensor]