CoreTriplesFactory

class CoreTriplesFactory(mapped_triples: Tensor | ndarray, num_entities: int, num_relations: int, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None)[source]

Bases: KGInfo

Create instances from ID-based triples.

Create the triples factory.

Parameters:
  • mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.

  • num_entities (int) – The number of entities.

  • num_relations (int) – The number of relations.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph

Raises:
  • TypeError – if the mapped_triples are of non-integer dtype

  • ValueError – if the mapped_triples are of invalid shape

Attributes Summary

base_file_name

num_triples

The number of triples.

triples_file_name

Methods Summary

apply_condenser(condenser)

Apply the triple condenser.

clone_and_exchange_triples(mapped_triples[, ...])

Create a new triples factory sharing everything except the triples.

condense([entities, relations])

Drop all IDs which are not present in the triples.

create(mapped_triples[, num_entities, ...])

Create a triples factory without any label information.

entities_to_ids(entities)

Normalize entities to IDs.

from_path_binary(path)

Load triples factory from a binary file.

get_inverse_relation_id(relation)

Get the inverse relation identifier for the given relation.

get_mask_for_relations(relations[, invert])

Get a boolean mask for triples with the given relations.

get_most_frequent_relations(n)

Get the IDs of the n most frequent relations.

iter_extra_repr()

Iterate over extra_repr components.

make_condenser([entities, relations])

Create a triple condenser from the factory's triples without applying it.

merge(*others)

Merge the triples factory with others.

new_with_restriction([entities, relations, ...])

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

relations_to_ids(relations)

Normalize relations to IDs.

split([ratios, random_state, ...])

Split a triples factory into a training part and a variable number of (transductive) evaluation parts.

split_fully_inductive([...])

Create a fully inductive split.

split_semi_inductive([ratios, random_state])

Create a semi-inductive split.

tensor_to_df(tensor, **kwargs)

Take a tensor of triples and make a pandas dataframe with labels.

to_path_binary(path)

Save triples factory to path in (PyTorch's .pt) binary format.

with_labels(entity_to_id, relation_to_id)

Add labeling to the TriplesFactory.

Attributes Documentation

base_file_name: ClassVar[str] = 'base.pth'
num_triples

The number of triples.

triples_file_name: ClassVar[str] = 'numeric_triples.tsv.gz'

Methods Documentation

apply_condenser(condenser: TripleCondenser) Self[source]

Apply the triple condenser.

Parameters:

condenser (TripleCondenser) – The condenser.

Return type:

Self

Warning

This creates a triples factory that may have a new entity- or relation to id mapping.

Returns:

A condensed version with potentially smaller num_entities or num_relations.

Parameters:

condenser (TripleCondenser)

Return type:

Self

clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) Self[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:
  • mapped_triples (Tensor) – The new mapped triples.

  • extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

  • create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.

Returns:

The new factory.

Return type:

Self

condense(entities: bool = True, relations: bool = False) Self[source]

Drop all IDs which are not present in the triples.

Parameters:
  • entities (bool) – Whether to condense entity IDs.

  • relations (bool) – Whether to condense relation IDs.

Return type:

Self

Warning

This creates a triples factory that may have a new entity- or relation to id mapping.

Returns:

A condensed version with potentially smaller num_entities or num_relations.

Parameters:
Return type:

Self

classmethod create(mapped_triples: Tensor, num_entities: int | None = None, num_relations: int | None = None, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None) Self[source]

Create a triples factory without any label information.

Parameters:
  • mapped_triples (Tensor) – shape: (n, 3) The ID-based triples.

  • num_entities (int | None) – The number of entities. If not given, inferred from mapped_triples.

  • num_relations (int | None) – The number of relations. If not given, inferred from mapped_triples.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Mapping[str, Any] | None) – Additional metadata to store in the factory.

Returns:

A new triples factory.

Return type:

Self

entities_to_ids(entities: Iterable[int] | Iterable[str]) Sequence[int][source]

Normalize entities to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:

entities (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for entities or string labels for entities (that will get auto-converted)

Returns:

Integer identifiers for entities, in the same order.

Return type:

Sequence[int]

classmethod from_path_binary(path: str | Path | TextIO) Self[source]

Load triples factory from a binary file.

Parameters:

path (str | Path | TextIO) – The path, pointing to an existing PyTorch .pt file.

Returns:

The loaded triples factory.

Return type:

Self

get_inverse_relation_id(relation: int) int[source]

Get the inverse relation identifier for the given relation.

Parameters:

relation (int)

Return type:

int

get_mask_for_relations(relations: Collection[int], invert: bool = False) Tensor[source]

Get a boolean mask for triples with the given relations.

Parameters:
Return type:

Tensor

get_most_frequent_relations(n: int | float) set[int][source]

Get the IDs of the n most frequent relations.

Parameters:

n (int | float) – Either the (integer) number of top relations to keep or the (float) percentage of top relationships to keep.

Returns:

A set of IDs for the n most frequent relations

Raises:

TypeError – If the n is the wrong type

Return type:

set[int]

iter_extra_repr() Iterable[str][source]

Iterate over extra_repr components.

Return type:

Iterable[str]

make_condenser(entities: bool = True, relations: bool = False) TripleCondenser[source]

Create a triple condenser from the factory’s triples without applying it.

Parameters:
  • entities (bool) – Whether to condense entity IDs.

  • relations (bool) – Whether to condense relations IDs.

Returns:

The triple condenser.

Return type:

TripleCondenser

merge(*others: Self) Self[source]

Merge the triples factory with others.

The other triples factories have to be compatible.

Parameters:

others (Self) – The other factories.

Returns:

A new factory with the combined triples.

Raises:

ValueError – If any of the other factories has incompatible settings (number of entities or relations, or creation of inverse triples.)

Return type:

Self

new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) Self[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:
  • entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.

  • relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.

  • invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.

  • invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

Return type:

Self

relations_to_ids(relations: Iterable[int] | Iterable[str]) Sequence[int][source]

Normalize relations to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:

relations (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for relations or string labels for relations (that will get auto-converted)

Returns:

Integer identifiers for relations, in the same order.

Return type:

Sequence[int]

split(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None, randomize_cleanup: bool = False, method: str | None = None) list[Self][source]

Split a triples factory into a training part and a variable number of (transductive) evaluation parts.

Warning

This method is not suitable to create inductive splits.

Parameters:
  • ratios (float | Sequence[float]) –

    There are three options for this argument:

    1. A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.

    2. A list of ratios can be given for which set in which order should get what ratios as in [0.8, 0.1]. The final ratio can be omitted because that can be calculated.

    3. All ratios can be explicitly set in order such as in [0.8, 0.1, 0.1] where the sum of all ratios is 1.0.

  • random_state (None | int | Generator) – The random state used to shuffle and split the triples.

  • randomize_cleanup (bool) – This parameter is forwarded to the underlying pykeen.triples.splitting.split().

  • method (str | None) – This parameter is forwarded to the underlying pykeen.triples.splitting.split().

Returns:

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

Return type:

list[Self]

ratio = 0.8  # makes a [0.8, 0.2] split
training_factory, testing_factory = factory.split(ratio)

ratios = [0.8, 0.1]  # makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)

ratios = [0.8, 0.1, 0.1]  # also makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)
split_fully_inductive(entity_split_train_ratio: float = 0.5, evaluation_triples_ratios: float | Sequence[float] = 0.8, random_state: None | int | Generator = None) list[Self][source]

Create a fully inductive split.

In a fully inductive split, we first split the entities into two disjoint sets: training entities and inference entities. We use the induced subgraph of the training entities for training. The triples of the inference graph are then further split into inference triples and evaluation triples.

Parameters:
  • entity_split_train_ratio (float) – The ratio of entities to use for the training part. The remainder will be used for the inference/evaluation graph.

  • evaluation_triples_ratios (float | Sequence[float]) – The split ratio for the inference graph split.

  • random_state (None | int | Generator) – The random state used to shuffle and split the triples.

Returns:

A (transductive) training triples factory, the inductive inference triples factory, as well as the evaluation triples factories.

Return type:

list[Self]

split_semi_inductive(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None) list[Self][source]

Create a semi-inductive split.

In a semi-inductive split, we first split the entities into training and evaluation entities. The training graph is then composed of all triples involving only training entities. The evaluation graphs are built by looking at the triples that involve exactly one training and one evaluation entity.

Parameters:
  • ratios (float | Sequence[float]) – The entity split ratio(s).

  • random_state (None | int | Generator) – The random state used to shuffle and split the triples.

Returns:

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

Return type:

list[Self]

See also

tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) DataFrame[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:
  • tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Returns:

A dataframe with n rows, and 6 + len(kwargs) columns.

Return type:

DataFrame

to_path_binary(path: str | Path | TextIO) Path[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:

path (str | Path | TextIO) – The path to store the triples factory to.

Returns:

The path to the file that got dumped

Return type:

Path

with_labels(entity_to_id: Mapping[str, int], relation_to_id: Mapping[str, int]) TriplesFactory[source]

Add labeling to the TriplesFactory.

Parameters:
Return type:

TriplesFactory