CoreTriplesFactory

class CoreTriplesFactory(mapped_triples: Tensor | ndarray, num_entities: int, num_relations: int, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None)[source]

Bases: KGInfo

Create instances from ID-based triples.

Create the triples factory.

Parameters:
  • mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.

  • num_entities (int) – The number of entities.

  • num_relations (int) – The number of relations.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph

Raises:
  • TypeError – if the mapped_triples are of non-integer dtype

  • ValueError – if the mapped_triples are of invalid shape

Attributes Summary

base_file_name

num_triples

The number of triples.

triples_file_name

Methods Summary

clone_and_exchange_triples(mapped_triples[, ...])

Create a new triples factory sharing everything except the triples.

create(mapped_triples[, num_entities, ...])

Create a triples factory without any label information.

entities_to_ids(entities)

Normalize entities to IDs.

from_path_binary(path)

Load triples factory from a binary file.

get_inverse_relation_id(relation)

Get the inverse relation identifier for the given relation.

get_mask_for_relations(relations[, invert])

Get a boolean mask for triples with the given relations.

get_most_frequent_relations(n)

Get the IDs of the n most frequent relations.

iter_extra_repr()

Iterate over extra_repr components.

new_with_restriction([entities, relations, ...])

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

relations_to_ids(relations)

Normalize relations to IDs.

split([ratios, random_state, ...])

Split a triples factory into a training part and a variable number of (transductive) evaluation parts.

tensor_to_df(tensor, **kwargs)

Take a tensor of triples and make a pandas dataframe with labels.

to_path_binary(path)

Save triples factory to path in (PyTorch's .pt) binary format.

with_labels(entity_to_id, relation_to_id)

Add labeling to the TriplesFactory.

Attributes Documentation

base_file_name: ClassVar[str] = 'base.pth'
num_triples

The number of triples.

triples_file_name: ClassVar[str] = 'numeric_triples.tsv.gz'

Methods Documentation

clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) CoreTriplesFactory[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:
  • mapped_triples (Tensor) – The new mapped triples.

  • extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

  • create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.

Returns:

The new factory.

Return type:

CoreTriplesFactory

classmethod create(mapped_triples: Tensor, num_entities: int | None = None, num_relations: int | None = None, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None) CoreTriplesFactory[source]

Create a triples factory without any label information.

Parameters:
  • mapped_triples (Tensor) – shape: (n, 3) The ID-based triples.

  • num_entities (int | None) – The number of entities. If not given, inferred from mapped_triples.

  • num_relations (int | None) – The number of relations. If not given, inferred from mapped_triples.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Mapping[str, Any] | None) – Additional metadata to store in the factory.

Returns:

A new triples factory.

Return type:

CoreTriplesFactory

entities_to_ids(entities: Collection[int] | Collection[str]) Collection[int][source]

Normalize entities to IDs.

Parameters:

entities (Collection[int] | Collection[str]) – A collection of either integer identifiers for entities or string labels for entities (that will get auto-converted)

Returns:

Integer identifiers for entities

Raises:

ValueError – If the entities passed are string labels and this triples factory does not have an entity label to identifier mapping (e.g., it’s just a base CoreTriplesFactory instance)

Return type:

Collection[int]

classmethod from_path_binary(path: str | Path | TextIO) CoreTriplesFactory[source]

Load triples factory from a binary file.

Parameters:

path (str | Path | TextIO) – The path, pointing to an existing PyTorch .pt file.

Returns:

The loaded triples factory.

Return type:

CoreTriplesFactory

get_inverse_relation_id(relation: int) int[source]

Get the inverse relation identifier for the given relation.

Parameters:

relation (int)

Return type:

int

get_mask_for_relations(relations: Collection[int], invert: bool = False) Tensor[source]

Get a boolean mask for triples with the given relations.

Parameters:
Return type:

Tensor

get_most_frequent_relations(n: int | float) set[int][source]

Get the IDs of the n most frequent relations.

Parameters:

n (int | float) – Either the (integer) number of top relations to keep or the (float) percentage of top relationships to keep.

Returns:

A set of IDs for the n most frequent relations

Raises:

TypeError – If the n is the wrong type

Return type:

set[int]

iter_extra_repr() Iterable[str][source]

Iterate over extra_repr components.

Return type:

Iterable[str]

new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) CoreTriplesFactory[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:
  • entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.

  • relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.

  • invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.

  • invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

Return type:

CoreTriplesFactory

relations_to_ids(relations: Collection[int] | Collection[str]) Collection[int][source]

Normalize relations to IDs.

Parameters:

relations (Collection[int] | Collection[str]) – A collection of either integer identifiers for relations or string labels for relations (that will get auto-converted)

Returns:

Integer identifiers for relations

Raises:

ValueError – If the relations passed are string labels and this triples factory does not have a relation label to identifier mapping (e.g., it’s just a base CoreTriplesFactory instance)

Return type:

Collection[int]

split(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None, randomize_cleanup: bool = False, method: str | None = None) list[CoreTriplesFactory][source]

Split a triples factory into a training part and a variable number of (transductive) evaluation parts.

Warning

This method is not suitable to create inductive splits.

Parameters:
  • ratios (float | Sequence[float]) –

    There are three options for this argument:

    1. A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.

    2. A list of ratios can be given for which set in which order should get what ratios as in [0.8, 0.1]. The final ratio can be omitted because that can be calculated.

    3. All ratios can be explicitly set in order such as in [0.8, 0.1, 0.1] where the sum of all ratios is 1.0.

  • random_state (None | int | Generator) – The random state used to shuffle and split the triples.

  • randomize_cleanup (bool) – This parameter is forwarded to the underlying pykeen.triples.splitting.split().

  • method (str | None) – This parameter is forwarded to the underlying pykeen.triples.splitting.split().

Returns:

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

Return type:

list[CoreTriplesFactory]

ratio = 0.8  # makes a [0.8, 0.2] split
training_factory, testing_factory = factory.split(ratio)

ratios = [0.8, 0.1]  # makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)

ratios = [0.8, 0.1, 0.1]  # also makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)
tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) DataFrame[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:
  • tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Returns:

A dataframe with n rows, and 6 + len(kwargs) columns.

Return type:

DataFrame

to_path_binary(path: str | Path | TextIO) Path[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:

path (str | Path | TextIO) – The path to store the triples factory to.

Returns:

The path to the file that got dumped

Return type:

Path

with_labels(entity_to_id: Mapping[str, int], relation_to_id: Mapping[str, int]) TriplesFactory[source]

Add labeling to the TriplesFactory.

Parameters:
Return type:

TriplesFactory