TriplesFactory

class TriplesFactory(mapped_triples: Tensor | ndarray, entity_to_id: Mapping[str, int], relation_to_id: Mapping[str, int], create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None, num_entities: int | None = None, num_relations: int | None = None)[source]

Bases: CoreTriplesFactory

Create instances given the path to triples.

Create the triples factory.

Parameters:
  • mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.

  • entity_to_id (Mapping[str, int]) – The mapping from entities’ labels to their indices.

  • relation_to_id (Mapping[str, int]) – The mapping from relations’ labels to their indices.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph

  • num_entities (int) – the number of entities. May be None, in which case this number is inferred by the label mapping

  • num_relations (int) – the number of relations. May be None, in which case this number is inferred by the label mapping

Raises:

ValueError – if the explicitly provided number of entities or relations does not match with the one given by the label mapping

Attributes Summary

entity_id_to_label

Return the mapping from entity IDs to labels.

entity_to_id

Return the mapping from entity labels to IDs.

file_name_entity_to_id

file_name_relation_to_id

relation_id_to_label

Return the mapping from relations IDs to labels.

relation_to_id

Return the mapping from relations labels to IDs.

triples

The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Methods Summary

apply_condenser(condenser)

Apply the triple condenser.

clone_and_exchange_triples(mapped_triples[, ...])

Create a new triples factory sharing everything except the triples.

entities_to_ids(entities)

Normalize entities to IDs.

entity_word_cloud([top])

Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.

from_labeled_triples(triples, *[, ...])

Create a new triples factory from label-based triples.

from_path(path, *[, create_inverse_triples, ...])

Create a new triples factory from triples stored in a file.

get_inverse_relation_id(relation)

Get the inverse relation identifier for the given relation.

get_mask_for_relations(relations[, invert])

Get a boolean mask for triples with the given relations.

label_triples(triples[, ...])

Convert ID-based triples to label-based ones.

map_triples(triples)

Convert label-based triples to ID-based triples.

merge(*others)

Merge the triples factory with others.

new_with_restriction([entities, relations, ...])

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

relation_word_cloud([top])

Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.

relations_to_ids(relations)

Normalize relations to IDs.

tensor_to_df(tensor, **kwargs)

Take a tensor of triples and make a pandas dataframe with labels.

to_core_triples_factory()

Return this factory as a core factory.

to_path_binary(path)

Save triples factory to path in (PyTorch's .pt) binary format.

Attributes Documentation

entity_id_to_label

Return the mapping from entity IDs to labels.

entity_to_id

Return the mapping from entity labels to IDs.

file_name_entity_to_id: ClassVar[str] = 'entity_to_id'
file_name_relation_to_id: ClassVar[str] = 'relation_to_id'
relation_id_to_label

Return the mapping from relations IDs to labels.

relation_to_id

Return the mapping from relations labels to IDs.

triples

The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Methods Documentation

apply_condenser(condenser: TripleCondenser) Self[source]

Apply the triple condenser.

Parameters:

condenser (TripleCondenser) – The condenser.

Return type:

Self

Warning

This creates a triples factory that may have a new entity- or relation to id mapping.

Returns:

A condensed version with potentially smaller num_entities or num_relations.

Parameters:

condenser (TripleCondenser)

Return type:

Self

clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) Self[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:
  • mapped_triples (Tensor) – The new mapped triples.

  • extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

  • create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.

Returns:

The new factory.

Return type:

Self

entities_to_ids(entities: Iterable[int] | Iterable[str]) Sequence[int][source]

Normalize entities to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:

entities (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for entities or string labels for entities (that will get auto-converted)

Returns:

Integer identifiers for entities, in the same order.

Return type:

Sequence[int]

entity_word_cloud(top: int | None = None)[source]

Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.

Parameters:

top (int | None) – The number of top entities to show. Defaults to 100.

Returns:

A word cloud object for a Jupyter notebook

Warning

This function requires the wordcloud package. Use pip install pykeen[wordcloud] to install it.

classmethod from_labeled_triples(triples: ndarray, *, create_inverse_triples: bool = False, entity_to_id: Mapping[str, int] | None = None, relation_to_id: Mapping[str, int] | None = None, compact_id: bool = True, filter_out_candidate_inverse_relations: bool = True, metadata: dict[str, Any] | None = None) Self[source]

Create a new triples factory from label-based triples.

Parameters:
  • triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • entity_to_id (Mapping[str, int] | None) – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id (Mapping[str, int] | None) – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.

  • filter_out_candidate_inverse_relations (bool) – Whether to remove triples with relations with the inverse suffix.

  • metadata (dict[str, Any] | None) – Arbitrary key/value pairs to store as metadata

Returns:

A new triples factory.

Return type:

Self

classmethod from_path(path: str | Path | TextIO, *, create_inverse_triples: bool = False, entity_to_id: Mapping[str, int] | None = None, relation_to_id: Mapping[str, int] | None = None, compact_id: bool = True, metadata: dict[str, Any] | None = None, load_triples_kwargs: Mapping[str, Any] | None = None, **kwargs) Self[source]

Create a new triples factory from triples stored in a file.

Parameters:
  • path (str | Path | TextIO) – The path where the label-based triples are stored.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • entity_to_id (Mapping[str, int] | None) – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id (Mapping[str, int] | None) – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.

  • metadata (dict[str, Any] | None) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include path as a key because it is automatically taken from the path kwarg to this function.

  • load_triples_kwargs (Mapping[str, Any] | None) – Optional keyword arguments to pass to load_triples(). Could include the delimiter or a column_remapping.

  • kwargs – additional keyword-based parameters, which are ignored.

Returns:

A new triples factory.

Return type:

Self

get_inverse_relation_id(relation: str | int) int[source]

Get the inverse relation identifier for the given relation.

Parameters:

relation (str | int)

Return type:

int

get_mask_for_relations(relations: Collection[int] | Collection[str], invert: bool = False) Tensor[source]

Get a boolean mask for triples with the given relations.

Parameters:
Return type:

Tensor

label_triples(triples: Tensor, unknown_entity_label: str = '[UNKNOWN]', unknown_relation_label: str | None = None) ndarray[source]

Convert ID-based triples to label-based ones.

Parameters:
  • triples (Tensor) – The ID-based triples.

  • unknown_entity_label (str) – The label to use for unknown entity IDs.

  • unknown_relation_label (str | None) – The label to use for unknown relation IDs.

Returns:

The same triples, but labeled.

Return type:

ndarray

map_triples(triples: ndarray) Tensor[source]

Convert label-based triples to ID-based triples.

Parameters:

triples (ndarray)

Return type:

Tensor

merge(*others: Self) Self[source]

Merge the triples factory with others.

The other triples factories have to be compatible.

Parameters:

others (Self) – The other factories.

Returns:

A new factory with the combined triples.

Raises:

ValueError – If any of the other factories has incompatible settings (number of entities or relations, or creation of inverse triples.)

Return type:

Self

new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) Self[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:
  • entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.

  • relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.

  • invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.

  • invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

Return type:

Self

relation_word_cloud(top: int | None = None)[source]

Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.

Parameters:

top (int | None) – The number of top relations to show. Defaults to 100.

Returns:

A world cloud object for a Jupyter notebook

Warning

This function requires the wordcloud package. Use pip install pykeen[wordcloud] to install it.

relations_to_ids(relations: Iterable[int] | Iterable[str]) Sequence[int][source]

Normalize relations to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:

relations (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for relations or string labels for relations (that will get auto-converted)

Returns:

Integer identifiers for relations, in the same order.

Return type:

Sequence[int]

tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) DataFrame[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:
  • tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Returns:

A dataframe with n rows, and 6 + len(kwargs) columns.

Return type:

DataFrame

to_core_triples_factory() CoreTriplesFactory[source]

Return this factory as a core factory.

Return type:

CoreTriplesFactory

to_path_binary(path: str | Path | TextIO) Path[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:

path (str | Path | TextIO) – The path to store the triples factory to.

Returns:

The path to the file that got dumped

Return type:

Path