TriplesFactory

class TriplesFactory(mapped_triples: Tensor | ndarray, entity_to_id: Mapping[str, int], relation_to_id: Mapping[str, int], create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None, num_entities: int | None = None, num_relations: int | None = None)[source]

Bases: CoreTriplesFactory

Create instances given the path to triples.

Create the triples factory.

Parameters:

mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
entity_to_id (Mapping[str, int]) – The mapping from entities’ labels to their indices.
relation_to_id (Mapping[str, int]) – The mapping from relations’ labels to their indices.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph
num_entities (int) – the number of entities. May be None, in which case this number is inferred by the label mapping
num_relations (int) – the number of relations. May be None, in which case this number is inferred by the label mapping

Raises:

ValueError – if the explicitly provided number of entities or relations does not match with the one given by the label mapping

Attributes Summary

`entity_id_to_label`	Return the mapping from entity IDs to labels.
`entity_to_id`	Return the mapping from entity labels to IDs.
`file_name_entity_to_id`
`file_name_relation_to_id`
`relation_id_to_label`	Return the mapping from relations IDs to labels.
`relation_to_id`	Return the mapping from relations labels to IDs.
`triples`	The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Methods Summary

`apply_condenser`(condenser)	Apply the triple condenser.
`clone_and_exchange_triples`(mapped_triples[, ...])	Create a new triples factory sharing everything except the triples.
`entities_to_ids`(entities)	Normalize entities to IDs.
`entity_word_cloud`([top])	Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.
`from_labeled_triples`(triples, *[, ...])	Create a new triples factory from label-based triples.
`from_path`(path, *[, create_inverse_triples, ...])	Create a new triples factory from triples stored in a file.
`get_inverse_relation_id`(relation)	Get the inverse relation identifier for the given relation.
`get_mask_for_relations`(relations[, invert])	Get a boolean mask for triples with the given relations.
`label_triples`(triples[, ...])	Convert ID-based triples to label-based ones.
`map_triples`(triples)	Convert label-based triples to ID-based triples.
`merge`(*others)	Merge the triples factory with others.
`new_with_restriction`([entities, relations, ...])	Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
`relation_word_cloud`([top])	Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.
`relations_to_ids`(relations)	Normalize relations to IDs.
`tensor_to_df`(tensor, **kwargs)	Take a tensor of triples and make a pandas dataframe with labels.
`to_core_triples_factory`()	Return this factory as a core factory.
`to_path_binary`(path)	Save triples factory to path in (PyTorch's .pt) binary format.

Attributes Documentation

entity_id_to_label: Return the mapping from entity IDs to labels.

entity_to_id: Return the mapping from entity labels to IDs.

file_name_entity_to_id: ClassVar[str] = 'entity_to_id'

file_name_relation_to_id: ClassVar[str] = 'relation_to_id'

relation_id_to_label: Return the mapping from relations IDs to labels.

relation_to_id: Return the mapping from relations labels to IDs.

triples: The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Methods Documentation

apply_condenser(condenser: TripleCondenser) → Self[source]

Apply the triple condenser.

Parameters:: condenser (TripleCondenser) – The condenser.
Return type:: Self

Warning

This creates a triples factory that may have a new entity- or relation to id mapping.

Returns:: A condensed version with potentially smaller num_entities or num_relations.
Parameters:: condenser (TripleCondenser)
Return type:: Self

clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) → Self[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:

mapped_triples (Tensor) – The new mapped triples.
extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.
keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.

Returns:

The new factory.

Return type:

Self

entities_to_ids(entities: Iterable[int] | Iterable[str]) → Sequence[int][source]

Normalize entities to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:: entities (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for entities or string labels for entities (that will get auto-converted)
Returns:: Integer identifiers for entities, in the same order.
Return type:: Sequence[int]

entity_word_cloud(top: int | None = None)[source]

Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.

Parameters:: top (int | None) – The number of top entities to show. Defaults to 100.
Returns:: A word cloud object for a Jupyter notebook

Warning

This function requires the wordcloud package. Use pip install pykeen[wordcloud] to install it.

classmethod from_labeled_triples(triples: ndarray, *, create_inverse_triples: bool = False, entity_to_id: Mapping[str, int] | None = None, relation_to_id: Mapping[str, int] | None = None, compact_id: bool = True, filter_out_candidate_inverse_relations: bool = True, metadata: dict[str, Any] | None = None) → Self[source]

Create a new triples factory from label-based triples.

Parameters:

triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.
create_inverse_triples (bool) – Whether to create inverse triples.
entity_to_id (Mapping[str, int] | None) – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id (Mapping[str, int] | None) – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.
filter_out_candidate_inverse_relations (bool) – Whether to remove triples with relations with the inverse suffix.
metadata (dict[str, Any] | None) – Arbitrary key/value pairs to store as metadata

Returns:

A new triples factory.

Return type:

Self

classmethod from_path(path: str | Path | TextIO, *, create_inverse_triples: bool = False, entity_to_id: Mapping[str, int] | None = None, relation_to_id: Mapping[str, int] | None = None, compact_id: bool = True, metadata: dict[str, Any] | None = None, load_triples_kwargs: Mapping[str, Any] | None = None, **kwargs) → Self[source]

Create a new triples factory from triples stored in a file.

Parameters:

path (str | Path | TextIO) – The path where the label-based triples are stored.
create_inverse_triples (bool) – Whether to create inverse triples.
entity_to_id (Mapping[str, int] | None) – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id (Mapping[str, int] | None) – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.
metadata (dict[str, Any] | None) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include path as a key because it is automatically taken from the path kwarg to this function.
load_triples_kwargs (Mapping[str, Any] | None) – Optional keyword arguments to pass to load_triples(). Could include the delimiter or a column_remapping.
kwargs – additional keyword-based parameters, which are ignored.

Returns:

A new triples factory.

Return type:

Self

get_inverse_relation_id(relation: str | int) → int[source]

Get the inverse relation identifier for the given relation.

Parameters:: relation (str | int)
Return type:: int

get_mask_for_relations(relations: Collection[int] | Collection[str], invert: bool = False) → Tensor[source]

Get a boolean mask for triples with the given relations.

Parameters:

relations (Collection[int] | Collection[str])
invert (bool)

Return type:

Tensor

label_triples(triples: Tensor, unknown_entity_label: str = '[UNKNOWN]', unknown_relation_label: str | None = None) → ndarray[source]

Convert ID-based triples to label-based ones.

Parameters:

triples (Tensor) – The ID-based triples.
unknown_entity_label (str) – The label to use for unknown entity IDs.
unknown_relation_label (str | None) – The label to use for unknown relation IDs.

Returns:

The same triples, but labeled.

Return type:

ndarray

map_triples(triples: ndarray) → Tensor[source]

Convert label-based triples to ID-based triples.

Parameters:: triples (ndarray)
Return type:: Tensor

merge(*others: Self) → Self[source]

Merge the triples factory with others.

The other triples factories have to be compatible.

Parameters:: others (Self) – The other factories.
Returns:: A new factory with the combined triples.
Raises:: ValueError – If any of the other factories has incompatible settings (number of entities or relations, or creation of inverse triples.)
Return type:: Self

new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) → Self[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:

entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.
relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

Return type:

Self

relation_word_cloud(top: int | None = None)[source]

Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.

Parameters:: top (int | None) – The number of top relations to show. Defaults to 100.
Returns:: A world cloud object for a Jupyter notebook

Warning

This function requires the wordcloud package. Use pip install pykeen[wordcloud] to install it.

relations_to_ids(relations: Iterable[int] | Iterable[str]) → Sequence[int][source]

Normalize relations to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:: relations (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for relations or string labels for relations (that will get auto-converted)
Returns:: Integer identifiers for relations, in the same order.
Return type:: Sequence[int]

tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) → DataFrame[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:

tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Returns:

A dataframe with n rows, and 6 + len(kwargs) columns.

Return type:

DataFrame

to_core_triples_factory() → CoreTriplesFactory[source]

Return this factory as a core factory.

Return type:: CoreTriplesFactory

to_path_binary(path: str | Path | TextIO) → Path[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:: path (str | Path | TextIO) – The path to store the triples factory to.
Returns:: The path to the file that got dumped
Return type:: Path