TriplesFactory
- class TriplesFactory(mapped_triples: Tensor | ndarray, entity_to_id: Mapping[str, int], relation_to_id: Mapping[str, int], create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None, num_entities: int | None = None, num_relations: int | None = None)[source]
Bases:
CoreTriplesFactory
Create instances given the path to triples.
Create the triples factory.
- Parameters:
mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
entity_to_id (Mapping[str, int]) – The mapping from entities’ labels to their indices.
relation_to_id (Mapping[str, int]) – The mapping from relations’ labels to their indices.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph
num_entities (int) – the number of entities. May be None, in which case this number is inferred by the label mapping
num_relations (int) – the number of relations. May be None, in which case this number is inferred by the label mapping
- Raises:
ValueError – if the explicitly provided number of entities or relations does not match with the one given by the label mapping
Attributes Summary
Return the mapping from entity IDs to labels.
Return the mapping from entity labels to IDs.
Return the mapping from relations IDs to labels.
Return the mapping from relations labels to IDs.
The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.
Methods Summary
apply_condenser
(condenser)Apply the triple condenser.
clone_and_exchange_triples
(mapped_triples[, ...])Create a new triples factory sharing everything except the triples.
entities_to_ids
(entities)Normalize entities to IDs.
entity_word_cloud
([top])Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.
from_labeled_triples
(triples, *[, ...])Create a new triples factory from label-based triples.
from_path
(path, *[, create_inverse_triples, ...])Create a new triples factory from triples stored in a file.
get_inverse_relation_id
(relation)Get the inverse relation identifier for the given relation.
get_mask_for_relations
(relations[, invert])Get a boolean mask for triples with the given relations.
label_triples
(triples[, ...])Convert ID-based triples to label-based ones.
map_triples
(triples)Convert label-based triples to ID-based triples.
merge
(*others)Merge the triples factory with others.
new_with_restriction
([entities, relations, ...])Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
relation_word_cloud
([top])Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.
relations_to_ids
(relations)Normalize relations to IDs.
tensor_to_df
(tensor, **kwargs)Take a tensor of triples and make a pandas dataframe with labels.
Return this factory as a core factory.
to_path_binary
(path)Save triples factory to path in (PyTorch's .pt) binary format.
Attributes Documentation
- entity_id_to_label
Return the mapping from entity IDs to labels.
- entity_to_id
Return the mapping from entity labels to IDs.
- relation_id_to_label
Return the mapping from relations IDs to labels.
- relation_to_id
Return the mapping from relations labels to IDs.
- triples
The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.
Methods Documentation
- apply_condenser(condenser: TripleCondenser) Self [source]
Apply the triple condenser.
- Parameters:
condenser (TripleCondenser) – The condenser.
- Return type:
Warning
This creates a triples factory that may have a new entity- or relation to id mapping.
- Returns:
A condensed version with potentially smaller num_entities or num_relations.
- Parameters:
condenser (TripleCondenser)
- Return type:
- clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) Self [source]
Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters:
mapped_triples (Tensor) – The new mapped triples.
extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If
keep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.
- Returns:
The new factory.
- Return type:
- entities_to_ids(entities: Iterable[int] | Iterable[str]) Sequence[int] [source]
Normalize entities to IDs.
It raises a
TypeError
if the factory does not support the given data type, e.g. you cannot use str withCoreTriplesFactory
.
- entity_word_cloud(top: int | None = None)[source]
Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.
- Parameters:
top (int | None) – The number of top entities to show. Defaults to 100.
- Returns:
A word cloud object for a Jupyter notebook
Warning
This function requires the
wordcloud
package. Usepip install pykeen[wordcloud]
to install it.
- classmethod from_labeled_triples(triples: ndarray, *, create_inverse_triples: bool = False, entity_to_id: Mapping[str, int] | None = None, relation_to_id: Mapping[str, int] | None = None, compact_id: bool = True, filter_out_candidate_inverse_relations: bool = True, metadata: dict[str, Any] | None = None) Self [source]
Create a new triples factory from label-based triples.
- Parameters:
triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.
create_inverse_triples (bool) – Whether to create inverse triples.
entity_to_id (Mapping[str, int] | None) – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id (Mapping[str, int] | None) – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.
filter_out_candidate_inverse_relations (bool) – Whether to remove triples with relations with the inverse suffix.
metadata (dict[str, Any] | None) – Arbitrary key/value pairs to store as metadata
- Returns:
A new triples factory.
- Return type:
- classmethod from_path(path: str | Path | TextIO, *, create_inverse_triples: bool = False, entity_to_id: Mapping[str, int] | None = None, relation_to_id: Mapping[str, int] | None = None, compact_id: bool = True, metadata: dict[str, Any] | None = None, load_triples_kwargs: Mapping[str, Any] | None = None, **kwargs) Self [source]
Create a new triples factory from triples stored in a file.
- Parameters:
path (str | Path | TextIO) – The path where the label-based triples are stored.
create_inverse_triples (bool) – Whether to create inverse triples.
entity_to_id (Mapping[str, int] | None) – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id (Mapping[str, int] | None) – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.
metadata (dict[str, Any] | None) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include
path
as a key because it is automatically taken from thepath
kwarg to this function.load_triples_kwargs (Mapping[str, Any] | None) – Optional keyword arguments to pass to
load_triples()
. Could include thedelimiter
or acolumn_remapping
.kwargs – additional keyword-based parameters, which are ignored.
- Returns:
A new triples factory.
- Return type:
- get_inverse_relation_id(relation: str | int) int [source]
Get the inverse relation identifier for the given relation.
- get_mask_for_relations(relations: Collection[int] | Collection[str], invert: bool = False) Tensor [source]
Get a boolean mask for triples with the given relations.
- Parameters:
relations (Collection[int] | Collection[str])
invert (bool)
- Return type:
- label_triples(triples: Tensor, unknown_entity_label: str = '[UNKNOWN]', unknown_relation_label: str | None = None) ndarray [source]
Convert ID-based triples to label-based ones.
- merge(*others: Self) Self [source]
Merge the triples factory with others.
The other triples factories have to be compatible.
- Parameters:
others (Self) – The other factories.
- Returns:
A new factory with the combined triples.
- Raises:
ValueError – If any of the other factories has incompatible settings (number of entities or relations, or creation of inverse triples.)
- Return type:
- new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) Self [source]
Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters:
entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.
relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.
- Returns:
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
- Return type:
- relation_word_cloud(top: int | None = None)[source]
Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.
- Parameters:
top (int | None) – The number of top relations to show. Defaults to 100.
- Returns:
A world cloud object for a Jupyter notebook
Warning
This function requires the
wordcloud
package. Usepip install pykeen[wordcloud]
to install it.
- relations_to_ids(relations: Iterable[int] | Iterable[str]) Sequence[int] [source]
Normalize relations to IDs.
It raises a
TypeError
if the factory does not support the given data type, e.g. you cannot use str withCoreTriplesFactory
.
- tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) DataFrame [source]
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters:
tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Returns:
A dataframe with n rows, and 6 + len(kwargs) columns.
- Return type:
DataFrame
- to_core_triples_factory() CoreTriplesFactory [source]
Return this factory as a core factory.
- Return type: