CoreTriplesFactory

class CoreTriplesFactory(mapped_triples: Tensor | ndarray, num_entities: int, num_relations: int, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None)[source]

Bases: KGInfo

Create instances from ID-based triples.

Create the triples factory.

Parameters:

mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph

Raises:

TypeError – if the mapped_triples are of non-integer dtype
ValueError – if the mapped_triples are of invalid shape

Attributes Summary

`base_file_name`
`num_triples`	The number of triples.
`triples_file_name`

Methods Summary

`apply_condenser`(condenser)	Apply the triple condenser.
`clone_and_exchange_triples`(mapped_triples[, ...])	Create a new triples factory sharing everything except the triples.
`condense`([entities, relations])	Drop all IDs which are not present in the triples.
`create`(mapped_triples[, num_entities, ...])	Create a triples factory without any label information.
`entities_to_ids`(entities)	Normalize entities to IDs.
`from_path_binary`(path)	Load triples factory from a binary file.
`get_inverse_relation_id`(relation)	Get the inverse relation identifier for the given relation.
`get_mask_for_relations`(relations[, invert])	Get a boolean mask for triples with the given relations.
`get_most_frequent_relations`(n)	Get the IDs of the n most frequent relations.
`iter_extra_repr`()	Iterate over extra_repr components.
`make_condenser`([entities, relations])	Create a triple condenser from the factory's triples without applying it.
`merge`(*others)	Merge the triples factory with others.
`new_with_restriction`([entities, relations, ...])	Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
`relations_to_ids`(relations)	Normalize relations to IDs.
`split`([ratios, random_state, ...])	Split a triples factory into a training part and a variable number of (transductive) evaluation parts.
`split_fully_inductive`([...])	Create a fully inductive split.
`split_semi_inductive`([ratios, random_state])	Create a semi-inductive split.
`tensor_to_df`(tensor, **kwargs)	Take a tensor of triples and make a pandas dataframe with labels.
`to_path_binary`(path)	Save triples factory to path in (PyTorch's .pt) binary format.
`with_labels`(entity_to_id, relation_to_id)	Add labeling to the TriplesFactory.

Attributes Documentation

base_file_name: ClassVar[str] = 'base.pth'

num_triples: The number of triples.

triples_file_name: ClassVar[str] = 'numeric_triples.tsv.gz'

Methods Documentation

apply_condenser(condenser: TripleCondenser) → Self[source]

Apply the triple condenser.

Parameters:: condenser (TripleCondenser) – The condenser.
Return type:: Self

Warning

This creates a triples factory that may have a new entity- or relation to id mapping.

Returns:: A condensed version with potentially smaller num_entities or num_relations.
Parameters:: condenser (TripleCondenser)
Return type:: Self

clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) → Self[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:

mapped_triples (Tensor) – The new mapped triples.
extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.
keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.

Returns:

The new factory.

Return type:

Self

condense(entities: bool = True, relations: bool = False) → Self[source]

Drop all IDs which are not present in the triples.

Parameters:

entities (bool) – Whether to condense entity IDs.
relations (bool) – Whether to condense relation IDs.

Return type:

Self

Warning

This creates a triples factory that may have a new entity- or relation to id mapping.

Returns:

A condensed version with potentially smaller num_entities or num_relations.

Parameters:

entities (bool)
relations (bool)

Return type:

Self

classmethod create(mapped_triples: Tensor, num_entities: int | None = None, num_relations: int | None = None, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None) → Self[source]

Create a triples factory without any label information.

Parameters:

mapped_triples (Tensor) – shape: (n, 3) The ID-based triples.
num_entities (int | None) – The number of entities. If not given, inferred from mapped_triples.
num_relations (int | None) – The number of relations. If not given, inferred from mapped_triples.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Additional metadata to store in the factory.

Returns:

A new triples factory.

Return type:

Self

entities_to_ids(entities: Iterable[int] | Iterable[str]) → Sequence[int][source]

Normalize entities to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:: entities (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for entities or string labels for entities (that will get auto-converted)
Returns:: Integer identifiers for entities, in the same order.
Return type:: Sequence[int]

classmethod from_path_binary(path: str | Path | TextIO) → Self[source]

Load triples factory from a binary file.

Parameters:: path (str | Path | TextIO) – The path, pointing to an existing PyTorch .pt file.
Returns:: The loaded triples factory.
Return type:: Self

get_inverse_relation_id(relation: int) → int[source]

Get the inverse relation identifier for the given relation.

Parameters:: relation (int)
Return type:: int

get_mask_for_relations(relations: Collection[int], invert: bool = False) → Tensor[source]

Get a boolean mask for triples with the given relations.

Parameters:

relations (Collection[int])
invert (bool)

Return type:

Tensor

get_most_frequent_relations(n: int | float) → set[int][source]

Get the IDs of the n most frequent relations.

Parameters:: n (int | float) – Either the (integer) number of top relations to keep or the (float) percentage of top relationships to keep.
Returns:: A set of IDs for the n most frequent relations
Raises:: TypeError – If the n is the wrong type
Return type:: set[int]

iter_extra_repr() → Iterable[str][source]

Iterate over extra_repr components.

Return type:: Iterable[str]

make_condenser(entities: bool = True, relations: bool = False) → TripleCondenser[source]

Create a triple condenser from the factory’s triples without applying it.

Parameters:

entities (bool) – Whether to condense entity IDs.
relations (bool) – Whether to condense relations IDs.

Returns:

The triple condenser.

Return type:

TripleCondenser

merge(*others: Self) → Self[source]

Merge the triples factory with others.

The other triples factories have to be compatible.

Parameters:: others (Self) – The other factories.
Returns:: A new factory with the combined triples.
Raises:: ValueError – If any of the other factories has incompatible settings (number of entities or relations, or creation of inverse triples.)
Return type:: Self

new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) → Self[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:

entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.
relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

Return type:

Self

relations_to_ids(relations: Iterable[int] | Iterable[str]) → Sequence[int][source]

Normalize relations to IDs.

It raises a TypeError if the factory does not support the given data type, e.g. you cannot use str with CoreTriplesFactory.

Parameters:: relations (Iterable[int] | Iterable[str]) – A sequence of either integer identifiers for relations or string labels for relations (that will get auto-converted)
Returns:: Integer identifiers for relations, in the same order.
Return type:: Sequence[int]

split(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None, randomize_cleanup: bool = False, method: str | None = None) → list[Self][source]

Split a triples factory into a training part and a variable number of (transductive) evaluation parts.

Warning

This method is not suitable to create inductive splits.

Parameters:

ratios (float | Sequence[float]) –
There are three options for this argument:
1. A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.
2. A list of ratios can be given for which set in which order should get what ratios as in [0.8, 0.1]. The final ratio can be omitted because that can be calculated.
3. All ratios can be explicitly set in order such as in [0.8, 0.1, 0.1] where the sum of all ratios is 1.0.
random_state (None | int | Generator) – The random state used to shuffle and split the triples.
randomize_cleanup (bool) – This parameter is forwarded to the underlying pykeen.triples.splitting.split().
method (str | None) – This parameter is forwarded to the underlying pykeen.triples.splitting.split().

Returns:

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

Return type:

list[Self]