CoreTriplesFactory
- class CoreTriplesFactory(mapped_triples: Tensor | ndarray, num_entities: int, num_relations: int, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None)[source]
Bases:
KGInfo
Create instances from ID-based triples.
Create the triples factory.
- Parameters:
mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph
- Raises:
TypeError – if the mapped_triples are of non-integer dtype
ValueError – if the mapped_triples are of invalid shape
Attributes Summary
The number of triples.
Methods Summary
clone_and_exchange_triples
(mapped_triples[, ...])Create a new triples factory sharing everything except the triples.
create
(mapped_triples[, num_entities, ...])Create a triples factory without any label information.
entities_to_ids
(entities)Normalize entities to IDs.
from_path_binary
(path)Load triples factory from a binary file.
get_inverse_relation_id
(relation)Get the inverse relation identifier for the given relation.
get_mask_for_relations
(relations[, invert])Get a boolean mask for triples with the given relations.
Get the IDs of the n most frequent relations.
Iterate over extra_repr components.
new_with_restriction
([entities, relations, ...])Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
relations_to_ids
(relations)Normalize relations to IDs.
split
([ratios, random_state, ...])Split a triples factory into a training part and a variable number of (transductive) evaluation parts.
tensor_to_df
(tensor, **kwargs)Take a tensor of triples and make a pandas dataframe with labels.
to_path_binary
(path)Save triples factory to path in (PyTorch's .pt) binary format.
with_labels
(entity_to_id, relation_to_id)Add labeling to the TriplesFactory.
Attributes Documentation
- num_triples
The number of triples.
Methods Documentation
- clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) CoreTriplesFactory [source]
Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters:
mapped_triples (Tensor) – The new mapped triples.
extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If
keep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.
- Returns:
The new factory.
- Return type:
- classmethod create(mapped_triples: Tensor, num_entities: int | None = None, num_relations: int | None = None, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None) CoreTriplesFactory [source]
Create a triples factory without any label information.
- Parameters:
mapped_triples (Tensor) – shape: (n, 3) The ID-based triples.
num_entities (int | None) – The number of entities. If not given, inferred from mapped_triples.
num_relations (int | None) – The number of relations. If not given, inferred from mapped_triples.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Additional metadata to store in the factory.
- Returns:
A new triples factory.
- Return type:
- entities_to_ids(entities: Collection[int] | Collection[str]) Collection[int] [source]
Normalize entities to IDs.
- Parameters:
entities (Collection[int] | Collection[str]) – A collection of either integer identifiers for entities or string labels for entities (that will get auto-converted)
- Returns:
Integer identifiers for entities
- Raises:
ValueError – If the
entities
passed are string labels and this triples factory does not have an entity label to identifier mapping (e.g., it’s just a baseCoreTriplesFactory
instance)- Return type:
- classmethod from_path_binary(path: str | Path | TextIO) CoreTriplesFactory [source]
Load triples factory from a binary file.
- get_inverse_relation_id(relation: int) int [source]
Get the inverse relation identifier for the given relation.
- get_mask_for_relations(relations: Collection[int], invert: bool = False) Tensor [source]
Get a boolean mask for triples with the given relations.
- Parameters:
relations (Collection[int])
invert (bool)
- Return type:
- get_most_frequent_relations(n: int | float) set[int] [source]
Get the IDs of the n most frequent relations.
- new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) CoreTriplesFactory [source]
Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters:
entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.
relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.
- Returns:
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
- Return type:
- relations_to_ids(relations: Collection[int] | Collection[str]) Collection[int] [source]
Normalize relations to IDs.
- Parameters:
relations (Collection[int] | Collection[str]) – A collection of either integer identifiers for relations or string labels for relations (that will get auto-converted)
- Returns:
Integer identifiers for relations
- Raises:
ValueError – If the
relations
passed are string labels and this triples factory does not have a relation label to identifier mapping (e.g., it’s just a baseCoreTriplesFactory
instance)- Return type:
- split(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None, randomize_cleanup: bool = False, method: str | None = None) list[CoreTriplesFactory] [source]
Split a triples factory into a training part and a variable number of (transductive) evaluation parts.
Warning
This method is not suitable to create inductive splits.
- Parameters:
ratios (float | Sequence[float]) –
There are three options for this argument:
A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.
A list of ratios can be given for which set in which order should get what ratios as in
[0.8, 0.1]
. The final ratio can be omitted because that can be calculated.All ratios can be explicitly set in order such as in
[0.8, 0.1, 0.1]
where the sum of all ratios is 1.0.
random_state (None | int | Generator) – The random state used to shuffle and split the triples.
randomize_cleanup (bool) – This parameter is forwarded to the underlying
pykeen.triples.splitting.split()
.method (str | None) – This parameter is forwarded to the underlying
pykeen.triples.splitting.split()
.
- Returns:
A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.
- Return type:
See also
ratio = 0.8 # makes a [0.8, 0.2] split training_factory, testing_factory = factory.split(ratio) ratios = [0.8, 0.1] # makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios) ratios = [0.8, 0.1, 0.1] # also makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios)
- tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) DataFrame [source]
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters:
tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Returns:
A dataframe with n rows, and 6 + len(kwargs) columns.
- Return type:
DataFrame