CoreTriplesFactory
- class CoreTriplesFactory(mapped_triples: Tensor | ndarray, num_entities: int, num_relations: int, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None)[source]
Bases:
KGInfo
Create instances from ID-based triples.
Create the triples factory.
- Parameters:
mapped_triples (Tensor | ndarray) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Arbitrary metadata to go with the graph
- Raises:
TypeError – if the mapped_triples are of non-integer dtype
ValueError – if the mapped_triples are of invalid shape
Attributes Summary
The number of triples.
Methods Summary
apply_condenser
(condenser)Apply the triple condenser.
clone_and_exchange_triples
(mapped_triples[, ...])Create a new triples factory sharing everything except the triples.
condense
([entities, relations])Drop all IDs which are not present in the triples.
create
(mapped_triples[, num_entities, ...])Create a triples factory without any label information.
entities_to_ids
(entities)Normalize entities to IDs.
from_path_binary
(path)Load triples factory from a binary file.
get_inverse_relation_id
(relation)Get the inverse relation identifier for the given relation.
get_mask_for_relations
(relations[, invert])Get a boolean mask for triples with the given relations.
Get the IDs of the n most frequent relations.
Iterate over extra_repr components.
make_condenser
([entities, relations])Create a triple condenser from the factory's triples without applying it.
merge
(*others)Merge the triples factory with others.
new_with_restriction
([entities, relations, ...])Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
relations_to_ids
(relations)Normalize relations to IDs.
split
([ratios, random_state, ...])Split a triples factory into a training part and a variable number of (transductive) evaluation parts.
split_fully_inductive
([...])Create a fully inductive split.
split_semi_inductive
([ratios, random_state])Create a semi-inductive split.
tensor_to_df
(tensor, **kwargs)Take a tensor of triples and make a pandas dataframe with labels.
to_path_binary
(path)Save triples factory to path in (PyTorch's .pt) binary format.
with_labels
(entity_to_id, relation_to_id)Add labeling to the TriplesFactory.
Attributes Documentation
- num_triples
The number of triples.
Methods Documentation
- apply_condenser(condenser: TripleCondenser) Self [source]
Apply the triple condenser.
- Parameters:
condenser (TripleCondenser) – The condenser.
- Return type:
Warning
This creates a triples factory that may have a new entity- or relation to id mapping.
- Returns:
A condensed version with potentially smaller num_entities or num_relations.
- Parameters:
condenser (TripleCondenser)
- Return type:
- clone_and_exchange_triples(mapped_triples: Tensor, extra_metadata: dict[str, Any] | None = None, keep_metadata: bool = True, create_inverse_triples: bool | None = None) Self [source]
Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters:
mapped_triples (Tensor) – The new mapped triples.
extra_metadata (dict[str, Any] | None) – Extra metadata to include in the new triples factory. If
keep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (bool | None) – Change inverse triple creation flag. If None, use flag from this factory.
- Returns:
The new factory.
- Return type:
- condense(entities: bool = True, relations: bool = False) Self [source]
Drop all IDs which are not present in the triples.
- Parameters:
- Return type:
Warning
This creates a triples factory that may have a new entity- or relation to id mapping.
- classmethod create(mapped_triples: Tensor, num_entities: int | None = None, num_relations: int | None = None, create_inverse_triples: bool = False, metadata: Mapping[str, Any] | None = None) Self [source]
Create a triples factory without any label information.
- Parameters:
mapped_triples (Tensor) – shape: (n, 3) The ID-based triples.
num_entities (int | None) – The number of entities. If not given, inferred from mapped_triples.
num_relations (int | None) – The number of relations. If not given, inferred from mapped_triples.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Mapping[str, Any] | None) – Additional metadata to store in the factory.
- Returns:
A new triples factory.
- Return type:
- entities_to_ids(entities: Iterable[int] | Iterable[str]) Sequence[int] [source]
Normalize entities to IDs.
It raises a
TypeError
if the factory does not support the given data type, e.g. you cannot use str withCoreTriplesFactory
.
- classmethod from_path_binary(path: str | Path | TextIO) Self [source]
Load triples factory from a binary file.
- get_inverse_relation_id(relation: int) int [source]
Get the inverse relation identifier for the given relation.
- get_mask_for_relations(relations: Collection[int], invert: bool = False) Tensor [source]
Get a boolean mask for triples with the given relations.
- Parameters:
relations (Collection[int])
invert (bool)
- Return type:
- get_most_frequent_relations(n: int | float) set[int] [source]
Get the IDs of the n most frequent relations.
- make_condenser(entities: bool = True, relations: bool = False) TripleCondenser [source]
Create a triple condenser from the factory’s triples without applying it.
- merge(*others: Self) Self [source]
Merge the triples factory with others.
The other triples factories have to be compatible.
- Parameters:
others (Self) – The other factories.
- Returns:
A new factory with the combined triples.
- Raises:
ValueError – If any of the other factories has incompatible settings (number of entities or relations, or creation of inverse triples.)
- Return type:
- new_with_restriction(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) Self [source]
Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters:
entities (None | Collection[int] | Collection[str]) – The entities of interest. If None, defaults to all entities.
relations (None | Collection[int] | Collection[str]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.
- Returns:
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
- Return type:
- relations_to_ids(relations: Iterable[int] | Iterable[str]) Sequence[int] [source]
Normalize relations to IDs.
It raises a
TypeError
if the factory does not support the given data type, e.g. you cannot use str withCoreTriplesFactory
.
- split(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None, randomize_cleanup: bool = False, method: str | None = None) list[Self] [source]
Split a triples factory into a training part and a variable number of (transductive) evaluation parts.
Warning
This method is not suitable to create inductive splits.
- Parameters:
ratios (float | Sequence[float]) –
There are three options for this argument:
A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.
A list of ratios can be given for which set in which order should get what ratios as in
[0.8, 0.1]
. The final ratio can be omitted because that can be calculated.All ratios can be explicitly set in order such as in
[0.8, 0.1, 0.1]
where the sum of all ratios is 1.0.
random_state (None | int | Generator) – The random state used to shuffle and split the triples.
randomize_cleanup (bool) – This parameter is forwarded to the underlying
pykeen.triples.splitting.split()
.method (str | None) – This parameter is forwarded to the underlying
pykeen.triples.splitting.split()
.
- Returns:
A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.
- Return type:
See also
ratio = 0.8 # makes a [0.8, 0.2] split training_factory, testing_factory = factory.split(ratio) ratios = [0.8, 0.1] # makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios) ratios = [0.8, 0.1, 0.1] # also makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios)
- split_fully_inductive(entity_split_train_ratio: float = 0.5, evaluation_triples_ratios: float | Sequence[float] = 0.8, random_state: None | int | Generator = None) list[Self] [source]
Create a fully inductive split.
In a fully inductive split, we first split the entities into two disjoint sets: training entities and inference entities. We use the induced subgraph of the training entities for training. The triples of the inference graph are then further split into inference triples and evaluation triples.
- Parameters:
entity_split_train_ratio (float) – The ratio of entities to use for the training part. The remainder will be used for the inference/evaluation graph.
evaluation_triples_ratios (float | Sequence[float]) – The split ratio for the inference graph split.
random_state (None | int | Generator) – The random state used to shuffle and split the triples.
- Returns:
A (transductive) training triples factory, the inductive inference triples factory, as well as the evaluation triples factories.
- Return type:
- split_semi_inductive(ratios: float | Sequence[float] = 0.8, *, random_state: None | int | Generator = None) list[Self] [source]
Create a semi-inductive split.
In a semi-inductive split, we first split the entities into training and evaluation entities. The training graph is then composed of all triples involving only training entities. The evaluation graphs are built by looking at the triples that involve exactly one training and one evaluation entity.
- Parameters:
- Returns:
A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.
- Return type:
See also
- tensor_to_df(tensor: Tensor, **kwargs: Tensor | ndarray | Sequence) DataFrame [source]
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters:
tensor (Tensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Tensor | ndarray | Sequence) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Returns:
A dataframe with n rows, and 6 + len(kwargs) columns.
- Return type:
DataFrame