Triples

Classes for creating and storing training data from triples.

class CoreTriplesFactory(mapped_triples, num_entities, num_relations, entity_ids, relation_ids, create_inverse_triples=False, metadata=None)[source]

Create instances from ID-based triples.

Create the triples factory.

Parameters

mapped_triples (LongTensor) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Optional[Mapping[str, Any]]) – Arbitrary metadata to go with the graph

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters

mapped_triples (LongTensor) – The new mapped triples.
extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.
keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (Optional[bool]) – Change inverse triple creation flag. If None, use flag from this factory.

Return type

CoreTriplesFactory

Returns

The new factory.

classmethod create(mapped_triples, num_entities=None, num_relations=None, entity_ids=None, relation_ids=None, create_inverse_triples=False, metadata=None)[source]

Create a triples factory without any label information.

Parameters

mapped_triples (LongTensor) – shape: (n, 3) The ID-based triples.
num_entities (Optional[int]) – The number of entities. If not given, inferred from mapped_triples.
num_relations (Optional[int]) – The number of relations. If not given, inferred from mapped_triples.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Optional[Mapping[str, Any]]) – Additional metadata to store in the factory.

Return type

CoreTriplesFactory

Returns

A new triples factory.

create_lcwa_instances(use_tqdm=None, target=None)[source]

Create LCWA instances for this factory’s triples.

Return type: Dataset

create_slcwa_instances(*, sampler=None, **kwargs)[source]

Create sLCWA instances for this factory’s triples.

Return type: Dataset

entities_to_ids(entities)[source]

Normalize entities to IDs.

Return type: Collection[int]

extra_repr()[source]

Extra representation string.

Return type: str

classmethod from_path_binary(path)[source]

Load triples factory from a binary file.

Parameters: path (Union[str, Path, TextIO]) – The path, pointing to an existing PyTorch .pt file.
Return type: CoreTriplesFactory
Returns: The loaded triples factory.

get_inverse_relation_id(relation)[source]

Get the inverse relation identifier for the given relation.

Return type: int

get_mask_for_relations(relations, invert=False)[source]

Get a boolean mask for triples with the given relations.

Return type: BoolTensor

get_most_frequent_relations(n)[source]

Get the IDs of the n most frequent relations.

Parameters: n (Union[int, float]) – Either the (integer) number of top relations to keep or the (float) percentage of top relationships to keep.
Return type: Set[int]

new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters

entities (Union[None, Collection[int], Collection[str]]) – The entities of interest. If None, defaults to all entities.
relations (Union[None, Collection[int], Collection[str]]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Return type

CoreTriplesFactory

Returns

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

property num_entities: int

The number of unique entities.

Return type: int

property num_relations: int

The number of unique relations.

Return type: int

property num_triples: int

The number of triples.

Return type: int

property real_num_relations: int

The number of relations without inverse relations.

Return type: int

relations_to_ids(relations)[source]

Normalize relations to IDs.

Return type: Collection[int]

split(ratios=0.8, *, random_state=None, randomize_cleanup=False, method=None)[source]

Split a triples factory into a train/test.

Parameters

ratios (Union[float, Sequence[float]]) –
There are three options for this argument:
1. A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.
2. A list of ratios can be given for which set in which order should get what ratios as in [0.8, 0.1]. The final ratio can be omitted because that can be calculated.
3. All ratios can be explicitly set in order such as in [0.8, 0.1, 0.1] where the sum of all ratios is 1.0.
random_state (Union[None, int, Generator]) – The random state used to shuffle and split the triples.
randomize_cleanup (bool) – If true, uses the non-deterministic method for moving triples to the training set. This has the advantage that it does not necessarily have to move all of them, but it might be significantly slower since it moves one triple at a time.
method (Optional[str]) – The name of the method to use, from SPLIT_METHODS. Defaults to “coverage”.

Return type

List[CoreTriplesFactory]

Returns

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

ratio = 0.8  # makes a [0.8, 0.2] split
training_factory, testing_factory = factory.split(ratio)

ratios = [0.8, 0.1]  # makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)

ratios = [0.8, 0.1, 0.1]  # also makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)

tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters

tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type

DataFrame

Returns

A dataframe with n rows, and 6 + len(kwargs) columns.

to_path_binary(path)[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters: path (Union[str, Path, TextIO]) – The path to store the triples factory to.
Return type: Path

with_labels(entity_to_id, relation_to_id)[source]

Add labeling to the TriplesFactory.

Return type: TriplesFactory

class Instances(*args, **kwds)[source]

Base class for training instances.

classmethod from_triples(mapped_triples, *, num_entities, num_relations, **kwargs)[source]

Create instances from mapped triples.

Parameters

mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.
num_entities (int) – >0 The number of entities.
num_relations (int) – >0 The number of relations.
kwargs – additional keyword-based parameters.

Return type

Instances

Returns

The instances.

get_collator()[source]

Get a collator.

Return type: Optional[Callable[[List[~SampleType]], ~BatchType]]

class LCWAInstances(*, pairs, compressed)[source]

Triples and mappings to their indices for LCWA.

Initialize the LCWA instances.

Parameters

pairs (ndarray) – The unique pairs
compressed (csr_matrix) – The compressed triples in CSR format

classmethod from_triples(mapped_triples, *, num_entities, num_relations, target=None, **kwargs)[source]

Create LCWA instances from triples.

Parameters

mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.
num_entities (int) – The number of entities.
num_relations (int) – The number of relations.
target (Optional[int]) – The column to predict

Return type

Instances

Returns

The instances.

class RelationInverter[source]

An interface for inverse-relation ID mapping.

abstract get_inverse_id(relation_id)[source]

Get the inverse ID for a given relation.

Return type: ~RelationID

abstract invert_(batch, index=1)[source]

Invert relations in a batch (in-place).

Return type: LongTensor

map(batch, index=1, invert=False)[source]

Map relations of batch, optionally also inverting them.

Return type: LongTensor

class SLCWAInstances(*, mapped_triples, num_entities=None, num_relations=None, negative_sampler=None, negative_sampler_kwargs=None)[source]

Training instances for the sLCWA.

Initialize the sLCWA instances.

Parameters

mapped_triples (LongTensor) – shape: (num_triples, 3) the ID-based triples, passed to the negative sampler
num_entities (Optional[int]) – >0 the number of entities, passed to the negative sampler
num_relations (Optional[int]) – >0 the number of relations, passed to the negative sampler
negative_sampler (Union[str, NegativeSampler, Type[NegativeSampler], None]) – the negative sampler, or a hint thereof
negative_sampler_kwargs (Optional[Mapping[str, Any]]) – additional keyword-based arguments passed to the negative sampler

static collate(samples)[source]

Collate samples.

Return type: SLCWABatch

classmethod from_triples(mapped_triples, *, num_entities, num_relations, **kwargs)[source]

Create instances from mapped triples.

Parameters

mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.
num_entities (int) – >0 The number of entities.
num_relations (int) – >0 The number of relations.
kwargs – additional keyword-based parameters.

Return type

Instances

Returns

The instances.

get_collator()[source]

Get a collator.

Return type: Optional[Callable[[List[Tuple[LongTensor, LongTensor, Optional[BoolTensor]]]], SLCWABatch]]

class TriplesFactory(mapped_triples, entity_to_id, relation_to_id, create_inverse_triples=False, metadata=None)[source]

Create instances given the path to triples.

Create the triples factory.

Parameters

mapped_triples (LongTensor) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.
entity_to_id (Mapping[str, int]) – The mapping from entities’ labels to their indices.
relation_to_id (Mapping[str, int]) – The mapping from relations’ labels to their indices.
create_inverse_triples (bool) – Whether to create inverse triples.
metadata (Optional[Mapping[str, Any]]) – Arbitrary metadata to go with the graph

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters

mapped_triples (LongTensor) – The new mapped triples.
extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.
keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (Optional[bool]) – Change inverse triple creation flag. If None, use flag from this factory.

Return type

TriplesFactory

Returns

The new factory.

entities_to_ids(entities)[source]

Normalize entities to IDs.

Return type: Collection[int]

property entity_id_to_label: Mapping[int, str]

Return the mapping from entity IDs to labels.

Return type: Mapping[int, str]

property entity_to_id: Mapping[str, int]

Return the mapping from entity labels to IDs.

Return type: Mapping[str, int]

entity_word_cloud(top=None)[source]

Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.

Parameters: top (Optional[int]) – The number of top entities to show. Defaults to 100.

Warning

This function requires the word_cloud package. Use pip install pykeen[plotting] to install it automatically, or install it yourself with pip install git+https://github.com/kavgan/word_cloud.git.

classmethod from_labeled_triples(triples, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, filter_out_candidate_inverse_relations=True, metadata=None)[source]

Create a new triples factory from label-based triples.

Parameters

triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.
create_inverse_triples (bool) – Whether to create inverse triples.
entity_to_id (Optional[Mapping[str, int]]) – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id (Optional[Mapping[str, int]]) – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.
filter_out_candidate_inverse_relations (bool) – Whether to remove triples with relations with the inverse suffix.
metadata (Optional[Dict[str, Any]]) – Arbitrary key/value pairs to store as metadata

Return type

TriplesFactory

Returns

A new triples factory.

classmethod from_path(path, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, metadata=None, load_triples_kwargs=None)[source]

Create a new triples factory from triples stored in a file.

Parameters

path (Union[str, Path, TextIO]) – The path where the label-based triples are stored.
create_inverse_triples (bool) – Whether to create inverse triples.
entity_to_id (Optional[Mapping[str, int]]) – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id (Optional[Mapping[str, int]]) – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.
metadata (Optional[Dict[str, Any]]) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include path as a key because it is automatically taken from the path kwarg to this function.
load_triples_kwargs (Optional[Mapping[str, Any]]) – Optional keyword arguments to pass to load_triples(). Could include the delimiter or a column_remapping.

Return type

TriplesFactory

Returns

A new triples factory.

get_inverse_relation_id(relation)[source]

Get the inverse relation identifier for the given relation.

Return type: int

get_mask_for_relations(relations, invert=False)[source]

Get a boolean mask for triples with the given relations.

Return type: BoolTensor

label_triples(triples, unknown_entity_label='[UNKNOWN]', unknown_relation_label=None)[source]

Convert ID-based triples to label-based ones.

Parameters

triples (LongTensor) – The ID-based triples.
unknown_entity_label (str) – The label to use for unknown entity IDs.
unknown_relation_label (Optional[str]) – The label to use for unknown relation IDs.

Return type

ndarray

Returns

The same triples, but labeled.

map_triples(triples)[source]

Convert label-based triples to ID-based triples.

Return type: LongTensor

new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters

entities (Union[None, Collection[int], Collection[str]]) – The entities of interest. If None, defaults to all entities.
relations (Union[None, Collection[int], Collection[str]]) – The relations of interest. If None, defaults to all relations.
invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Return type

TriplesFactory

Returns

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

property relation_id_to_label: Mapping[int, str]

Return the mapping from relations IDs to labels.

Return type: Mapping[int, str]

property relation_to_id: Mapping[str, int]

Return the mapping from relations labels to IDs.

Return type: Mapping[str, int]

relation_word_cloud(top=None)[source]

Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.

Parameters: top (Optional[int]) – The number of top relations to show. Defaults to 100.

Warning

This function requires the word_cloud package. Use pip install pykeen[plotting] to install it automatically, or install it yourself with pip install git+https://github.com/kavgan/word_cloud.git.

relations_to_ids(relations)[source]

Normalize relations to IDs.

Return type: Collection[int]

tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters

tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type

DataFrame

Returns

A dataframe with n rows, and 6 + len(kwargs) columns.

to_core_triples_factory()[source]

Return this factory as a core factory.

Return type: CoreTriplesFactory

to_path_binary(path)[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters: path (Union[str, Path, TextIO]) – The path to store the triples factory to.
Return type: Path

property triples: numpy.ndarray

The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Return type: ndarray

class TriplesNumericLiteralsFactory(*, path=None, triples=None, path_to_numeric_triples=None, numeric_triples=None, **kwargs)[source]

Create multi-modal instances given the path to triples.

Initialize the multi-modal triples factory.

Parameters

path (Union[None, str, Path, TextIO]) – The path to a 3-column TSV file with triples in it. If not specified, you should specify triples.
triples (Optional[ndarray]) – A 3-column numpy array with triples in it. If not specified, you should specify path
path_to_numeric_triples (Union[None, str, Path, TextIO]) – The path to a 3-column TSV file with triples and numeric. If not specified, you should specify numeric_triples.
numeric_triples (Optional[ndarray]) – A 3-column numpy array with numeric triples in it. If not specified, you should specify path_to_numeric_triples.

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters

mapped_triples (LongTensor) – The new mapped triples.
extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.
keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory
create_inverse_triples (Optional[bool]) – Change inverse triple creation flag. If None, use flag from this factory.

Return type

TriplesNumericLiteralsFactory

Returns

The new factory.

extra_repr()[source]

Extra representation string.

Return type: str

get_numeric_literals_tensor()[source]

Return the numeric literals as a tensor.

Return type: FloatTensor

Instance creation utilities.

compute_compressed_adjacency_list(mapped_triples, num_entities=None)[source]

Compute compressed undirected adjacency list representation for efficient sampling.

The compressed adjacency list format is inspired by CSR sparse matrix format.

Parameters

mapped_triples (LongTensor) – the ID-based triples
num_entities (Optional[int]) – the number of entities.

Return type

Tuple[LongTensor, LongTensor, LongTensor]

Returns

a tuple (degrees, offsets, compressed_adj_lists) where

degrees: shape: (num_entities,)

offsets: shape: (num_entities,)

compressed_adj_list: shape: (2 * num_triples, 2)

with

adj_list[i] = compressed_adj_list[offsets[i]:offsets[i+1]]

get_entities(triples)[source]

Get all entities from the triples.

Return type: Set[int]

get_relations(triples)[source]

Get all relations from the triples.

Return type: Set[int]

load_triples(path, delimiter='\\t', encoding=None, column_remapping=None)[source]

Load triples saved as tab separated values.

Parameters

path (Union[str, Path, TextIO]) – The key for the data to be loaded. Typically, this will be a file path ending in .tsv that points to a file with three columns - the head, relation, and tail. This can also be used to invoke PyKEEN data importer entrypoints (see below).
delimiter (str) – The delimiter between the columns in the file
encoding (Optional[str]) – The encoding for the file. Defaults to utf-8.
column_remapping (Optional[Sequence[int]]) – A remapping if the three columns do not follow the order head-relation-tail. For example, if the order is head-tail-relation, pass (0, 2, 1)

Return type

ndarray

Returns

A numpy array representing “labeled” triples.

Raises

ValueError – if a column remapping was passed but it was not a length 3 sequence

Besides TSV handling, PyKEEN does not come with any importers pre-installed. A few can be found at:

pybel.io.pykeen
bio2bel.io.pykeen

tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters

tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).
kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type

DataFrame

Returns

A dataframe with n rows, and 3 + len(kwargs) columns.

Raises

ValueError – If a reserved column name appears in kwargs.

Remixing and dataset distance utilities.

Most datasets are given in with a pre-defined split, but it’s often not discussed how this split was created. This module contains utilities for investigating the effects of remixing pre-split datasets like :class`pykeen.datasets.Nations`.

Further, it defines a metric for the “distance” between two splits of a given dataset. Later, this will be used to map the landscape and see if there is a smooth, continuous relationship between datasets’ splits’ distances and their maximum performance.

remix(*triples_factories, **kwargs)[source]

Remix the triples from the training, testing, and validation set.

Parameters

triples_factories (CoreTriplesFactory) – A sequence of triples factories
kwargs – Keyword arguments to be passed to split()

Return type

List[CoreTriplesFactory]

Returns

A sequence of triples factories of the same sizes but randomly re-assigned triples

Raises

NotImplementedError – if any of the triples factories have create_inverse_triples

Deterioration algorithm.

deteriorate(reference, *others, n, random_state=None)[source]

Remove n triples from the reference set.

TODO: take care that triples aren’t removed that are the only ones with any given entity

Return type: List[TriplesFactory]