Triples

A module to handle triples.

A knowledge graph can be thought of as a collection of facts, where each individual fact is represented as a triple of a head entity \(h\), a relation \(r\) and a tail entity \(t\). In order to operate efficiently on them, most of PyKEEN assumes that the set of entities has been that the set of entities has been transformed so that they are identified by integers from \([0, \ldots, E)\), where \(E\) is the number of entities. A similar assumption is made for relations with their indices of \([0, \ldots, R]\).

This module includes classes and methods for loading and transforming triples from other formats into the index-based format, as well as advanced methods for creating leakage-free training and test splits and analyzing data distribution.

Basic Handling

The most basic information about a knowledge graph is stored in KGInfo. It contains the minimal information needed to create a knowledge graph embedding: the number of entities and relations, as well as information about the use of inverse relations (which artificially increases the number of relations).

To store information about triples, there is the CoreTriplesFactory. It extends KGInfo by additionally storing a set of index-based triples, in the form of a 3-column matrix. It also allows to store arbitrary metadata in the form of a (JSON-compatible) dictionary. It also adds support for serialization, i.e. saving and loading to a file, as well as filter operations and utility methods to create dataframes for further (external) processing.

Finally, there is TriplesFactory, which adds mapping of string-based entity and relation names to IDs. This class also provides rich factory methods that allow creating mappings from string-based triples alone, loading triples from sufficiently similar external file formats such as TSV or CSV, and converting back and forth between label-based and index-based formats. It also extends serialization to ensure that the string-to-index mappings are included along with the files.

Splitting

To evaluate knowledge graph embedding models, we need training, validation, and test sets. In classical machine learning settings, we often have a large number of independent samples and can use simple random sampling. In graph learning settings, however, we are interested in learning relational patterns, i.e. patterns between different samples. In these settings, we need to be more careful.

For example, to evaluate models in a transductive setting, we need to make sure that all entities and relations of the triples used in the evaluation are also present in the training triples. PyKEEN includes methods to construct splits that ensure the presence of all entities and relations in the training part. Those can be found in pykeen.triples.splitting.

In addition, knowledge graphs may contain inverse relationships, such as a _predecessor_ and _successor_ relationship. In this case, careless splitting can lead to test leakage, where models that only check whether the inverse relationship exists in training can produce significantly strong results, inflating scores without learning meaningful relationship patterns. PyKEEN includes methods to check knowledge graph splits for leakage, which can be found in pykeen.triples.leakage.

In pykeen.triples.remix, we offer methods to examine the effects of a particular choice of splits.

Analysis

We also provide methods for analyzing knowledge graphs. These include simple statistics such as the number of entities or relations (in pykeen.triples.stats), as well as advanced analysis of relational patterns (pykeen.triples.analysis).

class CoreTriplesFactory(mapped_triples, num_entities, num_relations, create_inverse_triples=False, metadata=None)[source]

Create instances from ID-based triples.

Create the triples factory.

Parameters:
  • mapped_triples (Union[LongTensor, ndarray]) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.

  • num_entities (int) – The number of entities.

  • num_relations (int) – The number of relations.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Optional[Mapping[str, Any]]) – Arbitrary metadata to go with the graph

Raises:
  • TypeError – if the mapped_triples are of non-integer dtype

  • ValueError – if the mapped_triples are of invalid shape

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:
  • mapped_triples (LongTensor) – The new mapped triples.

  • extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

  • create_inverse_triples (Optional[bool]) – Change inverse triple creation flag. If None, use flag from this factory.

Return type:

CoreTriplesFactory

Returns:

The new factory.

classmethod create(mapped_triples, num_entities=None, num_relations=None, create_inverse_triples=False, metadata=None)[source]

Create a triples factory without any label information.

Parameters:
  • mapped_triples (LongTensor) – shape: (n, 3) The ID-based triples.

  • num_entities (Optional[int]) – The number of entities. If not given, inferred from mapped_triples.

  • num_relations (Optional[int]) – The number of relations. If not given, inferred from mapped_triples.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Optional[Mapping[str, Any]]) – Additional metadata to store in the factory.

Return type:

CoreTriplesFactory

Returns:

A new triples factory.

create_lcwa_instances(use_tqdm=None, target=None)[source]

Create LCWA instances for this factory’s triples.

Return type:

Dataset

Parameters:
  • use_tqdm (bool | None) –

  • target (int | None) –

create_slcwa_instances(*, sampler=None, **kwargs)[source]

Create sLCWA instances for this factory’s triples.

Return type:

Dataset

Parameters:

sampler (str | None) –

entities_to_ids(entities)[source]

Normalize entities to IDs.

Parameters:

entities (Union[Collection[int], Collection[str]]) – A collection of either integer identifiers for entities or string labels for entities (that will get auto-converted)

Return type:

Collection[int]

Returns:

Integer identifiers for entities

Raises:

ValueError – If the entities passed are string labels and this triples factory does not have an entity label to identifier mapping (e.g., it’s just a base CoreTriplesFactory instance)

classmethod from_path_binary(path)[source]

Load triples factory from a binary file.

Parameters:

path (Union[str, Path, TextIO]) – The path, pointing to an existing PyTorch .pt file.

Return type:

CoreTriplesFactory

Returns:

The loaded triples factory.

get_inverse_relation_id(relation)[source]

Get the inverse relation identifier for the given relation.

Return type:

int

Parameters:

relation (int) –

get_mask_for_relations(relations, invert=False)[source]

Get a boolean mask for triples with the given relations.

Return type:

BoolTensor

Parameters:
get_most_frequent_relations(n)[source]

Get the IDs of the n most frequent relations.

Parameters:

n (Union[int, float]) – Either the (integer) number of top relations to keep or the (float) percentage of top relationships to keep.

Return type:

Set[int]

Returns:

A set of IDs for the n most frequent relations

Raises:

TypeError – If the n is the wrong type

iter_extra_repr()[source]

Iterate over extra_repr components.

Return type:

Iterable[str]

new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:
  • entities (Union[None, Collection[int], Collection[str]]) – The entities of interest. If None, defaults to all entities.

  • relations (Union[None, Collection[int], Collection[str]]) – The relations of interest. If None, defaults to all relations.

  • invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.

  • invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Return type:

CoreTriplesFactory

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

property num_triples: int

The number of triples.

Return type:

int

relations_to_ids(relations)[source]

Normalize relations to IDs.

Parameters:

relations (Union[Collection[int], Collection[str]]) – A collection of either integer identifiers for relations or string labels for relations (that will get auto-converted)

Return type:

Collection[int]

Returns:

Integer identifiers for relations

Raises:

ValueError – If the relations passed are string labels and this triples factory does not have a relation label to identifier mapping (e.g., it’s just a base CoreTriplesFactory instance)

split(ratios=0.8, *, random_state=None, randomize_cleanup=False, method=None)[source]

Split a triples factory into a train/test.

Parameters:
  • ratios (Union[float, Sequence[float]]) –

    There are three options for this argument:

    1. A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.

    2. A list of ratios can be given for which set in which order should get what ratios as in [0.8, 0.1]. The final ratio can be omitted because that can be calculated.

    3. All ratios can be explicitly set in order such as in [0.8, 0.1, 0.1] where the sum of all ratios is 1.0.

  • random_state (Union[None, int, Generator]) – The random state used to shuffle and split the triples.

  • randomize_cleanup (bool) – If true, uses the non-deterministic method for moving triples to the training set. This has the advantage that it does not necessarily have to move all of them, but it might be significantly slower since it moves one triple at a time.

  • method (Optional[str]) – The name of the method to use, from SPLIT_METHODS. Defaults to “coverage”.

Return type:

List[CoreTriplesFactory]

Returns:

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

ratio = 0.8  # makes a [0.8, 0.2] split
training_factory, testing_factory = factory.split(ratio)

ratios = [0.8, 0.1]  # makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)

ratios = [0.8, 0.1, 0.1]  # also makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)
tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:
  • tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type:

DataFrame

Returns:

A dataframe with n rows, and 6 + len(kwargs) columns.

to_path_binary(path)[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:

path (Union[str, Path, TextIO]) – The path to store the triples factory to.

Return type:

Path

Returns:

The path to the file that got dumped

with_labels(entity_to_id, relation_to_id)[source]

Add labeling to the TriplesFactory.

Return type:

TriplesFactory

Parameters:
class Instances[source]

Base class for training instances.

classmethod from_triples(mapped_triples, *, num_entities, num_relations, **kwargs)[source]

Create instances from mapped triples.

Parameters:
  • mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.

  • num_entities (int) – >0 The number of entities.

  • num_relations (int) – >0 The number of relations.

  • kwargs – additional keyword-based parameters.

Return type:

Instances

Returns:

The instances.

# noqa:DAR202 # noqa:DAR401

get_collator()[source]

Get a collator.

Return type:

Optional[Callable[[List[~SampleType]], ~BatchType]]

class KGInfo(num_entities, num_relations, create_inverse_triples)[source]

An object storing information about the number of entities and relations.

Initialize the information object.

Parameters:
  • num_entities (int) – the number of entities.

  • num_relations (int) – the number of relations, excluding artifical inverse relations.

  • create_inverse_triples (bool) – whether to create inverse triples

create_inverse_triples: bool

whether to create inverse triples

iter_extra_repr()[source]

Iterate over extra_repr components.

Return type:

Iterable[str]

num_entities: int

the number of unique entities

num_relations: int

the number of relations (maybe including “artificial” inverse relations)

real_num_relations: int

the number of real relations, i.e., without artificial inverses

class LCWAInstances(*, pairs, compressed)[source]

Triples and mappings to their indices for LCWA.

Initialize the LCWA instances.

Parameters:
  • pairs (ndarray) – The unique pairs

  • compressed (csr_matrix) – The compressed triples in CSR format

classmethod from_triples(mapped_triples, *, num_entities, num_relations, target=None, **kwargs)[source]

Create LCWA instances from triples.

Parameters:
  • mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.

  • num_entities (int) – The number of entities.

  • num_relations (int) – The number of relations.

  • target (Optional[int]) – The column to predict

  • kwargs – Keyword arguments (thrown out)

Return type:

Instances

Returns:

The instances.

class SLCWAInstances(*, mapped_triples, num_entities=None, num_relations=None, negative_sampler=None, negative_sampler_kwargs=None)[source]

Training instances for the sLCWA.

Initialize the sLCWA instances.

Parameters:
  • mapped_triples (LongTensor) – shape: (num_triples, 3) the ID-based triples, passed to the negative sampler

  • num_entities (Optional[int]) – >0 the number of entities, passed to the negative sampler

  • num_relations (Optional[int]) – >0 the number of relations, passed to the negative sampler

  • negative_sampler (Union[str, NegativeSampler, Type[NegativeSampler], None]) – the negative sampler, or a hint thereof

  • negative_sampler_kwargs (Optional[Mapping[str, Any]]) – additional keyword-based arguments passed to the negative sampler

static collate(samples)[source]

Collate samples.

Return type:

SLCWABatch

Parameters:

samples (Iterable[Tuple[LongTensor, LongTensor, BoolTensor | None]]) –

classmethod from_triples(mapped_triples, *, num_entities, num_relations, **kwargs)[source]

Create instances from mapped triples.

Parameters:
  • mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.

  • num_entities (int) – >0 The number of entities.

  • num_relations (int) – >0 The number of relations.

  • kwargs – additional keyword-based parameters.

Return type:

Instances

Returns:

The instances.

# noqa:DAR202 # noqa:DAR401

get_collator()[source]

Get a collator.

Return type:

Optional[Callable[[List[Tuple[LongTensor, LongTensor, Optional[BoolTensor]]]], SLCWABatch]]

class TriplesFactory(mapped_triples, entity_to_id, relation_to_id, create_inverse_triples=False, metadata=None, num_entities=None, num_relations=None)[source]

Create instances given the path to triples.

Create the triples factory.

Parameters:
  • mapped_triples (Union[LongTensor, ndarray]) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.

  • entity_to_id (Mapping[str, int]) – The mapping from entities’ labels to their indices.

  • relation_to_id (Mapping[str, int]) – The mapping from relations’ labels to their indices.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • metadata (Optional[Mapping[str, Any]]) – Arbitrary metadata to go with the graph

  • num_entities (Optional[int]) – the number of entities. May be None, in which case this number is inferred by the label mapping

  • num_relations (Optional[int]) – the number of relations. May be None, in which case this number is inferred by the label mapping

Raises:

ValueError – if the explicitly provided number of entities or relations does not match with the one given by the label mapping

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:
  • mapped_triples (LongTensor) – The new mapped triples.

  • extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

  • create_inverse_triples (Optional[bool]) – Change inverse triple creation flag. If None, use flag from this factory.

Return type:

TriplesFactory

Returns:

The new factory.

entities_to_ids(entities)[source]

Normalize entities to IDs.

Parameters:

entities (Union[Collection[int], Collection[str]]) – A collection of either integer identifiers for entities or string labels for entities (that will get auto-converted)

Return type:

Collection[int]

Returns:

Integer identifiers for entities

Raises:

ValueError – If the entities passed are string labels and this triples factory does not have an entity label to identifier mapping (e.g., it’s just a base CoreTriplesFactory instance)

property entity_id_to_label: Mapping[int, str]

Return the mapping from entity IDs to labels.

Return type:

Mapping[int, str]

property entity_to_id: Mapping[str, int]

Return the mapping from entity labels to IDs.

Return type:

Mapping[str, int]

entity_word_cloud(top=None)[source]

Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.

Parameters:

top (Optional[int]) – The number of top entities to show. Defaults to 100.

Returns:

A word cloud object for a Jupyter notebook

Warning

This function requires the wordcloud package. Use pip install pykeen[wordcloud] to install it.

classmethod from_labeled_triples(triples, *, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, filter_out_candidate_inverse_relations=True, metadata=None)[source]

Create a new triples factory from label-based triples.

Parameters:
  • triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • entity_to_id (Optional[Mapping[str, int]]) – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id (Optional[Mapping[str, int]]) – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.

  • filter_out_candidate_inverse_relations (bool) – Whether to remove triples with relations with the inverse suffix.

  • metadata (Optional[Dict[str, Any]]) – Arbitrary key/value pairs to store as metadata

Return type:

TriplesFactory

Returns:

A new triples factory.

classmethod from_path(path, *, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, metadata=None, load_triples_kwargs=None, **kwargs)[source]

Create a new triples factory from triples stored in a file.

Parameters:
  • path (Union[str, Path, TextIO]) – The path where the label-based triples are stored.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • entity_to_id (Optional[Mapping[str, int]]) – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id (Optional[Mapping[str, int]]) – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.

  • metadata (Optional[Dict[str, Any]]) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include path as a key because it is automatically taken from the path kwarg to this function.

  • load_triples_kwargs (Optional[Mapping[str, Any]]) – Optional keyword arguments to pass to load_triples(). Could include the delimiter or a column_remapping.

  • kwargs – additional keyword-based parameters, which are ignored.

Return type:

TriplesFactory

Returns:

A new triples factory.

get_inverse_relation_id(relation)[source]

Get the inverse relation identifier for the given relation.

Return type:

int

Parameters:

relation (str | int) –

get_mask_for_relations(relations, invert=False)[source]

Get a boolean mask for triples with the given relations.

Return type:

BoolTensor

Parameters:
label_triples(triples, unknown_entity_label='[UNKNOWN]', unknown_relation_label=None)[source]

Convert ID-based triples to label-based ones.

Parameters:
  • triples (LongTensor) – The ID-based triples.

  • unknown_entity_label (str) – The label to use for unknown entity IDs.

  • unknown_relation_label (Optional[str]) – The label to use for unknown relation IDs.

Return type:

ndarray

Returns:

The same triples, but labeled.

map_triples(triples)[source]

Convert label-based triples to ID-based triples.

Return type:

LongTensor

Parameters:

triples (ndarray) –

new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters:
  • entities (Union[None, Collection[int], Collection[str]]) – The entities of interest. If None, defaults to all entities.

  • relations (Union[None, Collection[int], Collection[str]]) – The relations of interest. If None, defaults to all relations.

  • invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.

  • invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Return type:

TriplesFactory

Returns:

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

property relation_id_to_label: Mapping[int, str]

Return the mapping from relations IDs to labels.

Return type:

Mapping[int, str]

property relation_to_id: Mapping[str, int]

Return the mapping from relations labels to IDs.

Return type:

Mapping[str, int]

relation_word_cloud(top=None)[source]

Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.

Parameters:

top (Optional[int]) – The number of top relations to show. Defaults to 100.

Returns:

A world cloud object for a Jupyter notebook

Warning

This function requires the wordcloud package. Use pip install pykeen[wordcloud] to install it.

relations_to_ids(relations)[source]

Normalize relations to IDs.

Parameters:

relations (Union[Collection[int], Collection[str]]) – A collection of either integer identifiers for relations or string labels for relations (that will get auto-converted)

Return type:

Collection[int]

Returns:

Integer identifiers for relations

Raises:

ValueError – If the relations passed are string labels and this triples factory does not have a relation label to identifier mapping (e.g., it’s just a base CoreTriplesFactory instance)

tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:
  • tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type:

DataFrame

Returns:

A dataframe with n rows, and 6 + len(kwargs) columns.

to_core_triples_factory()[source]

Return this factory as a core factory.

Return type:

CoreTriplesFactory

to_path_binary(path)[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:

path (Union[str, Path, TextIO]) – The path to store the triples factory to.

Return type:

Path

Returns:

The path to the file that got dumped

property triples: ndarray

The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Return type:

ndarray

class TriplesNumericLiteralsFactory(*, numeric_literals, literals_to_id, **kwargs)[source]

Create multi-modal instances given the path to triples.

Initialize the multi-modal triples factory.

Parameters:
  • numeric_literals (ndarray) – shape: (num_entities, num_literals) the numeric literals as a dense matrix.

  • literals_to_id (Mapping[str, int]) – a mapping from literal names to their IDs, i.e., the columns in the numeric_literals matrix.

  • kwargs – additional keyword-based parameters passed to TriplesFactory.__init__().

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters:
  • mapped_triples (LongTensor) – The new mapped triples.

  • extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

  • create_inverse_triples (Optional[bool]) – Change inverse triple creation flag. If None, use flag from this factory.

Return type:

TriplesNumericLiteralsFactory

Returns:

The new factory.

classmethod from_labeled_triples(triples, *, numeric_triples=None, **kwargs)[source]

Create a new triples factory from label-based triples.

Parameters:
  • triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.

  • create_inverse_triples – Whether to create inverse triples.

  • entity_to_id – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id – Whether to compact IDs such that the IDs are consecutive.

  • filter_out_candidate_inverse_relations – Whether to remove triples with relations with the inverse suffix.

  • metadata – Arbitrary key/value pairs to store as metadata

  • numeric_triples (ndarray) –

Return type:

TriplesNumericLiteralsFactory

Returns:

A new triples factory.

classmethod from_path(path, *, path_to_numeric_triples=None, **kwargs)[source]

Create a new triples factory from triples stored in a file.

Parameters:
  • path (Union[str, Path, TextIO]) – The path where the label-based triples are stored.

  • create_inverse_triples – Whether to create inverse triples.

  • entity_to_id – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id – Whether to compact IDs such that the IDs are consecutive.

  • metadata – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include path as a key because it is automatically taken from the path kwarg to this function.

  • load_triples_kwargs – Optional keyword arguments to pass to load_triples(). Could include the delimiter or a column_remapping.

  • kwargs – additional keyword-based parameters, which are ignored.

  • path_to_numeric_triples (None | str | Path | TextIO) –

Return type:

TriplesNumericLiteralsFactory

Returns:

A new triples factory.

get_numeric_literals_tensor()[source]

Return the numeric literals as a tensor.

Return type:

FloatTensor

iter_extra_repr()[source]

Iterate over extra_repr components.

Return type:

Iterable[str]

property literal_shape: Tuple[int, ...]

Return the shape of the literals.

Return type:

Tuple[int, …]

to_path_binary(path)[source]

Save triples factory to path in (PyTorch’s .pt) binary format.

Parameters:

path (Union[str, Path, TextIO]) – The path to store the triples factory to.

Return type:

Path

Returns:

The path to the file that got dumped

get_mapped_triples(x=None, *, mapped_triples=None, triples=None, factory=None)[source]

Get ID-based triples either directly, or from a factory.

Preference order: 1. mapped_triples 2. triples (converted using factory) 3. x 4. factory.mapped_triples

Parameters:
Raises:

ValueError – if all inputs are None, or provided inputs are invalid.

Return type:

LongTensor

Returns:

the ID-based triples

Instance creation utilities.

compute_compressed_adjacency_list(mapped_triples, num_entities=None)[source]

Compute compressed undirected adjacency list representation for efficient sampling.

The compressed adjacency list format is inspired by CSR sparse matrix format.

Parameters:
  • mapped_triples (LongTensor) – the ID-based triples

  • num_entities (Optional[int]) – the number of entities.

Return type:

Tuple[LongTensor, LongTensor, LongTensor]

Returns:

a tuple (degrees, offsets, compressed_adj_lists) where

  • degrees: shape: (num_entities,)

  • offsets: shape: (num_entities,)

  • compressed_adj_list: shape: (2 * num_triples, 2)

with

adj_list[i] = compressed_adj_list[offsets[i]:offsets[i+1]]

get_entities(triples)[source]

Get all entities from the triples.

Return type:

Set[int]

Parameters:

triples (LongTensor) –

get_relations(triples)[source]

Get all relations from the triples.

Return type:

Set[int]

Parameters:

triples (LongTensor) –

load_triples(path, delimiter='\\t', encoding=None, column_remapping=None)[source]

Load triples saved as tab separated values.

Parameters:
  • path (Union[str, Path, TextIO]) – The key for the data to be loaded. Typically, this will be a file path ending in .tsv that points to a file with three columns - the head, relation, and tail. This can also be used to invoke PyKEEN data importer entrypoints (see below).

  • delimiter (str) – The delimiter between the columns in the file

  • encoding (Optional[str]) – The encoding for the file. Defaults to utf-8.

  • column_remapping (Optional[Sequence[int]]) – A remapping if the three columns do not follow the order head-relation-tail. For example, if the order is head-tail-relation, pass (0, 2, 1)

Return type:

ndarray

Returns:

A numpy array representing “labeled” triples.

Raises:

ValueError – if a column remapping was passed but it was not a length 3 sequence

Besides TSV handling, PyKEEN does not come with any importers pre-installed. A few can be found at:

tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters:
  • tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type:

DataFrame

Returns:

A dataframe with n rows, and 3 + len(kwargs) columns.

Raises:

ValueError – If a reserved column name appears in kwargs.

Remixing and dataset distance utilities.

Most datasets are given in with a pre-defined split, but it’s often not discussed how this split was created. This module contains utilities for investigating the effects of remixing pre-split datasets like :class`pykeen.datasets.Nations`.

Further, it defines a metric for the “distance” between two splits of a given dataset. Later, this will be used to map the landscape and see if there is a smooth, continuous relationship between datasets’ splits’ distances and their maximum performance.

remix(*triples_factories, **kwargs)[source]

Remix the triples from the training, testing, and validation set.

Parameters:
  • triples_factories (CoreTriplesFactory) – A sequence of triples factories

  • kwargs – Keyword arguments to be passed to split()

Return type:

List[CoreTriplesFactory]

Returns:

A sequence of triples factories of the same sizes but randomly re-assigned triples

Raises:

NotImplementedError – if any of the triples factories have create_inverse_triples

Deterioration algorithm.

deteriorate(reference, *others, n, random_state=None)[source]

Remove n triples from the reference set.

Parameters:
  • reference (TriplesFactory) – The reference triples factory

  • others (TriplesFactory) – Other triples factories to deteriorate

  • n (Union[int, float]) – The ratio to deteriorate. If given as a float, should be between 0 and 1. If an integer, deteriorates that many triples

  • random_state (Union[None, int, Generator]) – The random state

Return type:

List[TriplesFactory]

Returns:

A concatenated list of the processed reference and other triples factories

Raises:
  • NotImplementedError – if the reference triples factory has inverse triples

  • ValueError – If a float is given for n that isn’t between 0 and 1