Triples

Classes for creating and storing training data from triples.

class Instances[source]

Triples and mappings to their indices.

class LCWAInstances(pairs, compressed)[source]

Triples and mappings to their indices for LCWA.

compressed: scipy.sparse.csr.csr_matrix

The compressed triples in CSR format

classmethod from_triples(mapped_triples, num_entities)[source]

Create LCWA instances from triples.

Parameters
  • mapped_triples (LongTensor) – shape: (num_triples, 3) The ID-based triples.

  • num_entities (int) – The number of entities.

Return type

Instances

Returns

The instances.

pairs: numpy.ndarray

The unique pairs

class MultimodalInstances(numeric_literals, literals_to_id)[source]

Triples and mappings to their indices as well as multimodal data.

numeric_literals: Mapping[str, numpy.ndarray]

TODO: do we need these?

class MultimodalLCWAInstances(numeric_literals, literals_to_id, pairs, compressed)[source]

Triples and mappings to their indices as well as multimodal data for LCWA.

class MultimodalSLCWAInstances(numeric_literals, literals_to_id, mapped_triples)[source]

Triples and mappings to their indices as well as multimodal data for sLCWA.

class SLCWAInstances(mapped_triples)[source]

Triples and mappings to their indices for sLCWA.

mapped_triples: torch.LongTensor

The mapped triples, shape: (num_triples, 3)

class TriplesFactory(entity_to_id, relation_to_id, mapped_triples, create_inverse_triples=False, metadata=None)[source]

Create instances given the path to triples.

clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True)[source]

Create a new triples factory sharing everything except the triples.

Note

We use shallow copies.

Parameters
  • mapped_triples (LongTensor) – The new mapped triples.

  • extra_metadata (Optional[Dict[str, Any]]) – Extra metadata to include in the new triples factory. If keep_metadata is true, the dictionaries will be unioned with precedence taken on keys from extra_metadata.

  • keep_metadata (bool) – Pass the current factory’s metadata to the new triples factory

Return type

TriplesFactory

Returns

The new factory.

create_inverse_triples: bool = False

Whether to create inverse triples

create_lcwa_instances(use_tqdm=None)[source]

Create LCWA instances for this factory’s triples.

Return type

Instances

create_slcwa_instances()[source]

Create sLCWA instances for this factory’s triples.

Return type

Instances

entities_to_ids(entities)[source]

Normalize entities to IDs.

Return type

Collection[int]

entity_id_to_label: Mapping[int, str]

The inverse mapping for entity_label_to_id; initialized automatically

entity_to_id: Mapping[str, int]

The mapping from entities’ labels to their indices

entity_word_cloud(top=None)[source]

Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.

Parameters

top (Optional[int]) – The number of top entities to show. Defaults to 100.

Warning

This function requires the word_cloud package. Use pip install pykeen[plotting] to install it automatically, or install it yourself with pip install git+https://github.com/kavgan/word_cloud.git.

extra_repr()[source]

Extra representation string.

Return type

str

classmethod from_labeled_triples(triples, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, filter_out_candidate_inverse_relations=True, metadata=None)[source]

Create a new triples factory from label-based triples.

Parameters
  • triples (ndarray) – shape: (n, 3), dtype: str The label-based triples.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • entity_to_id (Optional[Mapping[str, int]]) – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id (Optional[Mapping[str, int]]) – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.

  • filter_out_candidate_inverse_relations (bool) – Whether to remove triples with relations with the inverse suffix.

  • metadata (Optional[Dict[str, Any]]) – Arbitrary key/value pairs to store as metadata

Return type

TriplesFactory

Returns

A new triples factory.

classmethod from_path(path, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, metadata=None)[source]

Create a new triples factory from triples stored in a file.

Parameters
  • path (Union[str, TextIO]) – The path where the label-based triples are stored.

  • create_inverse_triples (bool) – Whether to create inverse triples.

  • entity_to_id (Optional[Mapping[str, int]]) – The mapping from entity labels to ID. If None, create a new one from the triples.

  • relation_to_id (Optional[Mapping[str, int]]) – The mapping from relations labels to ID. If None, create a new one from the triples.

  • compact_id (bool) – Whether to compact IDs such that the IDs are consecutive.

  • metadata (Optional[Dict[str, Any]]) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include path as a key because it is automatically taken from the path kwarg to this function.

Return type

TriplesFactory

Returns

A new triples factory.

get_inverse_relation_id(relation)[source]

Get the inverse relation identifier for the given relation.

Return type

int

get_mask_for_entities(entities, invert=False)[source]

Get a boolean mask for triples with the given entities.

Return type

BoolTensor

get_mask_for_relations(relations, invert=False)[source]

Get a boolean mask for triples with the given relations.

Return type

BoolTensor

get_most_frequent_relations(n)[source]

Get the IDs of the n most frequent relations.

Parameters

n (Union[int, float]) – Either the (integer) number of top relations to keep or the (float) percentage of top relationships to keep

Return type

Set[int]

label_triples(triples, unknown_entity_label='[UNKNOWN]', unknown_relation_label=None)[source]

Convert ID-based triples to label-based ones.

Parameters
  • triples (LongTensor) – The ID-based triples.

  • unknown_entity_label (str) – The label to use for unknown entity IDs.

  • unknown_relation_label (Optional[str]) – The label to use for unknown relation IDs.

Return type

ndarray

Returns

The same triples, but labeled.

mapped_triples: torch.LongTensor

A three-column matrix where each row are the head identifier, relation identifier, then tail identifier

metadata: Optional[Dict[str, Any]] = None

Arbitrary metadata to go with the graph

new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]

Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.

Parameters
  • entities (Union[None, Collection[int], Collection[str]]) – The entities of interest. If None, defaults to all entities.

  • relations (Union[None, Collection[int], Collection[str]]) – The relations of interest. If None, defaults to all relations.

  • invert_entity_selection (bool) – Whether to invert the entity selection, i.e. select those triples without the provided entities.

  • invert_relation_selection (bool) – Whether to invert the relation selection, i.e. select those triples without the provided relations.

Return type

TriplesFactory

Returns

A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.

property num_entities

The number of unique entities.

Return type

int

property num_relations

The number of unique relations.

Return type

int

property num_triples

The number of triples.

Return type

int

property real_num_relations

The number of relations without inverse relations.

Return type

int

relation_id_to_label: Mapping[int, str]

The inverse mapping for relation_label_to_id; initialized automatically

relation_to_id: Mapping[str, int]

The mapping from relations’ labels to their indices

relation_word_cloud(top=None)[source]

Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.

Parameters

top (Optional[int]) – The number of top relations to show. Defaults to 100.

Warning

This function requires the word_cloud package. Use pip install pykeen[plotting] to install it automatically, or install it yourself with pip install git+https://github.com/kavgan/word_cloud.git.

relations_to_ids(relations)[source]

Normalize relations to IDs.

Return type

Collection[int]

split(ratios=0.8, *, random_state=None, randomize_cleanup=False, method=None)[source]

Split a triples factory into a train/test.

Parameters
  • ratios (Union[float, Sequence[float]]) –

    There are three options for this argument:

    1. A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.

    2. A list of ratios can be given for which set in which order should get what ratios as in [0.8, 0.1]. The final ratio can be omitted because that can be calculated.

    3. All ratios can be explicitly set in order such as in [0.8, 0.1, 0.1] where the sum of all ratios is 1.0.

  • random_state (Union[None, int, Generator]) – The random state used to shuffle and split the triples.

  • randomize_cleanup (bool) – If true, uses the non-deterministic method for moving triples to the training set. This has the advantage that it does not necessarily have to move all of them, but it might be significantly slower since it moves one triple at a time.

  • method (Optional[str]) – The name of the method to use, from SPLIT_METHODS. Defaults to “coverage”.

Return type

List[TriplesFactory]

Returns

A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.

ratio = 0.8  # makes a [0.8, 0.2] split
training_factory, testing_factory = factory.split(ratio)

ratios = [0.8, 0.1]  # makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)

ratios = [0.8, 0.1, 0.1]  # also makes a [0.8, 0.1, 0.1] split
training_factory, testing_factory, validation_factory = factory.split(ratios)
tensor_to_df(tensor, **kwargs)[source]

Take a tensor of triples and make a pandas dataframe with labels.

Parameters
  • tensor (LongTensor) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).

  • kwargs (Union[Tensor, ndarray, Sequence]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.

Return type

DataFrame

Returns

A dataframe with n rows, and 6 + len(kwargs) columns.

property triples

The labeled triples, a 3-column matrix where each row are the head label, relation label, then tail label.

Return type

ndarray

class TriplesNumericLiteralsFactory(*, path=None, triples=None, path_to_numeric_triples=None, numeric_triples=None, **kwargs)[source]

Create multi-modal instances given the path to triples.

Initialize the multi-modal triples factory.

Parameters
  • path (Union[None, str, TextIO]) – The path to a 3-column TSV file with triples in it. If not specified, you should specify triples.

  • triples (Optional[ndarray]) – A 3-column numpy array with triples in it. If not specified, you should specify path

  • path_to_numeric_triples (Union[None, str, TextIO]) – The path to a 3-column TSV file with triples and numeric. If not specified, you should specify numeric_triples.

  • numeric_triples (Optional[ndarray]) – A 3-column numpy array with numeric triples in it. If not specified, you should specify path_to_numeric_triples.

create_lcwa_instances(use_tqdm=None)[source]

Create multi-modal LCWA instances for this factory’s triples.

Return type

MultimodalLCWAInstances

create_slcwa_instances()[source]

Create multi-modal sLCWA instances for this factory’s triples.

Return type

MultimodalSLCWAInstances

extra_repr()[source]

Extra representation string.

Return type

str

Instance creation utilities.

get_entities(triples)[source]

Get all entities from the triples.

Return type

Set[int]

get_relations(triples)[source]

Get all relations from the triples.

Return type

Set[int]

load_triples(path, delimiter='\\t', encoding=None)[source]

Load triples saved as tab separated values.

Besides TSV handling, PyKEEN does not come with any importers pre-installed. A few can be found at:

Return type

ndarray