Triples¶
Classes for creating and storing training data from triples.
- class CoreTriplesFactory(mapped_triples, num_entities, num_relations, create_inverse_triples=False, metadata=None)[source]¶
Create instances from ID-based triples.
Create the triples factory.
- Parameters:
mapped_triples (
Union
[LongTensor
,ndarray
]) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.num_entities (
int
) – The number of entities.num_relations (
int
) – The number of relations.create_inverse_triples (
bool
) – Whether to create inverse triples.metadata (
Optional
[Mapping
[str
,Any
]]) – Arbitrary metadata to go with the graph
- Raises:
TypeError – if the mapped_triples are of non-integer dtype
ValueError – if the mapped_triples are of invalid shape
- clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]¶
Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters:
mapped_triples (
LongTensor
) – The new mapped triples.extra_metadata (
Optional
[Dict
[str
,Any
]]) – Extra metadata to include in the new triples factory. Ifkeep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (
bool
) – Pass the current factory’s metadata to the new triples factorycreate_inverse_triples (
Optional
[bool
]) – Change inverse triple creation flag. If None, use flag from this factory.
- Return type:
- Returns:
The new factory.
- classmethod create(mapped_triples, num_entities=None, num_relations=None, create_inverse_triples=False, metadata=None)[source]¶
Create a triples factory without any label information.
- Parameters:
mapped_triples (
LongTensor
) – shape: (n, 3) The ID-based triples.num_entities (
Optional
[int
]) – The number of entities. If not given, inferred from mapped_triples.num_relations (
Optional
[int
]) – The number of relations. If not given, inferred from mapped_triples.create_inverse_triples (
bool
) – Whether to create inverse triples.metadata (
Optional
[Mapping
[str
,Any
]]) – Additional metadata to store in the factory.
- Return type:
- Returns:
A new triples factory.
- create_lcwa_instances(use_tqdm=None, target=None)[source]¶
Create LCWA instances for this factory’s triples.
- create_slcwa_instances(*, sampler=None, **kwargs)[source]¶
Create sLCWA instances for this factory’s triples.
- entities_to_ids(entities)[source]¶
Normalize entities to IDs.
- Parameters:
entities (
Union
[Collection
[int
],Collection
[str
]]) – A collection of either integer identifiers for entities or string labels for entities (that will get auto-converted)- Return type:
- Returns:
Integer identifiers for entities
- Raises:
ValueError – If the
entities
passed are string labels and this triples factory does not have an entity label to identifier mapping (e.g., it’s just a baseCoreTriplesFactory
instance)
- classmethod from_path_binary(path)[source]¶
Load triples factory from a binary file.
- get_inverse_relation_id(relation)[source]¶
Get the inverse relation identifier for the given relation.
- get_mask_for_relations(relations, invert=False)[source]¶
Get a boolean mask for triples with the given relations.
- Return type:
BoolTensor
- Parameters:
relations (Collection[int]) –
invert (bool) –
- new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]¶
Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters:
entities (
Union
[None
,Collection
[int
],Collection
[str
]]) – The entities of interest. If None, defaults to all entities.relations (
Union
[None
,Collection
[int
],Collection
[str
]]) – The relations of interest. If None, defaults to all relations.invert_entity_selection (
bool
) – Whether to invert the entity selection, i.e. select those triples without the provided entities.invert_relation_selection (
bool
) – Whether to invert the relation selection, i.e. select those triples without the provided relations.
- Return type:
- Returns:
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
- relations_to_ids(relations)[source]¶
Normalize relations to IDs.
- Parameters:
relations (
Union
[Collection
[int
],Collection
[str
]]) – A collection of either integer identifiers for relations or string labels for relations (that will get auto-converted)- Return type:
- Returns:
Integer identifiers for relations
- Raises:
ValueError – If the
relations
passed are string labels and this triples factory does not have a relation label to identifier mapping (e.g., it’s just a baseCoreTriplesFactory
instance)
- split(ratios=0.8, *, random_state=None, randomize_cleanup=False, method=None)[source]¶
Split a triples factory into a train/test.
- Parameters:
ratios (
Union
[float
,Sequence
[float
]]) –There are three options for this argument:
A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.
A list of ratios can be given for which set in which order should get what ratios as in
[0.8, 0.1]
. The final ratio can be omitted because that can be calculated.All ratios can be explicitly set in order such as in
[0.8, 0.1, 0.1]
where the sum of all ratios is 1.0.
random_state (
Union
[None
,int
,Generator
]) – The random state used to shuffle and split the triples.randomize_cleanup (
bool
) – If true, uses the non-deterministic method for moving triples to the training set. This has the advantage that it does not necessarily have to move all of them, but it might be significantly slower since it moves one triple at a time.method (
Optional
[str
]) – The name of the method to use, from SPLIT_METHODS. Defaults to “coverage”.
- Return type:
- Returns:
A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.
ratio = 0.8 # makes a [0.8, 0.2] split training_factory, testing_factory = factory.split(ratio) ratios = [0.8, 0.1] # makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios) ratios = [0.8, 0.1, 0.1] # also makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios)
- tensor_to_df(tensor, **kwargs)[source]¶
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters:
tensor (
LongTensor
) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).kwargs (
Union
[Tensor
,ndarray
,Sequence
]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Return type:
DataFrame
- Returns:
A dataframe with n rows, and 6 + len(kwargs) columns.
- class Instances(*args, **kwds)[source]¶
Base class for training instances.
- class KGInfo(num_entities, num_relations, create_inverse_triples)[source]¶
An object storing information about the number of entities and relations.
Initialize the information object.
- Parameters:
- class LCWAInstances(*, pairs, compressed)[source]¶
Triples and mappings to their indices for LCWA.
Initialize the LCWA instances.
- Parameters:
pairs (
ndarray
) – The unique pairscompressed (
csr_matrix
) – The compressed triples in CSR format
- class SLCWAInstances(*, mapped_triples, num_entities=None, num_relations=None, negative_sampler=None, negative_sampler_kwargs=None)[source]¶
Training instances for the sLCWA.
Initialize the sLCWA instances.
- Parameters:
mapped_triples (
LongTensor
) – shape: (num_triples, 3) the ID-based triples, passed to the negative samplernum_entities (
Optional
[int
]) – >0 the number of entities, passed to the negative samplernum_relations (
Optional
[int
]) – >0 the number of relations, passed to the negative samplernegative_sampler (
Union
[str
,NegativeSampler
,Type
[NegativeSampler
],None
]) – the negative sampler, or a hint thereofnegative_sampler_kwargs (
Optional
[Mapping
[str
,Any
]]) – additional keyword-based arguments passed to the negative sampler
- class TriplesFactory(mapped_triples, entity_to_id, relation_to_id, create_inverse_triples=False, metadata=None, num_entities=None, num_relations=None)[source]¶
Create instances given the path to triples.
Create the triples factory.
- Parameters:
mapped_triples (
Union
[LongTensor
,ndarray
]) – shape: (n, 3) A three-column matrix where each row are the head identifier, relation identifier, then tail identifier.entity_to_id (
Mapping
[str
,int
]) – The mapping from entities’ labels to their indices.relation_to_id (
Mapping
[str
,int
]) – The mapping from relations’ labels to their indices.create_inverse_triples (
bool
) – Whether to create inverse triples.metadata (
Optional
[Mapping
[str
,Any
]]) – Arbitrary metadata to go with the graphnum_entities (
Optional
[int
]) – the number of entities. May be None, in which case this number is inferred by the label mappingnum_relations (
Optional
[int
]) – the number of relations. May be None, in which case this number is inferred by the label mapping
- Raises:
ValueError – if the explicitly provided number of entities or relations does not match with the one given by the label mapping
- clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]¶
Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters:
mapped_triples (
LongTensor
) – The new mapped triples.extra_metadata (
Optional
[Dict
[str
,Any
]]) – Extra metadata to include in the new triples factory. Ifkeep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (
bool
) – Pass the current factory’s metadata to the new triples factorycreate_inverse_triples (
Optional
[bool
]) – Change inverse triple creation flag. If None, use flag from this factory.
- Return type:
- Returns:
The new factory.
- entities_to_ids(entities)[source]¶
Normalize entities to IDs.
- Parameters:
entities (
Union
[Collection
[int
],Collection
[str
]]) – A collection of either integer identifiers for entities or string labels for entities (that will get auto-converted)- Return type:
- Returns:
Integer identifiers for entities
- Raises:
ValueError – If the
entities
passed are string labels and this triples factory does not have an entity label to identifier mapping (e.g., it’s just a baseCoreTriplesFactory
instance)
- entity_word_cloud(top=None)[source]¶
Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.
- Parameters:
top (
Optional
[int
]) – The number of top entities to show. Defaults to 100.- Returns:
A word cloud object for a Jupyter notebook
Warning
This function requires the
wordcloud
package. Usepip install pykeen[wordcloud]
to install it.
- classmethod from_labeled_triples(triples, *, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, filter_out_candidate_inverse_relations=True, metadata=None)[source]¶
Create a new triples factory from label-based triples.
- Parameters:
triples (
ndarray
) – shape: (n, 3), dtype: str The label-based triples.create_inverse_triples (
bool
) – Whether to create inverse triples.entity_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from entity labels to ID. If None, create a new one from the triples.relation_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from relations labels to ID. If None, create a new one from the triples.compact_id (
bool
) – Whether to compact IDs such that the IDs are consecutive.filter_out_candidate_inverse_relations (
bool
) – Whether to remove triples with relations with the inverse suffix.metadata (
Optional
[Dict
[str
,Any
]]) – Arbitrary key/value pairs to store as metadata
- Return type:
- Returns:
A new triples factory.
- classmethod from_path(path, *, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, metadata=None, load_triples_kwargs=None, **kwargs)[source]¶
Create a new triples factory from triples stored in a file.
- Parameters:
path (
Union
[str
,Path
,TextIO
]) – The path where the label-based triples are stored.create_inverse_triples (
bool
) – Whether to create inverse triples.entity_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from entity labels to ID. If None, create a new one from the triples.relation_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from relations labels to ID. If None, create a new one from the triples.compact_id (
bool
) – Whether to compact IDs such that the IDs are consecutive.metadata (
Optional
[Dict
[str
,Any
]]) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not includepath
as a key because it is automatically taken from thepath
kwarg to this function.load_triples_kwargs (
Optional
[Mapping
[str
,Any
]]) – Optional keyword arguments to pass toload_triples()
. Could include thedelimiter
or acolumn_remapping
.kwargs – additional keyword-based parameters, which are ignored.
- Return type:
- Returns:
A new triples factory.
- get_inverse_relation_id(relation)[source]¶
Get the inverse relation identifier for the given relation.
- get_mask_for_relations(relations, invert=False)[source]¶
Get a boolean mask for triples with the given relations.
- Return type:
BoolTensor
- Parameters:
relations (Collection[int] | Collection[str]) –
invert (bool) –
- label_triples(triples, unknown_entity_label='[UNKNOWN]', unknown_relation_label=None)[source]¶
Convert ID-based triples to label-based ones.
- map_triples(triples)[source]¶
Convert label-based triples to ID-based triples.
- Return type:
LongTensor
- Parameters:
triples (ndarray) –
- new_with_restriction(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]¶
Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters:
entities (
Union
[None
,Collection
[int
],Collection
[str
]]) – The entities of interest. If None, defaults to all entities.relations (
Union
[None
,Collection
[int
],Collection
[str
]]) – The relations of interest. If None, defaults to all relations.invert_entity_selection (
bool
) – Whether to invert the entity selection, i.e. select those triples without the provided entities.invert_relation_selection (
bool
) – Whether to invert the relation selection, i.e. select those triples without the provided relations.
- Return type:
- Returns:
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
- relation_word_cloud(top=None)[source]¶
Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.
- Parameters:
top (
Optional
[int
]) – The number of top relations to show. Defaults to 100.- Returns:
A world cloud object for a Jupyter notebook
Warning
This function requires the
wordcloud
package. Usepip install pykeen[wordcloud]
to install it.
- relations_to_ids(relations)[source]¶
Normalize relations to IDs.
- Parameters:
relations (
Union
[Collection
[int
],Collection
[str
]]) – A collection of either integer identifiers for relations or string labels for relations (that will get auto-converted)- Return type:
- Returns:
Integer identifiers for relations
- Raises:
ValueError – If the
relations
passed are string labels and this triples factory does not have a relation label to identifier mapping (e.g., it’s just a baseCoreTriplesFactory
instance)
- tensor_to_df(tensor, **kwargs)[source]¶
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters:
tensor (
LongTensor
) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).kwargs (
Union
[Tensor
,ndarray
,Sequence
]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Return type:
DataFrame
- Returns:
A dataframe with n rows, and 6 + len(kwargs) columns.
- class TriplesNumericLiteralsFactory(*, numeric_literals, literals_to_id, **kwargs)[source]¶
Create multi-modal instances given the path to triples.
Initialize the multi-modal triples factory.
- Parameters:
numeric_literals (
ndarray
) – shape: (num_entities, num_literals) the numeric literals as a dense matrix.literals_to_id (
Mapping
[str
,int
]) – a mapping from literal names to their IDs, i.e., the columns in the numeric_literals matrix.kwargs – additional keyword-based parameters passed to
TriplesFactory.__init__()
.
- clone_and_exchange_triples(mapped_triples, extra_metadata=None, keep_metadata=True, create_inverse_triples=None)[source]¶
Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters:
mapped_triples (
LongTensor
) – The new mapped triples.extra_metadata (
Optional
[Dict
[str
,Any
]]) – Extra metadata to include in the new triples factory. Ifkeep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (
bool
) – Pass the current factory’s metadata to the new triples factorycreate_inverse_triples (
Optional
[bool
]) – Change inverse triple creation flag. If None, use flag from this factory.
- Return type:
- Returns:
The new factory.
- classmethod from_labeled_triples(triples, *, numeric_triples=None, **kwargs)[source]¶
Create a new triples factory from label-based triples.
- Parameters:
triples (
ndarray
) – shape: (n, 3), dtype: str The label-based triples.create_inverse_triples – Whether to create inverse triples.
entity_to_id – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id – Whether to compact IDs such that the IDs are consecutive.
filter_out_candidate_inverse_relations – Whether to remove triples with relations with the inverse suffix.
metadata – Arbitrary key/value pairs to store as metadata
numeric_triples (ndarray | None) –
- Return type:
- Returns:
A new triples factory.
- classmethod from_path(path, *, path_to_numeric_triples=None, **kwargs)[source]¶
Create a new triples factory from triples stored in a file.
- Parameters:
path (
Union
[str
,Path
,TextIO
]) – The path where the label-based triples are stored.create_inverse_triples – Whether to create inverse triples.
entity_to_id – The mapping from entity labels to ID. If None, create a new one from the triples.
relation_to_id – The mapping from relations labels to ID. If None, create a new one from the triples.
compact_id – Whether to compact IDs such that the IDs are consecutive.
metadata – Arbitrary key/value pairs to store as metadata with the triples factory. Do not include
path
as a key because it is automatically taken from thepath
kwarg to this function.load_triples_kwargs – Optional keyword arguments to pass to
load_triples()
. Could include thedelimiter
or acolumn_remapping
.kwargs – additional keyword-based parameters, which are ignored.
- Return type:
- Returns:
A new triples factory.
- get_numeric_literals_tensor()[source]¶
Return the numeric literals as a tensor.
- Return type:
FloatTensor
- get_mapped_triples(x=None, *, mapped_triples=None, triples=None, factory=None)[source]¶
Get ID-based triples either directly, or from a factory.
Preference order: 1. mapped_triples 2. triples (converted using factory) 3. x 4. factory.mapped_triples
- Parameters:
x (
Union
[Tuple
[str
,str
,str
],Sequence
[Tuple
[str
,str
,str
]],ndarray
,LongTensor
,CoreTriplesFactory
,None
]) – either of label-based triples, ID-based triples, a factory, or None.mapped_triples (
Optional
[LongTensor
]) – shape: (n, 3) the ID-based triplestriples (
Union
[None
,ndarray
,Tuple
[str
,str
,str
],Sequence
[Tuple
[str
,str
,str
]]]) – the label-based triplesfactory (
Optional
[CoreTriplesFactory
]) – the triples factory
- Raises:
ValueError – if all inputs are None, or provided inputs are invalid.
- Return type:
LongTensor
- Returns:
the ID-based triples
Instance creation utilities.
- compute_compressed_adjacency_list(mapped_triples, num_entities=None)[source]¶
Compute compressed undirected adjacency list representation for efficient sampling.
The compressed adjacency list format is inspired by CSR sparse matrix format.
- Parameters:
- Return type:
Tuple
[LongTensor
,LongTensor
,LongTensor
]- Returns:
a tuple (degrees, offsets, compressed_adj_lists) where
degrees: shape: (num_entities,)
offsets: shape: (num_entities,)
compressed_adj_list: shape: (2 * num_triples, 2)
with
adj_list[i] = compressed_adj_list[offsets[i]:offsets[i+1]]
- load_triples(path, delimiter='\\t', encoding=None, column_remapping=None)[source]¶
Load triples saved as tab separated values.
- Parameters:
path (
Union
[str
,Path
,TextIO
]) – The key for the data to be loaded. Typically, this will be a file path ending in.tsv
that points to a file with three columns - the head, relation, and tail. This can also be used to invoke PyKEEN data importer entrypoints (see below).delimiter (
str
) – The delimiter between the columns in the fileencoding (
Optional
[str
]) – The encoding for the file. Defaults to utf-8.column_remapping (
Optional
[Sequence
[int
]]) – A remapping if the three columns do not follow the order head-relation-tail. For example, if the order is head-tail-relation, pass(0, 2, 1)
- Return type:
- Returns:
A numpy array representing “labeled” triples.
- Raises:
ValueError – if a column remapping was passed but it was not a length 3 sequence
Besides TSV handling, PyKEEN does not come with any importers pre-installed. A few can be found at:
bio2bel.io.pykeen
- tensor_to_df(tensor, **kwargs)[source]¶
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters:
tensor (
LongTensor
) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).kwargs (
Union
[Tensor
,ndarray
,Sequence
]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Return type:
DataFrame
- Returns:
A dataframe with n rows, and 3 + len(kwargs) columns.
- Raises:
ValueError – If a reserved column name appears in kwargs.
Remixing and dataset distance utilities.
Most datasets are given in with a pre-defined split, but it’s often not discussed how this split was created. This module contains utilities for investigating the effects of remixing pre-split datasets like :class`pykeen.datasets.Nations`.
Further, it defines a metric for the “distance” between two splits of a given dataset. Later, this will be used to map the landscape and see if there is a smooth, continuous relationship between datasets’ splits’ distances and their maximum performance.
- remix(*triples_factories, **kwargs)[source]¶
Remix the triples from the training, testing, and validation set.
- Parameters:
triples_factories (
CoreTriplesFactory
) – A sequence of triples factorieskwargs – Keyword arguments to be passed to
split()
- Return type:
- Returns:
A sequence of triples factories of the same sizes but randomly re-assigned triples
- Raises:
NotImplementedError – if any of the triples factories have
create_inverse_triples
Deterioration algorithm.
- deteriorate(reference, *others, n, random_state=None)[source]¶
Remove n triples from the reference set.
- Parameters:
reference (
TriplesFactory
) – The reference triples factoryothers (
TriplesFactory
) – Other triples factories to deterioraten (
Union
[int
,float
]) – The ratio to deteriorate. If given as a float, should be between 0 and 1. If an integer, deteriorates that many triplesrandom_state (
Union
[None
,int
,Generator
]) – The random state
- Return type:
- Returns:
A concatenated list of the processed reference and other triples factories
- Raises:
NotImplementedError – if the reference triples factory has inverse triples
ValueError – If a float is given for n that isn’t between 0 and 1