Triples¶
Classes for creating and storing training data from triples.
-
class
LCWAInstances
(pairs, compressed)[source]¶ Triples and mappings to their indices for LCWA.
-
compressed
: scipy.sparse.csr.csr_matrix¶ The compressed triples in CSR format
-
classmethod
from_triples
(mapped_triples, num_entities)[source]¶ Create LCWA instances from triples.
- Parameters
mapped_triples (
LongTensor
) – shape: (num_triples, 3) The ID-based triples.num_entities (
int
) – The number of entities.
- Return type
Instances
- Returns
The instances.
-
pairs
: numpy.ndarray¶ The unique pairs
-
-
class
MultimodalInstances
(numeric_literals, literals_to_id)[source]¶ Triples and mappings to their indices as well as multimodal data.
-
numeric_literals
: Mapping[str, numpy.ndarray]¶ TODO: do we need these?
-
-
class
MultimodalLCWAInstances
(numeric_literals, literals_to_id, pairs, compressed)[source]¶ Triples and mappings to their indices as well as multimodal data for LCWA.
-
class
MultimodalSLCWAInstances
(numeric_literals, literals_to_id, mapped_triples)[source]¶ Triples and mappings to their indices as well as multimodal data for sLCWA.
-
class
SLCWAInstances
(mapped_triples)[source]¶ Triples and mappings to their indices for sLCWA.
-
mapped_triples
: torch.LongTensor¶ The mapped triples, shape: (num_triples, 3)
-
-
class
TriplesFactory
(entity_to_id, relation_to_id, mapped_triples, create_inverse_triples=False, metadata=None)[source]¶ Create instances given the path to triples.
-
clone_and_exchange_triples
(mapped_triples, extra_metadata=None, keep_metadata=True)[source]¶ Create a new triples factory sharing everything except the triples.
Note
We use shallow copies.
- Parameters
mapped_triples (
LongTensor
) – The new mapped triples.extra_metadata (
Optional
[Dict
[str
,Any
]]) – Extra metadata to include in the new triples factory. Ifkeep_metadata
is true, the dictionaries will be unioned with precedence taken on keys fromextra_metadata
.keep_metadata (
bool
) – Pass the current factory’s metadata to the new triples factory
- Return type
TriplesFactory
- Returns
The new factory.
-
create_lcwa_instances
(use_tqdm=None)[source]¶ Create LCWA instances for this factory’s triples.
- Return type
Instances
-
create_slcwa_instances
()[source]¶ Create sLCWA instances for this factory’s triples.
- Return type
Instances
-
entity_id_to_label
: Mapping[int, str]¶ The inverse mapping for entity_label_to_id; initialized automatically
-
entity_word_cloud
(top=None)[source]¶ Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.
Warning
This function requires the
word_cloud
package. Usepip install pykeen[plotting]
to install it automatically, or install it yourself withpip install git+https://github.com/kavgan/word_cloud.git
.
-
classmethod
from_labeled_triples
(triples, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, filter_out_candidate_inverse_relations=True, metadata=None)[source]¶ Create a new triples factory from label-based triples.
- Parameters
triples (
ndarray
) – shape: (n, 3), dtype: str The label-based triples.create_inverse_triples (
bool
) – Whether to create inverse triples.entity_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from entity labels to ID. If None, create a new one from the triples.relation_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from relations labels to ID. If None, create a new one from the triples.compact_id (
bool
) – Whether to compact IDs such that the IDs are consecutive.filter_out_candidate_inverse_relations (
bool
) – Whether to remove triples with relations with the inverse suffix.metadata (
Optional
[Dict
[str
,Any
]]) – Arbitrary key/value pairs to store as metadata
- Return type
TriplesFactory
- Returns
A new triples factory.
-
classmethod
from_path
(path, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True, metadata=None)[source]¶ Create a new triples factory from triples stored in a file.
- Parameters
path (
Union
[str
,TextIO
]) – The path where the label-based triples are stored.create_inverse_triples (
bool
) – Whether to create inverse triples.entity_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from entity labels to ID. If None, create a new one from the triples.relation_to_id (
Optional
[Mapping
[str
,int
]]) – The mapping from relations labels to ID. If None, create a new one from the triples.compact_id (
bool
) – Whether to compact IDs such that the IDs are consecutive.metadata (
Optional
[Dict
[str
,Any
]]) – Arbitrary key/value pairs to store as metadata with the triples factory. Do not includepath
as a key because it is automatically taken from thepath
kwarg to this function.
- Return type
TriplesFactory
- Returns
A new triples factory.
-
get_inverse_relation_id
(relation)[source]¶ Get the inverse relation identifier for the given relation.
- Return type
-
get_mask_for_entities
(entities, invert=False)[source]¶ Get a boolean mask for triples with the given entities.
- Return type
-
get_mask_for_relations
(relations, invert=False)[source]¶ Get a boolean mask for triples with the given relations.
- Return type
-
label_triples
(triples, unknown_entity_label='[UNKNOWN]', unknown_relation_label=None)[source]¶ Convert ID-based triples to label-based ones.
-
mapped_triples
: torch.LongTensor¶ A three-column matrix where each row are the head identifier, relation identifier, then tail identifier
-
new_with_restriction
(entities=None, relations=None, invert_entity_selection=False, invert_relation_selection=False)[source]¶ Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters
entities (
Union
[None
,Collection
[int
],Collection
[str
]]) – The entities of interest. If None, defaults to all entities.relations (
Union
[None
,Collection
[int
],Collection
[str
]]) – The relations of interest. If None, defaults to all relations.invert_entity_selection (
bool
) – Whether to invert the entity selection, i.e. select those triples without the provided entities.invert_relation_selection (
bool
) – Whether to invert the relation selection, i.e. select those triples without the provided relations.
- Return type
TriplesFactory
- Returns
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
-
relation_id_to_label
: Mapping[int, str]¶ The inverse mapping for relation_label_to_id; initialized automatically
-
relation_word_cloud
(top=None)[source]¶ Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.
Warning
This function requires the
word_cloud
package. Usepip install pykeen[plotting]
to install it automatically, or install it yourself withpip install git+https://github.com/kavgan/word_cloud.git
.
-
split
(ratios=0.8, *, random_state=None, randomize_cleanup=False, method=None)[source]¶ Split a triples factory into a train/test.
- Parameters
ratios (
Union
[float
,Sequence
[float
]]) –There are three options for this argument:
A float can be given between 0 and 1.0, non-inclusive. The first set of triples will get this ratio and the second will get the rest.
A list of ratios can be given for which set in which order should get what ratios as in
[0.8, 0.1]
. The final ratio can be omitted because that can be calculated.All ratios can be explicitly set in order such as in
[0.8, 0.1, 0.1]
where the sum of all ratios is 1.0.
random_state (
Union
[None
,int
,Generator
]) – The random state used to shuffle and split the triples.randomize_cleanup (
bool
) – If true, uses the non-deterministic method for moving triples to the training set. This has the advantage that it does not necessarily have to move all of them, but it might be significantly slower since it moves one triple at a time.method (
Optional
[str
]) – The name of the method to use, from SPLIT_METHODS. Defaults to “coverage”.
- Return type
List
[TriplesFactory
]- Returns
A partition of triples, which are split (approximately) according to the ratios, stored TriplesFactory’s which share everything else with this root triples factory.
ratio = 0.8 # makes a [0.8, 0.2] split training_factory, testing_factory = factory.split(ratio) ratios = [0.8, 0.1] # makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios) ratios = [0.8, 0.1, 0.1] # also makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios)
-
tensor_to_df
(tensor, **kwargs)[source]¶ Take a tensor of triples and make a pandas dataframe with labels.
- Parameters
tensor (
LongTensor
) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).kwargs (
Union
[Tensor
,ndarray
,Sequence
]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Return type
DataFrame
- Returns
A dataframe with n rows, and 6 + len(kwargs) columns.
-
-
class
TriplesNumericLiteralsFactory
(*, path=None, triples=None, path_to_numeric_triples=None, numeric_triples=None, **kwargs)[source]¶ Create multi-modal instances given the path to triples.
Initialize the multi-modal triples factory.
- Parameters
path (
Union
[None
,str
,TextIO
]) – The path to a 3-column TSV file with triples in it. If not specified, you should specifytriples
.triples (
Optional
[ndarray
]) – A 3-column numpy array with triples in it. If not specified, you should specifypath
path_to_numeric_triples (
Union
[None
,str
,TextIO
]) – The path to a 3-column TSV file with triples and numeric. If not specified, you should specifynumeric_triples
.numeric_triples (
Optional
[ndarray
]) – A 3-column numpy array with numeric triples in it. If not specified, you should specifypath_to_numeric_triples
.
-
create_lcwa_instances
(use_tqdm=None)[source]¶ Create multi-modal LCWA instances for this factory’s triples.
- Return type
MultimodalLCWAInstances
Instance creation utilities.