Triples¶
Classes for creating and storing training data from triples.
- class Instances(mapped_triples, entity_to_id, relation_to_id)[source]¶
Triples and mappings to their indices.
- mapped_triples: torch.LongTensor¶
A PyTorch tensor of triples
- class LCWAInstances(mapped_triples, entity_to_id, relation_to_id, labels)[source]¶
Triples and mappings to their indices for LCWA.
- class MultimodalInstances(mapped_triples, entity_to_id, relation_to_id, numeric_literals, literals_to_id)[source]¶
Triples and mappings to their indices as well as multimodal data.
- class MultimodalLCWAInstances(mapped_triples, entity_to_id, relation_to_id, numeric_literals, literals_to_id, labels)[source]¶
Triples and mappings to their indices as well as multimodal data for LCWA.
- class MultimodalSLCWAInstances(mapped_triples, entity_to_id, relation_to_id, numeric_literals, literals_to_id)[source]¶
Triples and mappings to their indices as well as multimodal data for sLCWA.
- class SLCWAInstances(mapped_triples, entity_to_id, relation_to_id)[source]¶
Triples and mappings to their indices for sLCWA.
- class TriplesFactory(*, path=None, triples=None, create_inverse_triples=False, entity_to_id=None, relation_to_id=None, compact_id=True)[source]¶
Create instances given the path to triples.
Initialize the triples factory.
- Parameters
path (
Union
[None
,str
,TextIO
]) – The path to a 3-column TSV file with triples in it. If not specified, you should specifytriples
.triples (
Optional
[ndarray
]) – A 3-column numpy array with triples in it. If not specified, you should specifypath
create_inverse_triples (
bool
) – Should inverse triples be created? Defaults to False.compact_id (
bool
) – Whether to compact the IDs such that they range from 0 to (num_entities or num_relations)-1
- create_lcwa_instances(use_tqdm=None)[source]¶
Create LCWA instances for this factory’s triples.
- Return type
- property entity_id_to_label: Mapping[int, str]¶
The mapping from entity IDs to their labels.
- entity_word_cloud(top=None)[source]¶
Make a word cloud based on the frequency of occurrence of each entity in a Jupyter notebook.
Warning
This function requires the
word_cloud
package. Usepip install pykeen[plotting]
to install it automatically, or install it yourself withpip install git+https://github.com/kavgan/word_cloud.git
.
- get_idx_for_entities(entities, invert=False)[source]¶
Get an np.array index for triples with the given entities.
- get_idx_for_relations(relations, invert=False)[source]¶
Get an np.array index for triples with the given relations.
- get_inverse_relation_id(relation)[source]¶
Get the inverse relation identifier for the given relation.
- Return type
- get_triples_for_relations(relations, invert=False)[source]¶
Get the labeled triples containing the given relations.
- Return type
- map_triples_to_id(triples)[source]¶
Load triples and map to ids based on the existing id mappings of the triples factory.
Works from either the path to a file containing triples given as string or a numpy array containing triples.
- Return type
LongTensor
- mapped_triples: torch.LongTensor¶
A three-column matrix where each row are the head identifier, relation identifier, then tail identifier
- new_with_relations(relations)[source]¶
Make a new triples factory only keeping the given relations.
- Return type
- new_with_restriction(entities=None, relations=None)[source]¶
Make a new triples factory only keeping the given entities and relations, but keeping the ID mapping.
- Parameters
entities (
Optional
[Collection
[str
]]) – The entities of interest. If None, defaults to all entities.relations (
Optional
[Collection
[str
]]) – The relations of interest. If None, defaults to all relations.
- Return type
- Returns
A new triples factory, which has only a subset of the triples containing the entities and relations of interest. The label-to-ID mapping is not modified.
- new_without_relations(relations)[source]¶
Make a new triples factory without the given relations.
- Return type
- property relation_id_to_label: Mapping[int, str]¶
The mapping from relation IDs to their labels.
- relation_to_inverse: Optional[Mapping[str, str]]¶
A dictionary mapping each relation to its inverse, if inverse triples were created
- relation_word_cloud(top=None)[source]¶
Make a word cloud based on the frequency of occurrence of each relation in a Jupyter notebook.
Warning
This function requires the
word_cloud
package. Usepip install pykeen[plotting]
to install it automatically, or install it yourself withpip install git+https://github.com/kavgan/word_cloud.git
.
- split(ratios=0.8, *, random_state=None, randomize_cleanup=False)[source]¶
Split a triples factory into a train/test.
- Parameters
ratios (
Union
[float
,Sequence
[float
]]) – There are three options for this argument. First, a float can be given between 0 and 1.0, non-inclusive. The first triples factory will get this ratio and the second will get the rest. Second, a list of ratios can be given for which factory in which order should get what ratios as in[0.8, 0.1]
. The final ratio can be omitted because that can be calculated. Third, all ratios can be explicitly set in order such as in[0.8, 0.1, 0.1]
where the sum of all ratios is 1.0.random_state (
Union
[None
,int
,RandomState
]) – The random state used to shuffle and split the triples in this factory.randomize_cleanup (
bool
) – If true, uses the non-deterministic method for moving triples to the training set. This has the advantage that it doesn’t necessarily have to move all of them, but it might be slower.
ratio = 0.8 # makes a [0.8, 0.2] split training_factory, testing_factory = factory.split(ratio) ratios = [0.8, 0.1] # makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios) ratios = [0.8, 0.1, 0.1] # also makes a [0.8, 0.1, 0.1] split training_factory, testing_factory, validation_factory = factory.split(ratios)
- Return type
- tensor_to_df(tensor, **kwargs)[source]¶
Take a tensor of triples and make a pandas dataframe with labels.
- Parameters
tensor (
LongTensor
) – shape: (n, 3) The triples, ID-based and in format (head_id, relation_id, tail_id).kwargs (
Union
[Tensor
,ndarray
,Sequence
]) – Any additional number of columns. Each column needs to be of shape (n,). Reserved column names: {“head_id”, “head_label”, “relation_id”, “relation_label”, “tail_id”, “tail_label”}.
- Return type
DataFrame
- Returns
A dataframe with n rows, and 6 + len(kwargs) columns.
- triples: numpy.ndarray¶
A three-column matrix where each row are the head label, relation label, then tail label
- class TriplesNumericLiteralsFactory(*, path=None, triples=None, path_to_numeric_triples=None, numeric_triples=None)[source]¶
Create multi-modal instances given the path to triples.
Initialize the multi-modal triples factory.
- Parameters
path (
Union
[None
,str
,TextIO
]) – The path to a 3-column TSV file with triples in it. If not specified, you should specifytriples
.triples (
Optional
[ndarray
]) – A 3-column numpy array with triples in it. If not specified, you should specifypath
path_to_numeric_triples (
Union
[None
,str
,TextIO
]) – The path to a 3-column TSV file with triples and numeric. If not specified, you should specifynumeric_triples
.numeric_triples (
Optional
[ndarray
]) – A 3-column numpy array with numeric triples in it. If not specified, you should specifypath_to_numeric_triples
.
- create_lcwa_instances(use_tqdm=None)[source]¶
Create multi-modal LCWA instances for this factory’s triples.
- Return type
Instance creation utilities.