Dataset
- class Dataset[source]
Bases:
ExtraReprMixinThe base dataset class.
Attributes Summary
Return whether inverse triples are created for the training factory.
The mapping of entity labels to IDs.
Return a dictionary of the three factories.
the dataset's name
The number of entities.
The number of relations.
The mapping of relation labels to IDs.
Methods Summary
cli()Run the CLI.
deteriorate(n[, random_state])Deteriorate n triples from the dataset's training with
pykeen.triples.deteriorate.deteriorate().docdata(*parts)Get docdata for this class.
from_directory_binary(path)Load a dataset from a directory.
from_path(path[, ratios])Create a dataset from a single triples factory by splitting it in 3.
from_tf(tf[, ratios])Create a dataset from a single triples factory by splitting it in 3.
Get the normalized name of the dataset.
Yield extra entries for the instance's string representation.
remix([random_state])Remix a dataset using
pykeen.triples.remix.remix().restrict([entities, relations, ...])Restrict a dataset to the given entities/relations.
similarity(other[, metric])Compute the similarity between two shuffles of the same dataset.
summarize([title, show_examples, file])Print a summary of the dataset.
summary_str([title, show_examples, end])Make a summary string of all of the factories.
to_directory_binary(path)Store a dataset to a path in binary format.
triples_pair_sort_key(pair)Get the number of triples for sorting in an iterator context.
triples_sort_key(cls)Get the number of triples for sorting.
Attributes Documentation
- create_inverse_triples
Return whether inverse triples are created for the training factory.
- entity_to_id
The mapping of entity labels to IDs.
- factory_dict
Return a dictionary of the three factories.
- num_entities
The number of entities.
- num_relations
The number of relations.
- relation_to_id
The mapping of relation labels to IDs.
Methods Documentation
- deteriorate(n: int | float, random_state: None | int | Generator = None) Dataset[source]
Deteriorate n triples from the dataset’s training with
pykeen.triples.deteriorate.deteriorate().
- classmethod from_directory_binary(path: str | Path) Dataset[source]
Load a dataset from a directory.
- classmethod from_path(path: str | Path, ratios: list[float] | None = None) Dataset[source]
Create a dataset from a single triples factory by splitting it in 3.
- static from_tf(tf: TriplesFactory, ratios: list[float] | None = None) Dataset[source]
Create a dataset from a single triples factory by splitting it in 3.
- Parameters:
tf (TriplesFactory)
- Return type:
- iter_extra_repr() Iterable[str][source]
Yield extra entries for the instance’s string representation.
- remix(random_state: None | int | Generator = None, **kwargs) Dataset[source]
Remix a dataset using
pykeen.triples.remix.remix().
- restrict(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) EagerDataset | Self[source]
Restrict a dataset to the given entities/relations.
Example:
>>> from pykeen.datasets import get_dataset >>> full_dataset = get_dataset(dataset="nations") >>> restricted_dataset = dataset.restrict(entities={"burma", "china", "india", "indonesia"})
- Parameters:
entities (None | Collection[int] | Collection[str]) – The entities to keep (or discard, cf. invert_entity_selection). None corresponds to selecting all entities (but is handled more efficiently).
relations (None | Collection[int] | Collection[str]) – The relations to keep (or discard, cf. invert_relation_selection). None corresponds to selecting all relations (but is handled more efficiently).
invert_entity_selection (bool) – Whether to invert the entity selection, i.e., discard the selected entities rather than all remaining ones.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e., discard the selected relations rather than all remaining ones.
- Returns:
a new dataset with different entity and relation mappins and a restricted set of triples.
- Return type:
Warning
This is different to
pykeen.triples.triples_factory.CoreTriplesFactory.new_with_restriction()as it does modify the label to id mapping.
- similarity(other: Dataset, metric: str | None = None) float[source]
Compute the similarity between two shuffles of the same dataset.
- Parameters:
- Returns:
A float of the similarity
- Return type:
See also
pykeen.triples.triples_factory.splits_similarity().
- summarize(title: str | None = None, show_examples: int | None = 5, file=None) None[source]
Print a summary of the dataset.
- summary_str(title: str | None = None, show_examples: int | None = 5, end='\n') str[source]
Make a summary string of all of the factories.