Dataset
- class Dataset[source]
Bases:
ExtraReprMixin
The base dataset class.
Attributes Summary
Return whether inverse triples are created for the training factory.
The mapping of entity labels to IDs.
Return a dictionary of the three factories.
the dataset's name
The number of entities.
The number of relations.
The mapping of relation labels to IDs.
Methods Summary
cli
()Run the CLI.
deteriorate
(n[, random_state])Deteriorate n triples from the dataset's training with
pykeen.triples.deteriorate.deteriorate()
.docdata
(*parts)Get docdata for this class.
from_directory_binary
(path)Load a dataset from a directory.
from_path
(path[, ratios])Create a dataset from a single triples factory by splitting it in 3.
from_tf
(tf[, ratios])Create a dataset from a single triples factory by splitting it in 3.
Get the normalized name of the dataset.
Yield extra entries for the instance's string representation.
remix
([random_state])Remix a dataset using
pykeen.triples.remix.remix()
.restrict
([entities, relations, ...])Restrict a dataset to the given entities/relations.
similarity
(other[, metric])Compute the similarity between two shuffles of the same dataset.
summarize
([title, show_examples, file])Print a summary of the dataset.
summary_str
([title, show_examples, end])Make a summary string of all of the factories.
to_directory_binary
(path)Store a dataset to a path in binary format.
triples_pair_sort_key
(pair)Get the number of triples for sorting in an iterator context.
triples_sort_key
(cls)Get the number of triples for sorting.
Attributes Documentation
- create_inverse_triples
Return whether inverse triples are created for the training factory.
- entity_to_id
The mapping of entity labels to IDs.
- factory_dict
Return a dictionary of the three factories.
- num_entities
The number of entities.
- num_relations
The number of relations.
- relation_to_id
The mapping of relation labels to IDs.
Methods Documentation
- deteriorate(n: int | float, random_state: None | int | Generator = None) Dataset [source]
Deteriorate n triples from the dataset’s training with
pykeen.triples.deteriorate.deteriorate()
.
- classmethod from_directory_binary(path: str | Path) Dataset [source]
Load a dataset from a directory.
- classmethod from_path(path: str | Path, ratios: list[float] | None = None) Dataset [source]
Create a dataset from a single triples factory by splitting it in 3.
- static from_tf(tf: TriplesFactory, ratios: list[float] | None = None) Dataset [source]
Create a dataset from a single triples factory by splitting it in 3.
- Parameters:
tf (TriplesFactory)
- Return type:
- iter_extra_repr() Iterable[str] [source]
Yield extra entries for the instance’s string representation.
- remix(random_state: None | int | Generator = None, **kwargs) Dataset [source]
Remix a dataset using
pykeen.triples.remix.remix()
.
- restrict(entities: None | Collection[int] | Collection[str] = None, relations: None | Collection[int] | Collection[str] = None, invert_entity_selection: bool = False, invert_relation_selection: bool = False) EagerDataset | Self [source]
Restrict a dataset to the given entities/relations.
Example:
>>> from pykeen.datasets import get_dataset >>> full_dataset = get_dataset(dataset="nations") >>> restricted_dataset = dataset.restrict(entities={"burma", "china", "india", "indonesia"})
- Parameters:
entities (None | Collection[int] | Collection[str]) – The entities to keep (or discard, cf. invert_entity_selection). None corresponds to selecting all entities (but is handled more efficiently).
relations (None | Collection[int] | Collection[str]) – The relations to keep (or discard, cf. invert_relation_selection). None corresponds to selecting all relations (but is handled more efficiently).
invert_entity_selection (bool) – Whether to invert the entity selection, i.e., discard the selected entities rather than all remaining ones.
invert_relation_selection (bool) – Whether to invert the relation selection, i.e., discard the selected relations rather than all remaining ones.
- Returns:
a new dataset with different entity and relation mappins and a restricted set of triples.
- Return type:
Warning
This is different to
pykeen.triples.triples_factory.CoreTriplesFactory.new_with_restriction()
as it does modify the label to id mapping.
- similarity(other: Dataset, metric: str | None = None) float [source]
Compute the similarity between two shuffles of the same dataset.
- Parameters:
- Returns:
A float of the similarity
- Return type:
See also
pykeen.triples.triples_factory.splits_similarity()
.
- summarize(title: str | None = None, show_examples: int | None = 5, file=None) None [source]
Print a summary of the dataset.
- summary_str(title: str | None = None, show_examples: int | None = 5, end='\n') str [source]
Make a summary string of all of the factories.