Dataset

Bases: object

The base dataset class.

Attributes Summary

`entity_to_id`	The mapping of entity labels to IDs.
`factory_dict`	Return a dictionary of the three factories.
`metadata`	the dataset's name
`metadata_file_name`
`num_entities`	The number of entities.
`num_relations`	The number of relations.
`relation_to_id`	The mapping of relation labels to IDs.

Methods Summary

`cli`()	Run the CLI.
`deteriorate`(n[, random_state])	Deteriorate n triples from the dataset's training with `pykeen.triples.deteriorate.deteriorate()`.
`docdata`(*parts)	Get docdata for this class.
`from_directory_binary`(path)	Load a dataset from a directory.
`from_path`(path[, ratios])	Create a dataset from a single triples factory by splitting it in 3.
`from_tf`(tf[, ratios])	Create a dataset from a single triples factory by splitting it in 3.
`get_normalized_name`()	Get the normalized name of the dataset.
`remix`([random_state])	Remix a dataset using `pykeen.triples.remix.remix()`.
`similarity`(other[, metric])	Compute the similarity between two shuffles of the same dataset.
`summarize`([title, show_examples, file])	Print a summary of the dataset.
`summary_str`([title, show_examples, end])	Make a summary string of all of the factories.
`to_directory_binary`(path)	Store a dataset to a path in binary format.
`triples_pair_sort_key`(pair)	Get the number of triples for sorting in an iterator context.
`triples_sort_key`(cls)	Get the number of triples for sorting.

Attributes Documentation

entity_to_id: The mapping of entity labels to IDs.

factory_dict

Return a dictionary of the three factories.

Return type: Mapping[str, CoreTriplesFactory]

metadata: Optional[Mapping[str, Any]] = None: the dataset’s name

metadata_file_name: ClassVar[str] = 'metadata.pth'

num_entities: The number of entities.

num_relations: The number of relations.

relation_to_id: The mapping of relation labels to IDs.

Methods Documentation

classmethod cli()[source]

Run the CLI.

Return type: None

deteriorate(n, random_state=None)[source]

Deteriorate n triples from the dataset’s training with pykeen.triples.deteriorate.deteriorate().

Return type: Dataset

classmethod docdata(*parts)[source]

Get docdata for this class.

Return type: Any

classmethod from_directory_binary(path)[source]

Load a dataset from a directory.

Return type: ForwardRef

classmethod from_path(path, ratios=None)[source]

Create a dataset from a single triples factory by splitting it in 3.

Return type: ForwardRef

static from_tf(tf, ratios=None)[source]

Create a dataset from a single triples factory by splitting it in 3.

Return type: ForwardRef

get_normalized_name()[source]

Get the normalized name of the dataset.

Return type: str

remix(random_state=None, **kwargs)[source]

Remix a dataset using pykeen.triples.remix.remix().

Return type: Dataset

similarity(other, metric=None)[source]

Compute the similarity between two shuffles of the same dataset.

Parameters

other (Dataset) – The other shuffling of the dataset
metric (Optional[str]) – The metric to use. Defaults to tanimoto.

Return type

float

Returns

A float of the similarity