Dataset

class Dataset[source]

Bases: ExtraReprMixin

The base dataset class.

Attributes Summary

create_inverse_triples

Return whether inverse triples are created for the training factory.

entity_to_id

The mapping of entity labels to IDs.

factory_dict

Return a dictionary of the three factories.

metadata

the dataset's name

metadata_file_name

num_entities

The number of entities.

num_relations

The number of relations.

relation_to_id

The mapping of relation labels to IDs.

Methods Summary

cli()

Run the CLI.

deteriorate(n[, random_state])

Deteriorate n triples from the dataset's training with pykeen.triples.deteriorate.deteriorate().

docdata(*parts)

Get docdata for this class.

from_directory_binary(path)

Load a dataset from a directory.

from_path(path[, ratios])

Create a dataset from a single triples factory by splitting it in 3.

from_tf(tf[, ratios])

Create a dataset from a single triples factory by splitting it in 3.

get_normalized_name()

Get the normalized name of the dataset.

iter_extra_repr()

Yield extra entries for the instance's string representation.

remix([random_state])

Remix a dataset using pykeen.triples.remix.remix().

similarity(other[, metric])

Compute the similarity between two shuffles of the same dataset.

summarize([title, show_examples, file])

Print a summary of the dataset.

summary_str([title, show_examples, end])

Make a summary string of all of the factories.

to_directory_binary(path)

Store a dataset to a path in binary format.

triples_pair_sort_key(pair)

Get the number of triples for sorting in an iterator context.

triples_sort_key(cls)

Get the number of triples for sorting.

Attributes Documentation

create_inverse_triples

Return whether inverse triples are created for the training factory.

entity_to_id

The mapping of entity labels to IDs.

factory_dict

Return a dictionary of the three factories.

Return type

Mapping[str, CoreTriplesFactory]

metadata: Optional[Mapping[str, Any]] = None

the dataset’s name

metadata_file_name: ClassVar[str] = 'metadata.pth'
num_entities

The number of entities.

num_relations

The number of relations.

relation_to_id

The mapping of relation labels to IDs.

Methods Documentation

classmethod cli()[source]

Run the CLI.

Return type

None

deteriorate(n, random_state=None)[source]

Deteriorate n triples from the dataset’s training with pykeen.triples.deteriorate.deteriorate().

Return type

Dataset

Parameters
classmethod docdata(*parts)[source]

Get docdata for this class.

Return type

Any

Parameters

parts (str) –

classmethod from_directory_binary(path)[source]

Load a dataset from a directory.

Return type

ForwardRef

Parameters

path (Union[str, Path]) –

classmethod from_path(path, ratios=None)[source]

Create a dataset from a single triples factory by splitting it in 3.

Return type

ForwardRef

Parameters
static from_tf(tf, ratios=None)[source]

Create a dataset from a single triples factory by splitting it in 3.

Return type

ForwardRef

Parameters
get_normalized_name()[source]

Get the normalized name of the dataset.

Return type

str

iter_extra_repr()[source]

Yield extra entries for the instance’s string representation.

Return type

Iterable[str]

remix(random_state=None, **kwargs)[source]

Remix a dataset using pykeen.triples.remix.remix().

Return type

Dataset

Parameters

random_state (Union[None, int, Generator]) –

similarity(other, metric=None)[source]

Compute the similarity between two shuffles of the same dataset.

Parameters
  • other (Dataset) – The other shuffling of the dataset

  • metric (Optional[str]) – The metric to use. Defaults to tanimoto.

Return type

float

Returns

A float of the similarity

See also

pykeen.triples.triples_factory.splits_similarity().

summarize(title=None, show_examples=5, file=None)[source]

Print a summary of the dataset.

Return type

None

Parameters
summary_str(title=None, show_examples=5, end='\\n')[source]

Make a summary string of all of the factories.

Return type

str

Parameters
to_directory_binary(path)[source]

Store a dataset to a path in binary format.

Return type

None

Parameters

path (Union[str, Path]) –

classmethod triples_pair_sort_key(pair)[source]

Get the number of triples for sorting in an iterator context.

Return type

int

Parameters

pair (Tuple[str, Type[Dataset]]) –

static triples_sort_key(cls)[source]

Get the number of triples for sorting.

Return type

int

Parameters

cls (Type[Dataset]) –