TarFileSingleDataset

class TarFileSingleDataset(url: str, relative_path: str | PurePosixPath, name: str | None = None, cache_root: str | None = None, eager: bool = False, create_inverse_triples: bool = False, delimiter: str | None = None, random_state: None | int | Generator = None, read_csv_kwargs: dict[str, Any] | None = None)[source]

Bases: CompressedSingleDataset

Loads a dataset that’s a single file inside a tar.gz archive.

Initialize dataset.

Parameters:
  • url (str) – The url where to download the dataset from

  • relative_path (str | pathlib.PurePosixPath) – The path inside the archive to the contained dataset.

  • name (str | None) – The name of the file. If not given, tries to get the name from the end of the URL

  • cache_root (Path) – An optional directory to store the extracted files. Is none is given, the default PyKEEN directory is used. This is defined either by the environment variable PYKEEN_HOME or defaults to ~/.pykeen.

  • create_inverse_triples (bool) – Should inverse triples be created? Defaults to false.

  • eager (bool) – Should the data be loaded eagerly? Defaults to false.

  • random_state (TorchRandomHint) – An optional random state to make the training/testing/validation split reproducible.

  • delimiter (str | None) – The delimiter for the contained dataset.

  • read_csv_kwargs (dict[str, Any] | None) – Keyword arguments to pass through to pandas.read_csv().