TarFileSingleDataset

class TarFileSingleDataset(url, relative_path, name=None, cache_root=None, eager=False, create_inverse_triples=False, delimiter=None, random_state=None, read_csv_kwargs=None)[source]

Bases: CompressedSingleDataset

Loads a dataset that’s a single file inside a tar.gz archive.

Initialize dataset.

Parameters:
  • url (str) – The url where to download the dataset from

  • relative_path (Union[str, PurePosixPath]) – The path inside the archive to the contained dataset.

  • name (Optional[str]) – The name of the file. If not given, tries to get the name from the end of the URL

  • cache_root (Optional[str]) – An optional directory to store the extracted files. Is none is given, the default PyKEEN directory is used. This is defined either by the environment variable PYKEEN_HOME or defaults to ~/.pykeen.

  • create_inverse_triples (bool) – Should inverse triples be created? Defaults to false.

  • eager (bool) – Should the data be loaded eagerly? Defaults to false.

  • random_state (Union[None, int, Generator]) – An optional random state to make the training/testing/validation split reproducible.

  • delimiter (Optional[str]) – The delimiter for the contained dataset.

  • read_csv_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass through to pandas.read_csv().