Datasets

Sample datasets for use with PyKEEN, borrowed from https://github.com/ZhenfengLei/KGDatasets.

New datasets (inheriting from pykeen.datasets.base.DataSet) can be registered with PyKEEN using the pykeen.datasets group in Python entrypoints in your own setup.py or setup.cfg package configuration. They are loaded automatically with pkg_resources.iter_entry_points().

Functions

get_dataset(*[, dataset, dataset_kwargs, …])

Get the dataset.

has_dataset(key)

Return if the dataset is registered in PyKEEN.

Classes

Hetionet([create_inverse_triples, eager, …])

The Hetionet dataset is a large biological network.

Kinships(**kwargs)

The Kinships data set.

Nations(**kwargs)

The Nations data set.

OpenBioLink([create_inverse_triples, eager])

The OpenBioLink dataset.

OpenBioLinkF1([create_inverse_triples, eager])

The PyKEEN First Filtered OpenBioLink 2020 Dataset.

OpenBioLinkF2([create_inverse_triples, eager])

The PyKEEN Second Filtered OpenBioLink 2020 Dataset.

OpenBioLinkLQ([create_inverse_triples, eager])

The low-quality variant of the OpenBioLink dataset.

UMLS(**kwargs)

The UMLS data set.

FB15k([cache_root])

The FB15k data set.

FB15k237([cache_root])

The FB15k-237 data set.

WN18([cache_root])

The WN18 data set.

WN18RR([cache_root])

The WN18-RR data set.

YAGO310([cache_root])

The YAGO3-10 data set is a subset of YAGO3 that only contains entities with at least 10 relations.

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.hetionet.Hetionet, pykeen.datasets.kinships.Kinships, pykeen.datasets.nations.Nations, pykeen.datasets.openbiolink.OpenBioLink, pykeen.datasets.openbiolink.OpenBioLinkF1, pykeen.datasets.openbiolink.OpenBioLinkF2, pykeen.datasets.openbiolink.OpenBioLinkLQ, pykeen.datasets.umls.UMLS, pykeen.datasets.freebase.FB15k, pykeen.datasets.freebase.FB15k237, pykeen.datasets.wordnet.WN18, pykeen.datasets.wordnet.WN18RR, pykeen.datasets.yago.YAGO310

Utility classes for constructing datasets.

Classes

DataSet()

Contains a lazy reference to a training, testing, and validation data set.

EagerDataset(training, testing, validation)

A dataset that has already been loaded.

LazyDataSet()

A data set that has lazy loading.

PathDataSet(training_path, testing_path, …)

Contains a lazy reference to a training, testing, and validation data set.

RemoteDataSet(url, relative_training_path, …)

Contains a lazy reference to a remote dataset that is loaded if needed.

TarFileRemoteDataSet(url, …[, cache_root, …])

A remote dataset stored as a tar file.

ZipFileRemoteDataSet(url, …[, cache_root, …])

A remote dataset stored as a zip file.

PackedZipRemoteDataSet(…[, url, name, …])

Contains a lazy reference to a remote dataset that is loaded if needed.

SingleTabbedDataset([url, name, cache_root, …])

This class is for when you’ve got a single TSV of edges and want them to get auto-split.

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.base.DataSet, pykeen.datasets.base.EagerDataset, pykeen.datasets.base.LazyDataSet, pykeen.datasets.base.PathDataSet, pykeen.datasets.base.RemoteDataSet, pykeen.datasets.base.TarFileRemoteDataSet, pykeen.datasets.base.ZipFileRemoteDataSet, pykeen.datasets.base.PackedZipRemoteDataSet, pykeen.datasets.base.SingleTabbedDataset