Sample datasets for use with PyKEEN, borrowed from

New datasets (inheriting from pykeen.datasets.base.Dataset) can be registered with PyKEEN using the pykeen.datasets group in Python entrypoints in your own or setup.cfg package configuration. They are loaded automatically with pkg_resources.iter_entry_points().


get_dataset(*[, dataset, dataset_kwargs, …])

Get the dataset.


Return if the dataset is registered in PyKEEN.


Hetionet([create_inverse_triples, eager, …])

The Hetionet dataset is a large biological network.


The Kinships dataset.


The Nations dataset.

OpenBioLink([create_inverse_triples, eager])

The OpenBioLink dataset.

OpenBioLinkF1([create_inverse_triples, eager])

The PyKEEN First Filtered OpenBioLink 2020 Dataset.

OpenBioLinkF2([create_inverse_triples, eager])

The PyKEEN Second Filtered OpenBioLink 2020 Dataset.

OpenBioLinkLQ([create_inverse_triples, eager])

The low-quality variant of the OpenBioLink dataset.


The CoDEx small dataset.


The CoDEx medium dataset.


The CoDEx large dataset.

OGBBioKG([cache_root, create_inverse_triples])

The OGB BioKG dataset.

OGBWikiKG([cache_root, create_inverse_triples])

The OGB WikiKG dataset.


The UMLS dataset.


The FB15k dataset.


The FB15k-237 dataset.


The WN18 dataset.


The WN18-RR dataset.


The YAGO3-10 dataset is a subset of YAGO3 that only contains entities with at least 10 relations.

DRKG([create_inverse_triples, random_state])

The DRKG dataset.

ConceptNet([create_inverse_triples, …])

The ConceptNet dataset from [speer2017].

CKG([eager, create_inverse_triples, …])

The Clinical Knowledge Graph (CKG) dataset from [santos2020].

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.hetionet.Hetionet, pykeen.datasets.kinships.Kinships, pykeen.datasets.nations.Nations, pykeen.datasets.openbiolink.OpenBioLink, pykeen.datasets.openbiolink.OpenBioLinkF1, pykeen.datasets.openbiolink.OpenBioLinkF2, pykeen.datasets.openbiolink.OpenBioLinkLQ, pykeen.datasets.codex.CoDExSmall, pykeen.datasets.codex.CoDExMedium, pykeen.datasets.codex.CoDExLarge, pykeen.datasets.ogb.OGBBioKG, pykeen.datasets.ogb.OGBWikiKG, pykeen.datasets.umls.UMLS, pykeen.datasets.freebase.FB15k, pykeen.datasets.freebase.FB15k237, pykeen.datasets.wordnet.WN18, pykeen.datasets.wordnet.WN18RR, pykeen.datasets.yago.YAGO310, pykeen.datasets.drkg.DRKG, pykeen.datasets.conceptnet.ConceptNet, pykeen.datasets.ckg.CKG

Utility classes for constructing datasets.



Contains a lazy reference to a training, testing, and validation dataset.

EagerDataset(training, testing, validation)

A dataset that has already been loaded.


A dataset that has lazy loading.

PathDataset(training_path, testing_path, …)

Contains a lazy reference to a training, testing, and validation dataset.

RemoteDataset(url, relative_training_path, …)

Contains a lazy reference to a remote dataset that is loaded if needed.

UnpackedRemoteDataset(training_url, …[, …])

A dataset with all three of train, test, and validation sets as URLs.

TarFileRemoteDataset(url, …[, cache_root, …])

A remote dataset stored as a tar file.

ZipFileRemoteDataset(url, …[, cache_root, …])

A remote dataset stored as a zip file.

PackedZipRemoteDataset(…[, url, name, …])

Contains a lazy reference to a remote dataset that is loaded if needed.

TarFileSingleDataset(url, relative_path[, …])

Loads a dataset that’s a single file inside a tar.gz archive.

TabbedDataset([cache_root, eager, …])

This class is for when you’ve got a single TSV of edges and want them to get auto-split.

SingleTabbedDataset(url[, name, cache_root, …])

This class is for when you’ve got a single TSV of edges and want them to get auto-split.

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.base.Dataset, pykeen.datasets.base.EagerDataset, pykeen.datasets.base.LazyDataset, pykeen.datasets.base.PathDataset, pykeen.datasets.base.RemoteDataset, pykeen.datasets.base.UnpackedRemoteDataset, pykeen.datasets.base.TarFileRemoteDataset, pykeen.datasets.base.ZipFileRemoteDataset, pykeen.datasets.base.PackedZipRemoteDataset, pykeen.datasets.base.TarFileSingleDataset, pykeen.datasets.base.TabbedDataset, pykeen.datasets.base.SingleTabbedDataset