Datasets

pykeen.datasets Package

Built-in datasets for PyKEEN.

New datasets (inheriting from pykeen.datasets.Dataset) can be registered with PyKEEN using the pykeen.datasets group in Python entrypoints in your own setup.py or setup.cfg package configuration. They are loaded automatically with pkg_resources.iter_entry_points().

Functions

get_dataset(*[, dataset, dataset_kwargs, ...])

Get a dataset, cached based on the given kwargs.

has_dataset(key)

Return if the dataset is registered in PyKEEN.

Classes

Dataset()

The base dataset class.

AristoV4(**kwargs)

The Aristo-v4 dataset from [chen2021].

Hetionet([create_inverse_triples, random_state])

The Hetionet dataset from [himmelstein2017].

Kinships([create_inverse_triples])

The Kinships dataset.

Nations([create_inverse_triples])

The Nations dataset.

OpenBioLink([create_inverse_triples])

The OpenBioLink dataset.

OpenBioLinkLQ([create_inverse_triples])

The low-quality variant of the OpenBioLink dataset.

CoDExSmall([create_inverse_triples])

The CoDEx small dataset.

CoDExMedium([create_inverse_triples])

The CoDEx medium dataset.

CoDExLarge([create_inverse_triples])

The CoDEx large dataset.

CN3l([graph_pair])

The CN3l dataset family.

OGBBioKG([cache_root, create_inverse_triples])

The OGB BioKG dataset.

OGBWikiKG2([cache_root, create_inverse_triples])

The OGB WikiKG2 dataset.

UMLS([create_inverse_triples])

The UMLS dataset.

FB15k([create_inverse_triples])

The FB15k dataset.

FB15k237([create_inverse_triples])

The FB15k-237 dataset.

WK3l15k([graph_pair])

The WK3l-15k dataset family.

WK3l120k([graph_pair])

The WK3l-120k dataset family.

WN18([create_inverse_triples])

The WN18 dataset.

WN18RR([create_inverse_triples])

The WN18-RR dataset.

YAGO310([create_inverse_triples])

The YAGO3-10 dataset is a subset of YAGO3 that only contains entities with at least 10 relations.

DRKG([create_inverse_triples, random_state])

The DRKG dataset.

BioKG([create_inverse_triples, random_state])

The BioKG dataset from [walsh2020].

ConceptNet([create_inverse_triples, ...])

The ConceptNet dataset from [speer2017].

CKG([create_inverse_triples, random_state])

The Clinical Knowledge Graph (CKG) dataset from [santos2020].

CSKG([create_inverse_triples, random_state])

The CSKG dataset.

DBpedia50([create_inverse_triples])

The DBpedia50 dataset.

DB100K([create_inverse_triples])

The DB100K dataset from [ding2018].

OpenEA(*[, graph_pair, size, version])

The OpenEA dataset family.

Countries([create_inverse_triples])

The Countries dataset.

WD50KT([create_inverse_triples])

The triples-only version of WD50K.

Wikidata5M([create_inverse_triples])

The Wikidata5M dataset from [wang2019].

PharmKG8k([create_inverse_triples])

The PharmKG8k dataset from [zheng2020].

PharmKG([create_inverse_triples, random_state])

The PharmKGFull dataset from [zheng2020].

PrimeKG([create_inverse_triples, random_state])

The Precision Medicine Knowledge Graph (PrimeKG) dataset from [chandak2022].

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.base.Dataset, pykeen.datasets.aristo.AristoV4, pykeen.datasets.hetionet.Hetionet, pykeen.datasets.kinships.Kinships, pykeen.datasets.nations.Nations, pykeen.datasets.openbiolink.OpenBioLink, pykeen.datasets.openbiolink.OpenBioLinkLQ, pykeen.datasets.codex.CoDExSmall, pykeen.datasets.codex.CoDExMedium, pykeen.datasets.codex.CoDExLarge, pykeen.datasets.ea.wk3l.CN3l, pykeen.datasets.ogb.OGBBioKG, pykeen.datasets.ogb.OGBWikiKG2, pykeen.datasets.umls.UMLS, pykeen.datasets.freebase.FB15k, pykeen.datasets.freebase.FB15k237, pykeen.datasets.ea.wk3l.WK3l15k, pykeen.datasets.ea.wk3l.WK3l120k, pykeen.datasets.wordnet.WN18, pykeen.datasets.wordnet.WN18RR, pykeen.datasets.yago.YAGO310, pykeen.datasets.drkg.DRKG, pykeen.datasets.biokg.BioKG, pykeen.datasets.conceptnet.ConceptNet, pykeen.datasets.ckg.CKG, pykeen.datasets.cskg.CSKG, pykeen.datasets.dbpedia.DBpedia50, pykeen.datasets.db100k.DB100K, pykeen.datasets.ea.openea.OpenEA, pykeen.datasets.countries.Countries, pykeen.datasets.wd50k.WD50KT, pykeen.datasets.wikidata5m.Wikidata5M, pykeen.datasets.pharmkg.PharmKG8k, pykeen.datasets.pharmkg.PharmKG, pykeen.datasets.primekg.PrimeKG

pykeen.datasets.base Module

Utility classes for constructing datasets.

Functions

dataset_similarity(a, b[, metric])

Calculate the similarity between two datasets.

Classes

Dataset()

The base dataset class.

EagerDataset(training, testing[, ...])

A dataset whose training, testing, and optional validation factories are pre-loaded.

LazyDataset()

A dataset whose training, testing, and optional validation factories are lazily loaded.

PathDataset(training_path, testing_path, ...)

Contains a lazy reference to a training, testing, and validation dataset.

RemoteDataset(url, relative_training_path, ...)

Contains a lazy reference to a remote dataset that is loaded if needed.

UnpackedRemoteDataset(training_url, ...[, ...])

A dataset with all three of train, test, and validation sets as URLs.

TarFileRemoteDataset(url, ...[, cache_root, ...])

A remote dataset stored as a tar file.

PackedZipRemoteDataset(...[, url, name, ...])

Contains a lazy reference to a remote dataset that is loaded if needed.

CompressedSingleDataset(url, relative_path)

Loads a dataset that's a single file inside an archive.

TarFileSingleDataset(url, relative_path[, ...])

Loads a dataset that's a single file inside a tar.gz archive.

ZipSingleDataset(url, relative_path[, name, ...])

Loads a dataset that's a single file inside a zip archive.

TabbedDataset([cache_root, eager, ...])

This class is for when you've got a single TSV of edges and want them to get auto-split.

SingleTabbedDataset(url[, name, cache_root, ...])

This class is for when you've got a single TSV of edges and want them to get auto-split.

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.base.Dataset, pykeen.datasets.base.EagerDataset, pykeen.datasets.base.LazyDataset, pykeen.datasets.base.PathDataset, pykeen.datasets.base.RemoteDataset, pykeen.datasets.base.UnpackedRemoteDataset, pykeen.datasets.base.TarFileRemoteDataset, pykeen.datasets.base.PackedZipRemoteDataset, pykeen.datasets.base.CompressedSingleDataset, pykeen.datasets.base.TarFileSingleDataset, pykeen.datasets.base.ZipSingleDataset, pykeen.datasets.base.TabbedDataset, pykeen.datasets.base.SingleTabbedDataset

pykeen.datasets.analysis Module

Dataset analysis utilities.

Functions

get_relation_count_df(dataset[, ...])

Create a dataframe with relation counts.

get_entity_count_df(dataset[, merge_sides, ...])

Create a dataframe with entity counts.

get_entity_relation_co_occurrence_df(dataset)

Create a dataframe of entity/relation co-occurrence.

get_relation_functionality_df(*, dataset[, ...])

Calculate the functionality and inverse functionality score per relation.

get_relation_pattern_types_df(dataset, *[, ...])

Categorize relations based on patterns from RotatE [sun2019].

get_relation_cardinality_types_df(*, dataset)

Determine the relation cardinality types.

Inductive Datasets

pykeen.datasets.inductive Package

Inductive models in PyKEEN.

Classes

InductiveDataset()

Contains transductive train and inductive inference/validation/test datasets.

EagerInductiveDataset(transductive_training, ...)

An eager inductive datasets.

LazyInductiveDataset()

An inductive dataset that has lazy loading.

DisjointInductivePathDataset(...[, eager, ...])

A disjoint inductive dataset specified by paths.

UnpackedRemoteDisjointInductiveDataset(...)

A dataset with all four of train, inductive_inference, inductive test, and inductive validation sets as URLs.

InductiveFB15k237([version, ...])

The inductive FB15k-237 dataset in 4 versions.

InductiveWN18RR([version, ...])

The inductive WN18RR dataset in 4 versions.

InductiveNELL([version, create_inverse_triples])

The inductive NELL dataset in 4 versions.

ILPC2022Large(**kwargs)

An inductive link prediction dataset for the ILPC 2022 Challenge.

ILPC2022Small(**kwargs)

An inductive link prediction dataset for the ILPC 2022 Challenge.

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.inductive.base.InductiveDataset, pykeen.datasets.inductive.base.EagerInductiveDataset, pykeen.datasets.inductive.base.LazyInductiveDataset, pykeen.datasets.inductive.base.DisjointInductivePathDataset, pykeen.datasets.inductive.base.UnpackedRemoteDisjointInductiveDataset, pykeen.datasets.inductive.ilp_teru.InductiveFB15k237, pykeen.datasets.inductive.ilp_teru.InductiveWN18RR, pykeen.datasets.inductive.ilp_teru.InductiveNELL, pykeen.datasets.inductive.ilpc2022.ILPC2022Large, pykeen.datasets.inductive.ilpc2022.ILPC2022Small

Entity Alignment

pykeen.datasets.ea.combination Module

Combination strategies for entity alignment datasets.

Classes

GraphPairCombinator()

A base class for combination of a graph pair into a single graph.

DisjointGraphPairCombinator()

This combinator keeps both graphs as disconnected components.

SwapGraphPairCombinator()

Add extra triples by swapping aligned entities.

ExtraRelationGraphPairCombinator()

This combinator keeps all entities, but introduces a novel alignment relation.

CollapseGraphPairCombinator()

This combinator merges all matching entity pairs into a single ID.

ProcessedTuple(mapped_triples, alignment, ...)

The result of processing a pair of triples factories.

Class Inheritance Diagram

Inheritance diagram of pykeen.datasets.ea.combination.GraphPairCombinator, pykeen.datasets.ea.combination.DisjointGraphPairCombinator, pykeen.datasets.ea.combination.SwapGraphPairCombinator, pykeen.datasets.ea.combination.ExtraRelationGraphPairCombinator, pykeen.datasets.ea.combination.CollapseGraphPairCombinator, pykeen.datasets.ea.combination.ProcessedTuple