Datasets
pykeen.datasets Package
Built-in datasets for PyKEEN.
New datasets (inheriting from pykeen.datasets.Dataset
) can be registered with PyKEEN using the
pykeen.datasets
group in Python entrypoints in your own setup.py or setup.cfg package configuration.
They are loaded automatically with pkg_resources.iter_entry_points()
.
Functions
|
Get a dataset, cached based on the given kwargs. |
|
Return if the dataset is registered in PyKEEN. |
Classes
|
The base dataset class. |
|
The Aristo-v4 dataset from [chen2021]. |
|
The Hetionet dataset from [himmelstein2017]. |
|
The Kinships dataset. |
|
The Nations dataset. |
|
The OpenBioLink dataset. |
|
The low-quality variant of the OpenBioLink dataset. |
|
The CoDEx small dataset. |
|
The CoDEx medium dataset. |
|
The CoDEx large dataset. |
|
The CN3l dataset family. |
|
The OGB BioKG dataset. |
|
The OGB WikiKG2 dataset. |
|
The UMLS dataset. |
|
The FB15k dataset. |
|
The FB15k-237 dataset. |
|
The WK3l-15k dataset family. |
|
The WK3l-120k dataset family. |
|
The WN18 dataset. |
|
The WN18-RR dataset. |
|
The YAGO3-10 dataset is a subset of YAGO3 that only contains entities with at least 10 relations. |
|
The DRKG dataset. |
|
The BioKG dataset from [walsh2020]. |
|
The ConceptNet dataset from [speer2017]. |
|
The Clinical Knowledge Graph (CKG) dataset from [santos2020]. |
|
The CSKG dataset. |
|
The DBpedia50 dataset. |
|
The DB100K dataset from [ding2018]. |
|
The OpenEA dataset family. |
|
The Countries dataset. |
|
The triples-only version of WD50K. |
|
The Wikidata5M dataset from [wang2019]. |
|
The PharmKG8k dataset from [zheng2020]. |
|
The PharmKGFull dataset from [zheng2020]. |
|
The Precision Medicine Knowledge Graph (PrimeKG) dataset from [chandak2022]. |
|
The Global Biotic Interactions (GloBI) dataset. |
|
The PharMeBINet dataset from [koenigs2022]. |
Class Inheritance Diagram
digraph inheritance875d756eb0 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "ABC" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Helper class that provides a standard way to create an ABC using"]; "AristoV4" [URL="../api/pykeen.datasets.AristoV4.html#pykeen.datasets.AristoV4",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Aristo-v4 dataset from [chen2021]."]; "PackedZipRemoteDataset" -> "AristoV4" [arrowsize=0.5,style="setlinewidth(0.5)"]; "BioKG" [URL="../api/pykeen.datasets.BioKG.html#pykeen.datasets.BioKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The BioKG dataset from [walsh2020]_."]; "ZipSingleDataset" -> "BioKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CKG" [URL="../api/pykeen.datasets.CKG.html#pykeen.datasets.CKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Clinical Knowledge Graph (CKG) dataset from [santos2020]_."]; "TabbedDataset" -> "CKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CN3l" [URL="../api/pykeen.datasets.CN3l.html#pykeen.datasets.CN3l",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The CN3l dataset family."]; "MTransEDataset" -> "CN3l" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CSKG" [URL="../api/pykeen.datasets.CSKG.html#pykeen.datasets.CSKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The CSKG dataset."]; "SingleTabbedDataset" -> "CSKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CoDExLarge" [URL="../api/pykeen.datasets.CoDExLarge.html#pykeen.datasets.CoDExLarge",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The CoDEx large dataset."]; "UnpackedRemoteDataset" -> "CoDExLarge" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CoDExMedium" [URL="../api/pykeen.datasets.CoDExMedium.html#pykeen.datasets.CoDExMedium",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The CoDEx medium dataset."]; "UnpackedRemoteDataset" -> "CoDExMedium" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CoDExSmall" [URL="../api/pykeen.datasets.CoDExSmall.html#pykeen.datasets.CoDExSmall",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The CoDEx small dataset."]; "UnpackedRemoteDataset" -> "CoDExSmall" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CompressedSingleDataset" [URL="../api/pykeen.datasets.base.CompressedSingleDataset.html#pykeen.datasets.base.CompressedSingleDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Loads a dataset that's a single file inside an archive."]; "LazyDataset" -> "CompressedSingleDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ConceptNet" [URL="../api/pykeen.datasets.ConceptNet.html#pykeen.datasets.ConceptNet",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The ConceptNet dataset from [speer2017]_."]; "SingleTabbedDataset" -> "ConceptNet" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Countries" [URL="../api/pykeen.datasets.Countries.html#pykeen.datasets.Countries",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Countries dataset."]; "UnpackedRemoteDataset" -> "Countries" [arrowsize=0.5,style="setlinewidth(0.5)"]; "DB100K" [URL="../api/pykeen.datasets.DB100K.html#pykeen.datasets.DB100K",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The DB100K dataset from [ding2018]_."]; "UnpackedRemoteDataset" -> "DB100K" [arrowsize=0.5,style="setlinewidth(0.5)"]; "DBpedia50" [URL="../api/pykeen.datasets.DBpedia50.html#pykeen.datasets.DBpedia50",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The DBpedia50 dataset."]; "UnpackedRemoteDataset" -> "DBpedia50" [arrowsize=0.5,style="setlinewidth(0.5)"]; "DRKG" [URL="../api/pykeen.datasets.DRKG.html#pykeen.datasets.DRKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The DRKG dataset."]; "TarFileSingleDataset" -> "DRKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Dataset" [URL="../api/pykeen.datasets.base.Dataset.html#pykeen.datasets.base.Dataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The base dataset class."]; "ExtraReprMixin" -> "Dataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "EADataset" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Base class for entity alignment datasets."]; "EagerDataset" -> "EADataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "EagerDataset" [URL="../api/pykeen.datasets.base.EagerDataset.html#pykeen.datasets.base.EagerDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset whose training, testing, and optional validation factories are pre-loaded."]; "Dataset" -> "EagerDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ExtraReprMixin" [URL="utils.html#pykeen.utils.ExtraReprMixin",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A mixin for modules with hierarchical `extra_repr`."]; "FB15k" [URL="../api/pykeen.datasets.FB15k.html#pykeen.datasets.FB15k",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The FB15k dataset."]; "TarFileRemoteDataset" -> "FB15k" [arrowsize=0.5,style="setlinewidth(0.5)"]; "FB15k237" [URL="../api/pykeen.datasets.FB15k237.html#pykeen.datasets.FB15k237",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The FB15k-237 dataset."]; "PackedZipRemoteDataset" -> "FB15k237" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Generic" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Abstract base class for generic types."]; "Globi" [URL="../api/pykeen.datasets.Globi.html#pykeen.datasets.Globi",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Global Biotic Interactions (GloBI) dataset."]; "SingleTabbedDataset" -> "Globi" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Hetionet" [URL="../api/pykeen.datasets.Hetionet.html#pykeen.datasets.Hetionet",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Hetionet dataset from [himmelstein2017]_."]; "SingleTabbedDataset" -> "Hetionet" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Kinships" [URL="../api/pykeen.datasets.Kinships.html#pykeen.datasets.Kinships",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Kinships dataset."]; "PathDataset" -> "Kinships" [arrowsize=0.5,style="setlinewidth(0.5)"]; "LazyDataset" [URL="../api/pykeen.datasets.base.LazyDataset.html#pykeen.datasets.base.LazyDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset whose training, testing, and optional validation factories are lazily loaded."]; "Dataset" -> "LazyDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "MTransEDataset" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Base class for WK3l datasets (WK3l-15k, WK3l-120k, CN3l)."]; "EADataset" -> "MTransEDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ABC" -> "MTransEDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Nations" [URL="../api/pykeen.datasets.Nations.html#pykeen.datasets.Nations",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Nations dataset."]; "PathDataset" -> "Nations" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OGBBioKG" [URL="../api/pykeen.datasets.OGBBioKG.html#pykeen.datasets.OGBBioKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The OGB BioKG dataset."]; "OGBLoader" -> "OGBBioKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OGBLoader" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Load from the Open Graph Benchmark (OGB)."]; "LazyDataset" -> "OGBLoader" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Generic" -> "OGBLoader" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OGBWikiKG2" [URL="../api/pykeen.datasets.OGBWikiKG2.html#pykeen.datasets.OGBWikiKG2",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The OGB WikiKG2 dataset."]; "OGBLoader" -> "OGBWikiKG2" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OpenBioLink" [URL="../api/pykeen.datasets.OpenBioLink.html#pykeen.datasets.OpenBioLink",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The OpenBioLink dataset."]; "PackedZipRemoteDataset" -> "OpenBioLink" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OpenBioLinkLQ" [URL="../api/pykeen.datasets.OpenBioLinkLQ.html#pykeen.datasets.OpenBioLinkLQ",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The low-quality variant of the OpenBioLink dataset."]; "PackedZipRemoteDataset" -> "OpenBioLinkLQ" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OpenEA" [URL="../api/pykeen.datasets.OpenEA.html#pykeen.datasets.OpenEA",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The OpenEA dataset family."]; "EADataset" -> "OpenEA" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PackedZipRemoteDataset" [URL="../api/pykeen.datasets.base.PackedZipRemoteDataset.html#pykeen.datasets.base.PackedZipRemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains a lazy reference to a remote dataset that is loaded if needed."]; "LazyDataset" -> "PackedZipRemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PathDataset" [URL="../api/pykeen.datasets.base.PathDataset.html#pykeen.datasets.base.PathDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains a lazy reference to a training, testing, and validation dataset."]; "LazyDataset" -> "PathDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PharMeBINet" [URL="../api/pykeen.datasets.PharMeBINet.html#pykeen.datasets.PharMeBINet",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The PharMeBINet dataset from [koenigs2022]_."]; "TarFileSingleDataset" -> "PharMeBINet" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PharmKG" [URL="../api/pykeen.datasets.PharmKG.html#pykeen.datasets.PharmKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The PharmKGFull dataset from [zheng2020]_."]; "SingleTabbedDataset" -> "PharmKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PharmKG8k" [URL="../api/pykeen.datasets.PharmKG8k.html#pykeen.datasets.PharmKG8k",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The PharmKG8k dataset from [zheng2020]_."]; "UnpackedRemoteDataset" -> "PharmKG8k" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PrimeKG" [URL="../api/pykeen.datasets.PrimeKG.html#pykeen.datasets.PrimeKG",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Precision Medicine Knowledge Graph (PrimeKG) dataset from [chandak2022]_."]; "SingleTabbedDataset" -> "PrimeKG" [arrowsize=0.5,style="setlinewidth(0.5)"]; "RemoteDataset" [URL="../api/pykeen.datasets.base.RemoteDataset.html#pykeen.datasets.base.RemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains a lazy reference to a remote dataset that is loaded if needed."]; "PathDataset" -> "RemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "SingleTabbedDataset" [URL="../api/pykeen.datasets.base.SingleTabbedDataset.html#pykeen.datasets.base.SingleTabbedDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="This class is for when you've got a single TSV of edges and want them to get auto-split."]; "TabbedDataset" -> "SingleTabbedDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TabbedDataset" [URL="../api/pykeen.datasets.base.TabbedDataset.html#pykeen.datasets.base.TabbedDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="This class is for when you've got a single TSV of edges and want them to get auto-split."]; "LazyDataset" -> "TabbedDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TarFileRemoteDataset" [URL="../api/pykeen.datasets.base.TarFileRemoteDataset.html#pykeen.datasets.base.TarFileRemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A remote dataset stored as a tar file."]; "RemoteDataset" -> "TarFileRemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TarFileSingleDataset" [URL="../api/pykeen.datasets.base.TarFileSingleDataset.html#pykeen.datasets.base.TarFileSingleDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Loads a dataset that's a single file inside a tar.gz archive."]; "CompressedSingleDataset" -> "TarFileSingleDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "UMLS" [URL="../api/pykeen.datasets.UMLS.html#pykeen.datasets.UMLS",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The UMLS dataset."]; "PathDataset" -> "UMLS" [arrowsize=0.5,style="setlinewidth(0.5)"]; "UnpackedRemoteDataset" [URL="../api/pykeen.datasets.base.UnpackedRemoteDataset.html#pykeen.datasets.base.UnpackedRemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset with all three of train, test, and validation sets as URLs."]; "PathDataset" -> "UnpackedRemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "WD50KT" [URL="../api/pykeen.datasets.WD50KT.html#pykeen.datasets.WD50KT",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The triples-only version of WD50K."]; "UnpackedRemoteDataset" -> "WD50KT" [arrowsize=0.5,style="setlinewidth(0.5)"]; "WK3l120k" [URL="../api/pykeen.datasets.WK3l120k.html#pykeen.datasets.WK3l120k",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The WK3l-120k dataset family."]; "MTransEDataset" -> "WK3l120k" [arrowsize=0.5,style="setlinewidth(0.5)"]; "WK3l15k" [URL="../api/pykeen.datasets.WK3l15k.html#pykeen.datasets.WK3l15k",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The WK3l-15k dataset family."]; "MTransEDataset" -> "WK3l15k" [arrowsize=0.5,style="setlinewidth(0.5)"]; "WN18" [URL="../api/pykeen.datasets.WN18.html#pykeen.datasets.WN18",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The WN18 dataset."]; "TarFileRemoteDataset" -> "WN18" [arrowsize=0.5,style="setlinewidth(0.5)"]; "WN18RR" [URL="../api/pykeen.datasets.WN18RR.html#pykeen.datasets.WN18RR",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The WN18-RR dataset."]; "TarFileRemoteDataset" -> "WN18RR" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Wikidata5M" [URL="../api/pykeen.datasets.Wikidata5M.html#pykeen.datasets.Wikidata5M",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The Wikidata5M dataset from [wang2019]_."]; "TarFileRemoteDataset" -> "Wikidata5M" [arrowsize=0.5,style="setlinewidth(0.5)"]; "YAGO310" [URL="../api/pykeen.datasets.YAGO310.html#pykeen.datasets.YAGO310",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The YAGO3-10 dataset is a subset of YAGO3 that only contains entities with at least 10 relations."]; "TarFileRemoteDataset" -> "YAGO310" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ZipSingleDataset" [URL="../api/pykeen.datasets.base.ZipSingleDataset.html#pykeen.datasets.base.ZipSingleDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Loads a dataset that's a single file inside a zip archive."]; "CompressedSingleDataset" -> "ZipSingleDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; }pykeen.datasets.base Module
Utility classes for constructing datasets.
Functions
|
Calculate the similarity between two datasets. |
Classes
|
The base dataset class. |
|
A dataset whose training, testing, and optional validation factories are pre-loaded. |
A dataset whose training, testing, and optional validation factories are lazily loaded. |
|
|
Contains a lazy reference to a training, testing, and validation dataset. |
|
Contains a lazy reference to a remote dataset that is loaded if needed. |
|
A dataset with all three of train, test, and validation sets as URLs. |
|
A remote dataset stored as a tar file. |
|
Contains a lazy reference to a remote dataset that is loaded if needed. |
|
Loads a dataset that's a single file inside an archive. |
|
Loads a dataset that's a single file inside a tar.gz archive. |
|
Loads a dataset that's a single file inside a zip archive. |
|
This class is for when you've got a single TSV of edges and want them to get auto-split. |
|
This class is for when you've got a single TSV of edges and want them to get auto-split. |
Class Inheritance Diagram
digraph inheritanceacf16f683a { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "CompressedSingleDataset" [URL="../api/pykeen.datasets.base.CompressedSingleDataset.html#pykeen.datasets.base.CompressedSingleDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Loads a dataset that's a single file inside an archive."]; "LazyDataset" -> "CompressedSingleDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "Dataset" [URL="../api/pykeen.datasets.base.Dataset.html#pykeen.datasets.base.Dataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The base dataset class."]; "ExtraReprMixin" -> "Dataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "EagerDataset" [URL="../api/pykeen.datasets.base.EagerDataset.html#pykeen.datasets.base.EagerDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset whose training, testing, and optional validation factories are pre-loaded."]; "Dataset" -> "EagerDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ExtraReprMixin" [URL="utils.html#pykeen.utils.ExtraReprMixin",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A mixin for modules with hierarchical `extra_repr`."]; "LazyDataset" [URL="../api/pykeen.datasets.base.LazyDataset.html#pykeen.datasets.base.LazyDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset whose training, testing, and optional validation factories are lazily loaded."]; "Dataset" -> "LazyDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PackedZipRemoteDataset" [URL="../api/pykeen.datasets.base.PackedZipRemoteDataset.html#pykeen.datasets.base.PackedZipRemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains a lazy reference to a remote dataset that is loaded if needed."]; "LazyDataset" -> "PackedZipRemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PathDataset" [URL="../api/pykeen.datasets.base.PathDataset.html#pykeen.datasets.base.PathDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains a lazy reference to a training, testing, and validation dataset."]; "LazyDataset" -> "PathDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "RemoteDataset" [URL="../api/pykeen.datasets.base.RemoteDataset.html#pykeen.datasets.base.RemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains a lazy reference to a remote dataset that is loaded if needed."]; "PathDataset" -> "RemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "SingleTabbedDataset" [URL="../api/pykeen.datasets.base.SingleTabbedDataset.html#pykeen.datasets.base.SingleTabbedDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="This class is for when you've got a single TSV of edges and want them to get auto-split."]; "TabbedDataset" -> "SingleTabbedDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TabbedDataset" [URL="../api/pykeen.datasets.base.TabbedDataset.html#pykeen.datasets.base.TabbedDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="This class is for when you've got a single TSV of edges and want them to get auto-split."]; "LazyDataset" -> "TabbedDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TarFileRemoteDataset" [URL="../api/pykeen.datasets.base.TarFileRemoteDataset.html#pykeen.datasets.base.TarFileRemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A remote dataset stored as a tar file."]; "RemoteDataset" -> "TarFileRemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TarFileSingleDataset" [URL="../api/pykeen.datasets.base.TarFileSingleDataset.html#pykeen.datasets.base.TarFileSingleDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Loads a dataset that's a single file inside a tar.gz archive."]; "CompressedSingleDataset" -> "TarFileSingleDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "UnpackedRemoteDataset" [URL="../api/pykeen.datasets.base.UnpackedRemoteDataset.html#pykeen.datasets.base.UnpackedRemoteDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset with all three of train, test, and validation sets as URLs."]; "PathDataset" -> "UnpackedRemoteDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ZipSingleDataset" [URL="../api/pykeen.datasets.base.ZipSingleDataset.html#pykeen.datasets.base.ZipSingleDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Loads a dataset that's a single file inside a zip archive."]; "CompressedSingleDataset" -> "ZipSingleDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; }pykeen.datasets.analysis Module
Dataset analysis utilities.
Functions
|
Create a dataframe with relation counts. |
|
Create a dataframe with entity counts. |
|
Create a dataframe of entity/relation co-occurrence. |
|
Calculate the functionality and inverse functionality score per relation. |
|
Categorize relations based on patterns from RotatE [sun2019]. |
|
Determine the relation cardinality types. |
Inductive Datasets
pykeen.datasets.inductive Package
Inductive models in PyKEEN.
Classes
Contains transductive train and inductive inference/validation/test datasets. |
|
|
An eager inductive datasets. |
An inductive dataset that has lazy loading. |
|
|
A disjoint inductive dataset specified by paths. |
A dataset with all four of train, inductive_inference, inductive test, and inductive validation sets as URLs. |
|
|
The inductive FB15k-237 dataset in 4 versions. |
|
The inductive WN18RR dataset in 4 versions. |
|
The inductive NELL dataset in 4 versions. |
|
An inductive link prediction dataset for the ILPC 2022 Challenge. |
|
An inductive link prediction dataset for the ILPC 2022 Challenge. |
Class Inheritance Diagram
digraph inheritance4cf349c8f2 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "DisjointInductivePathDataset" [URL="../api/pykeen.datasets.inductive.DisjointInductivePathDataset.html#pykeen.datasets.inductive.DisjointInductivePathDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A disjoint inductive dataset specified by paths."]; "LazyInductiveDataset" -> "DisjointInductivePathDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "EagerInductiveDataset" [URL="../api/pykeen.datasets.inductive.EagerInductiveDataset.html#pykeen.datasets.inductive.EagerInductiveDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An eager inductive datasets."]; "InductiveDataset" -> "EagerInductiveDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ILPC2022Large" [URL="../api/pykeen.datasets.inductive.ILPC2022Large.html#pykeen.datasets.inductive.ILPC2022Large",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An inductive link prediction dataset for the ILPC 2022 Challenge."]; "UnpackedRemoteDisjointInductiveDataset" -> "ILPC2022Large" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ILPC2022Small" [URL="../api/pykeen.datasets.inductive.ILPC2022Small.html#pykeen.datasets.inductive.ILPC2022Small",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An inductive link prediction dataset for the ILPC 2022 Challenge."]; "UnpackedRemoteDisjointInductiveDataset" -> "ILPC2022Small" [arrowsize=0.5,style="setlinewidth(0.5)"]; "InductiveDataset" [URL="../api/pykeen.datasets.inductive.InductiveDataset.html#pykeen.datasets.inductive.InductiveDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Contains transductive train and inductive inference/validation/test datasets."]; "InductiveFB15k237" [URL="../api/pykeen.datasets.inductive.InductiveFB15k237.html#pykeen.datasets.inductive.InductiveFB15k237",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The inductive FB15k-237 dataset in 4 versions."]; "UnpackedRemoteDisjointInductiveDataset" -> "InductiveFB15k237" [arrowsize=0.5,style="setlinewidth(0.5)"]; "InductiveNELL" [URL="../api/pykeen.datasets.inductive.InductiveNELL.html#pykeen.datasets.inductive.InductiveNELL",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The inductive NELL dataset in 4 versions."]; "UnpackedRemoteDisjointInductiveDataset" -> "InductiveNELL" [arrowsize=0.5,style="setlinewidth(0.5)"]; "InductiveWN18RR" [URL="../api/pykeen.datasets.inductive.InductiveWN18RR.html#pykeen.datasets.inductive.InductiveWN18RR",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="The inductive WN18RR dataset in 4 versions."]; "UnpackedRemoteDisjointInductiveDataset" -> "InductiveWN18RR" [arrowsize=0.5,style="setlinewidth(0.5)"]; "LazyInductiveDataset" [URL="../api/pykeen.datasets.inductive.LazyInductiveDataset.html#pykeen.datasets.inductive.LazyInductiveDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="An inductive dataset that has lazy loading."]; "InductiveDataset" -> "LazyInductiveDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; "UnpackedRemoteDisjointInductiveDataset" [URL="../api/pykeen.datasets.inductive.UnpackedRemoteDisjointInductiveDataset.html#pykeen.datasets.inductive.UnpackedRemoteDisjointInductiveDataset",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A dataset with all four of train, inductive_inference, inductive test, and inductive validation sets as URLs."]; "DisjointInductivePathDataset" -> "UnpackedRemoteDisjointInductiveDataset" [arrowsize=0.5,style="setlinewidth(0.5)"]; }Entity Alignment
pykeen.datasets.ea.combination Module
Combination strategies for entity alignment datasets.
Classes
A base class for combination of a graph pair into a single graph. |
|
This combinator keeps both graphs as disconnected components. |
|
Add extra triples by swapping aligned entities. |
|
This combinator keeps all entities, but introduces a novel alignment relation. |
|
This combinator merges all matching entity pairs into a single ID. |
|
|
The result of processing a pair of triples factories. |