class Hetionet(create_inverse_triples=False, eager=False, random_state=0)[source]

Bases: pykeen.datasets.base.SingleTabbedDataset

The Hetionet dataset is a large biological network.

In its publication [himmelstein2017], it is demonstrated to be useful for link prediction in drug repositioning and made publicly available through its GitHub repository in several formats. The link prediction algorithm showcased does not rely on embeddings, which leaves room for interesting comparison. One such comparison was made during the master’s thesis of Lingling Xu [xu2019].

For reproducibility, the random_state argument is set by default to 0. For permutation studies, you can change this.


Himmelstein, D. S., et al (2017). Systematic integration of biomedical knowledge prioritizes drugs for repurposing. ELife, 6.


Xu, L (2019) A Comparison of Learned and Engineered Features in Network-Based Drug Repositioning. Master’s Thesis.

Initialize dataset.

  • url – The url where to download the dataset from

  • name – The name of the file. If not given, tries to get the name from the end of the URL

  • cache_root – An optional directory to store the extracted files. Is none is given, the default PyKEEN directory is used. This is defined either by the environment variable PYKEEN_HOME or defaults to ~/.pykeen.