Using Neptune.ai

Neptune is a graphical tool for tracking the results of machine learning. PyKEEN integrates Neptune into the pipeline and HPO pipeline.

Preparation

  1. To use it, you’ll first have to install Neptune’s client with pip install neptune-client or install PyKEEN with the neptune extra with pip install pykeen[neptune].

  2. Create an account at Neptune.

    • Get an API token following this tutorial.

    • [Optional] Set the NEPTUNE_API_TOKEN environment variable to your API token.

  3. [Optional] Create a new project by following this tutorial for project and user management. Neptune automatically creates a project for all new users called sandbox which you can directly use.

Pipeline Example

This example shows using Neptune with the pykeen.pipeline.pipeline() function. Minimally, the project_qualified_name and experiment_name must be set.

from pykeen.pipeline import pipeline

pipeline_result = pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='neptune',
    result_tracker_kwargs=dict(
        project_qualified_name='cthoyt/sandbox',
        experiment_name='Tutorial Training of RotatE on Kinships',
    ),
)

Warning

If you haven’t set the NEPTUNE_API_TOKEN environment variable, the api_token becomes a mandatory key.

Reusing Experiments

In the Neptune web application, you’ll see that experiments are assigned an ID. This means you can re-use the same ID to group different sub-experiments together using the experiment_id keyword argument instead of experiment_name.

from pykeen.pipeline import pipeline

experiment_id = 4  # if doesn't already exist, will throw an error!
pipeline_result = pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='neptune'
    result_tracker_kwargs=dict(
        project_qualified_name='cthoyt/sandbox',
        experiment_id=4,
    ),
)

Don’t worry - you can keep using the experiment_name argument and the experiment’s identifier will be automatically looked up eah time.

Adding Tags

Tags are additional information that you might want to add to the experiment and store in Neptune. Note this is different from MLflow, which considers tags as key/value pairs.

For example, if you’re using custom input, you might want to add some labels about if the experiment is cool or not.

from pykeen.pipeline import pipeline

data_version = ...

pipeline_result = pipeline(
    model='RotatE',
    training=...,
    testing=...,
    validation=...,
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        project_qualified_name='cthoyt/sandbox',
        experiment_name='Tutorial Training of RotatE on Kinships',
        tags={'cool', 'doggo'},
    ),
)

Additional documentation of the valid keyword arguments can be found under pykeen.trackers.NeptuneResultTracker.