Using MLflow

MLflow is a graphical tool for tracking the results of machine learning. PyKEEN integrates MLflow into the pipeline and HPO pipeline.

To use it, you’ll first have to install MLflow with pip install mlflow and run it in the background with mlflow ui. More information can be found on the MLflow Quickstart. It’ll be running at http://localhost:5000 by default.

Pipeline Example

This example shows using MLflow with the pykeen.pipeline.pipeline() function. Minimally, the tracking_uri and experiment_name are required in the result_tracker_kwargs.

from pykeen.pipeline import pipeline

pipeline_result = pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_name='Tutorial Training of RotatE on Kinships',
    ),
)

If you navigate to the MLflow UI at http://localhost:5000, you’ll see the experiment appeared in the left column.

If you click on the experiment, you’ll see this:

HPO Example

This example shows using MLflow with the pykeen.hpo.hpo_pipeline() function.

from pykeen.hpo import hpo_pipeline

pipeline_result = hpo_pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_name='Tutorial HPO Training of RotatE on Kinships',
    ),
)

The same navigation through MLflow can be done for this example.

Reusing Experiments

In the MLflow UI, you’ll see that experiments are assigned an ID. This means you can re-use the same ID to group different sub-experiments together using the experiment_id keyword argument instead of experiment_name.

from pykeen.pipeline import pipeline

experiment_id = 4  # if doesn't already exist, will throw an error!
pipeline_result = pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='mlflow'
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_id=4,
    ),
)

Adding Tags

Tags are additional key/value information that you might want to add to the experiment and store in MLflow. By default, MLflow adds the tags listed on https://www.mlflow.org/docs/latest/tracking.html#id41.

For example, if you’re using custom input, you might want to add which version of the input file produced the results as follows:

from pykeen.pipeline import pipeline

data_version = ...

pipeline_result = pipeline(
    model='RotatE',
    training=...,
    testing=...,
    validation=...,
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_name='Tutorial Training of RotatE on Kinships',
        tags={
            "data_version": md5_hash,
        },
    ),
)

Additional documentation of the valid keyword arguments can be found under pykeen.trackers.MLFlowResultTracker.