Using MLflow
MLflow is a graphical tool for tracking the results of machine learning. PyKEEN integrates MLflow into the pipeline and HPO pipeline.
To use it, you’ll first have to install MLflow with pip install mlflow and run it in the background with mlflow
ui. More information can be found on the MLflow Quickstart. It’ll
be running at http://localhost:5000 by default.
Pipeline Example
This example shows using MLflow with the pykeen.pipeline.pipeline() function. Minimally, the tracking_uri and
experiment_name are required in the result_tracker_kwargs.
from pykeen.pipeline import pipeline
pipeline_result = pipeline(
model="RotatE",
dataset="Kinships",
result_tracker="mlflow",
result_tracker_kwargs=dict(
tracking_uri="http://localhost:5000",
experiment_name="Tutorial Training of RotatE on Kinships",
),
)
If you navigate to the MLflow UI at http://localhost:5000, you’ll see the experiment appeared in the left column.
If you click on the experiment, you’ll see this:
HPO Example
This example shows using MLflow with the pykeen.hpo.hpo_pipeline() function.
from pykeen.hpo import hpo_pipeline
pipeline_result = hpo_pipeline(
model="RotatE",
dataset="Kinships",
result_tracker="mlflow",
result_tracker_kwargs=dict(
tracking_uri="http://localhost:5000",
experiment_name="Tutorial HPO Training of RotatE on Kinships",
),
)
The same navigation through MLflow can be done for this example.
Reusing Experiments
In the MLflow UI, you’ll see that experiments are assigned an ID. This means you can re-use the same ID to group
different sub-experiments together using the experiment_id keyword argument instead of experiment_name.
from pykeen.pipeline import pipeline
experiment_id = 4 # if doesn't already exist, will throw an error!
pipeline_result = pipeline(
model='RotatE',
dataset='Kinships',
result_tracker='mlflow'
result_tracker_kwargs=dict(
tracking_uri='http://localhost:5000',
experiment_id=4,
),
)