dataset_similarity

dataset_similarity(a, b, metric=None)[source]

Calculate the similarity between two datasets.

Parameters

a (Dataset) – The reference dataset
b (Dataset) – The target dataset
metric (Optional[str]) – The similarity metric to use. Defaults to tanimoto. Could either be a symmetric or asymmetric metric.

Return type

float

Returns

A scalar value between 0 and 1 where closer to 1 means the datasets are more similar based on the metric.

Raises

ValueError – if an invalid metric type is passed. Right now, there’s only tanimoto, but this could change in later.