dataset_similarity

dataset_similarity(a: Dataset, b: Dataset, metric: str | None = None) float[source]

Calculate the similarity between two datasets.

Parameters:
  • a (Dataset) – The reference dataset

  • b (Dataset) – The target dataset

  • metric (str | None) – The similarity metric to use. Defaults to tanimoto. Could either be a symmetric or asymmetric metric.

Returns:

A scalar value between 0 and 1 where closer to 1 means the datasets are more similar based on the metric.

Raises:

ValueError – if an invalid metric type is passed. Right now, there’s only tanimoto, but this could change in later.

Return type:

float