Skip to content

Integrate with Annoy

Comet integrates with Annoy.

Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are[mmapped] (https://en.wikipedia.org/wiki/Mmap) into memory so that many processes may share the same data.

Configure Comet for Annoy

When using Comet with Annoy, there is no additional data that is logged automatically.

End-to-end example

import random

import comet_ml

from annoy import AnnoyIndex

comet_ml.init()

experiment = comet_ml.Experiment()

# Annoy hyper-parameters
f = 40  # Length of item vector that will be indexed
metric = "angular"
seed = 42
output_file = "test.ann"

# Create and fill Annoy Index
t = AnnoyIndex(f, metric)
t.set_seed(seed)

for i in range(1000):
    v = [random.gauss(0, 1) for z in range(f)]
    t.add_item(i, v)

t.build(10)  # 10 trees

t.save(output_file)

# Comet logging
index_metadata = {
    "f": f,
    metric: metric,
    "n_items": t.get_n_items(),
    "n_trees": t.get_n_trees(),
    "seed": seed,
}

experiment.log_parameters(index_metadata, prefix="annoy_index_1")

experiment.log_asset(output_file, metadata=index_metadata)

This example will log the following hyper-parameter:

  • annoy_index_1_f: The length of item vector that will be indexed
  • annoy_index_1_angular: The distance metric used to create the Annoy index
  • annoy_index_1_n_items: The number of items in the index
  • annoy_index_1_n_trees: The number of trees in the index
  • annoy_index_1_seed: The random number generator seed

An asset named test.ann will be logged to the Experiment. It contains the index content saved as a file. All of the hyperparameters are also saved as a JSON metadata of that asset.

Mar. 27, 2024