Skip to content

Comet Embedding Projector

Embedding Visualization

The word "embedding" has been used in a variety of ways in Machine Learning. Here, we will mean something very general: a "vector" representation of a category, where category can really be anything---a number, string, label, or description. We'll call that a "label".

Our main goal is the ability to visualize each of these vectors in a 2D or 3D plot, where each vector is a point in the space, and each of those points is associated with its label. Here is such an example showing a high-dimensional mapping of the MNIST digit vector representation (28 x 28) being mapped to 3D as determined by Principal Component Analysis (PCA) and showing an image at each point based on the associated label:

Embeddings Asset

In the above image note that the color of each vector image has been color coded so that all digit pictures are associated with a unique color. We'll see how to do easily do that below.

In NLP processing, an embedding layer (and its associated output vector) is often a representation that has been reduced in dimensions. However, for our uses here it does not matter if the embedding is of 3 dimensions, or 1,000 (or more). For dimensions higher than three, we will use a dimensionality reduction algorithm (such as PCA, UMAP, or T-SNE) to visualize in 3D.

Logging Embeddings

The simplest method of creating and logging an embedding is just to pass Experiment.log_embedding() a list of vectors (e.g., a matrix, with one row for each item), and a list/array of labels (one number or string per item).

experiment.log_embedding(vectors, labels)

For example, consider this simple example of four points in four dimensions:

from comet_ml import Experiment

experiment = Experiment()

vectors = [
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0],
    [0, 0, 0, 1],
]
labels = ["one", "two", "three", "four"]

experiment.log_embedding(vectors, labels)

When you run this script, on the assets tab, you should see something similar to:

Embeddings Asset

By using the Embedding Projector Panel (available under Public Panels in the Panels Gallery) or by selecting Open in the Assets Tab next to the embedding template name, you can then launch the Embedding Projector:

Embeddings Example

These are the four points shown in three dimensions.

Groups of Embeddings

You can include more than one visualization in each logged embedding by including a group name, and a title (so that you can easily distinguish among the different embeddings). Consider:

for epoch in range(max_epoch):
    model.train()
    experiment.log_embedding(vectors, labels,
        title="epoch-%s" % epoch,
    group="hidden-layer-1")

Here you can see how to log the embeddings that could be output activations from a particular hidden layer.

Embedding Images

Each embedding can also include a set of pictures, one for each representing each vector. You can either create the images automatically for each logged embedding:

experiment.log_embedding(
    vectors,
    labels,
    image_data,
    image_size)

For example, to continue with the above example, we make little 2x2 images for each vector:

image_data = [
    [[255, 0],
     [0,   0]],
    [[0, 255],
     [0,   0]],
    [[0,   0],
     [255, 0]],
    [[0,   0],
     [0, 255]],
]
image_size = (2, 2)

# Log the images with the vectors:

experiment.log_embedding(
    vectors,
    labels,
    image_data,
    image_size)

Or, you can create the images once and use the resulting image_url wherever image_data is used:

image, image_url = experiment.create_embedding_image(image_data, image_size)

For example, the image_url can be used as the image_data in subsequent loggings, like:

experiment.log_embedding(
    vectors,
    labels,
    image_data=image_url,
    image_size=image_size)

Advanced Embedding Image Options

You can supply the following three options when creating an embedding image:

  • image_preprocess_function: if image_data is an array, apply this function to each element first (numpy functions ok)
  • image_transparent_color: a (red, green, blue) tuple that indicates the color to be transparent
  • image_background_color_function: a function that takes an index, and returns a (red, green, blue) color tuple to be used as background color

For instance, consider the standard MNIST dataset represented as

from keras.datasets import mnist
import numpy as np

def get_dataset():
    num_classes = 10

    # the data, shuffled and split between train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.reshape(60000, 784)
    x_test = x_test.reshape(10000, 784)
    x_train = x_train.astype("float32")
    x_test = x_test.astype("float32")
    x_train /= 255
    x_test /= 255
    print(x_train.shape[0], "train samples")
    print(x_test.shape[0], "test samples")

    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_test = keras.utils.to_categorical(y_test, num_classes)

    return x_train, y_train, x_test, y_test

x_train, y_train, x_test, y_test = get_dataset()

def label_to_color(index):
    label = labels[index]
    if label == 0:
        return (255, 0, 0)
    elif label == 1:
        return (0, 255, 0)
    elif label == 2:
        return (0, 0, 255)
    elif label == 3:
        return (255, 255, 0)
    elif label == 4:
        return (0, 255, 255)
    elif label == 5:
        return (128, 128, 0)
    elif label == 6:
        return (0, 128, 128)
    elif label == 7:
        return (128, 0, 128)
    elif label == 8:
        return (255, 0, 255)
    elif label == 9:
        return (255, 255, 255)

image_data = x_test
labels = y_test
image_size = (28, 28)

# And use the optional arguments:

image, image_ = experiment.create_embedding_image(
    image_data,
    image_size,
    # we round the pixels to 0 and 1, and multiple by 2
    # to keep the non-zero colors dark (if they were 1, they
    # would get distributed between 0 and 255):
    image_preprocess_function=lambda matrix: np.round(matrix, 0) * 2,
    # Set the transparent color:
    image_transparent_color=(0, 0, 0),
    # Fill in the transparent color with a background color:
    image_background_color_function=label_to_color,
)

That will create this image which can then use for all of the embeddings with these labels:

Embeddings Sprite Sheet

References

For additional information, please see the following:

Mar. 27, 2024