Skip to content

Integrate with Hugging Face Transformers

Comet integrates with Hugging Face Transformers.

Transformers provide general-purpose Machine Learning models for Natural Language Understanding (NLP). Transformers give you easy access to pre-trained model weights, and interoperability between PyTorch and TensorFlow.

Open In Colab

Log automatically

By integrating with Hugging Face's Trainer object, Comet automatically logs the following items:

  • Metrics (such as loss and accuracy)
  • Hyperparameters
  • Assets (such as checkpoints and log files)

Configure Comet for Hugging Face

To enable Comet's logging functionality for Hugging Face, set the following environment variables:

Environment SettingDescription
COMET_MODEUsed to create either an online or offline Experiment or disable Comet logging model. Set to ONLINE, OFFLINE or DISABLE.
COMET_LOG_ASSETUsed to log training checkpoints and log files created from a Transformer training run. Set to either True or False.

For more information about using environment parameters in Comet, see Configure Comet.

End-to-end example

import comet_ml
PRE_TRAINED_MODEL_NAME = "distilbert-base-uncased"

import transformers
from transformers import AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification

raw_datasets = load_dataset("imdb")

tokenizer = AutoTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

def get_example(index):
    return eval_dataset[index]['text']

def compute_metrics(pred):
    experiment = comet_ml.get_global_experiment()

    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='macro')
    acc = accuracy_score(labels, preds)

    if experiment:
      epoch = int(experiment.curr_epoch) if experiment.curr_epoch is not None else 0
      experiment.set_epoch(epoch)
      experiment.log_confusion_matrix(
          y_true=labels,
          y_predicted=preds,
          file_name=f"confusion-matrix-epoch-{epoch}.json",
          labels=['negative', 'postive'],
          index_to_example_function=get_example
      )

    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

training_args = TrainingArguments(
    seed=42,
    output_dir='./results',
    overwrite_output_dir=True,
    num_train_epochs=1,
    eval_steps=100,
    evaluation_strategy="steps",
    save_total_limit=10,
    save_steps=100,
    do_train=True,
    do_eval=True
  )

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    data_collator=data_collator,
  )
trainer.train()

Try it out!

Here's an example for using Comet with Hugging Face.

Open In Colab

Apr. 20, 2022