Alerts | Opik Documentation

Alerts allow you to configure automated webhook notifications for important events in your Opik workspace. When specific events occur — such as trace errors, new feedback scores, or prompt changes — Opik sends HTTP POST requests to your configured endpoint with detailed event data.

Creating an alert

Prerequisites

Access to the Opik Configuration page
A webhook endpoint that can receive HTTP POST requests
(Optional) An HTTPS endpoint with valid SSL certificate for production use

Step-by-step guide

Navigate to Alerts
- Go to Configuration → Alerts tab
- Click “Create new alert” button
Configure basic settings
- Name: Give your alert a descriptive name (e.g., “Production Errors Slack”)
- Enable alert: Toggle on to activate the alert immediately
Configure webhook settings
- Endpoint URL: Enter your webhook URL (must start with http:// or https://)
- Example: https://hooks.slack.com/services/
Advanced webhook settings (optional)
- Secret token: Add a secret token to verify webhook authenticity
- Custom headers: Add HTTP headers for authentication or routing
  - Example: X-Custom-Auth: Bearer your-token-here
Add triggers
- Click “Add trigger” to select event types
- Choose one or more event types from the list
- Configure project scope for observability events (optional)
Test your configuration
- Click “Test connection” to send a sample webhook
- Verify your endpoint receives the test payload
- Check the response status in the Opik UI
Create the alert
- Click “Create alert” to save your configuration
- The alert will start monitoring events immediately

Integration examples

Slack integration

Send alerts to a Slack channel using Slack’s Incoming Webhooks:

Create a Slack app and enable Incoming Webhooks
Create a webhook URL (e.g., https://hooks.slack.com/services/T00000000/B00000000/XXXX)
In Opik, create an alert with your Slack webhook URL
Format the payload (Slack will display JSON by default)

For better formatting, create a middleware service that transforms Opik’s payload into Slack’s Block Kit format:

1 import requests
2 
3 def transform_to_slack(opik_payload):
4     event_type = opik_payload.get('eventType')
5     alert_name = opik_payload['payload']['alertName']
6     event_count = opik_payload['payload']['eventCount']
7     
8     return {
9         "blocks": [
10             {
11                 "type": "header",
12                 "text": {
13                     "type": "plain_text",
14                     "text": f"🚨 {alert_name}"
15                 }
16             },
17             {
18                 "type": "section",
19                 "text": {
20                     "type": "mrkdwn",
21                     "text": f"*{event_count}* new `{event_type}` events"
22                 }
23             },
24             {
25                 "type": "section",
26                 "text": {
27                     "type": "mrkdwn",
28                     "text": f"View in Opik: https://www.comet.com/opik"
29                 }
30             }
31         ]
32     }
33 
34 @app.route('/opik-to-slack', methods=['POST'])
35 def opik_to_slack():
36     opik_data = request.json
37     slack_payload = transform_to_slack(opik_data)
38     
39     # Forward to Slack
40     requests.post(
41         SLACK_WEBHOOK_URL,
42         json=slack_payload
43     )
44     
45     return {'status': 'success'}, 200

PagerDuty integration

Send critical alerts to PagerDuty for on-call incident management:

1 import requests
2 
3 PAGERDUTY_ROUTING_KEY = "your-routing-key"
4 PAGERDUTY_URL = "https://events.pagerduty.com/v2/enqueue"
5 
6 @app.route('/opik-to-pagerduty', methods=['POST'])
7 def opik_to_pagerduty():
8     data = request.json
9     event_type = data.get('eventType')
10     
11     # Only send critical errors to PagerDuty
12     if event_type != 'trace:errors':
13         return {'status': 'ignored'}, 200
14     
15     payload = data['payload']
16     errors = payload.get('metadata', [])
17     
18     # Create PagerDuty event
19     pagerduty_payload = {
20         "routing_key": PAGERDUTY_ROUTING_KEY,
21         "event_action": "trigger",
22         "payload": {
23             "summary": f"{payload['alertName']}: {len(errors)} errors",
24             "severity": "error",
25             "source": "opik",
26             "custom_details": {
27                 "event_count": payload['eventCount'],
28                 "errors": errors[:5]  # First 5 errors
29             }
30         }
31     }
32     
33     response = requests.post(PAGERDUTY_URL, json=pagerduty_payload)
34     return {'status': 'success'}, 200

Using no-code automation platforms

No-code automation tools like n8n, Make.com, and IFTTT provide an easy way to connect Opik alerts to other services—without writing or deploying code. These platforms can receive webhooks from Opik, apply filters or conditions, and trigger actions such as sending Slack messages, logging data in Google Sheets, or creating incidents in PagerDuty.

To use them:

Create a new workflow or scenario and add a Webhook trigger node/module
Copy the webhook URL generated by the platform and paste it into your Opik alert configuration
Secure the connection by validating the Authorization header or including a secret token parameter
Add filters or routing logic to handle different eventType values from Opik (for example, trace:errors or trace:feedback_score)
Chain the desired actions, such as notifications, database updates, or analytics tracking

These tools also provide built-in monitoring, retries, and visual flow editors, making them suitable for both technical and non-technical users who want to automate Opik alert handling securely and efficiently.

Custom dashboard integration

Build a custom monitoring dashboard that receives alerts:

1 from fastapi import FastAPI, Request
2 from datetime import datetime
3 
4 app = FastAPI()
5 
6 # In-memory storage (use a database in production)
7 alert_history = []
8 
9 @app.post("/webhook")
10 async def receive_webhook(request: Request):
11     data = await request.json()
12     
13     # Store alert
14     alert_history.append({
15         'timestamp': datetime.utcnow(),
16         'event_type': data.get('eventType'),
17         'alert_name': data['payload']['alertName'],
18         'event_count': data['payload']['eventCount'],
19         'data': data
20     })
21     
22     # Keep only last 1000 alerts
23     if len(alert_history) > 1000:
24         alert_history.pop(0)
25     
26     return {"status": "success"}
27 
28 @app.get("/dashboard")
29 async def get_dashboard():
30     # Return aggregated statistics
31     return {
32         'total_alerts': len(alert_history),
33         'by_type': group_by_type(alert_history),
34         'recent_alerts': alert_history[-10:]
35     }

Supported event types

Opik supports seven types of alert events:

Observability events

New error in trace

Event type: trace:errors
Triggered when: A trace is logged with error information
Project scope: Can be configured to specific projects
Payload: Array of trace objects with error details
Use case: Monitor production errors, debug issues in real-time

New score added to trace

Event type: trace:feedback_score
Triggered when: A feedback score is added to a trace
Project scope: Can be configured to specific projects
Payload: Array of feedback score objects
Use case: Track model performance, monitor user satisfaction

New score added to thread

Event type: trace_thread:feedback_score
Triggered when: A feedback score is added to a conversation thread
Project scope: Can be configured to specific projects
Payload: Array of thread feedback score objects
Use case: Monitor conversation quality, track multi-turn interactions

Guardrails triggered

Event type: trace:guardrails_triggered
Triggered when: A guardrail check fails for a trace
Project scope: Can be configured to specific projects
Payload: Array of guardrail result objects
Use case: Security monitoring, compliance tracking, PII detection

Prompt engineering events

New prompt added

Event type: prompt:created
Triggered when: A new prompt is created in the prompt library
Project scope: Workspace-wide
Payload: Prompt object with metadata
Use case: Track prompt library changes, audit prompt creation

New prompt version created

Event type: prompt:committed
Triggered when: A new version (commit) is added to a prompt
Project scope: Workspace-wide
Payload: Prompt version object with template and metadata
Use case: Monitor prompt iterations, track version history

Prompt deleted

Event type: prompt:deleted
Triggered when: A prompt is removed from the prompt library
Project scope: Workspace-wide
Payload: Array of deleted prompt objects
Use case: Audit prompt deletions, maintain prompt governance

Want us to support more event types?

If you need additional event types for your use case, please create an issue on GitHub and let us know what you’d like to monitor.

Webhook payload structure

All webhook events follow a consistent payload structure:

1 {
2   "id": "webhook-event-id",
3   "eventType": "trace:errors",
4   "alertId": "alert-uuid",
5   "alertName": "Production Errors Alert",
6   "workspaceId": "workspace-uuid",
7   "createdAt": "2025-01-15T10:30:00Z",
8   "payload": {
9     "alertId": "alert-uuid",
10     "alertName": "Production Errors Alert",
11     "eventType": "trace:errors",
12     "eventIds": ["event-id-1", "event-id-2"],
13     "userNames": ["user@example.com"],
14     "eventCount": 2,
15     "aggregationType": "consolidated",
16     "message": "Alert 'Production Errors Alert': 2 trace:errors events aggregated",
17     "metadata": [
18       {
19         "id": "trace-uuid",
20         "name": "handle_query",
21         "project_id": "project-uuid",
22         "project_name": "Demo Project",
23         "start_time": "2025-01-15T10:29:45Z",
24         "end_time": "2025-01-15T10:29:50Z",
25         "input": {
26           "query": "User question"
27         },
28         "output": {
29           "response": "LLM response"
30         },
31         "error_info": {
32           "exception_type": "ValidationException",
33           "message": "Validation failed",
34           "traceback": "Full traceback..."
35         },
36         "metadata": {
37           "customer_id": "customer_123"
38         },
39         "tags": ["production"]
40       }
41     ]
42   }
43 }

Payload fields

Field	Type	Description
`id`	string	Unique webhook event identifier
`eventType`	string	Type of event (e.g., `trace:errors`)
`alertId`	string (UUID)	Alert configuration identifier
`alertName`	string	Name of the alert
`workspaceId`	string	Workspace identifier
`createdAt`	string (ISO 8601)	Timestamp when webhook was created
`payload.eventIds`	array	List of aggregated event IDs
`payload.userNames`	array	Users associated with the events
`payload.eventCount`	number	Number of aggregated events
`payload.aggregationType`	string	Always “consolidated”
`payload.metadata`	array	Event-specific data (varies by event type)

Event-specific payloads

Trace errors payload

1 {
2   "metadata": [
3     {
4       "id": "trace-uuid",
5       "name": "trace-name",
6       "project_id": "project-uuid",
7       "project_name": "Project Name",
8       "start_time": "2025-01-15T10:00:00Z",
9       "end_time": "2025-01-15T10:00:05Z",
10       "input": { "query": "..." },
11       "output": { "response": "..." },
12       "error_info": {
13         "exception_type": "ExceptionName",
14         "message": "Error message",
15         "traceback": "Full traceback"
16       },
17       "metadata": { "custom": "data" },
18       "tags": ["tag1", "tag2"]
19     }
20   ]
21 }

Feedback score payload

1 {
2   "metadata": [
3     {
4       "id": "score-uuid",
5       "name": "score-name",
6       "value": 0.85,
7       "reason": "Explanation of the score",
8       "category_name": "quality",
9       "source": "sdk",
10       "author": "user@example.com"
11     }
12   ]
13 }

Thread feedback score payload

1 {
2   "metadata": [
3     {
4       "thread_id": "thread-uuid",
5       "name": "score-name",
6       "value": 0.90,
7       "reason": "Explanation of the score",
8       "category_name": "satisfaction",
9       "source": "sdk",
10       "author": "user@example.com"
11     }
12   ]
13 }

Prompt created payload

1 {
2   "metadata": {
3     "id": "prompt-uuid",
4     "name": "Prompt Name",
5     "description": "Prompt description",
6     "tags": ["system", "assistant"],
7     "created_at": "2025-01-15T10:00:00Z",
8     "created_by": "user@example.com",
9     "last_updated_at": "2025-01-15T10:00:00Z",
10     "last_updated_by": "user@example.com"
11   }
12 }

Prompt version created payload

1 {
2   "metadata": {
3     "id": "version-uuid",
4     "prompt_id": "prompt-uuid",
5     "commit": "abc12345",
6     "template": "You are a helpful assistant. {{question}}",
7     "type": "mustache",
8     "metadata": {
9       "version": "1.0",
10       "model": "gpt-4"
11     },
12     "created_at": "2025-01-15T10:00:00Z",
13     "created_by": "user@example.com"
14   }
15 }

Prompt deleted payload

1 {
2   "metadata": [
3     {
4       "id": "prompt-uuid",
5       "name": "Prompt Name",
6       "description": "Prompt description",
7       "tags": ["deprecated"],
8       "created_at": "2025-01-10T10:00:00Z",
9       "created_by": "user@example.com",
10       "last_updated_at": "2025-01-15T10:00:00Z",
11       "last_updated_by": "user@example.com",
12       "latest_version": {
13         "id": "version-uuid",
14         "commit": "abc12345",
15         "template": "Template content",
16         "type": "mustache",
17         "created_at": "2025-01-15T10:00:00Z",
18         "created_by": "user@example.com"
19       }
20     }
21   ]
22 }

Guardrails triggered payload

1 {
2   "metadata": [
3     {
4       "id": "guardrail-check-uuid",
5       "entity_id": "trace-uuid",
6       "project_id": "project-uuid",
7       "project_name": "Project Name",
8       "name": "PII",
9       "result": "failed",
10       "details": {
11         "detected_entities": ["EMAIL", "PHONE_NUMBER"],
12         "message": "PII detected in response: email and phone number"
13       }
14     }
15   ]
16 }

Securing your webhooks

Using secret tokens

Add a secret token to your webhook configuration to verify that incoming requests are from Opik:

Generate a secure random token (e.g., using openssl rand -hex 32)
Add it to your alert’s “Secret token” field
Opik will send it in the Authorization header: Authorization: Bearer your-secret-token
Validate the token in your webhook handler before processing the request

Example validation (Python/Flask)

1 from flask import Flask, request, abort
2 import hmac
3 
4 app = Flask(__name__)
5 SECRET_TOKEN = "your-secret-token-here"
6 
7 @app.route('/webhook', methods=['POST'])
8 def handle_webhook():
9     # Verify the secret token
10     auth_header = request.headers.get('Authorization', '')
11     if not auth_header.startswith('Bearer '):
12         abort(401, 'Missing or invalid Authorization header')
13     
14     token = auth_header.split(' ', 1)[1]
15     if not hmac.compare_digest(token, SECRET_TOKEN):
16         abort(401, 'Invalid secret token')
17     
18     # Process the webhook
19     data = request.json
20     event_type = data.get('eventType')
21     
22     # Handle different event types
23     if event_type == 'trace:errors':
24         handle_trace_errors(data)
25     elif event_type == 'trace:feedback_score':
26         handle_feedback_score(data)
27     
28     return {'status': 'success'}, 200

Using custom headers

You can add custom headers for additional authentication or routing:

1 # In your webhook handler
2 api_key = request.headers.get('X-API-Key')
3 environment = request.headers.get('X-Environment')
4 
5 if api_key != EXPECTED_API_KEY:
6     abort(401, 'Invalid API key')
7 
8 # Route to different handlers based on environment
9 if environment == 'production':
10     handle_production_webhook(data)
11 else:
12     handle_staging_webhook(data)

Troubleshooting

Webhooks not being delivered

Check endpoint accessibility:

Ensure your endpoint is publicly accessible (if using cloud)
Verify firewall rules allow incoming connections
Test your endpoint with curl: curl -X POST -H "Content-Type: application/json" -d '{"test": "data"}' https://your-endpoint.com/webhook

Check webhook configuration:

Verify the URL starts with http:// or https://
Check that the endpoint returns 2xx status codes
Review custom headers for syntax errors

Check alert status:

Ensure the alert is enabled
Verify at least one trigger is configured
Check that project scope matches your events (for observability events)

Webhook timeouts

Opik expects webhooks to respond within the configured timeout (typically 30 seconds). If your endpoint takes longer:

Optimize your handler:

Return a 200 response immediately
Process the webhook asynchronously in the background
Use a queue system (e.g., Celery, RabbitMQ) for long-running tasks

Example async processing:

1 from flask import Flask
2 from threading import Thread
3 
4 app = Flask(__name__)
5 
6 def process_webhook_async(data):
7     # Long-running processing
8     send_to_slack(data)
9     update_dashboard(data)
10     log_to_database(data)
11 
12 @app.route('/webhook', methods=['POST'])
13 def handle_webhook():
14     data = request.json
15     
16     # Start background processing
17     thread = Thread(target=process_webhook_async, args=(data,))
18     thread.start()
19     
20     # Return immediately
21     return {'status': 'accepted'}, 200

Duplicate webhooks

If you receive duplicate webhooks:

Check retry configuration:

Opik retries failed webhooks with exponential backoff
Ensure your endpoint returns 2xx status codes on success
Implement idempotency using the webhook id field

Example idempotent handler:

1 processed_webhook_ids = set()
2 
3 @app.route('/webhook', methods=['POST'])
4 def handle_webhook():
5     data = request.json
6     webhook_id = data.get('id')
7     
8     # Skip if already processed
9     if webhook_id in processed_webhook_ids:
10         return {'status': 'already_processed'}, 200
11     
12     # Process webhook
13     process_alert(data)
14     
15     # Mark as processed
16     processed_webhook_ids.add(webhook_id)
17     
18     return {'status': 'success'}, 200

Events not triggering alerts

Check event type matching:

Verify the alert has a trigger for this event type
For observability events, check project scope configuration
Review project IDs in trigger configuration

Check workspace context:

Ensure events are logged to the correct workspace
Verify the alert is in the same workspace as your events

Check alert evaluation:

View backend logs for alert evaluation messages
Confirm events are being published to the event bus
Check Redis for alert buckets (self-hosted deployments)

SSL certificate errors

If you see SSL certificate errors in logs:

For development/testing:

Use self-signed certificates with proper configuration
Or use HTTP endpoints (not recommended for production)

For production:

Use valid SSL certificates from trusted CAs
Ensure certificate chain is complete
Check certificate expiry dates
Use services like Let’s Encrypt for free SSL

Architecture and internals

Understanding Opik’s alert architecture can help with troubleshooting and optimization.

How alerts work

The Opik Alerts system monitors your workspace for specific events and sends consolidated webhook notifications to your configured endpoints. Here’s the flow:

Event occurs: An event happens in your workspace (e.g., a trace error, new feedback score)
Alert evaluation: The system checks if any enabled alerts match this event type
Event aggregation: Multiple events are aggregated over a short time window (debouncing)
Webhook delivery: A consolidated HTTP POST request is sent to your webhook URL
Retry handling: Failed requests are automatically retried with exponential backoff

Event debouncing

To prevent overwhelming your webhook endpoint, Opik aggregates multiple events of the same type within a short time window (typically 30-60 seconds) and sends them as a single consolidated webhook. This is particularly useful for high-frequency events like feedback scores.

Event flow

1. Event occurs (e.g., trace error logged)
   ↓
2. Service publishes AlertEvent to EventBus
   ↓
3. AlertEventListener receives event
   ↓
4. AlertEventEvaluationService evaluates against configured alerts
   ↓
5. Matching events added to AlertBucketService (Redis)
   ↓
6. AlertJob (runs every 5 seconds) processes ready buckets
   ↓
7. WebhookPublisher publishes to Redis stream
   ↓
8. WebhookSubscriber consumes from stream
   ↓
9. WebhookHttpClient sends HTTP POST request
   ↓
10. Retries on failure with exponential backoff

Debouncing mechanism

Opik uses Redis-based buckets to aggregate events:

Bucket key format: alert_bucket:{alertId}:{eventType}
Window size: Configurable (default 30-60 seconds)
Index: Redis Sorted Set for efficient bucket retrieval
TTL: Buckets expire automatically after processing

This prevents overwhelming your webhook endpoint with individual events and reduces costs for high-frequency events.

Retry strategy

Failed webhooks are automatically retried:

Max retries: Configurable (default 3)
Initial delay: 1 second
Max delay: 60 seconds
Backoff: Exponential with jitter
Retryable errors: 5xx status codes, network errors
Non-retryable errors: 4xx status codes (except 429)

Best practices

Alert design

Create focused alerts:

Use separate alerts for different purposes (e.g., one for errors, one for feedback)
Configure project scope to avoid noise from test projects
Use descriptive names that explain the alert’s purpose

Optimize for your workflow:

Send critical errors to PagerDuty or on-call systems
Route feedback scores to analytics platforms
Send prompt changes to audit logs or Slack channels

Test thoroughly:

Use the “Test connection” feature before enabling alerts
Monitor webhook delivery in your endpoint logs
Start with a small project scope and expand gradually

Webhook endpoint design

Handle failures gracefully:

Return 2xx status codes immediately
Process webhooks asynchronously
Implement retry logic in your handler
Use dead letter queues for permanent failures

Implement security:

Always validate secret tokens
Use HTTPS endpoints with valid certificates
Implement rate limiting to prevent abuse
Log all webhook attempts for auditing

Monitor performance:

Track webhook processing time
Alert on handler failures
Monitor queue lengths for async processing
Set up dead letter queue monitoring

Scaling considerations

For high-volume workspaces:

Use event debouncing (built-in)
Implement batch processing in your handler
Use message queues for async processing
Consider using serverless functions (AWS Lambda, Cloud Functions)

For multiple projects:

Create project-specific alerts with scope configuration
Use custom headers to route to different handlers
Implement filtering in your webhook handler
Consider separate endpoints for different event types

Next steps

Configure your first alert for production error monitoring
Set up Slack integration for team notifications
Explore Online Evaluation Rules for automated model monitoring
Learn about Guardrails for proactive risk detection
Review Production Monitoring best practices