Alerts
Alerts allow you to configure automated webhook notifications for important events in your Opik workspace. When specific events occur — such as trace errors, new feedback scores, or prompt changes — Opik sends HTTP POST requests to your configured endpoint with detailed event data.

Creating an alert
Prerequisites
- Access to the Opik Configuration page
- A webhook endpoint that can receive HTTP POST requests
- (Optional) An HTTPS endpoint with valid SSL certificate for production use
Step-by-step guide

-
Navigate to Alerts
- Go to Configuration → Alerts tab
- Click “Create new alert” button
-
Configure basic settings
- Name: Give your alert a descriptive name (e.g., “Production Errors Slack”)
- Enable alert: Toggle on to activate the alert immediately
-
Configure webhook settings
- Endpoint URL: Enter your webhook URL (must start with
http://
orhttps://
) - Example:
https://hooks.slack.com/services/
- Endpoint URL: Enter your webhook URL (must start with
-
Advanced webhook settings (optional)
- Secret token: Add a secret token to verify webhook authenticity
- Custom headers: Add HTTP headers for authentication or routing
- Example:
X-Custom-Auth: Bearer your-token-here
- Example:
-
Add triggers
- Click “Add trigger” to select event types
- Choose one or more event types from the list
- Configure project scope for observability events (optional)
-
Test your configuration
- Click “Test connection” to send a sample webhook
- Verify your endpoint receives the test payload
- Check the response status in the Opik UI
-
Create the alert
- Click “Create alert” to save your configuration
- The alert will start monitoring events immediately
Integration examples
Slack integration
Send alerts to a Slack channel using Slack’s Incoming Webhooks:
- Create a Slack app and enable Incoming Webhooks
- Create a webhook URL (e.g.,
https://hooks.slack.com/services/T00000000/B00000000/XXXX
) - In Opik, create an alert with your Slack webhook URL
- Format the payload (Slack will display JSON by default)
For better formatting, create a middleware service that transforms Opik’s payload into Slack’s Block Kit format:
PagerDuty integration
Send critical alerts to PagerDuty for on-call incident management:
Using no-code automation platforms
No-code automation tools like n8n, Make.com, and IFTTT provide an easy way to connect Opik alerts to other services—without writing or deploying code. These platforms can receive webhooks from Opik, apply filters or conditions, and trigger actions such as sending Slack messages, logging data in Google Sheets, or creating incidents in PagerDuty.

To use them:
- Create a new workflow or scenario and add a Webhook trigger node/module
- Copy the webhook URL generated by the platform and paste it into your Opik alert configuration
- Secure the connection by validating the Authorization header or including a secret token parameter
- Add filters or routing logic to handle different eventType values from Opik (for example, trace:errors or trace:feedback_score)
- Chain the desired actions, such as notifications, database updates, or analytics tracking
These tools also provide built-in monitoring, retries, and visual flow editors, making them suitable for both technical and non-technical users who want to automate Opik alert handling securely and efficiently.
Custom dashboard integration
Build a custom monitoring dashboard that receives alerts:
Supported event types
Opik supports seven types of alert events:
Observability events
New error in trace
- Event type:
trace:errors
- Triggered when: A trace is logged with error information
- Project scope: Can be configured to specific projects
- Payload: Array of trace objects with error details
- Use case: Monitor production errors, debug issues in real-time
New score added to trace
- Event type:
trace:feedback_score
- Triggered when: A feedback score is added to a trace
- Project scope: Can be configured to specific projects
- Payload: Array of feedback score objects
- Use case: Track model performance, monitor user satisfaction
New score added to thread
- Event type:
trace_thread:feedback_score
- Triggered when: A feedback score is added to a conversation thread
- Project scope: Can be configured to specific projects
- Payload: Array of thread feedback score objects
- Use case: Monitor conversation quality, track multi-turn interactions
Guardrails triggered
- Event type:
trace:guardrails_triggered
- Triggered when: A guardrail check fails for a trace
- Project scope: Can be configured to specific projects
- Payload: Array of guardrail result objects
- Use case: Security monitoring, compliance tracking, PII detection
Prompt engineering events
New prompt added
- Event type:
prompt:created
- Triggered when: A new prompt is created in the prompt library
- Project scope: Workspace-wide
- Payload: Prompt object with metadata
- Use case: Track prompt library changes, audit prompt creation
New prompt version created
- Event type:
prompt:committed
- Triggered when: A new version (commit) is added to a prompt
- Project scope: Workspace-wide
- Payload: Prompt version object with template and metadata
- Use case: Monitor prompt iterations, track version history
Prompt deleted
- Event type:
prompt:deleted
- Triggered when: A prompt is removed from the prompt library
- Project scope: Workspace-wide
- Payload: Array of deleted prompt objects
- Use case: Audit prompt deletions, maintain prompt governance
Want us to support more event types?
If you need additional event types for your use case, please create an issue on GitHub and let us know what you’d like to monitor.
Webhook payload structure
All webhook events follow a consistent payload structure:
Payload fields
Event-specific payloads
Trace errors payload
Feedback score payload
Thread feedback score payload
Prompt created payload
Prompt version created payload
Prompt deleted payload
Guardrails triggered payload
Securing your webhooks
Using secret tokens
Add a secret token to your webhook configuration to verify that incoming requests are from Opik:
- Generate a secure random token (e.g., using
openssl rand -hex 32
) - Add it to your alert’s “Secret token” field
- Opik will send it in the
Authorization
header:Authorization: Bearer your-secret-token
- Validate the token in your webhook handler before processing the request
Example validation (Python/Flask)
Using custom headers
You can add custom headers for additional authentication or routing:
Troubleshooting
Webhooks not being delivered
Check endpoint accessibility:
- Ensure your endpoint is publicly accessible (if using cloud)
- Verify firewall rules allow incoming connections
- Test your endpoint with curl:
curl -X POST -H "Content-Type: application/json" -d '{"test": "data"}' https://your-endpoint.com/webhook
Check webhook configuration:
- Verify the URL starts with
http://
orhttps://
- Check that the endpoint returns 2xx status codes
- Review custom headers for syntax errors
Check alert status:
- Ensure the alert is enabled
- Verify at least one trigger is configured
- Check that project scope matches your events (for observability events)
Webhook timeouts
Opik expects webhooks to respond within the configured timeout (typically 30 seconds). If your endpoint takes longer:
Optimize your handler:
- Return a 200 response immediately
- Process the webhook asynchronously in the background
- Use a queue system (e.g., Celery, RabbitMQ) for long-running tasks
Example async processing:
Duplicate webhooks
If you receive duplicate webhooks:
Check retry configuration:
- Opik retries failed webhooks with exponential backoff
- Ensure your endpoint returns 2xx status codes on success
- Implement idempotency using the webhook
id
field
Example idempotent handler:
Events not triggering alerts
Check event type matching:
- Verify the alert has a trigger for this event type
- For observability events, check project scope configuration
- Review project IDs in trigger configuration
Check workspace context:
- Ensure events are logged to the correct workspace
- Verify the alert is in the same workspace as your events
Check alert evaluation:
- View backend logs for alert evaluation messages
- Confirm events are being published to the event bus
- Check Redis for alert buckets (self-hosted deployments)
SSL certificate errors
If you see SSL certificate errors in logs:
For development/testing:
- Use self-signed certificates with proper configuration
- Or use HTTP endpoints (not recommended for production)
For production:
- Use valid SSL certificates from trusted CAs
- Ensure certificate chain is complete
- Check certificate expiry dates
- Use services like Let’s Encrypt for free SSL
Architecture and internals
Understanding Opik’s alert architecture can help with troubleshooting and optimization.
How alerts work
The Opik Alerts system monitors your workspace for specific events and sends consolidated webhook notifications to your configured endpoints. Here’s the flow:
- Event occurs: An event happens in your workspace (e.g., a trace error, new feedback score)
- Alert evaluation: The system checks if any enabled alerts match this event type
- Event aggregation: Multiple events are aggregated over a short time window (debouncing)
- Webhook delivery: A consolidated HTTP POST request is sent to your webhook URL
- Retry handling: Failed requests are automatically retried with exponential backoff
Event debouncing
To prevent overwhelming your webhook endpoint, Opik aggregates multiple events of the same type within a short time window (typically 30-60 seconds) and sends them as a single consolidated webhook. This is particularly useful for high-frequency events like feedback scores.
Event flow
Debouncing mechanism
Opik uses Redis-based buckets to aggregate events:
- Bucket key format:
alert_bucket:{alertId}:{eventType}
- Window size: Configurable (default 30-60 seconds)
- Index: Redis Sorted Set for efficient bucket retrieval
- TTL: Buckets expire automatically after processing
This prevents overwhelming your webhook endpoint with individual events and reduces costs for high-frequency events.
Retry strategy
Failed webhooks are automatically retried:
- Max retries: Configurable (default 3)
- Initial delay: 1 second
- Max delay: 60 seconds
- Backoff: Exponential with jitter
- Retryable errors: 5xx status codes, network errors
- Non-retryable errors: 4xx status codes (except 429)
Best practices
Alert design
Create focused alerts:
- Use separate alerts for different purposes (e.g., one for errors, one for feedback)
- Configure project scope to avoid noise from test projects
- Use descriptive names that explain the alert’s purpose
Optimize for your workflow:
- Send critical errors to PagerDuty or on-call systems
- Route feedback scores to analytics platforms
- Send prompt changes to audit logs or Slack channels
Test thoroughly:
- Use the “Test connection” feature before enabling alerts
- Monitor webhook delivery in your endpoint logs
- Start with a small project scope and expand gradually
Webhook endpoint design
Handle failures gracefully:
- Return 2xx status codes immediately
- Process webhooks asynchronously
- Implement retry logic in your handler
- Use dead letter queues for permanent failures
Implement security:
- Always validate secret tokens
- Use HTTPS endpoints with valid certificates
- Implement rate limiting to prevent abuse
- Log all webhook attempts for auditing
Monitor performance:
- Track webhook processing time
- Alert on handler failures
- Monitor queue lengths for async processing
- Set up dead letter queue monitoring
Scaling considerations
For high-volume workspaces:
- Use event debouncing (built-in)
- Implement batch processing in your handler
- Use message queues for async processing
- Consider using serverless functions (AWS Lambda, Cloud Functions)
For multiple projects:
- Create project-specific alerts with scope configuration
- Use custom headers to route to different handlers
- Implement filtering in your webhook handler
- Consider separate endpoints for different event types
Next steps
- Configure your first alert for production error monitoring
- Set up Slack integration for team notifications
- Explore Online Evaluation Rules for automated model monitoring
- Learn about Guardrails for proactive risk detection
- Review Production Monitoring best practices