Anonymizers are available in both cloud and self-hosted installations of Opik.
Anonymizers help you protect sensitive information in your LLM applications by automatically detecting and replacing personally identifiable information (PII) and other sensitive data before it’s logged to Opik. This ensures compliance with privacy regulations and prevents accidental exposure of sensitive information in your trace data.

Anonymizers work by processing all data that flows through Opik’s tracing system - including inputs, outputs, and metadata - before it’s stored or displayed. They apply a set of rules to detect and replace sensitive information with anonymized placeholders.
The anonymization happens automatically and transparently:
The most common type of anonymizer uses pattern-matching rules to identify and replace sensitive information. Rules can be defined in several formats:
Use regular expressions to match specific patterns:
Use custom Python functions for more complex anonymization logic:
Combine different rule types for comprehensive anonymization:
For advanced use cases, create custom anonymizers by extending the Anonymizer base class.
When implementing custom anonymizers, you need to implement the anonymize() method with the following signature:
The kwargs parameters:
The anonymize() method also receives additional context through **kwargs:
field_name: Indicates which field is being anonymized ("input", "output", "metadata", or nested field names in dots notation such as "metadata.email")object_type: The type of the object being processed ("span", "trace")When are kwargs available?
These kwargs are automatically passed by Opik’s internal data processors when anonymizing trace and span data before sending it to the backend. This allows you to apply different anonymization strategies based on the field being processed.
Example: Field-specific anonymization
The field_name and object_type kwargs are primarily useful for implementing context-aware anonymization logic. If you don’t need field-specific behavior, you can safely ignore these kwargs.
Example: Anonymization of nested data structures
Also, you can extend the RecursiveAnonymizer base class to work with nested data structures.
This allows you to apply the same anonymization logic to all nested fields. In this case you
need to implement the anonymize_text() method instead of anonymize().
Here’s a complete example showing how to set up anonymization for a simple LLM application:
For more sophisticated anonymization scenarios:
In addition to regex and custom Python functions, you can reuse existing PII detection / redaction tools such as Microsoft Presidio or cloud APIs (AWS Comprehend, Google Cloud DLP, Azure AI Language). These tools can be wrapped inside an Opik anonymizer so that all trace data is pre-redacted before it’s logged. You typically integrate third-party tools in one of two ways:
scrubadub).Third-party anonymizers are just custom anonymizers under the hood. You call the
external engine inside anonymize() or a function rule, then return the
redacted data back to Opik.
First, install Presidio in your environment:
Then create an Anonymizer that delegates to Presidio:
You can combine a Presidio anonymizer with existing regex/function rules by registering multiple anonymizers; they will be applied in sequence.
Anonymizers work seamlessly with all Opik integrations:
Control how deeply nested data structures are processed:
Register multiple anonymizers that will be applied in sequence:
Rules are applied in the order they’re defined. More specific patterns should come before general ones:
RegexRule automatically compiles patterns when the rule is created.Always test your anonymization rules to ensure they work correctly:
Anonymizer not working:
opik.hooks.add_anonymizer()opik.flush_tracker() is called if neededPerformance issues:
False positives:
Remember that anonymization is a one-way process — once data is anonymized in Opik, the original values cannot be recovered. Plan your anonymization strategy accordingly.