Enabling Large CSV Uploads
By default, Opik supports CSV file uploads up to 20MB for dataset creation. For self-hosted deployments that need to process larger CSV files (up to 2GB), you can enable the large CSV upload feature with additional configuration.
Overview
When enabled, this feature allows:
- CSV files up to 2GB in size
- Asynchronous processing - files are processed in the background after upload
Configuration Steps
1. Enable the Feature Toggle
Set the following environment variable for the Opik backend service:
2. Increase Idle Timeout
Large file uploads require more time to transfer. Increase the server idle timeout:
The default timeout is 30 seconds, which is insufficient for large file uploads. We recommend setting it to 10 minutes for files up to 2GB.
3. Configure Nginx (Kubernetes/Helm Deployments)
If you’re using the Helm chart deployment, add the following configuration to your values.yaml:
4. Ensure Adequate Disk Space
The backend service temporarily buffers uploaded CSV files to disk before processing them. Ensure your backend pods/containers have:
- Minimum 50GB of disk space available
- Sufficient IOPS for concurrent file operations
5. Optional: Adjust Batch Size
You can optionally configure the batch size for CSV processing:
The default batch size is 1000 rows per batch. Adjust this based on your:
- Available memory
- Row complexity (number of columns, data size)
- Desired processing speed
Docker Compose Deployments
For Docker Compose deployments, the configuration is slightly different:
1. Update docker-compose.yml
Add the environment variables to the backend service:
2. Update Nginx Configuration
The nginx configuration files already include the 2GB limit for local deployments. No additional changes are needed for nginx_default_local.conf or nginx_local_be_local.conf.
Kubernetes/Helm Deployment Example
Here’s a complete example for Helm chart deployments:
Then upgrade your Helm release:
Verification
After applying the configuration:
- Restart services to apply the changes
- Test with a small CSV first (< 100MB) to verify the feature works
- Monitor logs during upload to ensure proper processing:
You should see log messages like:
Troubleshooting
Upload Fails with 413 Error
Problem: HTTP 413 Request Entity Too Large
Solution: Verify nginx configuration includes client_max_body_size: 2g at the server level, not just in location blocks.
Upload Succeeds but Processing Fails
Problem: File uploads successfully but items don’t appear in the dataset
Solution:
- Check backend logs for processing errors
- Verify adequate disk space is available
- Check memory limits - large CSV files require sufficient memory for processing
Timeout Errors
Problem: Upload times out before completing
Solution:
- Increase
SERVER_IDLE_TIMEOUTfurther (e.g., to 15m or 20m) - Increase nginx proxy timeouts in
upstreamConfig - Check network bandwidth between client and server
Out of Memory Errors
Problem: Backend service crashes or restarts during processing
Solution:
- Reduce
BATCH_OPERATIONS_DATASETS_CSV_BATCH_SIZEto process smaller batches - Increase backend service memory limits
- Process smaller CSV files or split large files into multiple uploads
Additional Resources
- Scaling Opik - General scaling guidelines
- Kubernetes Deployment - Helm chart documentation
- Troubleshooting - Common issues and solutions