ClickHouse Backup Guide
This guide covers the two backup options available for ClickHouse in Opik’s Kubernetes deployment:
- SQL-based Backup - Uses ClickHouse’s native
BACKUP
command with S3 - ClickHouse Backup Tool - Uses the dedicated
clickhouse-backup
tool
Overview
ClickHouse backup is essential for data protection and disaster recovery. Opik provides two different approaches to handle backups, each with its own advantages:
- SQL-based Backup: Simple, uses ClickHouse’s built-in backup functionality
- ClickHouse Backup Tool: More advanced, provides additional features like compression and incremental backups
Option 1: SQL-based Backup (Default)
This is the default backup method that uses ClickHouse’s native BACKUP
command to create backups directly to S3-compatible storage.
Features
- Uses ClickHouse’s built-in
BACKUP ALL EXCEPT DATABASE system
command - Direct S3 upload with timestamped backup names
- Configurable schedule via CronJob
- Supports both AWS S3 and S3-compatible storage (like MinIO)
Configuration
Basic Setup
With AWS S3 Credentials
Create a Kubernetes secret with your S3 credentials:
Then configure the backup:
With IAM Role (AWS EKS)
For AWS EKS clusters, you can use IAM roles instead of access keys:
Required IAM Policy:
Trust Relationship Policy:
Custom Backup Command
You can customize the backup command if needed:
Backup Process
The SQL-based backup:
- Creates a timestamped backup name (format:
backupYYYYMMDDHHMM
) - Executes
BACKUP ALL EXCEPT DATABASE system TO S3(...)
command - Uploads all databases except the
system
database to S3 - Uses ClickHouse’s native backup format
Restore Process
To restore from a SQL-based backup:
Option 2: ClickHouse Backup Tool
The ClickHouse Backup Tool provides more advanced backup features including compression, incremental backups, and better restore capabilities.
Features
- Advanced backup management with compression
- Incremental backup support
- REST API for backup operations
- Better restore capabilities
- Backup metadata and validation
Configuration
Enable Backup Server
Configure S3 Storage
Set up S3 configuration for the backup tool:
With Kubernetes Secrets
Use Kubernetes secrets for sensitive data:
(can be ignored when using IAM roles)
Using the Backup Tool
Create Backup
Upload Backup to S3
Download and Restore
Automated Backup with CronJob
You can create a custom CronJob to automate the backup tool:
Comparison
Best Practices
General Recommendations
- Test Restores: Regularly test backup restoration procedures
- Monitor Backup Jobs: Set up monitoring for backup job failures
- Retention Policy: Implement backup retention policies
- Cross-Region: Consider cross-region backup replication for disaster recovery
Security
- Access Control: Use IAM roles when possible instead of access keys
- Encryption: Enable S3 server-side encryption for backup storage
- Network Security: Use VPC endpoints for S3 access when available
Performance
- Schedule: Run backups during low-traffic periods
- Resource Limits: Set appropriate resource limits for backup jobs
- Storage Class: Use appropriate S3 storage classes for cost optimization
Troubleshooting
Common Issues
Backup Job Fails
S3 Access Issues
Backup Tool API Issues
Monitoring
Set up monitoring for backup operations:
Migration Between Backup Methods
From SQL-based to ClickHouse Backup Tool
-
Enable the backup server:
-
Create initial backup with the tool
-
Disable SQL-based backup:
From ClickHouse Backup Tool to SQL-based
-
Disable backup server:
-
Enable SQL-based backup:
Support
For additional help with ClickHouse backups: