Resilience
Built-in circuit breaker, retry logic, and monitoring for production-grade reliability.
Three-layer protection
Circuit breaker + exponential backoff retry + connection pool
Circuit Breaker
The circuit breaker prevents cascading failures by rejecting operations when Redis is consistently failing. It has three states:
| State | Behaviour |
|---|---|
| CLOSED | Normal operation. Requests pass through. |
| OPEN | Rejects all operations with CircuitBreakerOpenError. Fail-fast. |
| HALF_OPEN | Allows one probe request to test if Redis recovered. |
Configuration
| Option | Default | Description |
|---|---|---|
threshold | 5 | Consecutive failures before tripping OPEN |
timeout | 60000 ms | Max operation duration before counted as failure |
resetTimeout | 30000 ms | OPEN → HALF_OPEN cool-down |
const cache = new RedisGraphCache(schema, {
redis: { host: 'localhost', port: 6379 },
resilience: {
circuitBreaker: {
threshold: 10, // Trip after 10 consecutive failures
timeout: 30000, // 30s timeout per operation
resetTimeout: 60000 // 1 minute before probing
}
}
});Retry with Exponential Backoff
Transient failures are automatically retried with exponential backoff before surfacing an error.
| Option | Default | Description |
|---|---|---|
maxAttempts | 3 | Max retries per operation |
baseDelay | 100 ms | Initial backoff |
maxDelay | 2000 ms | Cap on exponential backoff |
backoffFactor | 2 | Multiplier per attempt |
const cache = new RedisGraphCache(schema, {
redis: { host: 'localhost', port: 6379 },
resilience: {
retry: {
maxAttempts: 5, // Retry up to 5 times
baseDelay: 50, // Start with 50ms
maxDelay: 5000, // Cap at 5s
backoffFactor: 2 // Double each time
}
}
});Retry sequence example
With baseDelay: 100, maxDelay: 2000, backoffFactor: 2, maxAttempts: 3: 100ms → 200ms → 400ms (total ~700ms of backoff before error).
Connection Pool
Round-robin across multiple ioredis clients for higher throughput and partial failure tolerance.
const cache = new RedisGraphCache(schema, {
redis: {
host: 'localhost',
port: 6379,
poolSize: 8, // 8 concurrent connections
}
});Pool benefits
- Higher throughput under high concurrency (a single ioredis client serializes commands on its socket)
- Partial failure tolerance (one bad socket doesn't take down the cache; health is "OR" across pool members)
- Atomicity preserved across the pool — Lua scripts are single-key (or two related keys) and atomicity is a Redis-side property
Sizing
At low traffic (a few hundred ops/sec or less) the extra connections waste resources. Default of 1 is the right choice for most apps.
| Sustained ops/sec | Recommended poolSize |
|---|---|
| < 1k | 1 (default) |
| 1k–10k | 4 |
| 10k–30k | 8 |
| 30k+ | 16 |
Going much higher than 16 rarely helps and consumes Redis connection slots (default Redis maxclients is 10,000).
Monitoring
Real-time metrics for cache performance and health.
const metrics = cache.getMetrics();
console.log(metrics);
// {
// cacheHits: 9523,
// cacheMisses: 847,
// hitRate: 0.918, // 0..1, rounded to 3 d.p.
// totalOperations: 10370,
// avgResponseTime: 12.5, // ms
// failedOperations: 23,
// activeConnections: 8,
// memoryUsage: 0, // cache-side; reserved
// lastUpdated: 1714567890123
// }Health check
const health = await cache.getHealthStatus();
console.log(health);
// {
// status: 'healthy' | 'degraded' | 'unhealthy',
// redis: {
// connected: true,
// latency: 5, // ms; running average
// memoryUsage: 123456789 // bytes; INFO memory used_memory
// },
// cache: {
// activeOperations: 12, // currently in-flight cache calls
// memoryUsage: 0, // reserved
// errorRate: 0.002 // failed / total
// },
// timestamp: 1714567890123
// }
// Status mapping:
// - 'unhealthy' → PING failed
// - 'degraded' → error rate > 5% OR circuit breaker non-CLOSED
// - 'healthy' → neither of the aboveProduction Safety
Additional safeguards for production deployments.
Production mode
When NODE_ENV === 'production', clearAllCache requires an explicit allowProduction: true flag.
// In production, this is required:
await cache.clearAllCache({
confirm: 'YES_WIPE_ALL',
allowProduction: true,
});Resilience Best Practices
- Set appropriate thresholds based on your Redis latency and failure tolerance
- Use connection pooling in production (poolSize: 4-8)
- Monitor metrics and wire them into your observability stack
- Handle CircuitBreakerOpenError by falling back to your primary data store
- Always use keyPrefix when sharing Redis with other applications
- Enable compression for large payloads to reduce memory usage