Resilience

Built-in circuit breaker, retry logic, and monitoring for production-grade reliability.

Three-layer protection

Circuit breaker + exponential backoff retry + connection pool

Circuit Breaker

The circuit breaker prevents cascading failures by rejecting operations when Redis is consistently failing. It has three states:

State	Behaviour
CLOSED	Normal operation. Requests pass through.
OPEN	Rejects all operations with CircuitBreakerOpenError. Fail-fast.
HALF_OPEN	Allows one probe request to test if Redis recovered.

Configuration

Option	Default	Description
`threshold`	`5`	Consecutive failures before tripping OPEN
`timeout`	`60000` ms	Max operation duration before counted as failure
`resetTimeout`	`30000` ms	OPEN → HALF_OPEN cool-down

const cache = new RedisGraphCache(schema, {
  redis: { host: 'localhost', port: 6379 },
  resilience: {
    circuitBreaker: {
      threshold: 10,      // Trip after 10 consecutive failures
      timeout: 30000,     // 30s timeout per operation
      resetTimeout: 60000 // 1 minute before probing
    }
  }
});

Retry with Exponential Backoff

Transient failures are automatically retried with exponential backoff before surfacing an error.

Option	Default	Description
`maxAttempts`	`3`	Max retries per operation
`baseDelay`	`100` ms	Initial backoff
`maxDelay`	`2000` ms	Cap on exponential backoff
`backoffFactor`	`2`	Multiplier per attempt

const cache = new RedisGraphCache(schema, {
  redis: { host: 'localhost', port: 6379 },
  resilience: {
    retry: {
      maxAttempts: 5,      // Retry up to 5 times
      baseDelay: 50,       // Start with 50ms
      maxDelay: 5000,      // Cap at 5s
      backoffFactor: 2     // Double each time
    }
  }
});

Retry sequence example

With baseDelay: 100, maxDelay: 2000, backoffFactor: 2, maxAttempts: 3: 100ms → 200ms → 400ms (total ~700ms of backoff before error).

Connection Pool

Round-robin across multiple ioredis clients for higher throughput and partial failure tolerance.

const cache = new RedisGraphCache(schema, {
  redis: {
    host: 'localhost',
    port: 6379,
    poolSize: 8, // 8 concurrent connections
  }
});

Pool benefits

Higher throughput under high concurrency (a single ioredis client serializes commands on its socket)
Partial failure tolerance (one bad socket doesn't take down the cache; health is "OR" across pool members)
Atomicity preserved across the pool — Lua scripts are single-key (or two related keys) and atomicity is a Redis-side property

Sizing

At low traffic (a few hundred ops/sec or less) the extra connections waste resources. Default of 1 is the right choice for most apps.

Sustained ops/sec	Recommended `poolSize`
< 1k	1 (default)
1k–10k	4
10k–30k	8
30k+	16

Going much higher than 16 rarely helps and consumes Redis connection slots (default Redis maxclients is 10,000).

Monitoring

Real-time metrics for cache performance and health.

const metrics = cache.getMetrics();
console.log(metrics);
// {
//   cacheHits: 9523,
//   cacheMisses: 847,
//   hitRate: 0.918,             // 0..1, rounded to 3 d.p.
//   totalOperations: 10370,
//   avgResponseTime: 12.5,      // ms
//   failedOperations: 23,
//   activeConnections: 8,
//   memoryUsage: 0,             // cache-side; reserved
//   lastUpdated: 1714567890123
// }

Health check

const health = await cache.getHealthStatus();
console.log(health);
// {
//   status: 'healthy' | 'degraded' | 'unhealthy',
//   redis: {
//     connected: true,
//     latency: 5,                // ms; running average
//     memoryUsage: 123456789     // bytes; INFO memory used_memory
//   },
//   cache: {
//     activeOperations: 12,      // currently in-flight cache calls
//     memoryUsage: 0,            // reserved
//     errorRate: 0.002           // failed / total
//   },
//   timestamp: 1714567890123
// }
// Status mapping:
// - 'unhealthy' → PING failed
// - 'degraded'  → error rate > 5% OR circuit breaker non-CLOSED
// - 'healthy'   → neither of the above

Production Safety

Additional safeguards for production deployments.

Production mode

When NODE_ENV === 'production', clearAllCache requires an explicit allowProduction: true flag.

// In production, this is required:
await cache.clearAllCache({
  confirm: 'YES_WIPE_ALL',
  allowProduction: true,
});

Resilience Best Practices

Set appropriate thresholds based on your Redis latency and failure tolerance
Use connection pooling in production (poolSize: 4-8)
Monitor metrics and wire them into your observability stack
Handle CircuitBreakerOpenError by falling back to your primary data store
Always use keyPrefix when sharing Redis with other applications
Enable compression for large payloads to reduce memory usage

Previous: Error Model Next: Production Checklist