Resilience

Built-in circuit breaker, retry logic, and monitoring for production-grade reliability.

Three-layer protection

Circuit breaker + exponential backoff retry + connection pool

Circuit Breaker

The circuit breaker prevents cascading failures by rejecting operations when Redis is consistently failing. It has three states:

StateBehaviour
CLOSEDNormal operation. Requests pass through.
OPENRejects all operations with CircuitBreakerOpenError. Fail-fast.
HALF_OPENAllows one probe request to test if Redis recovered.

Configuration

OptionDefaultDescription
threshold5Consecutive failures before tripping OPEN
timeout60000 msMax operation duration before counted as failure
resetTimeout30000 msOPEN → HALF_OPEN cool-down
const cache = new RedisGraphCache(schema, {
  redis: { host: 'localhost', port: 6379 },
  resilience: {
    circuitBreaker: {
      threshold: 10,      // Trip after 10 consecutive failures
      timeout: 30000,     // 30s timeout per operation
      resetTimeout: 60000 // 1 minute before probing
    }
  }
});

Retry with Exponential Backoff

Transient failures are automatically retried with exponential backoff before surfacing an error.

OptionDefaultDescription
maxAttempts3Max retries per operation
baseDelay100 msInitial backoff
maxDelay2000 msCap on exponential backoff
backoffFactor2Multiplier per attempt
const cache = new RedisGraphCache(schema, {
  redis: { host: 'localhost', port: 6379 },
  resilience: {
    retry: {
      maxAttempts: 5,      // Retry up to 5 times
      baseDelay: 50,       // Start with 50ms
      maxDelay: 5000,      // Cap at 5s
      backoffFactor: 2     // Double each time
    }
  }
});

Retry sequence example

With baseDelay: 100, maxDelay: 2000, backoffFactor: 2, maxAttempts: 3: 100ms → 200ms → 400ms (total ~700ms of backoff before error).

Connection Pool

Round-robin across multiple ioredis clients for higher throughput and partial failure tolerance.

const cache = new RedisGraphCache(schema, {
  redis: {
    host: 'localhost',
    port: 6379,
    poolSize: 8, // 8 concurrent connections
  }
});

Pool benefits

  • Higher throughput under high concurrency (a single ioredis client serializes commands on its socket)
  • Partial failure tolerance (one bad socket doesn't take down the cache; health is "OR" across pool members)
  • Atomicity preserved across the pool — Lua scripts are single-key (or two related keys) and atomicity is a Redis-side property

Sizing

At low traffic (a few hundred ops/sec or less) the extra connections waste resources. Default of 1 is the right choice for most apps.

Sustained ops/secRecommended poolSize
< 1k1 (default)
1k–10k4
10k–30k8
30k+16

Going much higher than 16 rarely helps and consumes Redis connection slots (default Redis maxclients is 10,000).

Monitoring

Real-time metrics for cache performance and health.

const metrics = cache.getMetrics();
console.log(metrics);
// {
//   cacheHits: 9523,
//   cacheMisses: 847,
//   hitRate: 0.918,             // 0..1, rounded to 3 d.p.
//   totalOperations: 10370,
//   avgResponseTime: 12.5,      // ms
//   failedOperations: 23,
//   activeConnections: 8,
//   memoryUsage: 0,             // cache-side; reserved
//   lastUpdated: 1714567890123
// }

Health check

const health = await cache.getHealthStatus();
console.log(health);
// {
//   status: 'healthy' | 'degraded' | 'unhealthy',
//   redis: {
//     connected: true,
//     latency: 5,                // ms; running average
//     memoryUsage: 123456789     // bytes; INFO memory used_memory
//   },
//   cache: {
//     activeOperations: 12,      // currently in-flight cache calls
//     memoryUsage: 0,            // reserved
//     errorRate: 0.002           // failed / total
//   },
//   timestamp: 1714567890123
// }
// Status mapping:
// - 'unhealthy' → PING failed
// - 'degraded'  → error rate > 5% OR circuit breaker non-CLOSED
// - 'healthy'   → neither of the above

Production Safety

Additional safeguards for production deployments.

Production mode

When NODE_ENV === 'production', clearAllCache requires an explicit allowProduction: true flag.

// In production, this is required:
await cache.clearAllCache({
  confirm: 'YES_WIPE_ALL',
  allowProduction: true,
});

Resilience Best Practices

  • Set appropriate thresholds based on your Redis latency and failure tolerance
  • Use connection pooling in production (poolSize: 4-8)
  • Monitor metrics and wire them into your observability stack
  • Handle CircuitBreakerOpenError by falling back to your primary data store
  • Always use keyPrefix when sharing Redis with other applications
  • Enable compression for large payloads to reduce memory usage