ryanlecompte's comments

ryanlecompte · on Oct 12, 2012

redis_failover provides a full automatic master/slave failover solution for Ruby.

Changes in the 1.0 release:

redis_failover now supports distributed monitoring among the Node Managers! Previously, the Node Managers were only used as a means of redundancy in case a particular node manager crashed. Starting with version 1.0 of redis_failover, the Node Managers will all periodically report their health report/snapshots. The primary Node Manager will utilize a configurable "node strategy" to determine if a particular node is available or unavailable.

redis_failover now supports a configurable "failover strategy" that's consulted when performing a failover. Currently, a single strategy is provided that takes into account the average latency of the last health check to the redis server.

Improved handling of underlying ZK client connection in RedisFailover::NodeManager

Add support for passing in an existing ZK client instance to RedisFailover::Cient.new

Reduce unnecessary writes to ZK

ryanlecompte · on July 27, 2012

Hey Steven,

ZooKeeper is actually a proven technology for solving distributed configuration. It's widely used by Yahoo! and Netflix among other large companies. It's a proven PAXOS-like implementation, and definitely isn't a single point of failure. Check it out!

ryanlecompte · on April 17, 2012

FYI, redis_failover has now been rewritten to sit on top of ZooKeeper to deal with network partitions, stability, and data consistency. From the README:

redis_failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not provide an automatic failover capability when configured for master/slave replication. When the master node dies, a new master must be manually brought online and assigned as the slave's new master. This manual switch-over is not desirable in high traffic sites where Redis is a critical part of the overall architecture. The existing standard Redis client for Ruby also only supports configuration for a single Redis server. When using master/slave replication, it is desirable to have all writes go to the master, and all reads go to one of the N configured slaves.

This gem attempts to address these failover scenarios. A redis failover Node Manager daemon runs as a background process and monitors all of your configured master/slave nodes. When the daemon starts up, it automatically discovers the current master/slaves. Background watchers are setup for each of the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unavailable" state. If the node that went offline was the master, then one of the slaves will be promoted as the new master. All existing slaves will be automatically reconfigured to point to the new master for replication. All nodes marked as unavailable will be periodically checked to see if they have been brought back online. If so, the newly available nodes will be configured as slaves and brought back into the list of available nodes. Note that detection of a node going down should be nearly instantaneous, since the mechanism used to keep tabs on a node is via a blocking Redis BLPOP call (no polling). This call fails nearly immediately when the node actually goes offline. To avoid false positives (i.e., intermittent flaky network interruption), the Node Manager will only mark a node as unavailable if it fails to communicate with it 3 times (this is configurable via --max-failures, see configuration options below).

This gem provides a RedisFailover::Client wrapper that is master/slave aware. The client is configured with a list of ZooKeeper servers. The client will automatically contact the ZooKeeper cluster to find out the current state of the world (i.e., who is the current master and who are the current slaves). The client also sets up a ZooKeeper watcher for the set of redis nodes controlled by the Node Manager daemon. When the daemon promotes a new master or detects a node as going down, ZooKeeper will notify the client near-instantaneously so that it can rebuild its set of Redis connections. The client also acts as a load balancer in that it will automatically dispatch Redis read operations to one of N slaves, and Redis write operations to the master. If it fails to communicate with any node, it will go back and fetch the current list of available servers, and then optionally retry the operation.

ryanlecompte · on April 12, 2012

That's right. The client still maintains direct connections with the actual master/slaves. It's only when it fails to connect with one of them that it goes to the failover daemon to ask for the current set of available nodes. The split of the reads/writes is handled by the client, as it knows where to dispatch commands (to master for writes, and to one of the slaves for reads). I'll make this clearer in the README.

ryanlecompte · on April 12, 2012

The gem has a configurable --max-failures option that can be passed to failover daemon. The daemon will only mark a node as being unreachable if it fails to ping that amount of times (default 3). This might be something that can be improved too, but it was meant to avoid false positives.

ryanlecompte · on April 12, 2012

From the README:

Redis Failover attempts to provides a full automatic master/slave failover solution for Ruby. Redis does not provide an automatic failover capability when configured for master/slave replication. When the master node dies, a new master must be manually brought online and assigned as the slave's new master. This manual switch-over is not desirable in high traffic sites where Redis is a critical part of the overall architecture. The existing standard Redis client for Ruby also only supports configuration for a single Redis server. When using master/slave replication, it is desirable to have all writes go to the master, and all reads go to one of the N configured slaves.

This gem attempts to address both the server and client problems. A redis failover server runs as a background daemon and monitors all of your configured master/slave nodes. When the server starts up, it automatically discovers who is the master and who are the slaves. Watchers are setup for each of the redis nodes. As soon as a node is detected as being offline, it will be moved to an "unreachable" state. If the node that went offline was the master, then one of the slaves will be promoted as the new master. All existing slaves will be automatically reconfigured to point to the new master for replication. All nodes marked as unreachable will be periodically checked to see if they have been brought back online. If so, the newly reachable nodes will be configured as slaves and brought back into the list of live servers. Note that detection of a node going down should be nearly instantaneous, since the mechanism used to keep tabs on a node is via a blocking Redis BLPOP call (no polling). This call fails nearly immediately when the node actually goes offline.

This gem provides a RedisFailover::Client wrapper that is master/slave aware. The client is configured with a single host/port pair that points to redis failover server. The client will automatically connect to the server to find out the current state of the world (i.e., who's the current master and who are the current slaves). The client also acts as a load balancer in that it will automatically dispatch Redis read operations to one of N slaves, and Redis write operations to the master. If it fails to communicate with any node, it will go back and ask the server for the current list of available servers, and then optionally retry the operation.