Redis monitoring guide

Redis is a prevalent in-memory data store that provides high performance and low latency for applications. Its advanced data structures and programmability make it a leading choice for caching, real-time analytics, and other use cases that require fast and efficient data processing.

In the following article, we share a detailed guide on how to monitor Redis effectively. We start by introducing Redis and some of its alternatives, after which we share a list of important performance metrics and discuss how to monitor them using different commands and tools.

What is Redis?

Redis is an open-source, in-memory, data structure store that also offers optional persistence. Because Redis stores data in memory, it is inherently faster than traditional, disk-based databases. Additionally, Redis's replication and clustering capabilities enable developers to build distributed systems that can handle high traffic and large data volumes.

It’s called a “data structure” store because it natively supports several data structures, including strings, sets, lists, hashes, bitmaps, geospatial indexes, and sorted sets. This enables developers to store and process a wide range of data types in Redis.

For example, developers can use sorted sets to work with data that requires sorting or ranking, such as leaderboards or rate limiters. They can use the same Redis instance to also store location coordinates, using the geospatial data type.

Like traditional database systems, Redis supports transactions, which allow users to execute multiple commands in one step. Developers can also perform server-side scripting using Lua, or write stored procedures using the Redis Functions API.

Redis is often deployed as a distributed application cache, but can also be used as a standalone database. Another common use case for Redis is as a messaging system for real-time applications. Its publish/subscribe implementation is packed with useful features, like pattern-matching subscriptions and sharding. The Redis stream data type can be used to power data streaming and messaging use cases at scale.

Redis Alternatives – Redis vs. Memcached, Aerospike, Elasticsearch, and MongoDB

While searching for an in-memory data store, you may find yourself comparing Redis with alternatives, like Memcached, Aerospike, Elasticsearch, and MongoDB. Each data store has its own unique features and capabilities, which make them well suited for different use cases.

Redis vs. Memcached

Redis and Memcached are both key-value stores that are optimized for caching. In terms of read/write speed and memory usage, both offer similar performance and efficiency. However, Redis comes with configurable persistence, whereas Memcached doesn’t.

Redis also supports a wider range of data structures, which makes it more flexible than Memcached, and offers more features, such as persistence, transactions and atomicity, pipelining, pub-sub, Lua scripting, and clustering.

Redis vs. Aerospike

Aerospike is an in-memory, NoSQL database optimized for high data volumes. Like Redis, it offers a rich set of features, including server-side scripting, elastic scaling, optional persistence, and multi-cloud support. Both technologies deliver similar levels of performance, availability, atomicity, and resilience.

While Redis supports a wider range of data structures than Aerospike, the latter offers more flexibility with regards to replication and access control. In addition, client library support for Aerospike is limited when compared to Redis.

Redis vs. Elasticsearch

Elasticsearch is a full-text search engine that is optimized for fast and efficient search queries. While Redis, Aerospike, and Memcached are primarily used for caching, Elasticsearch is designed for searching and analyzing data.

Elasticsearch's advanced search capabilities and support for structured and unstructured data make it an ideal choice for use cases like analytics, fuzzy searching, and document processing. With that said, if desired, you can also utilize Elasticsearch as a cache for your primary storage. Just remember that since it’s not optimized for caching by default, you will have to manually tune the cluster for better indexing and concurrency.

Redis vs. MongoDB

Even though MongoDB is primarily a traditional, disk-based database, it also offers an in-memory engine for faster read and write operations. This in-memory engine is optimized to query large volumes of data in real time.

Both MongoDB and Redis provide high-level concurrency, transactions, and flexible deployment architectures. However, compared to Redis, the support for data structures and advanced caching features in MongoDB is limited.

In short, there is no outright winner when comparing Redis, Aerospike, Memcached, Elasticsearch, and MongoDB. The one you end up choosing will depend on your personalized business needs and developer preferences.

Important Redis performance metrics to monitor

Redis exposes several metrics related to its status, performance, availability, and replication. Administrators can use these metrics to monitor health, predict malfunctions, resolve bottlenecks, and identify avenues for improvement.

Cluster and node metrics

Monitoring metrics related to individual nodes and the overall cluster is important in ensuring that Redis is performing as expected. You can use the GET /v1/nodes/stats endpoint of the Redis REST API to obtain the following real-time statistics of a node:

  • available_memory: Total available RAM on the node, in bytes. Anomalous rise in the value of this metric should be immediately investigated, especially if you are not holding data in the cache for a long time.
  • avg_latency: Average latency of all the requests handled by this Redis node, in microseconds. Too high a value of this metric can lead to bottlenecks in the larger system.
  • conns: Total number of clients that are connected to this node. Strive to keep this metric’s value to the bare minimum; i.e., identify and remove any zombie connections.
  • ingress_bytes: The rate, in bytes per second, at which the node is receiving network traffic.
  • total_req: The request rate, in operations per second, that this node has handled.
  • free_memory: Total free memory on the node, in bytes. Exponential increase in this metric’s value typically points to poor memory management.
  • available_flash: Available flash memory, in bytes.
  • cpu_idle: A float value representing how long the CPU spent in the idle state. Multiplying this value by 100 gives the percentage CPU idle time.
  • cpu_system: A float value representing how long the CPU spent in the kernel space. Multiplying this value by 100 gives the percentage system CPU time.
  • cpu_user: A float value representing how long the CPU spent processing user requests. Multiplying this value by 100 gives the percentage user CPU time.
  • persistent_storage_avail: The total disk space, in bytes, that’s available to the Redis Enterprise processes.
  • persistent_storage_free: The total free disk space, in bytes.
  • ephemeral_storage_free: The total free disk space, in bytes, on the configured ephemeral disk.
  • ephemeral_storage_avail: The total available disk space on the configured ephemeral disk that’s available to Redis.

To obtain the same statistics (as above) at the cluster level, all you need to do is modify the endpoint to GET /v1/cluster/stats.

Database and shard metrics

Database metrics paint a clear picture of how well a Redis instance is dealing with data. The GET /v1/bdbs/stats endpoint of the Redis API returns the following key statistics in its response:

  • avg_latency: The average latency of all database operations, in microseconds. On a healthy Redis instance, the value of this metric should stay low. If you are experiencing high average latency, use the Redis Slow Log feature to identify any slow-running queries.
  • avg_read_latency: The average latency of all read operations performed on the database.
  • avg_write_latency: The average latency of all write operations performed on the database.
  • avg_other_latency: The average latency of all other (not read or write) operations performed on the database.
  • conns: The total number of client connections to the endpoints exposed by the database. Strive to keep this metric’s value to the bare minimum; i.e., identify and remove any zombie connections.
  • egress_bytes: The rate, in bytes/second, at which traffic is flowing out of the database.
  • ingress_bytes: The rate, in bytes/second, at which traffic is coming into the database.
  • expired_objects: The per-second rate of expired keys in the database.
  • evicted_objects: The per-second rate at which keys are being evicted out of the database.
  • last_res_time: The time at which the last response was generated by the database.
  • last_req_time: The time at which the database received the last request.
  • no_of_expires: The total number of keys that will be removed on expiration.
  • no_of_keys: The total number of keys in the database.
  • pubsub_channels: The total number of publish-subscribe patterns in the database.
  • total_req: The rate, in operations/second, at which the database is receiving requests.
  • total_res: The rate, in operations/second, at which the database is generating responses.
  • write_req: The rate, in operations/second, at which the database is receiving write requests.
  • write_res: The rate, in operations/second, at which the database is generating write responses.
  • read_req: The rate, in operations/second, at which the database is receiving read requests.
  • read_res: The rate, in operations/second, at which the database is generating read responses.

To view metrics specific to sharding, you can use the GET /v1/shards/stats endpoint. Some of the metrics it returns are:

  • aof_rewrite_inprog: The total number of concurrently running append-only file (AOF) rewrites.
  • avg_ttl: The average time-to-live (TTL) of a random key.
  • blocked_clients: A count of the clients that are waiting on a blocking call. Too many blocking calls can significantly impact a cluster’s performance.
  • connected_clients: A count of the number of connected clients to the shard.
  • shard_cpu_system: Percentage core utilization of the Redis shard process, in the system mode.
  • shard_cpu_user: Percentage core utilization of the shard process, in the user mode.
  • main_thread_cpu_system: Percentage core utilization of the shard main thread in the system mode.
  • main_thread_cpu_user: Percentage core utilization of the shard main thread in the user mode.
  • fork_cpu_system: Percentage core utilization of the shard fork child process in the system mode.
  • fork_cpu_user: Percentage core utilization of the shard fork child process in the user mode.
  • last_save_time: The time at which the Redis Backup File (RDB) was last saved.
  • used_memory: Total memory that this shard has used, in bytes.
  • used_memory_peak: The maximum amount of memory, in bytes, that this shard has used since it has been initialized.

Memory metrics

The Redis INFO command outputs the following list of memory-related statistics that can be key in detecting issues and optimizing throughput.

  • mem_allocator: The memory allocator chosen while compiling the Redis core.
  • used_memory: The total amount of memory, in bytes, that the Redis core has allocated. This metric should show an even progression over time. Unexpected fluctuations typically point to critical issues.
  • used_memory_startup: The amount of memory, in bytes, that Redis used at startup.
  • used_memory_peak: The maximum amount of memory, in bytes, that the Redis instance has used since startup.
  • total_system_memory: The total amount of memory, in bytes, that’s available to the Redis host.
  • used_memory_lua: The total amount of memory that the Lua engine has used, in bytes.
  • mem_clients_slaves: The total amount of memory used by the replica clients, in bytes.
  • mem_clients_normal: The total amount of memory used by the normal clients, in bytes.
  • mem_aof_buffer: The total amount of transient memory, in bytes, that’s used by AOF.
  • mem_replication_backlog: The total amount of memory used by the replication backlog, in bytes.

Replication metrics

The output of the Redis INFO command contains several metrics related to replication. Some of the most important metrics are:

  • role: If the instance isn’t a replica of any other node, this field returns “master”. Otherwise, it returns “slave”.
  • master_failover_state: If any failover is in progress, this metric will return the state of that failover.
  • repl_backlog_size: The total size of the replication backlog buffer, in bytes.
  • total_net_repl_input_bytes: The total number of incoming bytes for replication purposes.
  • total_net_repl_output_bytes: The total number of outgoing bytes for replication purposes.
  • instantaneous_input_repl_kbps: The rate at which data is being read from the network for replication purposes, measured in KB/sec.
  • instantaneous_output_repl_kbps: The rate at which data is being written to the network for replication purposes, measured in KB/sec.
  • master_replid: The replication id of the Redis instance.
  • repl_backlog_active: A Boolean value (true/false) indicating whether the replication backlog is active.

If the instance is a replica, some additional fields are returned, including:

  • master_sync_in_progress: This field indicates whether a master is syncing to this replica instance.
  • slave_priority: The priority of this instance to be promoted as a master, in case of a failover.
  • slave_read_only: A Boolean value indicating whether the slave is running in a read-only state.
  • master_last_io_seconds_ago: The number of seconds since this replica had its last interaction with the master node.
  • master_link_status: The status of the connection with the master node.
  • slave_repl_offset: The replication offset of this replica node.

Latency monitoring

The LATENCY group of commands can be used to perform latency monitoring of a Redis cluster. The LATENCY LATEST command outputs the most recently logged latency event. The output includes the name of the event, the timestamp at which the latest spike happened, the latest latency (in milliseconds), and the all-time highest latency for the event.

The LATENCY HISTORY command displays the latency time series for an event. It comes in handy when you want to analyze an event’s historical latency trends. If you want to reset the time series for an event, you can use the LATENCY RESET command.

If you want to visualize the trends of a latency event, the LATENCY GRAPH command is the one to use. It returns an ASCII-art style graph. Lastly, if you want a detailed, human-readable analysis of latency issues and potential solutions, you can issue the LATENCY DOCTOR command.

Monitor Redis using the admin console

The Redis admin console is a web-based application that shows metrics related to the cluster, nodes, databases, shards, replication, and memory. To log into the admin console you must retrieve the login credentials stored in a Kubernetes Secret, and forward the local port to the port of the UI service (default port is 8443). After that, you can view the console by visiting https://www.site24x7.net.au.

The admin console is a great place to get a holistic overview of a Redis instance. It displays information related to connections, CPU usage, incoming and outgoing traffic, disk space, RAM, latency, memory limit, and more. It also shows several database-specific metrics, such as evicted objects/second, hit ratio, command latency, RAM fragmentation, and total keys.

Monitor Redis using the MONITOR command

The MONITOR command in Redis provides a continuous stream of all commands that are being executed by the Redis server. It can act as a powerful debugging tool by allowing users to observe and understand what’s happening to a live Redis server.

For example, if you are debugging an application, you may enter MONITOR to see which commands the application is executing on Redis. Or if you are experiencing an exceptionally high request rate, you can use the MONITOR command to see the nature of the incoming requests.

You can issue the MONITOR command via the Redis CLI or telnet. Use Ctrl+C to stop a MONITOR stream started via the CLI. To close a telnet stream, you have to enter the RESET or QUIT commands. MONITOR doesn’t log administrative commands and redacts sensitive data. It’s important to note that continuously streaming MONITOR on a production instance can impact performance.

Monitor Redis using the Site24x7 monitoring plugin

Site24x7's monitoring plugin provides great insight into the health, performance, and availability of a Redis cluster. The plugin includes a web-based dashboard that lets you visualize important metrics as graphs and charts.

For example, you can monitor used and peak used memory, CPU utilization in system and user modes, key-space hits and misses, rejected connections, connected slaves, connected clients, and more. The Python-based plugin can be downloaded directly from GitHub, and installed in a few simple steps.

Conclusion

Redis is a popular in-memory data store that excels in performance, resilience, scalability, and durability. Its easy-to-use query interface, support for multiple data structures, built-in replication and clustering features, and data persistence make it a staple of several distributed IT infrastructures. Whether you are building for the cloud or on-premise, you can use Redis for caching, session management, or real-time data streaming.

Was this article helpful?

Related Articles

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us