Cassandra On AWS - Part 3 - Backup Strategy

This third part will focus on the databack up stragey for Cassandra on AWS.

##Cassandra Data Backup Strategy

In case of a node failure in a cluster, Cassandra can automatically repair and get the node up to speed on the data using the replicated data that is present on the other nodes.

It is advised to enabled incremental backup and then collect snapshots at a regular intervals( say 24 hrs).

The snapshot backups and incremental backups can be collected from the nodes and then pushed into S3 for use during restore.

  • Snapshot backup should be taken on a daily basis at 12:00 A.M midnight.
  • Snapshots & incremental backup files older than 7 days can be deleted from S3.
  • S3 backup location should be named and organized based on Cluster and node names. In case of AWS these will be Region and AZs.

##Cassandra Node Failure Recovery Strategy

If a node in a cluster fails or if the node restarts and is out of the cluster for a certain time then it is mandatory to run the node repair tool on the node.

Execute the following command on the failed node soon after startup:

nodetool repair -dc <datacenter or AZ name> -h localhost

Example:
nodetool repair -dc us-west-2 -h localhost

Node repair makes sure that that old delete data(Tombstoned) does not resurrect as new data on this node.

##Reference

[Taking a Snapshot]

[Incremental Backup]

[Restoring from a snapshot]

Elankumaran Srinivasan

Elankumaran Srinivasan
I am a software engineer, Java and open source technonogy enthusiast.

Counter-Badging Service Architecture

Most of the applications have some sort of badging functionality which is used to display certain counts to the users for CTA (Call To Ac...… Continue reading

Hystrix, Turbine with Spring Boot Admin

Published on April 16, 2017

Implementing CHAT system with MongoDB Atlas

Published on February 27, 2017