Cassandra On AWS - Part 1 - Setup
Deployment Model
Cassandra will be set up as a multi node cluster on AWS. AWS has several regions, with each region constituting multiple availability zones (AZ). The AZ’s in a region are connected to each other and provide low latency inter-communication. This setup helps setting up a fault tolerance system by distributing the service or data across different zones in a region.Most of the AWS regions have 3 AZ’s. As a best practice, it’s good to have at least one EC2 instance in each AZ of AWS region. In case you are working with a region which has only 2 regions then one of the region will have one more EC2 instance than others.
Setup
AWS
- Choose a region which has three availability zones. For the purpose of this post, I have chosen us-west-2.
- Spin up one EC2 instance in each AZ.
- Databases should always be on private subnets of VPC and should not be open to internet.
- Update the firewall setting on the security group associated with the EC2 instance to allow incoming traffic on the following ports
- Port: 9042 This is the port for CQL clients
- Port: 7000 This port is used for inter-node communication on the cluster
Installable
- Download Cassandra 2.2.4V from the below URL.
http://downloads.datastax.com/community/dsc-cassandra-2.2.4-bin.tar.gz - Unzip it to a location on the EC2 machine.
- Do this on all three nodes.
Cassandra Configuration
Update the configuration files present under the /conf folder of Cassandra installation.
File : cassandra-rackdc.properties
dc_suffix = 2a_cassandra # This property uniquely identify a node in a datacenter(DC).DC names are automatically assiged by Cassandra using EC2Snitch/EC2MultiRegionSnitch.
prefer_local = true
File : cassandra.yaml
partitioner: org.apache.cassandra.dht.Murmur3Partitioner # This is the default and we will keep it as it is. Used to hash and distribute the keys across different nodes.
endpoint_snitch: Ec2Snitch # Use EC2MultiRegionSnitch if you are setting up multiple clusters spanning different regions. Otherwise use Ec2Snitch
listen_address: 10.101.212.201 # This will be the private IP address of the EC2 instance. Will vary from instance to instance.
broadcast_address: 10.101.212.201 # private IP address of the EC2 instance.
rpc_address: 10.101.212.201 # private IP address of the EC2 instance
seeds: "10.101.212.206,10.101.214.60" ## IP address of the nodes acting as seeds
key_cache_size_in_mb: 100000
data_file_directories: # location where database files needs to be stored.
-/local/mnt/cassandra/data
commitlog_directory: /local/mnt/cassandra/commitlog #location where the commit logs needs to be stored.