Updated: Mar 7, 2019
Cassandra is a peer to peer NOSQL database with no single point of failure if any one node goes down in the setup of minimum three node cluster then there will be no data loss and system will continue to read and write data to the system. Performance is directly proportional to the number of node. Increasing the number of nodes will increase the read write speed of the backend setup.
Cassandra nodes communicate with each other using a gossiping protocol it is an efficient and lightweight way of communication between node about their health and status of data.
Gossiping protocol also helps in failure detection. If any node goes down it first send the message to other nodes about its status is going down. So other nodes know about the node which is going down or any node which is up again from the down state.
Cassandra keeps the data on each node in the form of token and ranges. Where token is generated by using MurMur3 hash function and it is used to distribute data on different nodes while ranges are the division of data in virtual nodes on the node. Each node can have maximum 256 vnodes so when a read or write request is server it get aware which node and DC/rack placement. Each token resolute the node's position in the ring and its share of data rendering to its hash value. Suppose there are 6 nodes cluster setup and vnode is set to 8 on each of these nodes and replication factor of 3 then each of these nodes will have 8 subsets of vnodes on which random data is stored using the hashing function but there will be three copies of the same row across the cluster. As you can see there is C range which is stored on three nodes 3,5 and 6. There is a very good calculator hosted by this link for calculating and configuring your cluster setup. http://www.ecyrd.com/cassandracalculator/
As Cassandra is a peer to peer nosql database with no single point of failure. Due to replication factor which keeps more than one copy of data on different nodes across the cluster. But the replication factor should not be greater then number of nodes.
Cassandra has a tunable consistency level. Cassandra consistency level is defined as the minimum number of nodes that must acknowledge a read or write operation before the operation can be considered successful. ALL- Writes/Reads must to the commit log and memtable across all the cluster.
EACH_QUORUM- Writes/Reads must to the commit log and memtable on each quorum of nodes.
QUORUM- Writes/Reads must to the commit log and memtable on a quorum of nodes on all clusters.
LOCAL_QUORUM- Writes/Reads must to the commit log and memtable on a quorum of nodes in the same dc.
ONE- Writes must/Reads to the commit log and memtable of at least one node.
TWO- Writes/Reads must to the commit log and memtable of at least two nodes.
THREE- Writes/Reads must to the commit log and memtable of at least three nodes.
LOCAL_ONE- Writes/Reads must to and successfully acknowledged by at least one node in the local dc.
ANY- Writes/Reads must go to at least one node.
SSTables are the immutable data files that Cassandra uses for persisting data on disk.
As SSTables are moved to disk from Memtables or are moved from other nodes, Cassandra triggers compactions which combine several SSTables into one. When the new SSTable has been written, the old SSTables can be removed.
Each SSTable is comprised of multiple components stored in separate files:
Data.db Stores the actual rows data.
Index.db Stored index for the partition keys
Summary.db A specimen of every 128th entry in the Index.db file.
Filter.db Stores Bloom Filter of partition keys in the SSTable.
CompressionInfo.db Metadata about the offsets and lengths of density portions in the Data.db file.
Statistics.db Stores metadata about the SSTable, including material about timestamps, repair, compression, tombstones, clustering keys, compaction, TTLs etc.
Digest.crc32 Stores the CRC-32 abstract of the Data.db.
TOC.txt Stores plain text list of the element files for the SSTable.
Cassandra writes data to the memtable and also pushes the data to commitlog. Commit log receives every write request made to the Cassandra node. The commit log is a crash-recovery apparatus in Cassandra.
Every write request made to the Cassandra node is written to the memtable before that data also get pushed to the commitlog. Which flushed the data to the disk after certain period of time or when memory gets full. A mem-table is a memory-resident data structure.
SSTable stands for Sorted Strings Table . Cassandra stores data into the these tables on disk when data get flushed from memtables . which stores a set of immutable row fragments in sorted order based on row keys.
SSTable versions are incremented when the format changes.
Sstable compatibility and upgrade version
Cassandra guarantees no loss of data and data not get lost for single point of failure. In a setup of multi mode cluster if any node goes down then data still got intact and reads get response. Up sizing or downsizing the Cassandra cluster any time is super easy and in production environment and there will be no effect in the working of the application.