Search

Cassandra data models

Updated: Mar 7, 2019



Objectives

Data modeling is the hardest part for the people coming from RDBMS back ground. The goal here is to define the basic rules one should keep in mind before designing the schema. If you follow these guide lines you will design a good performance schema.


Cassandra Data Model

To design a good schema first we need to carefully understand our use cases for needed data and then design the schema around the queries. As Cassandra is NOSQL peer to peer database with data distributed on multiple machines and datacenters so data need to be distributed evenly across cluster and group of data is necessary so similar data can be found easily on one node instead of getting the data from multiple nodes. As storage is cheap now days but stream out big size of data is time taking. So, we must carefully select the partition and clustering key in our schema while designing the schema.


Partition Key

As Cassandra is a distributed database and stored data on multiple nodes. Partition key is responsible for distributing the data on multiple nodes. It uses murmur3 hashing on the primary key and distribute the data on multiple nodes.


Clustering Key

It is responsible of storing data efficiently on single node on SSD. So, data must be stored closely so it can be retrieved in minimum operations. It is storing of data in sorted order on the disk. It is especially useful for storing the time series data.

-P1: Primary key (PK)

-(P1,C1): PK, where C1 is the clustering key.

-((P1,P2),(C1,C2)) Both Primary and clustering keys can be compound keys and consist of more then one fields.


Not to do

  1. Reducing the number of writes

  2. Reducing the data duplicates

  3. Basic Guide Lines

  4. Spread data in the cluster evenly

  5. Lessen the number of partitions to read.

  6. Model around your use cases (queries)


Model around your queries and determine what exactly queries you want to create for fetching the data you need.


Grouping by fields


  1. Order by fields

  2. Filtering by some fields

  3. Distinct results

  4. Changes in any of these conditions will change the design of your model.

  5. What queries to support

  6. Determine the queries to support

  7. Write the query in such a way that it will read from only one partition.







Recent Posts

See All

Call

T: +44 7825018241

Follow me

 

© 2020 by GoplarDB 

All Rights Reserved

Powered by: Goplar LTD

  • LinkedIn Social Icon
  • Twitter Clean