Cassandra Tutorial for Beginners | Learn Apache Cassandra
Keeping you updated with latest technology trends, Join DataFlair on Telegram
1. Apache Cassandra Tutorial – Objective
In this Apache Cassandra Tutorial, we will learn about what is Cassandra, what is NoSQL database and also a short introduction of Relational/SQL vs NoSQL database. We will also learn about different Apache Cassandra features, Cassandra History, Architecture of Cassandra. In addition, we are going to study Apache Cassandra Applications and Cassandra Data Model.
As the technology advances, the data has also expanded and grown exponentially. So, there is a need for a new database. After many databases, Apache Cassandra came up eventually, fulfilling every requirement.
So, let’s start Cassandra Tutorial.
2. What is NOSQL Database?
There are two types of databases, Relational or SQL Database and NoSQL Database. Relational Database provides a mechanism to store and retrieve data through tabular relations. In other words, it consists of relational data. Whereas, NoSQL database consists of non-relational data. This NoSQL database has a few advantages over SQL or Relational Database. They can handle a huge amount of data and support easy replication and also have a simple API. In this different data structures are used as compared to the relational database.
3. What is Apache Cassandra?
Apache Cassandra is an example of NoSQL Database. It is a distributed, decentralized and an open-source database or a storage system. It is basically used for managing very large amounts of structured data. There is no single point of failure, providing highly available services.
4. Cassandra Features
In this part of Cassandra Tutorial, we discuss some important features of Cassandra:
Every node in the cluster is identical. There are no single points of failure.
b. Fault Tolerance
Since data is replicated to multiple nodes, fault tolerance is pretty high. Also, any failed nodes are related with no downtime.
The fundamental architecture of Apache Cassandra is very robust. Therefore, it proves to be better than the alternatives.
It is linearly scalable. In other words, the throughput is increased as you increase the number of nodes in the Cassandra cluster.
Apache Cassandra is used at various companies. Some of them are Netflix, GoDaddy, GitHub, eBay etc.
These features of Apache Cassandra shows that it is actually very powerful and reliable.
5. Cassandra Tutorial – History
Avinash Lakshman and Prashant Malik initially developed Cassandra at Facebook. In July 2008, Facebook released Cassandra as an open-source project on Google Code. Later in March 2009, it became Apache Incubator project. Eventually, it became a top-level project in February 2010.
After that there were many releases of Apache project:
Have a look at Cassandra Documented shell Commands
Table no.1 Apache Cassandra Tutorial – Cassandra History
|S. No.||Version||Release Date|
|1||0.6||April 12, 2010|
|2||0.7||January 08, 2011|
|3||0.8||June 02, 2011|
|4||1.0||October 17, 2011|
|5||1.1||April 23, 2012|
|6||1.2||January 02, 2013|
|7||2.0||September 01, 2013|
|8||2.1||September 10, 2014|
|9||2.2||July 20, 2015|
|10||3.0||November 11, 2015|
|12||3.11||June 23, 2017|
6. Cassandra Architecture
The architecture of Cassandra has various components. Some of them are:
Data is stored here.
b. Data Center
It is a collection of related nodes.
c. Commit Log
It is a mechanism in Cassandra for recovery when it crashes.
Collection of data centres.
It is a memory resident data structure.
When the contents of mem-table reach the threshold value, the data is flushed here.
g. Bloom Filter
These are algorithms to test if an element is a member of a set.
It is the process of freeing up space by merging the large accumulated data files.
7. Cassandra Tutorial – Data Model
Below, we are discussing data models in Cassandra:
A Cassandra Cluster is a storage unit for data centers.
The outermost storage container for data in Cassandra is keyspace.
8. Cassandra Applications
Cassandra is used for many applications. Some of Cassandra Applications are:
- AppScale: Back-end for Google App Engine applications.
- Cisco‘s WebEx: storage for user feed and activity in near real time.
- Globo.com:Back-end database for their streaming services.
- Mahalo.com:Record user activity logs and topics for their Q&A website.
- Netflix: Back-end database for their streaming services.
- Nutanix: Store metadata and stats.
9. Conclusion – Apache Cassandra Tutorial
Hence, in this Apache Cassandra Tutorial, we studied that Cassandra is an open source database which can manage a large amount of structured data. Moreover, we went through two basic models of Cassandra i.e. cluster and keyspace. Also, we looked at Cassandra features, and applications Next article will be about books on Cassandra, that will help to increase your knowledge. Furthermore, if you have any query, feel free to ask in the comment section.
See also –