Cassandra ClusterVM by Anarion Technologies
A Cassandra cluster is a sophisticated and robust distributed database system engineered to manage vast quantities of data efficiently across many servers. The architecture of a Cassandra cluster is built to provide high availability and fault tolerance without any single point of failure. This is achieved through several key mechanisms and design principles:
- Scalability: One of the standout features of a Cassandra cluster is its ability to scale horizontally. As data volume or request rates grow, new nodes can be added to the cluster seamlessly, without downtime.
- Data Replication: To ensure fault tolerance and high availability, Cassandra automatically replicates data across multiple nodes. The replication factor is configurable, allowing for a trade-off between consistency and fault tolerance. In the event of a node failure, the data remains accessible from other nodes that hold replicas, ensuring that there is no single point of failure.
- Partitioning and Data Distribution: Data in Cassandra is partitioned across the cluster using a consistent hashing mechanism. This ensures that the data is evenly distributed among all the nodes in the cluster. The partitioning scheme helps in distributing the load evenly, avoiding hotspots, and making the best use of the cluster’s resources.
- Column Family Store: Cassandra uses a column-family data model, which is a hybrid between a key-value store and a tabular database. This model provides flexibility in data storage, allowing for efficient handling of structured, semi-structured, and unstructured data. It is particularly effective for time-series data, sensor data, and other types of big data workloads.
To subscribe to this product from Azure Marketplace and initiate an instance using the Azure compute service, follow these steps:
1. Navigate to Azure Marketplace and subscribe to the desired product.
2. Search for “virtual machines” and select “Virtual machines” under Services.
3. Click on “Add” in the Virtual machines page, which will lead you to the Create a virtual machine page.
4. In the Basics tab:
- Ensure the correct subscription is chosen under Project details.
- Opt for creating a new resource group by selecting “Create new resource group” and name it as “myResourceGroup.”
5. Under Instance details:
- Enter “myVM” as the Virtual machine name.
- Choose “East US” as the Region.
- Select “Ubuntu 18.04 LTS” as the Image.
- Leave other settings as default.
6. For Administrator account:
- Pick “SSH public key.”
- Provide your user name and paste your public key, ensuring no leading or trailing white spaces.
7. Under Inbound port rules > Public inbound ports:
- Choose “Allow selected ports.”
- Select “SSH (22)” and “HTTP (80)” from the drop-down.
8. Keep the remaining settings at their defaults and click on “Review + create” at the bottom of the page.
9. The “Create a virtual machine” page will display the details of the VM you’re about to create. Once ready, click on “Create.”
10. The deployment process will take a few minutes. Once it’s finished, proceed to the next section.
To connect to the virtual machine:
1. Access the overview page of your VM and click on “Connect.”
2. On the “Connect to virtual machine” page:
- Keep the default options for connecting via IP address over port 22.
- A connection command for logging in will be displayed. Click the button to copy the command. Here’s an example of what the SSH connection command looks like:
“`
ssh [email protected]
“`
3. Using the same bash shell that you used to generate your SSH key pair, you can either reopen the Cloud Shell by selecting >_ again
or going to https://shell.azure.com/bash.
4. Paste the SSH connection command into the shell to initiate an SSH session.
Usage/Deployment Instructions
Anarion Technologies – Cassandra Cluster
Note: Search product on Azure marketplace and click on “Get it now”
Click on Continue
Click on Create
Creating a Virtual Machine, enter or select appropriate values for zone, machine type, resource group and so on as per your choice.
After Process of Create Virtual Machine. You have got an Option Go to Resource Group
Copy the Public IP Address
SSH in your terminal and run these following commands:
$ sudo su
$ sudo apt update
Open the cassandra.service
file for editing:
$ sudo nano /etc/systemd/system/cassandra.service
Verify that the User
and Group
directives are correctly set to a valid user and group. For example:
[Unit]
Description=Apache Cassandra
[Service]
Type=simple
User=cassandra
Group=cassandra
ExecStart=/opt/cassandra/bin/cassandra -f
Restart=on-failure
[Install]
WantedBy=multi-user.target
Ensure the user and group specified (cassandra
in this example) exist on your system. You can create them if they don’t exist:
$ sudo useradd -r -s /bin/false cassandra
$ sudo groupadd cassandra
$ sudo usermod -aG cassandra cassandra
Set appropriate ownership for the Cassandra directories:
$ sudo chown -R cassandra:cassandra /opt/cassandra
Reload the systemd daemon to apply any changes:
$ sudo systemctl daemon-reload
Try starting the Cassandra service again:
$ sudo systemctl start cassandra.service
Check the status to ensure it’s running:
$ sudo systemctl status cassandra.service
Cassandra’s default configuration applies when Cassandra is used on a single node. However, if
Cassandra is used in a cluster or by multiple Cassandra nodes simultaneously, it’s handy to make
some modifications to the configuration file.
The Cassandra configuration file is called cassandra.yaml and is located at /etc/cassandra/. Open it with your preferred text editor and modify some of its settings:
$ sudo nano /etc/cassandra/cassandra.yaml
Firstly, change the name of the cluster. Look for the cluster_name parameter and assign a name:
cluster_name: [cluster_name]
It’s preferrable to change the data storage port. To do this, look for the storage_port parameter and assign one. Remember that it must be an available port in the Ubuntu firewall for everything to work correctly. In our case, the port is set as 7000.
storage_port :[port]
Finally, look for the seed_provider parameter and add the IP addresses of the nodes that make up the cluster, separated by a comma:
Seeds: [node_ip]:[node_port],[node_ip]:[node_port]…[node_ip]:[node_port]
Once done, save the file and reload Cassandra.
$ sudo systemctl reload cassandra
Now test out the connection with the following command:
$ nodetool status
Learn Basic Cassandra Query Language (CQL) Commands
The following section will showcase Cassandra’s most popular basic CQL commands and provide some practical examples.
$ cqlsh
cqlsh, or Cassandra query language shell, is used to communicate with Cassandra and initiate Cassandra Query Language. To start cqlsh, use the following command:
root@myawesomevps:/# cqlsh
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.5 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>
HELP
The HELP command lists out descriptions for all possible cqlsh commands:
For example, the output for HELP SHOW would look like this:
cqlsh> HELP SHOW
SHOW [cqlsh only]
Displays information about the current cqlsh session. Can be called in the
following ways:
SHOW VERSION
Shows the version and build of the connected Cassandra
instance, as well as the version of the CQL spec
that the connected Cassandra instance understands.
SHOW HOST
Shows where cqlsh is currently connected.
SHOW SESSION <sessionid>
Pretty-prints the requested tracing session.
cqlsh>
SHOW
The SHOW command displays all the information about the current
cqlsh session. You can choose
between showing host, version, and session information:
cqlsh> SHOW VERSION
[cqlsh 6.0.0 | Cassandra 4.0.5 | CQL spec 3.4.5 | Native
protocol v5]
cqlsh> SHOW HOST
Connected to Test Cluster at 127.0.0.1:9042
cqlsh>
CREATE KEYSPACE
A keyspace specifies data replication. In the following
example, we will create a new keyspace and specify the replication factor:
cqlsh> CREATE KEYSPACE testingout
WITH REPLICATION = {
‘class’ :
‘SimpleStrategy’,
‘replication_factor’
: 1
};
USE
The USE command sets the current working keyspace:
cqlsh> USE testingout;
cqlsh:testingout>
CREATE TABLE
In order to create a table, users need to use the CREATE
TABLE command. Here they will need to specify column names, data types, and
primary key:
cqlsh:testingout> CREATE TABLE tabletest (
name TEXT PRIMARY KEY,
surname TEXT,
phone INT
);
INSERT
INSERT command is used to add an entire row into a table.
Mind that missing values will be set to null:
cqlsh:testingout> INSERT INTO tabletest (name, surname,
phone)
VALUES (‘John’, ‘Johnson’, 456123789);
cqlsh:testingout>