Apache Spark - Anarion Technologies

Skip to content

Anarion Technologies

Home
Company
Technology
Services
- Sharepoint Services
- Power Platform Services

Apache Spark VM by Anarion Technologies

Apache Spark

Application Installed

Launch from Marketplace

About

How it works

Deployment

Support

About

Apache Spark is a powerful, open-source, distributed computing framework designed for big data processing and analytics. Originally developed at UC Berkeley, Spark has become one of the most widely used platforms for handling large-scale data processing tasks. It provides an in-memory computing architecture that significantly accelerates data processing by reducing the need for disk I/O, making it much faster than traditional batch processing systems like Hadoop MapReduce. Spark is capable of processing both batch and real-time data, supporting diverse workloads such as data querying, machine learning, graph processing, and stream processing.

Apache Spark offers a unified analytics engine that supports multiple programming languages, including Java, Scala, Python, and R, enabling a broad range of users, from developers to data scientists, to interact with the system using their preferred language. The platform includes several key libraries, such as MLlib for machine learning, Spark SQL for querying structured data, GraphX for graph processing, and Structured Streaming for real-time stream processing.

One of Spark’s major advantages is its ability to process data in-memory, which significantly speeds up iterative algorithms and complex analytics tasks. Spark also provides distributed data storage through integration with Hadoop’s HDFS (Hadoop Distributed File System) and can work with a variety of data sources, including NoSQL databases, cloud storage, and relational databases. Its scalability allows it to handle datasets ranging from gigabytes to petabytes, making it a go-to solution for industries dealing with vast amounts of data.

How it works

To subscribe to this product from Azure Marketplace and initiate an instance using the Azure compute service, follow these steps:

1. Navigate to Azure Marketplace and subscribe to the desired product.
2. Search for “virtual machines” and select “Virtual machines” under Services.
3. Click on “Add” in the Virtual machines page, which will lead you to the Create a virtual machine page.
4. In the Basics tab:

- Ensure the correct subscription is chosen under Project details.
- Opt for creating a new resource group by selecting “Create new resource group” and name it as “myResourceGroup.”

5. Under Instance details:

- Enter “myVM” as the Virtual machine name.
- Choose “East US” as the Region.
- Select “Ubuntu 18.04 LTS” as the Image.
- Leave other settings as default.

6. For Administrator account:

- Pick “SSH public key.”
- Provide your user name and paste your public key, ensuring no leading or trailing white spaces.

7. Under Inbound port rules > Public inbound ports:

- Choose “Allow selected ports.”
- Select “SSH (22)” and “HTTP (80)” from the drop-down.

8. Keep the remaining settings at their defaults and click on “Review + create” at the bottom of the page.
9. The “Create a virtual machine” page will display the details of the VM you’re about to create. Once ready, click on “Create.”
10. The deployment process will take a few minutes. Once it’s finished, proceed to the next section.

To connect to the virtual machine:

1. Access the overview page of your VM and click on “Connect.”
2. On the “Connect to virtual machine” page:

- Keep the default options for connecting via IP address over port 22.
- A connection command for logging in will be displayed. Click the button to copy the command. Here’s an example of what the SSH connection command looks like:
  “`
  ssh azureuser@10.111.12.123
  “`

3. Using the same bash shell that you used to generate your SSH key pair, you can either reopen the Cloud Shell by selecting >_ again

or going to https://shell.azure.com/bash.
4. Paste the SSH connection command into the shell to initiate an SSH session.

Deployment

Usage/Deployment Instructions

Anarion Technologies – Apache Spark

Note: Search product on Azure marketplace and click on “Get it now”

Click on Continue

Click on Create

Creating a Virtual Machine, enter or select appropriate values for zone, machine type, resource group and so on as per your choice.

After Process of Create Virtual Machine. You have got an Option Go to Resource Group

Click Go to Resource Group

Copy the Public IP Address

Click on the Network Security Group: spark-nsg

Click on Inbound Security Rule

Click on Add

Add Port

Add Port

Destination Port Ranges Section* (where default value is 8080)

8080

Select Protocol as TCP

Option Action is to be Allow

Click on Add

Click on Refresh

Copy the Public IP Address

SSH into Terminal and Run these commands:

$ sudo su
$ apt update
$ cd ../../
$ cd opt/spark/

Start Spark: To start Spark in
standalone mode, run:

$ start-master.sh

In your browser, you can now access by navigating to the IP address of your server:

http://”Instance IP Address:8080

Apache Spark is used for fast, distributed data processing and analytics on large-scale datasets across clusters, enabling high-performance computations and real-time stream processing.

ThankYou!!!

Support

All your queries are important to us. Please feel free to connect. 24X7 support provided for all the customers. We are happy to help you.
Contact Number: +1 (415) 800-4585
Support E-mail: support@anariontech.com

Submit Your Request

Your name

Your email

Subject

Your message (optional)

Introduction

At Anarion, we are passionate about harnessing the full potential of cutting-edge Microsoft technologies to empower businesses, enhance productivity, and drive digital transformation.

Company

Home
About
Careers
Contact
Privacy Policy

Services

Sharepoint Services
Power Platform Services

Social media

Linked in
Twitter
Youtube

Give Us A Call

+1 (628) 800-7755

Send Us An Email

contact@anariontech.com

Schedule a Meeting

Home
Company
Technology
Services

Proudly powered by WordPress.