Hadoop VM by Anarion Technologies
Hadoop is a robust, open-source framework designed to handle large-scale data processing across distributed computing environments. At its core, Hadoop consists of two main components: the Hadoop Distributed File System (HDFS) and the MapReduce programming model. HDFS is a scalable, fault-tolerant file system that provides high-throughput access to data by distributing it across a cluster of computers. This distributed nature ensures that data is replicated across multiple nodes, making it resilient to hardware failures and enabling efficient data retrieval.
The MapReduce programming model allows for the parallel processing of vast amounts of data by breaking down tasks into smaller, manageable pieces. In this model, data processing tasks are divided into “Map” tasks, which process data in parallel, and “Reduce” tasks, which aggregate the results from the Map tasks. This approach optimizes performance and scalability, allowing Hadoop to handle complex data processing and analysis at a scale that traditional systems cannot achieve.
Hadoop is particularly well-suited for big data analytics, handling both structured and unstructured data with ease. It supports a wide range of data processing and analysis tasks, from simple aggregations to complex machine learning algorithms. Its ability to scale from a single server to thousands of machines makes it a versatile tool for organizations looking to leverage large datasets for insights and decision-making. Whether used in a small-scale environment or across vast clusters, Hadoop provides the necessary infrastructure for processing and analyzing large volumes of data efficiently and effectively.
To subscribe to this product from Azure Marketplace and initiate an instance using the Azure compute service, follow these steps:
1. Navigate to Azure Marketplace and subscribe to the desired product.
2. Search for “virtual machines” and select “Virtual machines” under Services.
3. Click on “Add” in the Virtual machines page, which will lead you to the Create a virtual machine page.
4. In the Basics tab:
- Ensure the correct subscription is chosen under Project details.
- Opt for creating a new resource group by selecting “Create new resource group” and name it as “myResourceGroup.”
5. Under Instance details:
- Enter “myVM” as the Virtual machine name.
- Choose “East US” as the Region.
- Select “Ubuntu 18.04 LTS” as the Image.
- Leave other settings as default.
6. For Administrator account:
- Pick “SSH public key.”
- Provide your user name and paste your public key, ensuring no leading or trailing white spaces.
7. Under Inbound port rules > Public inbound ports:
- Choose “Allow selected ports.”
- Select “SSH (22)” and “HTTP (80)” from the drop-down.
8. Keep the remaining settings at their defaults and click on “Review + create” at the bottom of the page.
9. The “Create a virtual machine” page will display the details of the VM you’re about to create. Once ready, click on “Create.”
10. The deployment process will take a few minutes. Once it’s finished, proceed to the next section.
To connect to the virtual machine:
1. Access the overview page of your VM and click on “Connect.”
2. On the “Connect to virtual machine” page:
- Keep the default options for connecting via IP address over port 22.
- A connection command for logging in will be displayed. Click the button to copy the command. Here’s an example of what the SSH connection command looks like:
“`
ssh [email protected]
“`
3. Using the same bash shell that you used to generate your SSH key pair, you can either reopen the Cloud Shell by selecting >_ again
or going to https://shell.azure.com/bash.
4. Paste the SSH connection command into the shell to initiate an SSH session.
Usage/Deployment Instructions
Anarion Technologies – Hadoop
Note: Search product on Azure marketplace and click on “Get it now”
Click on Continue
Click on Create
Creating a Virtual Machine, enter or select appropriate values for zone, machine type, resource group and so on as per your choice.
After Process of Create Virtual Machine. You have got an Option Go to Resource Group Click Go to Resource Group
Click on the Network Security Group: hadoop-nsg
Click on Inbound Security Rule
Click on Add
Add Port
Add Port
Destination Port Ranges Section* (where default value is 8080)
9870, 9864 and 8088
Select Protocol as TCP Option
Action is to be Allow
Click on Add
Click on Refresh
SSH in Terminal and run these following commands:
$ sudo su
$ sudo apt update
$ nano /etc/shadow
Uncomment the hdoop line (Check from the bottom)
#hdoop (for example)
Save and exit from file
You are free the use any username and password you see fit. Switch to the newly created user and enter the corresponding password:
$ su – hdoop Password: admin
The user now needs to be able to SSH to the localhost without being prompted for a password.
$ ssh localhost
Start HadoopCluster
$ ./start-dfs.sh
Once the namenode, datanodes, and secondary namenode are up and running, start the YARN resource and nodemanagers by typing:
$ ./start-yarn.sh
Type this simple command to check if all the daemons are active and running as Java processes:
$ jps
Copy the Public IP Address
Use the browser to access the application at http://”instance ip address:8088″
Use the browser to access the application at http://”instance ip address:9870″
Use the browser to access the application at http://”instance ip address:9864″
ThankYou!!!