Azure Cloud Platform

HDInsights

HDInsight is the distribution of the cloud of the components of Hadoop which comes from Horton Work’s data platform (HDP). The original open - source framework was Apache Hadoop for distributed processing and big data sets analysis on the cluster of computers.


Related software and utilities are included in Hadoop technology stack. It also includes Kafka, HBase, Apache Hive, Spark and much more.


Type of cluster in HDInsight:

Apache Hadoop - To process and analyze batch data in parallel it uses YARN resource management, HDFS and simple programming model called Map Reduce.


Apache Spark - A framework for parallel processing that boosts big data analysis application performance by the support of in-memory processing.


Apache HBase - A Hadoop build NoSQL database which provides strong consistency and random access to semi-structured and unstructured data.


Microsoft R Server - A server for parallel hosting and managing, R process distribution. It also enables you for, on-demand distributed method of analytics on HDInsight, access to scalable is provided to R programmers, data scientists, and statisticians.


Apache Storm - To process a large stream of data fast, a distributed, real-time computing system.


Apache Interactive Hive preview - Interactive and quick Hive queries it is an in-memory caching system.


Apache Kafka - It is used to build streaming data pipelines and apps. Open source.


HDInsight clusters and components:

  • Ambari - Monitoring, Management, Utilities and cluster provisioning.
  • Avro - At Microsoft .NET Environment is does data serialization.
  • Hive and HCatalog - Storage management layer, table management, SQL like queuing.
  • Mahout - learning application for scalable machines.
  • Map Reduce - A distributed processing framework at Hadoop and resource management.
  • Oozie - Management of workflow.
  • Phoenix - At HBase, it is a relational database layer.
  • Pig - For Map Reduce transformations, a simpler scripting.
  • Sqoop - Export, and import of data.
  • Tez - To run efficiently at the scale it allows the data-intensive process.
  • YARN - Management of Resource which is part of Hadoop core library.
  • Zookeeper - Management of different process in distributed systems.
Next

Must Read Article

How to create a VM instance in Azure

How to create a VM instance in Azure

In this Tutorial we are going to launch a Virtual Machine in azure.

How to install WordPress on VM instance in Azure

How to install WordPress on VM instance in Azure

Before you start installing WordPress on VM instance on Azure. You must log in into the machine. You can launch a VM Instance in Azure using this tutorial. After launch of machine log in into the machine using SSH. If you do not know how to SSH in VM instance you can follow this tutorial.

How to launch a VM instances in Google Cloud

How to launch a VM instances in Google Cloud

In this tutorial we are going to launch a Compute Egine instance.

How to install wordpress in VM instances in Google Cloud

How to install WordPress on VM instance in Google Cloud

Before you start installing WordPress on VM instance on Google Cloud. You must log in into the machine. You can launch a VM Instance on Google Cloud using this tutorial. After launch of machine log in into the machine using SSH. If you do not know how to SSH in VM instance you can follow this tutorial.

How To Launch EC2 Machine

How To Launch EC2 Machine

In this article we are going to learn how to launch a EC2 instance. For this you need to have AWS account.

How to install WordPress on ec2 machine

How to install WordPress on ec2 machine

Before you start installing WordPress on ec2 machine. You must log in into the machine.