The most well-renowned technology used in the field of big data is Hadoop. It is a large-scale batch data processing system. Hadoop is most commonly preferred because it is a distributed cluster system and it enables parallel data processing. It is a platform for massively scalable applications. Michael J Cafarella and Doug Cutting developed a Hadoop technology.
Many well-known companies like facebook, apple, google, hp and much more use Hadoop. The optimum use of Hadoop is because of its features as if it provides access to the file system, the package provides source code, documentation and a contribution section that includes projects from the Hadoop community.
The architecture of Hadoop includes Hadoop Distributed File System (HDFS), Pig (Data Flow), Hive (SQL), Sqoop, Map Reduce (Job Scheduling/Execution System), ETL Tools, BI Reporting, RDBMS, Zookeeper (coordination), Avro (Serialization). However, when we talk about main components then Hadoop Distributive File System and Map Reduce are the two core components.
Now let us know more about Hadoop Distributed File System.
It is a traditional hierarchical file organization. Here we can write only once but can read multiple numbers of times. This distributed file system is aware of the network topology. Now when we are studying in detail about HDFS then how can we forget the architecture of it? The architecture of HDFS includes Data Node, Task Tracker, Name Node and Job Tracker is Master node. Whereas, Data Node and Task Tracker are Slave nodes.
Hadoop Map Reduce is implemented on Job Tracker and Task Tracker. In a job, tracker map reduce splits it into a map and reduce task scheduling tasks on a cluster node while in the task tracker map reduces runs map and reduce tasks periodically.
Hadoop comes with numerous benefits such as:
• Cost saving and reliable data processing.
• Provides an economically scalable solution.
• Data grid operating system
• Storing and processing a large amount of data.