Explain Data Locality in Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #5683
      DataFlair TeamDataFlair Team
      Spectator

      What does the term ‘Data Locality’ mean in Hadoop?
      What is Data locality? What is need of Data Locality in Hadoop MapReduce?

    • #5685
      DataFlair TeamDataFlair Team
      Spectator

      What does the term Data Locality mean in Hadoop?
      Data Locality is one of the design principal of Hadoop. As per this, data movement if prohibited. Instead, computation code will be moving towards Data Node, performing Data processing and writing output to HDFS.

      What is Data locality? What is need of Data Locality in Hadoop MapReduce?
      Data Locality ensures that MapReduce task is moved to Data Node for performing required processing. This ensures small sized computation code(KBs) is moved across the network rather than huge size data(GBs, TBS) in turn better utilization of network resources and time required for performing specific Map reduce task.

      Follow the link to learn more about Data Locality in Hadoop

    • #5687
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop works on huge volume of data so it is not feasible to move such volume over the network.

      Hadoop has come up with the most innovative principle of moving algorithm to data rather than data to algorithm. This is called Data Locality.

      So whenever any MapReduce job is invoked, the logic usually goes to the data for further computation rather then moving data to the MapReduce job.

      Fortunately, having map code executing on the node where the data resides significantly reduces this problem

Viewing 2 reply threads
  • You must be logged in to reply to this topic.