What is Data Locality ?

Viewing 3 reply threads
  • Author
    Posts
    • #5846
      DataFlair TeamDataFlair Team
      Spectator

      What is Data Locality ? What purpose it serves ?

    • #5847
      DataFlair TeamDataFlair Team
      Spectator

      A computation requested by application is efficient when it is executed closer to data.

      But in case of Hadoop which works with huge volume of data, when we submit MapReduce job against these data,
      which is spread across the cluster, needs to be copied to the node where job is being executed.
      This would result in network bottleneck and also time consuming process.

      Hadoop’s solution to this network bottleneck and to execute the computations parallely; is to move the algorithm closer to the data rather than data to algorithm. This is called as data locality.

      Follow the link to learn more about: Data Locality in Hadoop

    • #5849
      DataFlair TeamDataFlair Team
      Spectator

      Data Locality is one of the Hadoop feature to improve the MapReduce job performance. It moves computation close to the data rather than data to computation.

      Any computation is much more efficient if it’s executed near the data, especially when the size of data is very huge. As Hadoop works on large volume of datasets so it’s not feasible to move such large volume of data over the network. This feature of Hadoop moving code logics to data instead of data to code called data locality.

      Follow the link to learn more about: Data Locality in Hadoop

    • #5851
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop is specifically designed to solve Big Data Problems, so it will have to deal with larger amounts of data, so here it is not feasible to move such larger datasets towards computation. So Hadoop provides this feature called as Data Locality.
      Here the Computation is moved towards the data so due to this there is no network congestion.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.