Explain Distributed Cache in Apache Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Explain Distributed Cache in Apache Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #6302
      DataFlair TeamDataFlair Team
      Spectator

      What is Distributed Cache in Hadoop?
      What is the need of distributed cache in Hadoop?

    • #6303
      DataFlair TeamDataFlair Team
      Spectator

      In Hadoop, data chunks process independently in parallel among DataNodes, using a program written by the user. If we want to access some files from all the DataNodes, then we will put that file to Distributed Cache.

      MapReduce framework provides a service called Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.

      It saves lots of task and I/O operations e.g sometimes it is necessary for every Mapper to read a single file.

      Follow the link for more detail: Distributed Cache.

    • #6304
      DataFlair TeamDataFlair Team
      Spectator

      For some MapReduce applications some files need to be shared across all the worker nodes where the specific application’s map/reduce jobs are running. MapReduce Framework provides this facility by Distributed Cache. . It will distribute copies of the necessary files to slave nodes where the map/reducer for the specific job is running before the execution of any job.

      Distributed cache can cache simple read only text files, archives, jars etc. The advantage of Distributed cache is it reduces the network traffic because the files are copied only once per job.

      After the successful completion of the job, distributed cache will be deleted from the worker nodes.

      For more detail follow: Distributed Cache in Hadoop.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.