Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Explain Distributed Cache in Apache Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:40 pm #6302DataFlair TeamSpectator
What is Distributed Cache in Hadoop?
What is the need of distributed cache in Hadoop? -
September 20, 2018 at 5:40 pm #6303DataFlair TeamSpectator
In Hadoop, data chunks process independently in parallel among DataNodes, using a program written by the user. If we want to access some files from all the DataNodes, then we will put that file to Distributed Cache.
MapReduce framework provides a service called Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.
It saves lots of task and I/O operations e.g sometimes it is necessary for every Mapper to read a single file.
Follow the link for more detail: Distributed Cache.
-
September 20, 2018 at 5:41 pm #6304DataFlair TeamSpectator
For some MapReduce applications some files need to be shared across all the worker nodes where the specific application’s map/reduce jobs are running. MapReduce Framework provides this facility by Distributed Cache. . It will distribute copies of the necessary files to slave nodes where the map/reducer for the specific job is running before the execution of any job.
Distributed cache can cache simple read only text files, archives, jars etc. The advantage of Distributed cache is it reduces the network traffic because the files are copied only once per job.
After the successful completion of the job, distributed cache will be deleted from the worker nodes.
For more detail follow: Distributed Cache in Hadoop.
-
-
AuthorPosts
- You must be logged in to reply to this topic.