Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is a “Distributed Cache” in Apache Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:10 pm #5433DataFlair TeamSpectator
Explain the “Distributed Cache” in MapReduce Framework?
What is the need of distributed cache in Hadoop? -
September 20, 2018 at 3:11 pm #5434DataFlair TeamSpectator
Distributed Cache is a facility provided by the Map-Reduce framework to cache small files(kilobytes or few megabytes in size) needed by application.The files can be jars, text, archives etc.
Once you cache a file for your job, Hadoop framework will make it available on each and every data nodes (in file system, not in memory) where you map/reduce tasks are running. Thus, we can access files from all the datanode in our map/reduce job.
We can control the size of the distributed cache size property in mapped-site.xml.The Benefit of using distributed cache is it minimizes network data transfer. It also tracks the modification time stamp of cache files.and notifies that the files should not be changed until the job is executing.
Follow the link to learn more about DistributedCache in Hadoop
-
September 20, 2018 at 3:11 pm #5436DataFlair TeamSpectator
DistributedCache is a mechanism supported by Map-Reduce framework where some files to be shared across all data nodes in Hadoop Cluster to use them when map/reduce tasks are running. It can be simple properties file or can be executable jar file.
These files are stored locally on every Data node.The distributed cache can contain small data files .
After successful run of the job, the distributed cache files (these are temporary files) will be deleted from Slave nodes.By default, cache size is 10GB. If you want more memory to configure local.cache.size in mapred-site.xml .
Follow the link to learn more about DistributedCache in Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.