What is a “Distributed Cache” in Apache Hadoop?

This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 3:10 pm #5433
  
  DataFlair Team
  Spectator
  
  Explain the “Distributed Cache” in MapReduce Framework?
  What is the need of distributed cache in Hadoop?
- September 20, 2018 at 3:11 pm #5434
  
  DataFlair Team
  Spectator
  
  Distributed Cache is a facility provided by the Map-Reduce framework to cache small files(kilobytes or few megabytes in size) needed by application.The files can be jars, text, archives etc.
  Once you cache a file for your job, Hadoop framework will make it available on each and every data nodes (in file system, not in memory) where you map/reduce tasks are running. Thus, we can access files from all the datanode in our map/reduce job.
  We can control the size of the distributed cache size property in mapped-site.xml.
  
  The Benefit of using distributed cache is it minimizes network data transfer. It also tracks the modification time stamp of cache files.and notifies that the files should not be changed until the job is executing.
  
  Follow the link to learn more about DistributedCache in Hadoop
- September 20, 2018 at 3:11 pm #5436
  
  DataFlair Team
  Spectator
  
  DistributedCache is a mechanism supported by Map-Reduce framework where some files to be shared across all data nodes in Hadoop Cluster to use them when map/reduce tasks are running. It can be simple properties file or can be executable jar file.
  These files are stored locally on every Data node.The distributed cache can contain small data files .
  After successful run of the job, the distributed cache files (these are temporary files) will be deleted from Slave nodes.
  
  By default, cache size is 10GB. If you want more memory to configure local.cache.size in mapred-site.xml .
  
  Follow the link to learn more about DistributedCache in Hadoop
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

What is a “Distributed Cache” in Apache Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses