From where Distributed cache pick up the file?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop From where Distributed cache pick up the file?

Viewing 1 reply thread
  • Author
    Posts
    • #4890
      DataFlair TeamDataFlair Team
      Spectator

      From where Distributed cache pick up the file?
      From Local file file system or HDFS…

    • #4891
      DataFlair TeamDataFlair Team
      Spectator

      Basically, a facility provided by the Hadoop MapReduce framework is what we call Distributed cache. So, when applications need, it caches files. Although, it can cache read-only text files, archives, jar files and many more. As soon as we have cached a file for our job, at that time only Hadoop will make it available on each datanodes especially where map/reduce tasks are running.

      Hence, it is possible to access files from all the datanodes in our map and reduce job.

      – Working and Implementation of Distributed Cache in Hadoop

      An application which needs to use distributed cache to distribute a file:

      i. Must ensure that the file is available.
      ii. Also, ensure that file can be accessed via urls. Urls can be either hdfs: // or http://.

      Further, the user mentions the file to be a cache file to the distributed cache, if the file is present on the above urls.

      Its Process is:

      i. At first, copy the requisite file to the HDFS:
      $ hdfs dfs-put/user/dataflair/lib/jar_file.jar

      ii.Then setup the application’s JobConf:
      DistributedCache.addFileToClasspath(new Path (“/user/dataflair/lib/jar-file.jar”), conf)

      iii. And, further, add it in Driver class.

      Learn more about the Distributed cache, follow the link: Distributed Cache in Hadoop: Most Comprehensive Guide

Viewing 1 reply thread
  • You must be logged in to reply to this topic.