Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › From where Distributed cache pick up the file?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 12:38 pm #4890DataFlair TeamSpectator
From where Distributed cache pick up the file?
From Local file file system or HDFS… -
September 20, 2018 at 12:39 pm #4891DataFlair TeamSpectator
Basically, a facility provided by the Hadoop MapReduce framework is what we call Distributed cache. So, when applications need, it caches files. Although, it can cache read-only text files, archives, jar files and many more. As soon as we have cached a file for our job, at that time only Hadoop will make it available on each datanodes especially where map/reduce tasks are running.
Hence, it is possible to access files from all the datanodes in our map and reduce job.
– Working and Implementation of Distributed Cache in Hadoop
An application which needs to use distributed cache to distribute a file:
i. Must ensure that the file is available.
ii. Also, ensure that file can be accessed via urls. Urls can be either hdfs: // or http://.Further, the user mentions the file to be a cache file to the distributed cache, if the file is present on the above urls.
Its Process is:
i. At first, copy the requisite file to the HDFS:
$ hdfs dfs-put/user/dataflair/lib/jar_file.jarii.Then setup the application’s JobConf:
DistributedCache.addFileToClasspath(new Path (“/user/dataflair/lib/jar-file.jar”), conf)iii. And, further, add it in Driver class.
Learn more about the Distributed cache, follow the link: Distributed Cache in Hadoop: Most Comprehensive Guide
-
-
AuthorPosts
- You must be logged in to reply to this topic.