How to submit extra files(jars,static files) for MapReduce job during runtime ?

This topic has 2 replies, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 2 reply threads

Author

Posts
- September 20, 2018 at 4:17 pm #5818
  
  DataFlair Team
  Spectator
  
  How to distribute extra files or data for a MapReduce job during runtime in Hadoop?
- September 20, 2018 at 4:17 pm #5821
  
  DataFlair Team
  Spectator
  
  Hadoop MapReduce framework provides Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.
  First of all, an application which needs to use distributed cache to distribute a file should make sure that the files are available on URLs. Hence, URLs can be either hdfs:// or http://. Now, if the file is present on the hdfs:// or http://urls. Then, user mentions it to be cache file to distribute. This framework will copy the cache file on all the nodes before starting of tasks on those nodes. The files are only copied once per job. Applications should not modify those files.
  By default size of distributed cache is 10 GB. We can adjust the size of distributed cache using local.cache.size.
  
  To learn in detail about how to distribute the extra files in MapReduce job during run time follow:
  Distributed Cache
- September 20, 2018 at 4:18 pm #5823
  
  DataFlair Team
  Spectator
  
  There are two ways to add jars and static files.
  
  1.Using Distributed cache
  2.Add jars at runtime -CLI
  
  Distributed cache:
  
  It is used when the applications need some cache files to run. The distributed cache can cache read-only files, jars, and archive.
  
  Hadoop Hadoop makes a cache in all data nodes of the respective files and made available when they are required to run the map and reduce jobs.
  
  Distributed cache is added in driver class as shown below
  
  DistributedCache.addFileToClasspath(new Path (“/user/dataflair/lib/jar-file.jar”), conf)
  
  Before adding in driver class, Copy the requisite file to the HDFS and make it available as shown:
  
  $ hdfs dfs-put/user/dataflair/lib/jar_file.jar
  
  In this way, we can add the extra files.
  
  2.Add jars at runtime using CLI:
  
  adding libraries using -libjars parameter on CLI.
  
  $ export LIBJARS=/path/jar1,/path/jar2
  $ hadoop jar /path/to/my.jar com.wordpress.hadoopi.MyClass -libjars ${LIBJARS} value
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

How to submit extra files(jars,static files) for MapReduce job during runtime ?

About DataFlair

Trending Courses in Indore

Trending Courses in Bangalore

Trending Courses in Chennai

Trending Courses in Pune

Trending Courses in Hyderabad

Trending Courses in Delhi NCR