How to submit extra files(jars,static files) for MapReduce job during runtime ?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop How to submit extra files(jars,static files) for MapReduce job during runtime ?

Viewing 2 reply threads
  • Author
    • #5818
      DataFlair TeamDataFlair Team

      How to distribute extra files or data for a MapReduce job during runtime in Hadoop?

    • #5821
      DataFlair TeamDataFlair Team

      Hadoop MapReduce framework provides Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.
      First of all, an application which needs to use distributed cache to distribute a file should make sure that the files are available on URLs. Hence, URLs can be either hdfs:// or http://. Now, if the file is present on the hdfs:// or http://urls. Then, user mentions it to be cache file to distribute. This framework will copy the cache file on all the nodes before starting of tasks on those nodes. The files are only copied once per job. Applications should not modify those files.
      By default size of distributed cache is 10 GB. We can adjust the size of distributed cache using local.cache.size.

      To learn in detail about how to distribute the extra files in MapReduce job during run time follow:
      Distributed Cache

    • #5823
      DataFlair TeamDataFlair Team

      There are two ways to add jars and static files.

      1.Using Distributed cache
      2.Add jars at runtime -CLI

      Distributed cache:

      It is used when the applications need some cache files to run. The distributed cache can cache read-only files, jars, and archive.

      Hadoop Hadoop makes a cache in all data nodes of the respective files and made available when they are required to run the map and reduce jobs.

      Distributed cache is added in driver class as shown below

      DistributedCache.addFileToClasspath(new Path (“/user/dataflair/lib/jar-file.jar”), conf)

      Before adding in driver class, Copy the requisite file to the HDFS and make it available as shown:

      $ hdfs dfs-put/user/dataflair/lib/jar_file.jar

      In this way, we can add the extra files.

      2.Add jars at runtime using CLI:

      adding libraries using -libjars parameter on CLI.

      $ export LIBJARS=/path/jar1,/path/jar2
      $ hadoop jar /path/to/my.jar com.wordpress.hadoopi.MyClass -libjars ${LIBJARS} value

Viewing 2 reply threads
  • You must be logged in to reply to this topic.