Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop How to submit extra files(jars,static files) for MapReduce job during runtime ?

This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5818

    dfbdteam3
    Moderator

    How to distribute extra files or data for a MapReduce job during runtime in Hadoop?

    #5821

    dfbdteam3
    Moderator

    Hadoop MapReduce framework provides Distributed Cache to caches files needed by the applications. It can cache read-only text files, archives, jar files etc.
    First of all, an application which needs to use distributed cache to distribute a file should make sure that the files are available on URLs. Hence, URLs can be either hdfs:// or http://. Now, if the file is present on the hdfs:// or http://urls. Then, user mentions it to be cache file to distribute. This framework will copy the cache file on all the nodes before starting of tasks on those nodes. The files are only copied once per job. Applications should not modify those files.
    By default size of distributed cache is 10 GB. We can adjust the size of distributed cache using local.cache.size.

    To learn in detail about how to distribute the extra files in MapReduce job during run time follow:
    Distributed Cache

    #5823

    dfbdteam3
    Moderator

    There are two ways to add jars and static files.

    1.Using Distributed cache
    2.Add jars at runtime -CLI

    Distributed cache:

    It is used when the applications need some cache files to run. The distributed cache can cache read-only files, jars, and archive.

    Hadoop Hadoop makes a cache in all data nodes of the respective files and made available when they are required to run the map and reduce jobs.

    Distributed cache is added in driver class as shown below

    DistributedCache.addFileToClasspath(new Path (“/user/dataflair/lib/jar-file.jar”), conf)

    Before adding in driver class, Copy the requisite file to the HDFS and make it available as shown:

    $ hdfs dfs-put/user/dataflair/lib/jar_file.jar

    In this way, we can add the extra files.

    2.Add jars at runtime using CLI:

    adding libraries using -libjars parameter on CLI.

    $ export LIBJARS=/path/jar1,/path/jar2
    $ hadoop jar /path/to/my.jar com.wordpress.hadoopi.MyClass -libjars ${LIBJARS} value

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.