Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Hadoop In which kind of scenarios MapReduce jobs will be more useful than PIG?

This topic contains 2 replies, has 1 voice, and was last updated by  dfbdteam3 1 year, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #5684

    dfbdteam3
    Moderator

    Are there any problems which can only be solved by MapReduce and cannot be solved by PIG?
    What are some problems which can only be solved by MapReduce and cannot be solved by PIG?

    #5686

    dfbdteam3
    Moderator

    MapReduce is a powerful programming model based on the principle parallel processing or computation of data. Hadoop MapReduce gives the programmers the ability to filter and aggregate data from HDFS to gain business insights from big data. MapReduce programming can be implemented using many conventional programming languages like Java, Python, C etc.

    On the other hand, Apache Pig is a platform for analyzing large data sets containing high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. It gives ease of programming to the developers by enabling complex programmatical challenges to be written in simple data flow sequence and less complex textual language.

    Most of the jobs can be run using Pig and Hive but to make use of the advanced application programming interfaces, developers may look up to MapReduce alternatives. In certain situations we need MapReduce alternative over Pig like below:

    1) When Hadoop developers need definite driver program control then they should make use of Hadoop MapReduce instead of Pig and Hive.
    2) When Hadoop developer needs implementing a custom partitioner they choose MapReduce over Pig and Hive.
    3) If there already exists pre-defined library of Java Mappers or for a job then it is a wise decision to use Hadoop MapReduce instead of Pig and Hive.
    4) Hadoop MapReduce can prove to be a better coding approach over Pig and Hive if the job requires optimization at a particular stage of processing by making the best use of tricks like in-mapper combining.
    5) If the job has some tricky usage of Distributed cache (replicated join), cross products, groupings or joins then Hadoop MapReduce is a better programming approach over Pig

    #5688

    dfbdteam3
    Moderator

    Apache Pig is a high-level platform for creating programs that runs on Apache Hadoop. The language for this platform
    is called Pig Latin.
    Pig was initially developed at Yahoo to allow people using Apache Hadoop to focus more on analyzing large data sets
    and spend less time having to write mapper and reducer programs.

    Scenarios where MapReduce jobs will be more useful than PIG-:

    1)You cannot do complicated operations using PIG. For example, when output of one job acts as input to the other
    job (SequenceFileFormat file) or writing query on an image file, PIG is not useful.

    2)PIG is useful only if the data is structured. With unstructured data,
    PIG is not a good tool while with Map Reduce you can work on any kind of dataset.

    3)Debugging code is very difficult in PIG while with MR, user gets debugging facility of eclipse.

    4)For good amount of testability when combining lots of large data sets then they should use MapReduce
    instead of Pig.

    5)If there already exists pre-defined library of Java Mappers or Reducers for a job then its a good option
    to use Hadoop MapReduce instead of Pig.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.