What are the main configuration parameters in a MapReduce program?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What are the main configuration parameters in a MapReduce program?

Viewing 1 reply thread
  • Author
    Posts
    • #5990
      DataFlair TeamDataFlair Team
      Spectator

      What are configuration parameters in a MapReduce program?

    • #5991
      DataFlair TeamDataFlair Team
      Spectator

      If the above question is related to “in MapReduce Program”, then below is the answer to this question.

      In general, for MapReduce program, we have to take care the below parameters

      1) add the Input HDFS path to driver program. In general, this can be done using FileInputFormat class.
      2) add the Output HDFS path to driver program. In general, this can be done using FileOutputFormat class
      3) Set the InputFormat class if your data is other(Like sequencefile, ..etc) than TextInputFormat (by default framework will pick TextInputFormat if you don’t input format class)
      4) Set the OutputFormat class if you want to write other than text data to output path
      5) Set output key and values class. If map and reduce output key/value types are different, then you need to take care of setting map and reduce key and value class as part of job object using available methods.
      6) If you have written custom partitioner, comparator(Sorting, Grouping), combiner, then you need to set all these class in mapreduce driver program as part of job object using available methods.
      7) You need to take care about no of reducer. This should be decided based on your requirement and it also depends on partitioner
      8) You can set the Splitsize based on your requirements. Probably I may use CombineTextInputForrmat class and set the split size if input data is text format.
      9) And there are few more things need to be consider when you are working in mapreduce program, like runtime configuration/parameters which we can send as part of yarn command, compression enable, Usage Hadoop Core APIs in order to filter it (Lets say, I have given input files based on matched pattern (something like filtering before at driver level itself)

Viewing 1 reply thread
  • You must be logged in to reply to this topic.