What are the main configuration parameters in a MapReduce program?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:54 pm #5990
  
  DataFlair Team
  Spectator
  
  What are configuration parameters in a MapReduce program?
- September 20, 2018 at 4:54 pm #5991
  
  DataFlair Team
  Spectator
  
  If the above question is related to “in MapReduce Program”, then below is the answer to this question.
  
  In general, for MapReduce program, we have to take care the below parameters
  
  1) add the Input HDFS path to driver program. In general, this can be done using FileInputFormat class.
  2) add the Output HDFS path to driver program. In general, this can be done using FileOutputFormat class
  3) Set the InputFormat class if your data is other(Like sequencefile, ..etc) than TextInputFormat (by default framework will pick TextInputFormat if you don’t input format class)
  4) Set the OutputFormat class if you want to write other than text data to output path
  5) Set output key and values class. If map and reduce output key/value types are different, then you need to take care of setting map and reduce key and value class as part of job object using available methods.
  6) If you have written custom partitioner, comparator(Sorting, Grouping), combiner, then you need to set all these class in mapreduce driver program as part of job object using available methods.
  7) You need to take care about no of reducer. This should be decided based on your requirement and it also depends on partitioner
  8) You can set the Splitsize based on your requirements. Probably I may use CombineTextInputForrmat class and set the split size if input data is text format.
  9) And there are few more things need to be consider when you are working in mapreduce program, like runtime configuration/parameters which we can send as part of yarn command, compression enable, Usage Hadoop Core APIs in order to filter it (Lets say, I have given input files based on matched pattern (something like filtering before at driver level itself)
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What are the main configuration parameters in a MapReduce program?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses