How to decide input split?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:39 pm #4892
  
  DataFlair Team
  Spectator
  
  how input splits decided ? on what basis input splits are created?
- September 20, 2018 at 12:39 pm #4893
  
  DataFlair Team
  Spectator
  
  InputSplit in Hadoop
  
  In Hadoop MapReduce, InputSplit is the logical representation of data. Basically, it represents a unit of work which consists of single map task in a MapReduce program.
  
  Moreover, the data which is processed by an individual Mapper, Hadoop InputSplit represents it. Basically, the split is divided into several records. So, each record is processed by the mapper (which is a key-value pair).
  
  Furthermore, there are storage locations (hostname strings) of every InputSplit and in order to place map tasks as close to split’s data as possible, MapReduce systems use those storage locations.
  
  InputSplits created by an InputFormat (InputFormat creates the Inputsplit and divide into records), that means we don’t need to deal with InputSplit directly, as a user. By default, fileInputFormat, breaks a file into 128MB chunks (same as blocks in HDFS). So, we can control this value or by overriding the parameter in the Job object used to submit a particular MapReduce job, by setting mapred.min.split.size parameter in mapred-site.xml. Also, by writing a custom InputFormat, we can control how the file is broken up into splits.
  
  Learn more about Inputspli, follow the link: InputSplit in Hadoop MapReduce
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

How to decide input split?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses