The reduce-only job is not possible. If we view the internal flow of data movement from local HDFS store to Mapper, the OOTB components namely InputFormat,InputSpilt and RecordReader are getting executed in a sequential manner to provide input data as key-value pair to Mapper first.
Reducer takes a set of an intermediate key-value pair produced by the mapper as the input. After that, it runs a reduce function on each of them to generate the output. Thus the output of the reducer is the final output, which it stored in HDFS. Usually, in the reducer, we do aggregation or summation sort of computation.
Reducer has three primary phases-
Shuffle- In this phase, for each reducer hadoop framework collects the relevant partition of the output of all the Mappers by HTTP.
Sort- The framework groups Reducers inputs by the key in this Phase.
Shuffle/sorting phases occur simultaneously.
Reduce- After shuffling and sorting, reduce task aggregates the key-value pairs. In this phase, call the reduce (Object, Iterator, OutputCollector, Reporter) method for each <key, (list of values)> pair in the grouped inputs.
So if we try to write the reduce-only job, the above steps have to be omitted which Hadoop won’t allow.
No, Reduce Only job can’t exist because the reducer’s input is the intermediate data in the form of key-value pair from the Mapper. Since the source of it’s input is the mapper output and the reducer produces final output as aggregation/summation sort of function occurs here, the map-reduce function is incomplete without reducer.