What is Counter in Hadoop MapReduce?

Viewing 2 reply threads
  • Author
    Posts
    • #5611
      DataFlair TeamDataFlair Team
      Spectator

      What is Counter? What are its types in Hadoop MapReduce?
      What is the need of Counters in Hadoop?

    • #5612
      DataFlair TeamDataFlair Team
      Spectator

      Suppose you have the dataset of consumer’s complaint. The dataset consists of complaint number, date, category, sub-category, complaint location, description etc. You want to monitor the “delivery issue”, “return issue”, “refund issues” parallely then it is very challenging and so Hadoop Counter comes in the picture.

      Counters in Hadoop are used to keep track of occurrences of events. In Hadoop, whenever any job gets executed, Hadoop Framework initiates Counter to keep track of job statistics like the number of bytes read, the number of rows read, the number of rows written etc.

      There are two types of Counters:

      1) Built-in Counters: There are some built-in Counters in Hadoop for every job. For example their Counters for the number of bytes and records processed, which allow us to confirm that the expected amount of input was processed and expected output amount of output was produced etc.

      They are of Three types:

      MapReduce Task Counters: This counter collects tasks information over the course of their execution. Then aggregate results over all the tasks.
      File System Counter: File system counters track 2 main details, the number of bytes read by the file system and a number of bytes written.
      Job Counter: Jobtracker (or application master in YARN) maintain Job Counters, so they don’t need to be sent across the network. They don’t have the values which are changing during the execution of the job.
      2) Custom Counters: In the case if we want to have track any kind of statistics about the records written as logic in Mapper and Reducers. Then custom counters come into the picture. Another use of custom counters is in the debugging process, where it can be also used to find the number of Bad Records.

      Follow the link to learn more about Counters in Hadoop

    • #5613
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop Counter is a way in which we can measure the progress of the MapReduce Program.

      Why would we need the counters for?
      The answer in simple, to investigate if there is any issue in the MapReduce job that has completed.
      Counters are a way in which we can conclude that expected amount of input was consumed to produce expected amount of output.
      There are three types of counters in Hadoop:
      1) Hadoop Built-In counters: These are defined in the MapReduce program.
      2) User-Defined Java Counters: Users can define the counter in the Java code.
      3) User-Defined Streaming Counters: These are used with MapReduce Streaming programs.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.