Difference between map-side join and reduce side join in Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Difference between map-side join and reduce side join in Hadoop?

Viewing 1 reply thread
  • Author
    Posts
    • #4780
      DataFlair TeamDataFlair Team
      Spectator

      What is the difference between map-side join and reduce side join?
      Comparison between map-side join and reduce side join?

    • #4781
      DataFlair TeamDataFlair Team
      Spectator

      Reduce join

      Repartitioned join or Repartitioned sort-merge join, all are other names of Reduce side join. Though it is mostly used join type. Basically, It Reduce Join have to go through the sort and shuffle phase which may incur network overhead. Moreover, it uses several terms like data source, tag, as well as the group key.

      Map Join
      It performs join before data reached to Map. Before joining data on the map side, Map function expects a strong prerequisite.

      Other names of Apache Hive Map Join are Auto Map Join, or Map Side Join, or Broadcast Join.

      In order to speed up the Hive queries, we can use Map Join in Hive. since one of the tables in the join is a small table and can be loaded into memory, Hive Map Side Join is used. Hence without using a Map/Reduce step, a join could be performed within a mapper.

      As a conclusion, On compare to reduce side, Map side join is efficient but it requires the strict format.

      However, learn more about Map Join in Hive in detail, follow the link: Map Join in Hive | Map Side Join

Viewing 1 reply thread
  • You must be logged in to reply to this topic.