Difference between map-side join and reduce side join in Hadoop?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 12:13 pm #4780
  
  DataFlair Team
  Spectator
  
  What is the difference between map-side join and reduce side join?
  Comparison between map-side join and reduce side join?
- September 20, 2018 at 12:14 pm #4781
  
  DataFlair Team
  Spectator
  
  Reduce join
  
  Repartitioned join or Repartitioned sort-merge join, all are other names of Reduce side join. Though it is mostly used join type. Basically, It Reduce Join have to go through the sort and shuffle phase which may incur network overhead. Moreover, it uses several terms like data source, tag, as well as the group key.
  
  Map Join
  It performs join before data reached to Map. Before joining data on the map side, Map function expects a strong prerequisite.
  
  Other names of Apache Hive Map Join are Auto Map Join, or Map Side Join, or Broadcast Join.
  
  In order to speed up the Hive queries, we can use Map Join in Hive. since one of the tables in the join is a small table and can be loaded into memory, Hive Map Side Join is used. Hence without using a Map/Reduce step, a join could be performed within a mapper.
  
  As a conclusion, On compare to reduce side, Map side join is efficient but it requires the strict format.
  
  However, learn more about Map Join in Hive in detail, follow the link: Map Join in Hive | Map Side Join
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.