Explain Map-side joins?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:55 pm #5999
  
  DataFlair Team
  Spectator
  
  Explain Map-side joins in Hadoop?
  Discuss what is Map-side joins in MapReduce in detail?
- September 20, 2018 at 4:55 pm #6001
  DataFlair Team
  Spectator
  Join is a operation where we combine two or more datasets based on column or a set of columns.
  
  In mapreduce if Joins performed by mapper then is called as map-side joins and
  if Joins performed by reducer then it is called as reduce-side joins.
  
  A map-side join between large inputs works by performing the join on the data and after that it
  reaches the map function.
  
  Map side join is more efficient to reduce side.
  
  Now lets understand with the help of example:-
  
  Suppose we have two datasets
  
  DS-1(Employees Working on Projects)
  
  ProjectID EmpID
  101 E-1
  101 E-2
  102 E-3
  102 E-4
  
  DS-2(Project Details)
  
  ProjectID ProjectName
  101 P1
  102 P2
  
  Now let assume we want to combine these datasets on the basis of Projectid and see all the project details and Employee details combined together like
  
  ProjectID ProjectName EmpID
  101 P1 E-1
  101 P1 E-2
  102 P2 E-3
  102 P2 E-4
  
  Now in map side join map operation will produce output result as :-
  
  MAP 101 P1 1
  102 P2 2
  
  MAP 103 P3 3
  104 P4 4
  And input data to the Map will be in following form so as to produce shown output result(above)
  
  101 P1
  101 E-1 MAP
  101 E-2
  
  102 P2
  102 E-3 MAP
  102 E-4
  
  Now we can infer the strict requirements which should be considered for Map-SideJoins
  1. all the input datasets should be sorted by the same key that should be the one based on which join is performed.In above ex. it is ProjectID.
  2. Each input datasets must be divided into the same number of partitions.
  3. All the records of the same key should be in the same partition.
  This map output will be used as input to reducer.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Explain Map-side joins?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses