Explain Map-side joins?

Viewing 1 reply thread
  • Author
    Posts
    • #5999
      DataFlair TeamDataFlair Team
      Spectator

      Explain Map-side joins in Hadoop?
      Discuss what is Map-side joins in MapReduce in detail?

    • #6001
      DataFlair TeamDataFlair Team
      Spectator

      Join is a operation where we combine two or more datasets based on column or a set of columns.

      In mapreduce if Joins performed by mapper then is called as map-side joins and
      if Joins performed by reducer then it is called as reduce-side joins.

      A map-side join between large inputs works by performing the join on the data and after that it
      reaches the map function.

      Map side join is more efficient to reduce side.

      Now lets understand with the help of example:-

      Suppose we have two datasets

      DS-1(Employees Working on Projects)

      ProjectID EmpID
      101 E-1
      101 E-2
      102 E-3
      102 E-4

      DS-2(Project Details)

      ProjectID ProjectName
      101 P1
      102 P2

      Now let assume we want to combine these datasets on the basis of Projectid and see all the project details and Employee details combined together like

      ProjectID ProjectName EmpID
      101 P1 E-1
      101 P1 E-2
      102 P2 E-3
      102 P2 E-4

      Now in map side join map operation will produce output result as :-

      MAP 101 P1 1
      102 P2 2

      MAP 103 P3 3
      104 P4 4
      And input data to the Map will be in following form so as to produce shown output result(above)

      101 P1
      101 E-1 MAP
      101 E-2

      102 P2
      102 E-3 MAP
      102 E-4

      Now we can infer the strict requirements which should be considered for Map-SideJoins

        <li style=”list-style-type: none”>
      1. all the input datasets should be sorted by the same key that should be the one based on which join is performed.In above ex. it is ProjectID.
      2. Each input datasets must be divided into the same number of partitions.
      3. All the records of the same key should be in the same partition.

      This map output will be used as input to reducer.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.