How SAS Merge Datasets – Joining / Combining Data Sets in SAS

In the last article, we studied Proc Sort Data Set, today we will be learning about how SAS Merge Datasets and how to merge two or more datasets in SAS. Moreover, we will discuss different variable and the ways for merging datasets in SAS Programming language and some SAS Merge Datasets examples to clear our queries.

Let’s start with SAS Merge Datasets.

SAS Merge Datasets

Keeping you updated with latest technology trends, Join DataFlair on Telegram

1. What is SAS Merge?

i. SAS Merging combines observations from two or more SAS datasets based on the values of specified common variables (SAS merges more than 2 Datasets).
ii. SAS Merging creates a new data set (the merged dataset).
iii. It is done in a data step with the statements.

  • MERGE is used to name the input data sets.
  • BY is used to name the common variable(s) to be used for matching

Prerequisites for a match-merge in SAS Programming.

  • Input data sets must have at least one common variable to merge on.

2. Examples of SAS Merge Datasets

When we want to combine two data sets by merging in SAS and we know beforehand that each observation in the data set has a match in the other dataset we can do a very straight-forward merge. Here are a couple of examples.

SAS Merge Datasets Example:

Often different data on the same cases are stored in two or more different data sets. For example, you may have two person-level data sets on exactly the same individuals but containing different information on those individuals. To combine the data on those individuals into one data set requires a merge. Here is the way the merge would work using the simplest example:

Dataset one            Dataset two
ID   A   B                           ID   C
10   1   2                           10   0
20   3   4                           20   5
30   5   6                           30   7

This data step does a merge of data set one and two by ID:

  data three;
          merge one two;
          by id;
        run;

The output data set three looks like this:

ID    A   B   C
10    1   2   0
20    3   4   5
30    5   6   7

The data sets being merged in the example above contain different data on the same cases and the variable ID is a unique identifier (no duplicates).

Must Read: SAS Array – Basic Syntax & Operators

3. When Matching is Not Perfect

What happens when there is no match for SAS Merge Datasets?

Here is a simple example:

Dataset one         Dataset two
ID   A    B                    ID   C
10   1    2                   10   0
20   3    4                   30   1
30   5    6                   40   1

The second observation (ID=20) in data set one does not have a match on ID in data set two. A match-merge of one and two by ID like this:

  data three;
           merge one two;
           by id;
         run;

Results in an output data set three which looks like this:

ID   A   B     C
10   1   2     0    both datasets contributed to this observation
20   3   4     .     only data set one (left-hand dataset) contributed to this observation
30   5   6     1    both datasets contributed to this observation
40   .   .       1    only data set two (right-hand dataset) contributed to this observation

Explore the Steps of SAS Program Structure

4. IN= Variables

What if you want to keep in the output data set of a merge only the matches? (only those observations to which both input datasets contribute). SAS will set up for you special temporary variables, called the “IN=” variables so that you can do this and more.

Here’s what you have to do:

  • Signal to SAS on the MERGE statement that you need the IN= variables for the input data set(s).
  • Use the IN= variables in the data step appropriately.

So, to keep only the matches in the match-merge above, ask for the IN= variables and use them:

data  three;
           merge one(in=x) two(in=y);       /* x & y are your choices of names */
           by id;                           /* for the IN= variables for data  */
           if x=1 and y=1;                  /* sets one and two respectively   */
         run;

SAS Dataset three will now consist of only the matches on ID:

ID   A   B     C
10   1   2     0
30   5   6     1

Must Learn – Different types of SAS Functions

Only the matches are kept in the output data set above because of the way the IN= variables X and Y take on values in the PDV:

  • 1 if the data set contributes to the observation
  • 0 if the data set does not contribute to the observation

For the above example, you can picture the IN= variables X and Y taking on values like this:

ID   A   B   C   X    Y
10   1   2   0   1    1
20   3   4   .   1    0
30   5   6   1   1    1

If you want to keep not only the matches but also to keep track of separate data sets of the non-matches, you can let the data step create three data sets like this:

    data x1y1          /* x1y1, x1y0, x0y1 are your choices of data set names */
          x1y0
          x0y1;
      merge one(in= x) two(in= y);
       by id;
       if x = 1 and y = 1 then output x1y1; /* write all matches to x1y1 */
       if x = 1 and y = 0 then output x1y0;
       if x = 0 and y = 1 then output x0y1;
     run;

This was all about the SAS Merge Datasets Tutorial. Hope you like our explanation.

5. Summary

Hence, this section was all about learning what is SAS merge Datasets, how to merge two or more datasets in SAS and what happens when there is no match in Data Sets.

Hope you understood the topic nicely. If you have any query, please ask in a comments section below.

After learning these concepts, it’s the right time to test your SAS knowledge through SAS Programming Quiz.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.