Different Cases of SAS Concatenate Data Sets with SET Statement
Job-ready Online Courses: Dive into Knowledge. Learn More!
Previously, we studied SAS Functions, now we will move on to a very easy and interesting method to combine two data sets in SAS programming i.e. SAS concatenate data sets.
After the end of this tutorial, you will be able to understand SAS concatenate datasets and different cases of concatenating data sets.
What is SAS Concatenate Data Sets?
We come across different types of SAS datasets and many times we would want to combine two data sets to have a clear idea about our analysis. The SET statement is used to concatenate two datasets in SAS.
If there are two data sets, say one has 2 observations and the other has 3 observations, then, our SAS concatenate dataset will have 5 observations. The order of observations is sequential. All observations from the first data set are followed by all observations from the second data set, and so on.
Below is the syntax of a SET statement in SAS
SET data-set 1 data-set 2 data-set 3.....;
Data-set 1, data-set 2 and data-set 3 are names of the data sets we wish to combine.
For example – suppose the data set store1 contains three variables, store (number), day (of the week), and sales (in dollars):
Store    Day      Sales
1Â Â Â Â Â Â Â Â MÂ Â Â Â Â Â Â 1200
1Â Â Â Â Â Â Â Â TÂ Â Â Â Â Â Â 1435
1Â Â Â Â Â Â Â Â WÂ Â Â Â Â Â Â 1712
1Â Â Â Â Â Â Â Â RÂ Â Â Â Â Â Â 1529
1Â Â Â Â Â Â Â Â FÂ Â Â Â Â Â Â 1920
1Â Â Â Â Â Â Â Â SÂ Â Â Â Â Â Â 2325
and the data set store2 contains the same three variables:
Store     Day      Sales
2Â Â Â Â Â Â Â Â MÂ Â Â Â Â Â 2215
2Â Â Â Â Â Â Â Â TÂ Â Â Â Â Â Â 2458
2Â Â Â Â Â Â Â Â WÂ Â Â Â Â Â 1798
2Â Â Â Â Â Â Â Â RÂ Â Â Â Â Â Â 1692
2Â Â Â Â Â Â Â Â FÂ Â Â Â Â Â Â 2105
2Â Â Â Â Â Â Â Â SÂ Â Â Â Â Â Â 2847
For SAS concatenate data sets, you simply specify a list of data set names in one SET statement.
DATA store1;      input Store Day $ Sales;      DATALINES; 1 M 1200 1 T  1435 1 W  1712 1  R 1529 1  F  1920 1  S  2325 ; RUN; DATA store2;      input Store Day $ Sales;      DATALINES; 2 M 2215 2 T  2458 2 W 1798 2  R 1692 2  F  2105 2  S  2847 ; RUN; DATA bothstores;      set store1 store2; RUN; PROC PRINT data = bothstores;       title ‘ The bothstores dataset’; RUN;
The output of the above program will look like:
Store   Day    Sales
1Â Â Â Â Â Â Â MÂ Â Â Â Â Â 1200
1Â Â Â Â Â Â Â TÂ Â Â Â Â Â Â 1435
1Â Â Â Â Â Â Â WÂ Â Â Â Â Â 1712
1Â Â Â Â Â Â Â RÂ Â Â Â Â Â Â 1529
1Â Â Â Â Â Â Â FÂ Â Â Â Â Â Â 1920
1Â Â Â Â Â Â Â SÂ Â Â Â Â Â Â 2325
2Â Â Â Â Â Â Â MÂ Â Â Â Â Â Â 2215
2Â Â Â Â Â Â Â TÂ Â Â Â Â Â Â 2458
2Â Â Â Â Â Â Â WÂ Â Â Â Â Â 1798
2Â Â Â Â Â Â Â RÂ Â Â Â Â Â Â 1692
2Â Â Â Â Â Â Â FÂ Â Â Â Â Â Â 2105
2Â Â Â Â Â Â Â SÂ Â Â Â Â Â Â 2847
Different Cases of Concatenating Data Sets
In the above example, we saw that both datasets had three variables so it was easy to concatenate them. But, what happens if the variables are different, or of different lengths or of different names? Let us understand this from SAS Concatenate Datasets in different cases.
1. Different Names of Variables
In this case, both the data sets have different names for their variables. So, when we try and concatenate, an error might get displayed.
In order to overcome this, when we concatenate the datasets, we will rename the variables so that their name appears as one in the concatenated data set. Let us look at how we can do this.
Example:
In the below example data set x1 has the variable name x1inc whereas the data set x2 has the variable name x2inc. But both of these variables represent the same type(numeric). We apply the RENAME function in the SET statement as shown below:
DATA x1; INPUT empid name $ x1inc ; DATALINES; 2 Art 22000 1 Bill 30000 3 Paul 25000 ; RUN; DATA x2; INPUT empid name $ x2inc ; DATALINES; 1 Bess 15000 3 Pat 50000 2 Amy 18000 ; RUN; DATA x3; SET x1(RENAME=(x1inc=inc))x2(RENAME=(x2inc=inc)); RUN; PROC PRINT DATA=x3; RUN;
Output-
OBSÂ Â Â Â Â Â EMPIDÂ Â Â Â Â NAMEÂ Â Â Â Â Â Â INC
1         2         Art       22000
2         1         Bill       30000
3         3         Paul       25000
4         1         Bess      15000
5         3         Pat       50000
6         2         Amy      18000
2. Different Variable Attributes
We talked about variable attributes in earlier classes which are a type, length, label, etc. Now, what happens if these differ. Again, while combining datasets, we can specify the length in our SET statement, that we want to keep or the data type of the variables.
Let us look at an example:
In the below example, the variable name is of length 3 in the first data set and 4 in the second. When concatenating, we apply the LENGTH statement in SAS concatenate data set to set the name length to 7 or the sum of two lengths.
DATA x1;   INPUT empid name $3. inc; DATALINES;  2 Art 22000  1 Bob 30000  3 Tom 25000  RUN; DATA x2;  INPUT empid name $4. inc; DATALINES;  1 Bess 15000  3 Rory 50000  2 Jane 18000 RUN;  DATA x3;  SET x1 x2; RUN; PROC PRINT DATA=x1andx2; RUN;
Output-
OBSÂ Â Â Â empIDÂ Â Â Â Â NAMEÂ Â Â Â Â Â INC
1       2        Art       22000
2       1        Bob      30000
3       3        Tom      25000
4       1        Bes      15000
5       3        Ror      50000
6       2        Jan      18000
3. Different Number of Variables
If one of the two datasets has more number of variables than the other, then the data set which has less number of variables shows missing values in place of that extra variable.
Summary
Today we learned what is concatenation in SAS, how SAS concatenate data sets in SAS programming. We also discussed different cases of concatenating data sets in SAS.
Hope you find the explanation helpful. If you have any queries, feel free to ask in the comment section.
If you are Happy with DataFlair, do not forget to make us happy with your positive feedback on Google