Splitting and Subsetting Datasets in SAS – A Brief Guide of 7 Mins!

FREE Online Courses: Enroll Now, Thank us Later!

Now, we will be learning how we can extract some part of our dataset for our use by using statements like IF, WHERE, DROP and KEEP.

These statements when applied to the dataset, help us extract variables and observations of our interest only. Here, we will study what is subsetting datasets in SAS, SAS split dataset by variable and SAS split dataset by observations; SAS subsetting WHERE statement & IF Statement.

Subsetting Datasets in SAS

We come across different types of datasets in SAS and many times we would want to have only a part of the dataset that may be useful. Let’s say we want dates after 1/04/2011 only, all dates before this are not of our interest.

Then, the feature, subsetting datasets in SAS, helps us extract only that data which is relevant to us, thereby improving our analysis. Subsetting can be achieved by extracting either columns or rows(observations) of our choice.

SAS Subsetting Variables

You can create a new dataset with only a subset of the variables in the original dataset using a keep or drop statement.

Subsetting Datasets in SAS - Variables

Suppose you want to print just three of the variables in this data set: study id, age, and height.

data one;
      input studyid name $ sex $ age weight height;
cards;
1      Carol      f    22     120    64
6      Julie      f    55     125    63
2      Joe        m    34     130    68
7      Kelsey     f    43     130    64
4      Ann        f    29     .      65
11     Tracee     f    21     110     .
9      Bob        m    32     155    71
3      Ed         m    40     120    69
8      Karl       m    35     160    68
10     Gayla      f    40     130    68
5      John       m    17     175    73
run;

i. Using ‘var’

You can do this by specifying the variables in the VAR statement in proc print.

proc print data=one;
   var studyid age height;
run;

ii. Using ‘keep’

Another way to do this is to use a keep statement to create a new dataset only with the selected variables.

data two;
set one;
keep studyid age height;
proc print;
run;

iii. Using ‘drop’

Yet another way to do this is to use a drop statement to drop the other variables from your new data set.

data three;
set one;
drop name sex weight;
proc print;
run;

These will all produce the same output:

Output - Subsetting Variables

SAS Subsetting Observations

You can create a new dataset with only a subset of the observations in the original data set using an IF or WHERE statement.

To create a new data set that only includes a subset of observations from an existing data set, use a SET statement along with a subsetting IF statement.

The SET statement creates a duplicate of the original data step and the IF statement selects only the observations that meet the IF statement requirement.

The WHERE statement can be used similarly in a data step.

Subsetting Datasets in SAS - Observations

i. Using IF Statement

Example:-

There is a data set called abc, assume it is a SAS data set with 76 boys and 48 girls.

First, we will select only girls, i.e., those with sex=2

data abc;
    set pbkid;
     if sex=2;
run;
proc means data=abc;
      var iq;
run;

Output - IF statement (For Girls)

Next, we will select only boys, those with sex=1, using a WHERE statement, and produce their mean IQ score:

data abc;
        set pbkid;
     where sex=1;
run;
proc means data=abc;
             var iq;
run;

Output - IF statement (For Boys)

ii. Using WHERE Statement

In the above example, to produce statistics on a subset of our observations only, we created a subset dataset using an IF (or WHERE) statement in the data step and then applied the proc.

Similarly, we can use a where statement directly in the proc as shown below:

proc means data=abc;
var iq;
where sex = 1;
title1 'MALE (1) IQ SCORES';
run;
proc means data=abc;
var iq;
where sex=2;
title1 'FEMALE (2) IQ SCORES';
run;

Summary

We learned the subsetting datasets in SAS, statements that can be used to extract variables of our interest and statements that can be used to extract observations of our interest. Hope you understood it clearly.

If you have any queries related to the tutorial, feel free to ask in the comment section below. Awaiting for your reply!

Did you like this article? If Yes, please give DataFlair 5 Stars on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *