Splitting and Subsetting Datasets in SAS – A Brief Guide of 7 Mins!
Now, we will be learning how we can extract some part of our dataset for our use by using statements like IF, WHERE, DROP and KEEP.
These statements when applied to the dataset, help us extract variables and observations of our interest only. Here, we will study what is subsetting datasets in SAS, SAS split dataset by variable and SAS split dataset by observations; SAS subsetting WHERE statement & IF Statement.
Subsetting Datasets in SAS
We come across different types of datasets in SAS and many times we would want to have only a part of the dataset that may be useful. Let’s say we want dates after 1/04/2011 only, all dates before this are not of our interest.
Then, the feature, subsetting datasets in SAS, helps us extract only that data which is relevant to us, thereby improving our analysis. Subsetting can be achieved by extracting either columns or rows(observations) of our choice.
Stay updated with latest technology trends
Join DataFlair on Telegram!!
SAS Subsetting Variables
You can create a new dataset with only a subset of the variables in the original dataset using a keep or drop statement.
Suppose you want to print just three of the variables in this data set: study id, age, and height.
data one; input studyid name $ sex $ age weight height; cards; 1 Carol f 22 120 64 6 Julie f 55 125 63 2 Joe m 34 130 68 7 Kelsey f 43 130 64 4 Ann f 29 . 65 11 Tracee f 21 110 . 9 Bob m 32 155 71 3 Ed m 40 120 69 8 Karl m 35 160 68 10 Gayla f 40 130 68 5 John m 17 175 73 run;
i. Using ‘var’
You can do this by specifying the variables in the VAR statement in proc print.
proc print data=one; var studyid age height; run;
ii. Using ‘keep’
Another way to do this is to use a keep statement to create a new dataset only with the selected variables.
data two; set one; keep studyid age height; proc print; run;
iii. Using ‘drop’
Yet another way to do this is to use a drop statement to drop the other variables from your new data set.
data three; set one; drop name sex weight; proc print; run;
These will all produce the same output:
SAS Subsetting Observations
You can create a new dataset with only a subset of the observations in the original data set using an IF or WHERE statement.
To create a new data set that only includes a subset of observations from an existing data set, use a SET statement along with a subsetting IF statement.
The SET statement creates a duplicate of the original data step and the IF statement selects only the observations that meet the IF statement requirement.
The WHERE statement can be used similarly in a data step.
i. Using IF Statement
There is a data set called abc, assume it is a SAS data set with 76 boys and 48 girls.
First, we will select only girls, i.e., those with sex=2
data abc; set pbkid; if sex=2; run; proc means data=abc; var iq; run;
Next, we will select only boys, those with sex=1, using a WHERE statement, and produce their mean IQ score:
data abc; set pbkid; where sex=1; run; proc means data=abc; var iq; run;
ii. Using WHERE Statement
In the above example, to produce statistics on a subset of our observations only, we created a subset dataset using an IF (or WHERE) statement in the data step and then applied the proc.
Similarly, we can use a where statement directly in the proc as shown below:
proc means data=abc; var iq; where sex = 1; title1 'MALE (1) IQ SCORES'; run; proc means data=abc; var iq; where sex=2; title1 'FEMALE (2) IQ SCORES'; run;
We learned the subsetting datasets in SAS, statements that can be used to extract variables of our interest and statements that can be used to extract observations of our interest. Hope you understood it clearly.
If you have any queries related to the tutorial, feel free to ask in the comment section below. Awaiting for your reply!