SAS/STAT Discriminant Analysis Procedure
Stay updated with the latest technology trends while you're on the move - Join DataFlair's Telegram Channel
We looked at SAS/STAT Longitudinal Data Analysis Procedures in our previous tutorial, today we will look at SAS/STAT discriminant analysis. Moreover, we will also discuss how can we use discriminant analysis in SAS/STAT. Our focus here will be to understand different procedures for performing SAS/STAT discriminant analysis: PROC DISCRIM, PROC CANDISC, PROC STEPDISC through the use of examples.
So, let’s start SAS/STAT Discriminant Analysis Procedure.
2. What is SAS/STAT Discriminant Analysis?
SAS/STAT Discriminant analysis is a statistical technique that is used to analyze the data when the criterion or the dependent variable is categorical and the predictor or the independent variable is an interval in nature.
Discriminant analysis in SAS/STAT is very similar to an analysis of variance (ANOVA). Let us consider a simple example, suppose we measure height in a random sample of 50 males and 50 females. Females are, on the average, not as tall as males, and this difference will be reflected in the difference in means (for the variable Height). Therefore, variable height allows us to discriminate between males and females with a better than chance probability: if a person is tall, then he is likely to be a male, if a person is short, then she is likely to be a female.
The most common application of discriminant analysis in SAS/STAT is to include many measures in the study, in order to determine the ones that discriminate between groups. For example, an educational researcher interested in predicting high school graduates choices for further education would probably include as many measures of personality, achievement, motivation, academic performance, etc. as possible in order to learn which one(s) offer the best prediction.
Let’s Learn 7 Simple SAS/STAT Cluster Analysis Procedures
3. Procedures for Performing Discriminant Analysis in SAS/STAT
Following procedures performs in SAS/STAT discriminant analysis of a sample data. Each procedure has a different syntax and is used with different type of data in different contexts. Let us explore each one of these.
a. PROC CANDISC
The PROC CANDISC procedure in SAS/STAT is used as a dimension reduction technique to find linear combinations of quant variables that provide maximum separation between classes. It uses Mahalanobis distance between classes for separation.
Syntax of PROC CANDISC
PROC CANDISC DATASET <OPTIONS>; CLASS <variable>; VAR <variable>;
Example Of PROC CANDISC
data iris; set sashelp.iris; run; proc candisc data=iris out=outcan distance anova; class species; var sepallength sepalwidth petallength petalwidth; run;
b. PROC DISCRIM
The PROC DISCRIM procedure in SAS/STAT performs discriminant analysis through which it classifies observations into different groups. It is similar to logistic regression, the only difference is that we have two categories, in this multiple categories can be used.
Syntax for PROC DISCRIM
PROC DISCRIM dataset <OPTIONS>; CLASS <VARIABLES>; Var <VARIABLES>;
Example of PROC DISCRIM
data iris; set sashelp.iris; run; proc DISCRIM data=iris distance anova MANOVA CROSSLISTERR; class species; var sepallength sepalwidth petallength petalwidth; run;
The DISCRIM procedure begins by displaying summary information about the variables in the analysis. This information includes the number of observations, the number of quantitative variables in the analysis (specified with the VAR statement), and the number of classes in the classification variable (specified with the CLASS statement). The frequency of each class, its weight, the proportion of the total sample, and the prior probability are also displayed.
c. PROC STEPDISC
The PROC STEPDISC procedure in SAS/STAT performs a stepwise discriminant analysis to select a subset of the quantitative variables for use in discriminating among the classes. The STEPDISC procedure can be used for forward selection, backward elimination, or stepwise selection.
Syntax for PROC STEPDISC
PROC STEPDISC dataset OPTIONS; CLASS <VARIABLES>; Var < variable>;
Example of PROC STEPDISC
data iris; set sashelp.iris; run; proc stepdisc data=iris; class species; var sepallength sepalwidth petallength petalwidth; run;
This was all about SAS/STAT Discriminant Analysis Tutorial. Hope you like our explanation.
Let’s Know about SAS/STAT Bayesian Analysis Procedures You Must Know
Hence, this was a complete description and a comprehensive understanding of all the procedures offered by SAS/STAT Discriminant Analysis: PROC DISCRIM, PROC CANDISC, and PROC STEPDISC. We looked at each one of them, their syntax, and how they can be used. Hope you all enjoyed it. Stay tuned for more interesting topics in SAS/STAT and for any doubts, post it in the comments section below.