How to Enter and Read Raw Data in SAS
In the last article, we learned how SAS merge data sets, today we will be looking at how to enter & read raw data in SAS. Like we discussed earlier, a raw data file is a file that is temporarily stored by SAS for the execution of a program.
So, let’s start our journey of Entering & Reading Raw Data in SAS Programming.
6 Best Ways to Enter & Read Raw Data in SAS
Below are some of the ways in which we can enter and read Raw Data in SAS.
Stay updated with latest technology trends
Join DataFlair on Telegram!!
1. Instream Data
If you are planning to enter a small amount of data, it will be convenient to type the data in the SAS program rather than reading it from another file. This is known as instream data. It is a quick and easy way to enter data into SAS for analysis.
You will need 4 basic types of statements to enter data of this type:
- Cards or data lines
- A semicolon on a line by itself to end the data
Note: There should be at least one blank between each data value. More than one blank is appropriate. It is important to have something as a placeholder for each variable, even when the value is missing.
A period (.) will serve to indicate a missing value for both numeric and character variables entered in this way. The data do not need to be lined up exactly in columns.
data salary; input lname $ id sex $ salary age; cards; Smith 1028 M . . Williams 1337 F 3500 49 Brun 1829 . 14800 56 Agassi 1553 F 11800 65 Vernon 1626 M 129000 60 ; proc print data=salary; run;
2. Entering data for more than one case on the same line
If you want to enter raw data in SAS on the same line for several cases, you can use the @@ symbol:
data test; input x y group @@; cards; 1 2 A 3 12 A 15 22 B 17 29 B 11 44 C 13 29 C 7 21 D 11 29 D 16 19 E 25 27 E 41 12 F 17 19 F ; proc print data=test; run;
This results in the following output, which shows that data have been entered for 12 cases:
OBS X Y GROUP
1 1 2 A
2 3 12 A
3 15 22 B
4 17 29 B
5 11 44 C
6 13 29 C
7 7 21 D
8 11 29 D
9 16 19 E
10 25 27 E
11 41 12 F
12 17 19 F
3. Reading data from external files
Read Raw data in SAS, files (sometimes called ascii files, flat files, text files or unformatted files) can come from many different sources: when exported from a database program, such as Access, from a spreadsheet program, such as Excel.
The first step is to be sure that you know the characteristics of the raw data file in SAS. You can check & read raw data in SAS by using a text editor or word processing program.
For small files, you can use Windows Notepad, for larger files you can use Microsoft Word or Word Perfect (be sure if you open your raw data file with a word processing program, that you save it as text only or unformatted text when you quit).
To be able to read a raw data file, you will need a codebook that gives information about the data contained in the file. Some commonly used raw data file types are:
i. Blank separated values (with data in list form)
ii. Comma-separated values (.csv files–these typically come from Excel)
- Tab-separated values (.txt files–these may come from a number of different applications, including Excel)
iii. Fixed-column data (often the form of data from government agencies, or research groups, such as ICPSR–the Inter-University Consortium for Political and Social Research)
The part of SAS that creates a new data set is the data step as we discussed before. The data step for reading raw data from a file has 3 essential statements:
Other statements may be added to the data step to create new variables, carry out data transformations, or recode variables.
4. Reading blank separated values ( list or free-form data)
Raw data values separated by blanks are often called list or free-form data. Each value is separated from the next by one or more blanks. If there are any missing values, they must be indicated by a placeholder, such as a period.
Note that a period can be used to indicate a missing value for either character or numeric variables. Missing values can also be denoted by a missing value code, such as 99 or 999. The data do not need to be lined up in columns, so lines can be of unequal length, and can appear “ragged”.
Here is an excerpt from a raw data file that is separated by blanks. Note that the values in the file are not lined up in columns. The name of the raw data file is class.dat. Missing values indicates by a period (.), with a blank between periods for contiguous missing values.
Warren F 29 68 139
Kalbfleisch F 35 64 120
Pierce M . . 112
Walker F 22 56 133
Rogers M 45 68 145
Baldwin M 47 72 128
Mims F 48 67 152
Lambini F 36 . 120
Gossert M . 73 139
The SAS data step to read raw data in SAS is very simple. The data statement names the data set to create, and the infile statement indicates the raw data file to read. The input statement lists the variables to read in the order in which they appear in the raw data file.
No variables can skip at the beginning of the variable list, but you may stop reading variables before reaching the end of the list. Here are the SAS commands for reading this data:
data class; infile "class.dat"; input lname $ sex $ age height sbp; run;
5. Reading raw data separated by commas (.csv files):
Read raw data in SAS, files will in the form of CSV (Comma Separated Values) files. These files create by Excel and are very easy to read raw data in SAS Programming. An excerpt of a CSV file called PULSE.CSV discusses below.
Note that the first line of data contains the variable names.
SAS commands to read in this raw data file:
data pulse; infile "pulse.csv" firstobs=2 delimiter = "," dsd; input pulse1 pulse2 ran smokes sex height weight activity; run;
There are several modifications to the infile statement in the previous example:
- delimiter = “,” or dlm=”,” tells SAS that commas use to separate the values in the raw data file, not the default, which is a blank.
- firstobs = 2 tells SAS to begin reading the raw data file at line 2, which is where the actual values begin.
- dsd allows SAS to read consecutive commas as an indication of missing values.
The delimiter option may shorten to dlm, as shown below:
data pulse; infile "pulse.csv" firstobs=2 dlm = "," dsd; input pulse1 pulse2 ran smokes sex height weight activity; run;
6. Reading in raw data separated by tabs (.txt files):
Here we can read raw data in SAS, by separate tabs may create by Excel (saving a file with the text option) or by other applications. The example below shows how tab-separated data appear when viewed without the tabs visible. This is a portion of the raw data file, iris.txt:
51 38 15 3 Setosa
54 34 17 2 Setosa
51 37 15 4 Setosa
52 35 15 2 Setosa
53 37 15 2 Setosa
65 28 46 15 Versicolor
62 22 45 15 Versicolor
59 32 48 18 Versicolor
61 30 46 14 Versicolor
It is clearly not obvious to the naked eye that there are tabs separating the values in this file, but you still need to specify this to correct read raw data in SAS.
To do this, modify the infile statement to tell SAS that the delimiters are tabs. Since there is no character equivalent of the tab, the hexadecimal equivalent of tab indicates in the delimiter = option, as shown below:
data iris; infile "c:\temp\labdata\iris.txt" dsd missover dlm="09"X ; length species $ 10; input sepallen sepalwid petallen petalwid species $; run; proc print data=iris; run;
Note that SPECIES read as a character variable. We also use a length statement to be sure we get the correct length of the variable SPECIES.
Even though this variable appears last in the raw data, it will be first in the SAS data set, because the length statement is given before the data are read in. We will discuss the partial output from these commands:
Obs species sepallen sepalwid petallen petalwid
1 Setosa 50 33 14 2
2 Setosa 46 34 14 3
3 Setosa 46 36 10 2
4 Setosa 51 33 17 5
5 Setosa 55 35 13 2
6 Setosa 48 31 16 2
7 Setosa 52 34 14 2
So, this was all about how to enter & read raw data in SAS and different ways of reading raw data in SAS Programming Language. We will next be learning about how to write data in SAS.
Furthermore, if you have any query regarding how to enter & read raw data in SAS, comment below and stay tuned for more.