How to Enter and Read Raw Data in SAS

Job-ready Online Courses: Click for Success - Start Now!

In the last article, we learned how SAS merge data sets, today we will be looking at how to enter & read raw data in SAS. Like we discussed earlier, a raw data file is a file that is temporarily stored by SAS for the execution of a program.

So, let’s start our journey of Entering & Reading Raw Data in SAS Programming.

Read Raw Data In SAS Programming Language

6 Best Ways to Enter & Read Raw Data in SAS

Below are some of the ways in which we can enter and read Raw Data in SAS.

1. Instream Data

If you are planning to enter a small amount of data, it will be convenient to type the data in the SAS program rather than reading it from another file. This is known as instream data. It is a quick and easy way to enter data into SAS for analysis.

You will need 4 basic types of statements to enter data of this type:

  • Data
  • Input
  • Cards or data lines
  • A semicolon on a line by itself to end the data

Note: There should be at least one blank between each data value. More than one blank is appropriate. It is important to have something as a placeholder for each variable, even when the value is missing.

A period (.) will serve to indicate a missing value for both numeric and character variables entered in this way. The data do not need to be lined up exactly in columns.

Example-

data salary;
input lname $ id sex $ salary age;
cards;
Smith      1028 M     .       .
Williams   1337 F    3500    49
Brun       1829 .   14800    56
Agassi     1553 F   11800    65
Vernon     1626 M  129000    60
  ;
proc print data=salary;
run;

2. Entering data for more than one case on the same line

If you want to enter raw data in SAS on the same line for several cases, you can use the @@ symbol:

Example-

data test;
   input x y group @@;
   cards;
1  2 A   3 12 A   15 22 B   17 29 B   11 44 C   13 29 C
7 21 D  11 29 D   16 19 E   25 27 E   41 12 F   17 19 F
;
proc print data=test;
run;

This results in the following output, which shows that data have been entered for 12 cases:

OBS     X        Y    GROUP
1          1        2        A
2         3       12        A
3        15      22        B
4        17      29        B
5        11      44        C
6        13      29        C
7         7       21        D
8       11       29        D
9       16       19        E
10      25      27        E
11      41      12        F
12      17      19        F

3. Reading data from external files

Read Raw data in SAS, files (sometimes called ascii files, flat files, text files or unformatted files) can come from many different sources: when exported from a database program, such as Access, from a spreadsheet program, such as Excel.

The first step is to be sure that you know the characteristics of the raw data file in SAS. You can check & read raw data in SAS by using a text editor or word processing program.

For small files, you can use Windows Notepad, for larger files you can use Microsoft Word or Word Perfect (be sure if you open your raw data file with a word processing program, that you save it as text only or unformatted text when you quit).

To be able to read a raw data file, you will need a codebook that gives information about the data contained in the file. Some commonly used raw data file types are:

i. Blank separated values (with data in list form)

ii. Comma-separated values (.csv files–these typically come from Excel)

  • Tab-separated values (.txt files–these may come from a number of different applications, including Excel)

iii. Fixed-column data (often the form of data from government agencies, or research groups, such as ICPSR–the Inter-University Consortium for Political and Social Research)

The part of SAS that creates a new data set is the data step as we discussed before. The data step for reading raw data from a file has 3 essential statements:

  • Data
  • Infile
  • Input

Other statements may be added to the data step to create new variables, carry out data transformations, or recode variables.

4. Reading blank separated values ( list or free-form data)

Raw data values separated by blanks are often called list or free-form data. Each value is separated from the next by one or more blanks. If there are any missing values, they must be indicated by a placeholder, such as a period.

Note that a period can be used to indicate a missing value for either character or numeric variables. Missing values can also be denoted by a missing value code, such as 99 or 999. The data do not need to be lined up in columns, so lines can be of unequal length, and can appear “ragged”.

Here is an excerpt from a raw data file that is separated by blanks. Note that the values in the file are not lined up in columns. The name of the raw data file is class.dat. Missing values indicates by a period  (.), with a blank between periods for contiguous missing values.

Warren F 29 68 139
Kalbfleisch F 35 64 120
Pierce M . . 112
Walker F 22 56 133
Rogers M 45 68 145
Baldwin M 47 72 128
Mims F 48 67 152
Lambini F 36 . 120
Gossert M . 73 139

The SAS data step to read raw data in SAS is very simple. The data statement names the data set to create, and the infile statement indicates the raw data file to read. The input statement lists the variables to read in the order in which they appear in the raw data file.

No variables can skip at the beginning of the variable list, but you may stop reading variables before reaching the end of the list.  Here are the SAS commands for reading this data: 

data class;
      infile "class.dat";
      input lname $ sex $ age height sbp;
run;

5. Reading raw data separated by commas (.csv files):

Read raw data in SAS,  files will in the form of CSV (Comma Separated Values) files. These files create by Excel and are very easy to read raw data in SAS Programming. An excerpt of a CSV file called PULSE.CSV discusses below.

Note that the first line of data contains the variable names.

pulse1,pulse2,ran,smokes,sex,height,weight,activity
64,88,1,2,1,66,140,2
58,70,1,2,1,72,145,2
62,76,1,1,1,73,160,3
66,78,1,1,1,73,190,1

SAS commands to read in this raw data file:

data pulse;
   infile "pulse.csv" firstobs=2 delimiter = ","  dsd;
   input pulse1 pulse2 ran smokes sex height weight activity;
run;

There are several modifications to the infile statement in the previous example:

  1. delimiter = “,” or dlm=”,” tells SAS that commas use to separate the values in the raw data file, not the default, which is a blank.
  2. firstobs =  2  tells SAS to begin reading the raw data file at line 2, which is where the actual values begin.
  3. dsd allows SAS to read consecutive commas as an indication of missing values.

The delimiter option may shorten to dlm, as shown below:

data pulse;
   infile "pulse.csv" firstobs=2 dlm = "," dsd;
   input pulse1 pulse2 ran smokes sex height weight activity;
run;

6. Reading in raw data separated by tabs (.txt files):

Here we can read raw data in SAS, by separate tabs may create by Excel (saving a file with the text option) or by other applications. The example below shows how tab-separated data appear when viewed without the tabs visible. This is a portion of the raw data file, iris.txt:

51    38    15    3     Setosa
54    34    17    2     Setosa
51    37    15    4     Setosa
52    35    15    2     Setosa
53    37    15    2     Setosa
65    28    46    15    Versicolor
62    22    45    15    Versicolor
59    32    48    18    Versicolor
61    30    46    14    Versicolor

It is clearly not obvious to the naked eye that there are tabs separating the values in this file, but you still need to specify this to correct read raw data in SAS.

To do this, modify the infile statement to tell SAS that the delimiters are tabs. Since there is no character equivalent of the tab, the hexadecimal equivalent of tab indicates in the delimiter = option, as shown below:

data iris;
  infile "c:\temp\labdata\iris.txt"  dsd missover dlm="09"X  ;
  length species $ 10;
  input     sepallen
              sepalwid
              petallen
              petalwid
              species $;
run;
proc print data=iris;
run;

Note that SPECIES read as a character variable. We also use a length statement to be sure we get the correct length of the variable SPECIES.

Even though this variable appears last in the raw data, it will be first in the SAS data set, because the length statement is given before the data are read in. We will discuss the partial output from these commands:

Obs    species    sepallen    sepalwid    petallen    petalwid
1    Setosa        50          33          14           2
2    Setosa        46          34          14           3
3    Setosa        46          36          10           2
4    Setosa        51          33          17           5
5    Setosa        55          35          13           2
6    Setosa        48          31          16           2
7    Setosa        52          34          14           2

This was all, How to Enter and Read Raw Data in SAS Tutorial. Hope you like our explanation.

Summary

So, this was all about how to enter & read raw data in SAS and different ways of reading raw data in SAS Programming Language. We will next be learning about how to write data in SAS.

Furthermore, if you have any query regarding how to enter & read raw data in SAS, comment below and stay tuned for more.

Did we exceed your expectations?
If Yes, share your valuable feedback on Google

courses

DataFlair Team

The DataFlair Team provides industry-driven content on programming, Java, Python, C++, DSA, AI, ML, data Science, Android, Flutter, MERN, Web Development, and technology. Our expert educators focus on delivering value-packed, easy-to-follow resources for tech enthusiasts and professionals.

2 Responses

  1. Phan Thành Phúc says:

    Dear Admin, how to enter a character variable with space in between. For example “Steve Roger” with space ‘ ‘ b/t ‘Steve’ and ‘Roger’. Thank you & Regards,

  2. Kathryn Araya says:

    i believe you can use a space after the quoted rogers like – ” rogers” or “steve ” or use @ but there are a few ways

Leave a Reply

Your email address will not be published. Required fields are marked *