Importing Data in R – A Complete Tutorial
1. Objective – Importing Data in R
In this R tutorial, we are going to learn how to import data from external sources into R programming language. We will see the usage of different commands like c(), scan(), a comparison between read.csv() and read.csv2() in R to import csv files, read.table() to read data from table in R and read.delim() function for importing data in R. We will also learn how R handles blank data while combining 2 or more files for doing Data analytics.
So, let’s start Importing Data in R.
2. Importing Data in R
Now, let’s see the process of importing data in R –
a. Using the Combine Command
As we know c() function use to concatenate or combine items in R as specified below:
>c(item.1, item.2, item.n) //
(The c() function combines all the specified items in one object)
>sample.name = c(item.1, item.2, item.n) //
(The concatenated values can assign to a named object, as shown in the command)
Everything in the parentheses is joined to create a single item. Usually, the joined items are assigned to a named object.
b. Entering Numerical Items as Data
Numerical data can simply enhance by typing the values separated by commas into the c() command.
Let us create a data set. Below command use for the same:
>data1 = c(3, 5, 7, 5, 3, 2, 6, 8, 5, 6, 9)
It creates a new object to hold the data and then type the values in the parenthesis. The values are separated by using commas. The result is not automatically displayed. To see the dataset, type its name in the R console as follows:
This command will display entries of data
 3 5 7 5 3 2 6 8 5 6 9 //
Displays the contents of object data1
As R supports different types of data, all data types can import into it for computation.
Existing data objects can incorporate with existing values to make new ones, simply by incorporating them as if they were values themselves. In the following example, we take the numerical sample made earlier and incorporate it into a larger sample.
>data2 = c(data1, 4, 5, 7, 3, 4) data2 //
Displays the contents of object data2
Below display as output:
 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4
c. Entering Text Items as Data
Data that is not numerical can differentiate from numbers by using quotes. There is no difference between using single and double quotes; R converts them all to double. Either or both can use as long as the surrounding quotes for any single item match.
As numerical data can import, text values can also import and manipulate in R.
>day1 = c('Mon', 'Tue', 'Wed', 'Thu') >day1
This displays the contents of day1 as below:
 "Mon" "Tue" "Wed" "Thu"
As we have joined numerical data, in the same manner we can join text data as well as shown below:
>day1 = c(day1, 'Fri') >day1
This displays the updated contents of object day1
 “Mon” “Tue” “Wed” “Thu” “Fri”
When text and numbers are combined, entire data object becomes a text variable and the numbers are also converted to text.
The c()command is a quick way of getting a series of values stored in a data object. This command is useful when the samples are small, but it can be tedious when a lot of typing involves.
d. Using the scan() Command
When using the c() command, you may find typing all the commas to separate the values a little tedious. Instead, you can use the scan()command to do the same job, but without the commas. In addition to using the scan()command to enter text into datasets, it can use with the clipboard and take data from files.
Unlike the c()command, the scan()command uses empty parentheses. The command then prompts you to enter the desired data. The entered data can store in a new variable.
Let us see this with the help of an example:
>file_name = scan() //
This is the syntax for using scan command
You can also use the scan()command to enter text into datasets. Simply entering the items in quotes will generate an error message. The modified syntax for entering text as data is as follows:
>scan(what = 'character') >day1
 “Mon” “Tue” “Wed” “Thu” “Fri”
file: the name of a file
what: type of data, including logical, integer, numeric, complex, character, raw
In R, the user must specify that the items entered are characters, and not numbers. To do so, the (what = ‘character’) part must add.
e. Using the Clipboard to Make Data
Another way of importing data interactively into R is to use the Clipboard to copy and paste data.
The scan() command can use with programs, such as a spreadsheet for entering data into R.
The steps to import data are:
- If the spreadsheet data is in the form of numbers, simply type the command in R as usual before switching to the spreadsheet containing the data.
- Highlight the necessary cells in the spreadsheet and copy them to the clipboard.
- Then return to R and paste the data from the clipboard into R. As usual, R waits until a blank line is entered before ending the data entry so you can continue to copy and paste more data as required.
- Enter a blank line to complete data entry.
If the data is text, add the what = ‘character instruction to the scan() command. If the file can open in a spreadsheet, proceed with the aforementioned four steps. Now, if the file opens in a text editor or word processor, see how the data items are separated before continuing.
If the data is separated by simple spaces, simply copy and paste. If the data is separated by some other character, R needs to be told which character is used as the separator.
f. Using Scan()to retrieve data from CSV file
The scan() command can use to retrieve data from a CSV file, as follows:
>File_Name = scan(sep = ',') //sep
It uses for the separator to show the type of separator
The output displays below:
The separator must enclosed in quotes. You need to press enter to finish the data entry.
g. Reading a File of Data from a Disk
The scan() command can use to retrieve data file in the memory of the system.
Scan() can read data into a vector or list from the console or file. To read a file with the scan()command, simply add file = ‘filename’ to the command as shown below:
>Object_Name = scan(file = 'File_Name.txt')
The filename must enclosed in quotes.
R looks for the data file in the default directory. To get the current working directory, getwd() command is used as below:
This shows the current working directory as below:
 "C:/Documents and Settings/Administrator/My Documents"
The directories in the example separat by forwarding slashes. The backslash character is not necessary.
We can alter the working directory in R. In case you want to load files by just typing their names from any directory, the task becomes easier if we permanently set the working directory as different directory. We can alter the directory with the setwd() command:
This change the current working directory to “desktop” and displays new directory as below:
In the Windows and Mac operating systems, there is an alternative method that enables file selection. The instruction file.choose()can include as part of scan()command. This opens a browser-type window where users can navigate and select the file to read.
>Object_Name = scan(file.choose()) >Object_Name
The output display as below:
 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0
The file.choose()instruction does not work on the Linux operating system. The file.choose() instruction files from different directories can select without having to alter the working directory or typing the names in full.
h. Reading Bigger Data Files
Let us now see how to read bigger data files in R:
The scan()command is helpful in reading simple vectors. It is possible to enter large amounts of data directly into R from complicated data files that contain multiple items. It is more likely that the data would be stored in a spreadsheet. R provides the means to read data that is stored in a range of text formats, all of which the spreadsheet is able to create.
- Command to read from CSV file: > read.csv() or read.csv2()
- Command to read from tables: > read.table()
- Command to read from Tab separated value files: > delim()
The difference between read.csv() and read.csv2() in R is in their usage. The former function use if the separator is a ‘,’ while the latter use if the separator is ‘;’ to separate the values in your data file.
i. Missing Values in Data Files
In the real world, samples are often of unequal size. So now we are going to see how R handles missing values in data files:
Let us consider two samples, mow and unmow.
The mow sample contains five values, whereas the unmow sample contains four values. When this data read into R from a spreadsheet or text file, the program recognizes multiple columns of data and sets them accordingly.
R converts data into a neat rectangular item and fills in any gaps with NA.
NOTE: The NA item is a special object in its own right as “Not Applicable” or “Not Available.”
>Grass = read.csv (file.choose()) Grass
1 12 8
2 15 9
3 17 7
4 11 9
5 15 NA
The dataset has been called grass and R has filled in the gap by using NA.
R always pads out the shorter samples by using NA to produce a rectangular object. This is called a data frame in R. Thus R data frame is an important kind of object because it uses so often in statistical data manipulation.
Learn here ways to export data from R to text file or CSV file or Excel sheet or SAS or SPSS or Stata.
So, this was all in Importing Data in R. Hope you like our explanation.
Hence, in this R tutorial, we discussed the process of importing data in R. Also, we saw different command for importing data in R and understand all with the help of examples. Still, if you have a query regarding Importing Data in R, ask in the comment tab.