Introduction to Contingency Tables in R
This R tutorial is all about Contingency tables in R. First of all, we will discuss the introduction to R Contingency tables, different ways to create Contingency tables in R. This tutorial also covers the Complex Tables in R / Flat Tables in R, Cross Tabulation in R, Recreating original data from contingency tables in R, and everything related to R contingency tables.
2. What are Contingency Tables in R?
A contingency table is particularly useful when a large number of observations need to be condensed into a smaller format whereas a complex (flat) table is a type of contingency table that is used when creating just one single table as opposed to multiple ones.
You can manipulate, alter, and produce table objects using the table() command to summarize a data sample like vector, matrix, list, and data frame objects. You can also create a few special kinds of table objects, like contingency tables and complex (flat) contingency tables using table() command.
Additionally, you can also use cross-tabulation to reassemble data into a tabular format as necessary.
3. Making Contingency Tables in R
A contingency table is a way to redraw data and assemble it into a table that shows the layout of the original data in a manner that allows the reader to gain an overall summary of the original data. table() command can be used to create contingency tables in R because the command can handle data in simple vectors or more complex matrix and data frame objects. The more complex the original data, the more complex the resulting contingency table will be.
4. Creating Contingency Tables from Vectors in R
Vector is the simplest data object from which you can create a contingency table. In the following example, you have a simple numeric vector of values.
>data2  3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4 > table(data2) data2 2 3 4 5 6 7 8 9 1 3 2 4 2 2 1 1 > sort(data2)  2 3 3 3 4 4 5 5 5 5 6 6 7 7 8 9
The sort() command is used to reorder data values; comparing this to the table created above to what The table() command does to the same dataset produces a table with vector labels.
5. Creating R Contingency Tables from Data
A simple example where a data frame containing a column of numeric values and two columns of factors (character variables) is shown in the following table:
> pw height plant water 1 9 vulgaris lo 2 11 vulgaris lo 3 6 vulgaris lo 4 14 vulgaris mid 5 17 vulgaris mid 6 19 vulgaris mid 7 28 vulgaris hi
Three tables are produced using the table() command on these data. It produces a table for each of the water treatments. Here, only the first one is shown.
> table(pw) , , water = hi
The first table examines the situation for the first treatment; the factors are considered in order so the hi treatment comes first (as a result of alphabetical sorting). The other factors are shown in separate tables, with the next one being for the lo treatment:
> table(pw) , , water = lo
And last being for mid-treatment:
> table(pw) , , water = mid
6. How to create Custom Contingency Tables in R?
You could create an R contingency table that uses only part of the data rather than using all rows or columns. In this situation, you can actually select each separate row and column to use.
There are various ways of creating custom contingency tables in R:
- Selecting Columns to Use in a Contingency Table
- Selecting Rows to Use in a Contingency Table
- Rotating Data Frames
- Using Matrix Objects
- Using Data Frame
We will learn about each of these in detail below:
6.1. Selecting Columns to Use in an R Contingency Table
The table() command enables you to specify which columns of data to use to create a contingency table. You simply need to provide the names of the vector objects in the command instruction:
> table(column_name1, column_name2, ….)
An example of using table() command is given in the following table:
> table(pw$height, pw$water)
This command gives values of specific columns that is height and water from the data.
If in place of above command, you provide below command, it will show error:
> table(height, water) Error in table(height, water): object height not found
In this example, though the command cannot find the objects you want because they are part of the pw data frame. You can get around this in one of the several ways. You could use $ and specify the full name or you could use the attach() command to “open up” the data frame.
6.2. Selecting Rows to Use in R Contingency Table
If you want to use only certain rows of a data frame to form the basis for a contingency table, you need to use a slightly different approach. Essentially, this involves creating a matrix object and making a contingency table from that or Using Rows of a Data Frame in a Contingency Table.
6.3. Rotating Data Frames in R
You can rotate the data to make rows the columns and vice versa. You can use the t() command to transpose a data frame as:
> t(fw) Taw Torridge Ouse Exe Lyn Brook Ditch Fal count 9 25 15 2 14 25 24 47 speed 2 3 5 9 14 24 29 34
The result is a matrix, using square brackets to select the columns (i.e., the original rows) enables the matrix rotation:
> table(t(fw)[,1], t(fw)[,2], dnn = c(/Taw/, /Torridge/)) Torridge Taw 3 25 2 1 0 9 0 1
6.4. Creating Contingency Tables from Matrix Objects in R
Let us see the creation of contingency tables in R from matrix objects with an example:
Here is a matrix of bird observation data:
> bird Garden Hedgerow Parkland Pasture Woodland Blackbird 47 10 40 2 2 Chaffinch 19 3 5 0 2 Great Tit 50 0 0 7 0 House Sparrow 46 16 8 4 0 Robin 9 3 0 0 2 Song Thrush 4 0 6 0 0
Using the table() command, you will get the result, shown in the following table:
> table(bird) bird 0 2 3 4 5 6 7 8 9 10 16 19 40 46 47 50 9 4 2 2 1 1 1 1 1 2 1 1 1 1 1 1
Since the $ Convention does not work with a matrix, as a result the attach() command is also redundant;
however, using square brackets to pick out rows and columns can solve the issue.
To pick out required rows, you can use the square brackets (), as follows:
> table(bird[,1], bird[,2], dnn = c(‘Gdn’, ‘Hedge’)) Hedge Gdn 0 3 10 16 4 1 0 0 0
Dnn=instruction is used to specify the names for display.
6.5. Using Rows of a Data Frame in a Contingency Table
The matrix allows you to look at the rows to construct a contingency table. If you have a data frame, you cannot use the same square bracket convention to do the same thing.
The only way to get this to work is to force each item in the table as a matrix:
> table(as.matrix(fw)[1,], as.matrix(fw)[2,], dnn = c(‘Taw’, ‘Torridge’)) Torridge Taw 3 25 2 1 0 9 0 1
You could also create a new matrix and then apply the table() command to the new object:
> fw.mat = as.matrix(fw) > table(fw.mat[1,], fw.mat[4,], dnn = c(‘Taw’, ‘Exe’)) Exe Taw 2 9 2 0 1 9 1 0 > rm(fw.mat)
7. Selecting Parts of R Table Object
A table is a special sort of matrix. Dealing with tables is similar to matrix objects. As we extract elements of a matrix object, in the similar way we can extract elements of a table object.
Below are listed some commands for selecting parts of a table object:
> str(pw.tab) – Examines the structure of the table object named tab
> pw.tab[1:3,] – Displays the first three rows of the contingency table
> pw.tab[1:3,1] – Displays the first three rows of the first column
> pw.tab[1:3,1:2] – Displays the first three rows of the first and second columns
> pw.tab[,’hi’] – Displays the column labeled hi
> pw.tab[1:3, c(‘hi’, ‘mid’)] – Displays the first three rows of two of the columns
> pw.tab[1:3, c(‘mid’, ‘hi’)] – Displays some of the columns in a new order
> pw.tab[,c(‘hi’,3)] – Displays two columns using a mix of name and number
> length(pw.tab) – Displays the length of the table object
The first step is to create the table object using two of the columns to produce a simple contingency table. The str() command validates that the resulting object is a table. The table can be displayed much like a matrix by using square brackets to define rows and columns as required. The rows and columns can be specified as numbers or names (if appropriate), but you cannot mix names and numbers in the same command.
The length() command produces a result that reflects the number of items in the table; this is similar to a matrix but different from a data frame (where the command produced the number of columns).
8. Converting an Object into a Table
A table object is a special kind of object in its own right, but it also has certain properties of a matrix.
To convert an object into a table, you can use the as.table() command if it is already a matrix; however, if it is a data frame, you need to first convert it to a matrix and then convert into a table. This can be done in a single step as illustrated below:
> as.table(as.matrix(mf)) len sp alg no3 bod A 20.00 12.00 40.00 2.25 200.00 B 21.00 14.00 45.00 2.15 180.00 C 22.00 12.00 45.00 1.75 135.00 D 23.00 16.00 80.00 1.95 120.00 E 21.00 20.00 75.00 1.95 110.00 F 20.00 21.00 65.00 2.75 120.00 G 19.00 17.00 65.00 1.85 95.00
In this case, row names are in uppercase characters; this is only the first seven rows of the result. If you try to convert a table directly into an object you get an error.
9. Testing for R Table Objects
Test to see if an object is a table by using the is.table() command. This produces a TRUE result if you have a table and a FALSE result if not:
> is.table(bird)  FALSE > is.table(gr.tab)  TRUE
You can also use the class() command to see if an object is a table directly as follows:
The class() command can form the basis of a logical test by using the if() command in the following manner:
>if(class(gr.tab) ==’table’) TRUE else FALSE  TRUE
To enable options, if() command is useful. The basic form of the command is as follows:
if(condition) what.to.do.if.TRUE else what.to.do.if.FALSE
10. Complex Tables in R / Flat Tables in R
In a flat table, several rows or columns are subdivided to create a single table. It can be created using an alternate version of table() command. The command is ftable() and can be used in several ways.
Flat tables can be created in below 2 ways:
- Making “Flat” Contingency Tables in R– ftable() command is used to it.
- Making Selective “Flat” Contingency Tables in R
10.1. Making “Flat” Contingency Tables in R
You can create a “flat” contingency table by using ftable() command as follows:
In the following example, you see the plant-watering data frame. This has a column of numerical height data and two columns of factors, plant, and water. When you create the “flat” contingency table you get something like the following:
> ftable(pw) height plant hi lo mid vulgaris 0 0 0 6 sativa 0 1 0 vulgaris 0 1 0 7 sativa 0 1 0
Contingency tables in R can also be constructed by applying the table() command and specifying two or more columns of data to use in a table. A slightly different syntax can be employed to define a custom output as required. The general form of the command is:
ftable(column.items~row.items, data = data.object)
The tilde (~) character is used to create a formula where the left side of the symbol contains the variables as the row headings separated by commas. It also puts the names of the vectors that form the row items. These commands give great flexibility in creating contingency tables in R. The column specified before the ~ forms the main body of the table, whereas those to the right of the ~ forms the groupings of the table in the order they were specified.
10.2. Making Selective “Flat” Contingency Tables in R
Selective contingency tables in R can also be created in R. However, this is not a straightforward process. The commands shown in the following tables show a complete process of creating selective contingency table.
> with(pw, ftable(height==14, water, plant)) --- Creates a “flat” contingency table with a conditional column >with(pw, ftable(height==14, water==’hi’, plant)) --- Adds an additional condition to another column >pw.t = pw[which(pw$height==14),] ---- Make a new data object as a subset of the original data >with(pw.t, ftable(height, plant, water)) --- Creates a “flat” table from the new data
When you insert a conditional column into the ftable() command, the resulting contingency table includes data for both TRUE and FALSE results of the condition. By adding conditional statements for other columns (and also more complex conditional statements for the single column) and produce more TRUE and FALSE results.
11. Testing R Flat Table Objects
To see what kind of object you are dealing with, you can use the class() command. It gives a label for each kind of object. The class of an object is used to determine how R handles it and find out what an object is and also set the class of an object.
The command you can use for testing flat objects is as follows:
> if(class(gr.t) == ‘ftable’) TRUE else FALSE  TRUE
In preceding command, you can see if the class is “ftable”; if it is, it is a TRUE result; otherwise a FALSE result.
12. R Summary Commands for Tables
A table is a way to summarize data and is often the end point of an operation, for example, making a contingency table. However, it is desired to perform certain actions on a table itself.
Some useful summary commands for tables are shown as follows:
- table(x, margin = NULL, FUN) – Returns the contents of a data frame, matrix, or table as a proportion of the total specified margin. The default uses the grand total, margin = 1 uses row totals, and margin = 2 uses column totals.
You can use the prop.table() command to display the table data as proportions of the total sum. You can add an index for the rows or columns; in this way, you can express the data in your table as proportions of the various row or column sums.
- addmargin(A, margin = c(1, 2), FUN = sum) – Returns a function applied to rows and/or columns of a matrix or table. The addmargins() command enables you to use any function on rows or columns. The margin part defaults to both rows and columns, whereas the function applied defaults to the sum. A value of 1 refers to rows, but the function is applied to the row items.
Essentially, you get a row of results. In most situations, you are going to use the function to produce summaries for both rows and columns.
13. Cross Tabulation in R
R Cross tabulation means representing raw data into a tabular format.
For cross tabulation, you can use the xtabs() command as follows:
Here you put the name of the frequency data on the left-hand side of the tilde (~) sign.
Notice the tilde (~) symbol used similarly to a ftable() command. The logic is the same. To the left of the
~ put the name of the frequency data; to the, right put the categories you want to cross-tabulate separated by the plus sign. The first variable after the ~ forms the row categories and the next variable you type forms the columns categories.
At the end, type the name of the data object (so that R can “find” the variables).
For bird observation data, this command can be used as follows:
> birds.t = xtabs(Qty ~ Species + Habitat, data = birds) > birds.t Habitat Species Garden Hedgerow Parkland Pasture Woodland Blackbird 47 10 40 2 2 Chaffinch 19 3 5 0 2 Great Tit 50 0 10 7 0 House Sparrow 46 16 8 4 0 Robin 9 3 0 0 2 Song Thrush 4 0 6 0 0
14. Testing Cross-Table (xtabs) Objects
When you use the xtabs() command, the object you create is a kind of table and gives a TRUE result
using the is.table() command. It also gives a TRUE result if you use the as.matrix() command.
As far as R is concerned, it holds two sorts of class. You can see this using the class() command:
> class(birds.t)  .xtabs. .table
If you want to test for the object being a xtabs object, you may face an issue because now the class result has two elements as follows:
> if(class(birds.t) == ‘xtabs’) TRUE else FALSE  TRUE Warning message: In if (class(birds.t) == “xtabs”) TRUE else FALSE : the condition has length > 1 and only the first element will be used
Although a result is there it also has an error message. This is because the “xtabs” result was first
produced, however, some refinement is required to scan the entire result and pick out the bit desired. This can be done with the following:
> if(any(class(birds.t) == xtabs)) TRUE else FALSE  TRUE
The any() command enables matching any of the elements in a vector. In this case, the upshot is that you will pick out the xtabs item even if it is not the first in the bunch.
15. Recreating Original Data from a Contingency Table
You can reassemble xtabs object into a data frame using the as.data.frame() command:
> as.data.frame(birds.t) Species Habitat Freq 1 Blackbird Garden 47 2 Chaffinch Garden 19 3 Great Tit Garden 50 4 House Sparrow Garden 46
The original data has been recreated with minor differences, The Qty column has been renamed Freq and the rows with zero frequency are included. These are not a significant issue; using the Freq column the names can be altered and amended:
> as.data.frame(birds.t, responseName = /Qty/)
If you want to remove the zero data, you need to take your new data frame and select those rows with a Freq greater than zero as:
> birds.td = as.data.frame(birds.t) > birds.td = birds.td[which(birds.td$Freq > 0),]
16. Switching Class in R
You can use the class() command to alter the class an object and see the current class of the object. This can be useful in instances where an object needs to be in a certain class for a command to operate. In the following example, the bird object is queried and then reset using the class() command:
> class(bird)  .matrix. > class(bird) = /table/
The matrix of bird observations is now classed as a table. You can now proceed to create a data frame from the table using the as.data.frame() command:
> bird.df = as.data.frame(bird) Var1 Var2 Freq
The columns are not labeled appropriately and the zero data are still intact in the result of the as.data.frame() command. This can be modified by using names() command and reconstruct the data omitting the zero rows, shown as follows:
> bird.tt = bird > class(bird.tt) = ‘table’ > bird.tt = as.data.frame(bird.tt) > names(bird.tt) = c(‘Species’, ‘Habitat’, ‘Qty’) > bird.tt = bird.tt[which(bird.tt$Qty > 0),] > rownames(bird.tt) = as.numeric(1:length(rownames(bird.tt)))
The first command simply creates a duplicate matrix to work on, keeping the original intact. The second command changes the class to “table”. The third command creates the data frame of original values.
The fourth command alters the names of the columns. The penultimate command selects out the data that are greater than zero, effectively deleting 0 observations. The final command reinstates the row index labels to a continuous sequence.
If in case you feel any query about Contingency Tables in R, so, leave a comment in a section below. We will be happy to solve them.