User Tools

Site Tools


r:contingency-tables

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
r:contingency-tables [2020/04/23 14:16] – [Reordering rows and columns of a matrix] astefanowitschr:contingency-tables [2024/06/20 13:53] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +**[ [[R:introduction|Collection: Introduction to R]] ]**
  
 +====== Contingency tables ======
 +
 +===== Introduction =====
 +
 +Contingency tables (also called //cross tabulations//) are tables showing the intersections of two variables. For example, there are two variants of the preposition //toward(s)// (“in the direction of”): one with an //s// at the end and one without. There are several national varieties of English, very prominent among them British English and American English. Both variants of the preposition occur in both varieties, thus, we have two variables (Variant of the Preposition, with the values //toward// and //towards//) and Variety of English (with the values British and American). Obviously, this gives us four intersections: British ∩ //toward//, British ∩ //towards//, American ∩ //toward// and American ∩ towards. If we check the frequency of these intersections in the LOB and BROWN corpora and represent the results as a contingency table, we get the following:
 +
 +^ ^ //toward// ^ //towards// | Total ^
 +^ British |  **318**  |  **14**  |  332  |
 +^ American |  **64**  |  **386**  |  450  |
 +| Total |  382  |  400  |  782  |
 +
 +===== Prerequisites =====
 +
 +  * There are two ways of creating a contingency table, shown below. For the second of these ways, your data must be in the form of a data frame (see [[r:data-frames|Data Frames]]).
 +
 +===== How to create a contingency table =====
 +
 +There are two ways of creating a contingency table in R: you can enter the values manually, or you can create the table from a raw data list in the form of a data frame.
 +==== Creating a contingency table manually ====
 +
 +In order to create a contingency table manually, you first have to create a vector (see [[r:vectors|Vectors]]) containing the values, and store this vector in a variable. For the table above, this vector would look like this (if we call the variable ''myvector'' -- of course, we can give it any name we want):
 +
 + c(318, 64, 14, 386) -> myvector
 +
 +These are the values in the first column followed by those in the second column -- the totals are not part of the table -- if we need them, we can have R calculate them later.
 +
 +The first step in transforming this vector to a table is to use the command ''matrix()'', which takes a vector as input and transforms it to a table with a certain number of columns, specified using the ''ncol'' option. In our case, this would look as follows (if we call the variable ''mytable''):
 +
 + matrix(myvector, ncol=2) -> mytable
 +
 +If you display this variable (by typing ''mytable'' and hitting return), you get the following:
 +
 +      [,1] [,2]
 + [1,]  318   14
 + [2,]   64  386
 +
 +The values are displayed in the right way, but the rows and columns do not have names yet. You can refer to them by using the indices shown: for example, to display the first row of the table, type ''mytable[1,]'', to display the second column, type ''mytable[,2]'', and to display a specific cell, give both the row and the column number, e.g. ''mytable[1,2]'' to display the second cell in the first row.
 +
 +Strictly speaking, this is all you need to use this contingency table in other contexts, but you may want to add row and column labels so that you and others know what information is contained in this table. To add row and column labels, you use the functions ''rownames()'' and ''colnames()'': as their names suggest, these functions provide access to the parts of a contingency table that contain the row and column labels, so you can simply construct a vector that contains the correct number of text strings and assign this vector to the relevant part of the table:
 +
 + rownames(mytable) <- c("British", "American")
 + colnames(mytable) <- c("toward", "towards")
 +
 +If you now display the table (by typing ''mytable'' and hitting return), you get the following:
 +
 +          toward towards
 + British     318      14
 + American     64     386
 +
 +You can still refer to the columns, rows and cells in the way just described, but you can also use the labels instead of numbers (you need to put them in quotation marks, as they are text strings). For example, to display the first row of the table, you can type ''mytable["British",]'', to display the second column, you can type ''mytable[,"towards"]'', and to display a specific cell, give both the row and the column number, e.g. ''mytable["British","towards"]'' to display the second cell in the first row.
 +
 +==== From a data frame ====
 +
 +If you have imported a raw data table as a data frame (see [[r:data-frames|Data Frames]]), you can crosstabulate two columns of this data frame to create a contingency table. There is a sample csv file containing the distribution of the word forms //toward// and //towards// across different genres in British and American English (from the LOB and BROWN corpora here: {{ :r:data-towards.zip |data-towards.csv}}. Import it into a data frame called ''Toward'' (as described in [[r:importing-data|Importing Data]]).
 +
 +You can now create a contingency table using the ''table()'' command, which needs two columns from the data frame as input. Use the command ''head()'' to display the first few rows of the data frame:
 +
 + head(Toward)
 +
 +You will see the following:
 +
 +   Variety           Genre Variant
 + 1 British Press_Reportage towards
 + 2 British Press_Reportage towards
 + 3 British Press_Reportage  toward
 + 4 British Press_Reportage towards
 + 5 British Press_Reportage towards
 + 6 British Press_Reportage towards
 +
 +The first and the third column are relevant to our contingency table. They can be referred to by ''Toward$Variety'' and ''Toward$Variant'', so the following command will produce a contingency table:
 +
 + table(Toward$Variety,Toward$Variant) -> mytable
 +
 +Type ''mytable'' to display it:
 +   
 +          toward towards
 + American    386      64
 + British      14     318
 +
 +As you can see, this is the same table you created manually in the preceding section, but the rows and columns are ordered differently: the ''table'' command orders rows and columns alphabetically. If you don't like this order, you can reorder them (see below).
 +
 +===== Adding rows and columns to a matrix =====
 +
 +==== Adding data ====
 +
 +You can add rows or columns to an existing matrix, no matter how you created it. For rows, this is done by using the ''rbind()'' command. First, create a variable containing a vector with the numbers you want to add as a row, and name this variable as you want the new row to be named. For example, to add the frequencies of //toward// and //towards// in Indian English to the corpus (the data are from the KOLHAPUR corpus):
 +
 + Indian <- c(18,337)
 +
 +The ''rbind()'' command needs two arguments: the matrix to which you want to add a row, and the vector containing the row you want to add. Let us write the result to the same variable ''mytable'':
 +
 + rbind(mytable, Indian) -> mytable
 +
 +If you now type ''mytable'', you will get the following:
 +
 +          toward towards
 + American    386      64
 + British      14     318
 + Indian       17     327
 +
 +The ''cbind()'' command works in the same way. For example, to add a column containing the frequencies of the expression //in the direction of//, we create a corresponding variable (the frequencies are from BROWN, LOB and KOLHAPUR):
 +
 + in_the_direction_of <- c(11,12,11)
 +
 +We then add this column to our table:
 +
 + cbind(mytable, in_the_direction_of) -> mytable
 +
 +Typing ''mytable'' gives us:
 +
 +          toward towards in_the_direction_of
 + American    386      64                  11
 + British      14     318                  12
 + Indian       18     337                  11
 +
 +==== Adding row and column totals ====
 +
 +As mentioned above, your contingency table should contain only the intersections of your variables (shown in bold in the introduction), not the row totals, column totals and table total: statistical procedures expect a matrix to contain only data, if the totals are needed, they will be calculated internally. Also, if you want to create a box plot from a contingency table (see [[r:box-plots|Box Plots]]), it should not contain any totals. However, when you show a table in a research report, it should contain totals, so here is how to add them.
 +
 +R has special commands for creating these totals: ''rowSums()'' and ''colSums()'', which take a matrix as an argument and produce a variable containing the row or column totals. For example, typing ''rowSums(mytable)'' produces the following output:
 +
 + American  British   Indian 
 +      461      344      366 
 +
 +Let us add these row totals to our table using the ''cbind()'' command (note that the //row// totals must be added as a //column// and vice versa) and store the result in a new variable ''mytable_totals''
 +
 + cbind(mytable,rowSums(mytable)) -> mytable_totals
 +
 +Typing ''mytable_totals'' displays the following:
 +
 +          toward towards in_the_direction_of    
 + American    386      64                  11 461
 + British      14     318                  12 344
 + Indian       18     337                  11 366
 +
 +If you want to add the column name //Totals//, use the ''colnames()'' command introduced above -- since you only want to change the fourth position, attach ''[4]'' to the end:
 +
 + colnames(mytable_totals)[4] <- "Total"
 +
 +Now, let us add the column totals to ''mytable_totals'' as a new row, storing the result in the same variable:
 +
 + rbind(mytable_totals,colSums(mytable_totals)) -> mytable_totals
 +
 +Let's add the row name //Total// using ''rownames()'':
 +
 + rownames(mytable_totals)[4] <- "Total"
 +
 +Typing ''mytable_totals'' now gives us the following:
 +
 +          toward towards in_the_direction_of Total
 + American    386      64                  11   461
 + British      14     318                  12   344
 + Indian       18     337                  11   366
 + Total       418     719                  34  1171
 +
 +===== Reordering rows and columns of a matrix =====
 +
 +As mentioned above, ''table()'' put your rows and columns in alphabetical order. If you want them in a different order, there are various ways of doing so -- none of them very straightforward, but also not very complicated. The easiest way is to exploit the possibility of accessing individual cells by giving the row and column number in square brackets, as shown above.
 +
 +Instead of giving an individual row and column number, you can give a vector of numbers. For example, to display top left cell of our table (the cell in the first row and first column), you type ''mytable[1,1]'' -- so, to display all cells of our table, you type ''mytable[c(1,2,3), c(1,2,3)]'' (try it).
 +
 +Now, if you want rows and/or columns displayed in a different order, you simply order them differently in the vectors. For example, the order in the variable ''mytable'' is as follows:
 +
 +          toward towards in_the_direction_of
 + American    386      64                  11
 + British      14     318                  12
 + Indian       18     337                  11
 +
 +If we want to change the order of rows to //British//, //Indian//, //American//, and the order of columns to //in_the_direction_of//, //towards//, //toward//, you type:
 +
 + mytable[c(2,3,1), c(3,2,1)]
 +
 +This gives you:
 +
 +          in_the_direction_of towards toward
 + British                   12     318     14
 + Indian                    11     337     18
 + American                  11      64    386
 +
 +Of course, you can store this new order in a variable, if you want.
 +
 +===== Additional information =====
 +
 +  * ''matrix'' elements in R behave pretty much like matrices in the mathematical sense, for example, in operations such as addition, subtraction, multiplication etc.
 +  * You can rotate a matrix clockwise using the command ''t()'', with the matrix as argument.
 +  * The best way of displaying the content of a contingency table visually is usually a bar plot (see [[r:bar-plots|Bar Plots]])
 +  * There are various contingency tests that use a matrix as input in R, for example, the famous chi-square test (see [[r:chi-square-test|Chi-Square Test]])

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki