Home > Back-end >  Creating a Contingency Table From Data in R
Creating a Contingency Table From Data in R

Time:10-27

I have a dataset that looks something like this:

   name  status
1  john    sick
2  john    sick
3  john healthy
4  john    sick
5  john healthy
6  alex    sick
7  alex    sick
8   tim healthy
9   tim healthy
10  tim    sick
11  tim    sick

For this dataset, I want to find out the number of times people went from:

  • sick to sick
  • sick to healthy
  • healthy to healthy
  • healthy to sick

For example:

  • Sick to Sick: John (sick, sick), Alex (sick, sick), Tim (Sick, Sick) = Occurs in the dataset 3 Times
  • Sick to Healthy: John (sick, healthy), John (sick, healthy) = Occurs in the dataset 2 Times
  • Healthy to Healthy: Tim (healthy, healthy) = Occurs in the dataset 1 Time
  • Healthy to Sick: John (healthy, sick), Tim (healthy, sick) = Occurs in the dataset 2 Times

I am not sure how to approach this problem in R - can someone please suggest how to do this?

Thank you!

CodePudding user response:

I would approach this using dplyr::lag() and count():

library(dplyr)
library(tidyr)

df1 %>%
  group_by(name) %>%
  mutate(from = dplyr::lag(status)) %>%
  ungroup() %>%
  count(from, to = status) %>%
  drop_na()

Output:

# A tibble: 4 × 3
  from    to          n
  <chr>   <chr>   <int>
1 healthy healthy     1
2 healthy sick        2
3 sick    healthy     2
4 sick    sick        3

CodePudding user response:

Technically a contingency table is a table with entries in two (or more) dimensions. thus:

ct <- table(
  do.call(rbind, by(data, data$name, function(x) 
    data.frame(from = head(x$status, -1), to = tail(x$status, -1)))))

ct
#>          to
#> from      healthy sick
#>   healthy       1    2
#>   sick          2    3
  • Related