I have a dataset that looks something like this:
name status
1 john sick
2 john sick
3 john healthy
4 john sick
5 john healthy
6 alex sick
7 alex sick
8 tim healthy
9 tim healthy
10 tim sick
11 tim sick
For this dataset, I want to find out the number of times people went from:
- sick to sick
- sick to healthy
- healthy to healthy
- healthy to sick
For example:
- Sick to Sick: John (sick, sick), Alex (sick, sick), Tim (Sick, Sick) = Occurs in the dataset 3 Times
- Sick to Healthy: John (sick, healthy), John (sick, healthy) = Occurs in the dataset 2 Times
- Healthy to Healthy: Tim (healthy, healthy) = Occurs in the dataset 1 Time
- Healthy to Sick: John (healthy, sick), Tim (healthy, sick) = Occurs in the dataset 2 Times
I am not sure how to approach this problem in R - can someone please suggest how to do this?
Thank you!
CodePudding user response:
I would approach this using dplyr::lag()
and count()
:
library(dplyr)
library(tidyr)
df1 %>%
group_by(name) %>%
mutate(from = dplyr::lag(status)) %>%
ungroup() %>%
count(from, to = status) %>%
drop_na()
Output:
# A tibble: 4 × 3
from to n
<chr> <chr> <int>
1 healthy healthy 1
2 healthy sick 2
3 sick healthy 2
4 sick sick 3
CodePudding user response:
Technically a contingency table is a table with entries in two (or more) dimensions. thus:
ct <- table(
do.call(rbind, by(data, data$name, function(x)
data.frame(from = head(x$status, -1), to = tail(x$status, -1)))))
ct
#> to
#> from healthy sick
#> healthy 1 2
#> sick 2 3