I'm struggling with filtering options in R. I have this dataset:
patient_id period TREAT_CAT Outcome
1 -3228 days pre-treatment Pink
1 -3170 days pre-treatment Pink
1 100 days post-treatment Blue
1 200 days post-treatment Pink
2 -2900 days pre-treatment Blue
2 0 days post-treatment Pink
2 100 days post-treatment Pink
structure(list(patient_id = c(1, 1, 1, 1, 2, 2, 2),
period = structure(c(-3228, -3170, 100, 200, -2900,
0, 100), class = "difftime", units = "days"),
TREAT_CAT = structure(c( 1L, 1L, 2L, 2L, 1L, 2L, 2L), levels = c("pre-treatment", "post-treatment"), class = "factor"),
Outcome = c("Pink", "Pink", "Blue", "Pink",
"Blue", "Pink", "Pink"), row.names = c(19L, 26L, 24L, 3L, 7L, 29L, 20L), class = "data.frame")
I would like to filter the closest "period" to 0 for the pre-treatment group and the farest "period" from 0 on the post-treatment group.
I've tried something like this
df2 <- df %>%
group_by(patient_id) %>%
filter((TREAT_CAT=="pre-treatment" & period == min(period)) | (TREAT_CAT=="post-treatment" & period == max(period))) %>%
filter(n() == 2)
But obviously it gaves me the farest from 0 from both periods. I've also tried with max(period) for both groups but it's not working because the max (period) it's only happening for the post-treatment group, resulting in 0 variables.
I would expect something like
patient_id period TREAT_CAT Outcome
1 -3170 days pre-treatment Pink
1 200 days post-treatment Pink
2 -2900 days pre-treatment Blue
2 100 days post-treatment Pink
Could you please help?
Thanks in advance
CodePudding user response:
I assume that pre-treatment
is always negative. That way, you can slice_max()
on each group
df %>%
group_by(patient_id, TREAT_CAT) %>%
slice_max(period)
# A tibble: 4 × 4
# Groups: patient_id, TREAT_CAT [4]
patient_id period TREAT_CAT Outcome
<dbl> <drtn> <fct> <chr>
1 1 -3170 days pre-treatment Pink
2 1 200 days post-treatment Pink
3 2 -2900 days pre-treatment Blue
4 2 100 days post-treatment Pink