Here's a reprex for illustration.
library(tidyverse)
set.seed(1337)
df <- tibble(
date_visit = sample(seq(as.Date("2020/01/01"),
as.Date("2021/01/01"),
by = "day"
), 400, replace = T),
patient_id = as.factor(paste("patient", sample(seq(1, 13), 400, replace = T), sep = "_")),
type_of_visit = as.factor(sample(c("medical", "veterinary"), 400, replace = T))
)
What I'm trying to do create a dataframe where I keep the patient_id (group by, I assume), and the visit types if that patient has done 2 different visits in less than 24 hours. Or adding a variable that says True/False if that condition is met.
I tried to use a left join by patient_id to work with 2 different variables but that takes too much computing time (my original DF is much longer than this)
Can someone point me in the right direction?
Thank you
CodePudding user response:
Maybe this will help -
library(dplyr)
df %>%
group_by(patient_id, date_visit) %>%
summarise(flag = n_distinct(type_of_visit) >= 2) %>%
summarise(flag = any(flag))
# patient_id flag
# <fct> <lgl>
# 1 patient_1 TRUE
# 2 patient_10 FALSE
# 3 patient_11 TRUE
# 4 patient_12 FALSE
# 5 patient_13 FALSE
# 6 patient_2 FALSE
# 7 patient_3 FALSE
# 8 patient_4 FALSE
# 9 patient_5 TRUE
#10 patient_6 FALSE
#11 patient_7 TRUE
#12 patient_8 TRUE
#13 patient_9 TRUE
If you want to keep all the rows for those patient id's
df %>%
group_by(patient_id, date_visit) %>%
summarise(flag = n_distinct(type_of_visit) >= 2) %>%
filter(any(flag))
CodePudding user response:
library(tidyverse)
set.seed(1337)
df <- tibble(
date_visit = sample(seq(as.Date("2020/01/01"),
as.Date("2021/01/01"),
by = "day"
), 400, replace = T),
patient_id = as.factor(paste("patient", sample(seq(1, 13), 400, replace = T), sep = "_")),
type_of_visit = as.factor(sample(c("medical", "veterinary"), 400, replace = T))
)
df
#> # A tibble: 400 x 3
#> date_visit patient_id type_of_visit
#> <date> <fct> <fct>
#> 1 2020-05-26 patient_11 medical
#> 2 2020-08-29 patient_4 medical
#> 3 2020-02-18 patient_6 medical
#> 4 2020-07-28 patient_9 veterinary
#> 5 2020-05-31 patient_9 veterinary
#> 6 2020-07-29 patient_1 veterinary
#> 7 2020-12-21 patient_11 veterinary
#> 8 2020-07-06 patient_9 veterinary
#> 9 2020-04-10 patient_3 medical
#> 10 2020-11-08 patient_12 medical
#> # … with 390 more rows
df %>%
group_by(patient_id, date_visit) %>%
# less than 24h <=> same date
filter(n() == 2) %>%
ungroup() %>%
distinct(patient_id, type_of_visit)
#> # A tibble: 15 x 2
#> patient_id type_of_visit
#> <fct> <fct>
#> 1 patient_9 veterinary
#> 2 patient_2 veterinary
#> 3 patient_11 medical
#> 4 patient_12 veterinary
#> 5 patient_2 medical
#> 6 patient_3 veterinary
#> 7 patient_5 veterinary
#> 8 patient_7 veterinary
#> 9 patient_6 veterinary
#> 10 patient_11 veterinary
#> 11 patient_9 medical
#> 12 patient_10 veterinary
#> 13 patient_5 medical
#> 14 patient_1 veterinary
#> 15 patient_3 medical
Created on 2021-10-07 by the reprex package (v2.0.1)