Home > Mobile >  Keep rows that are within specific interval for different conditions and grouped by
Keep rows that are within specific interval for different conditions and grouped by

Time:10-07

Here's a reprex for illustration.

library(tidyverse)

set.seed(1337)
df <- tibble(
  date_visit = sample(seq(as.Date("2020/01/01"),
    as.Date("2021/01/01"),
    by = "day"
  ), 400, replace = T),
  patient_id = as.factor(paste("patient", sample(seq(1, 13), 400, replace = T), sep = "_")),
  type_of_visit = as.factor(sample(c("medical", "veterinary"), 400, replace = T))
)

What I'm trying to do create a dataframe where I keep the patient_id (group by, I assume), and the visit types if that patient has done 2 different visits in less than 24 hours. Or adding a variable that says True/False if that condition is met.

I tried to use a left join by patient_id to work with 2 different variables but that takes too much computing time (my original DF is much longer than this)

Can someone point me in the right direction?

Thank you

CodePudding user response:

Maybe this will help -

library(dplyr)

df %>%
  group_by(patient_id, date_visit) %>%
  summarise(flag = n_distinct(type_of_visit) >= 2) %>%
  summarise(flag = any(flag))

#  patient_id flag 
#   <fct>      <lgl>
# 1 patient_1  TRUE 
# 2 patient_10 FALSE
# 3 patient_11 TRUE 
# 4 patient_12 FALSE
# 5 patient_13 FALSE
# 6 patient_2  FALSE
# 7 patient_3  FALSE
# 8 patient_4  FALSE
# 9 patient_5  TRUE 
#10 patient_6  FALSE
#11 patient_7  TRUE 
#12 patient_8  TRUE 
#13 patient_9  TRUE 

If you want to keep all the rows for those patient id's

df %>%
  group_by(patient_id, date_visit) %>%
  summarise(flag = n_distinct(type_of_visit) >= 2) %>%
  filter(any(flag)) 

CodePudding user response:

library(tidyverse)

set.seed(1337)
df <- tibble(
  date_visit = sample(seq(as.Date("2020/01/01"),
    as.Date("2021/01/01"),
    by = "day"
  ), 400, replace = T),
  patient_id = as.factor(paste("patient", sample(seq(1, 13), 400, replace = T), sep = "_")),
  type_of_visit = as.factor(sample(c("medical", "veterinary"), 400, replace = T))
)
df
#> # A tibble: 400 x 3
#>    date_visit patient_id type_of_visit
#>    <date>     <fct>      <fct>        
#>  1 2020-05-26 patient_11 medical      
#>  2 2020-08-29 patient_4  medical      
#>  3 2020-02-18 patient_6  medical      
#>  4 2020-07-28 patient_9  veterinary   
#>  5 2020-05-31 patient_9  veterinary   
#>  6 2020-07-29 patient_1  veterinary   
#>  7 2020-12-21 patient_11 veterinary   
#>  8 2020-07-06 patient_9  veterinary   
#>  9 2020-04-10 patient_3  medical      
#> 10 2020-11-08 patient_12 medical      
#> # … with 390 more rows

df %>%
  group_by(patient_id, date_visit) %>%
  # less than 24h <=> same date
  filter(n() == 2) %>%
  ungroup() %>%
  distinct(patient_id, type_of_visit)
#> # A tibble: 15 x 2
#>    patient_id type_of_visit
#>    <fct>      <fct>        
#>  1 patient_9  veterinary   
#>  2 patient_2  veterinary   
#>  3 patient_11 medical      
#>  4 patient_12 veterinary   
#>  5 patient_2  medical      
#>  6 patient_3  veterinary   
#>  7 patient_5  veterinary   
#>  8 patient_7  veterinary   
#>  9 patient_6  veterinary   
#> 10 patient_11 veterinary   
#> 11 patient_9  medical      
#> 12 patient_10 veterinary   
#> 13 patient_5  medical      
#> 14 patient_1  veterinary   
#> 15 patient_3  medical

Created on 2021-10-07 by the reprex package (v2.0.1)

  • Related