I am currently working on a dataset with multiple biopsies per patient ID. I need to find the biopsy result closest to a specific date (individual per patient). A dummy dataset can be seen below
df <- data.frame(m1 = c("1","1","1","2","2","2"),
patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"),
baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
biopsy=c("1","2","3","1","2","3"))
I have then calculated the time difference between patodate and baselinedate
df$patodate <- as.Date(df$patodate)
df$baselinedate <- as.Date(df$baselinedate)
df <- df%>%
group_by(m1) %>%
mutate(diff = baselinedate-recdate)
My question is now - I want to add a new column called 'status' which shows (by group m1) the 'biopsy' result with the time difference closest to 0. The end result would be
df <- data.frame(m1 = c("1","1","1","2","2","2"),
patodate=c("2013-06-03","2014-01-06","2018-11-23","2004-03-03","2018-06-25","2018-12-19"),
baselinedate=c("2018-11-09","2018-11-09","2018-11-09","2018-07-24","2018-07-24","2018-07-24"),
biopsy=c("1","2","3","1","2","3"),
status=c("3","3","3","2","2","2"))
I hope someone understands the issue and is able to help. Many thanks
Kind regards,
Tobias Berg
CodePudding user response:
You can get index of minimum absolute value of difference between the dates for each group.
library(dplyr)
df %>%
group_by(m1) %>%
mutate(status = which.min(abs(patodate - baselinedate))) %>%
ungroup
# m1 patodate baselinedate biopsy status
# <chr> <date> <date> <chr> <int>
#1 1 2013-06-03 2018-11-09 1 3
#2 1 2014-01-06 2018-11-09 2 3
#3 1 2018-11-23 2018-11-09 3 3
#4 2 2004-03-03 2018-07-24 1 2
#5 2 2018-06-25 2018-07-24 2 2
#6 2 2018-12-19 2018-07-24 3 2
CodePudding user response:
Here is an alternative way, with a hepler
column:
library(dplyr)
library(lubridate)
df %>%
group_by(m1) %>%
mutate(across(contains("date"), ymd),
helper = abs(difftime(baselinedate,patodate))) %>%
mutate(status = biopsy[helper==min(helper)], .keep="unused")
m1 patodate baselinedate status
<chr> <date> <date> <chr>
1 1 2013-06-03 2018-11-09 3
2 1 2014-01-06 2018-11-09 3
3 1 2018-11-23 2018-11-09 3
4 2 2004-03-03 2018-07-24 2
5 2 2018-06-25 2018-07-24 2
6 2 2018-12-19 2018-07-24 2