I need help with coding a binary column indicating that one measurement is above a threshold and found in a one-year interval of a second measurement. The dataframe has 4 colums (ID, result type, date and value of result) and could look like this:
IDrow type Date value
1 ID1 NP1 2018-06-11 4.5
2 ID1 NP2 2018-08-21 55
3 ID2 NP1 2016-04-23 4.7
4 ID3 NP1 2015-11-24 9.5
5 ID3 NP2 2015-11-24 7390
6 ID3 NP2 2016-11-24 308
7 ID4 NP1 2015-01-01 6.0
8 ID4 NP2 2018-01-01 100
9 ID5 NP1 2013-04-04 4.0
10 ID5 NP2 2013-04-04 13
As listed, there are two types of results. Some ID's don't have either two types, which is fine. I need help to code a new column indicating that NP2 occurs within - one year of measurement of NP1. If this is true, and the value for the NP2 is above > 25, then the new column should be 1. If the NP2 result is not avaible for the ID, the new column shoud state 0. If the date for NP2 is not found in the NP1 date interval, the coloumn should state 0 (as for ID4), and if the date is found in the interval, but the value for NP2 is < 25, the coloumn should also state 0 (as for ID5). My desired outcome looks like this
IDrow type Date value positive
1 ID1 NP1 2018-06-11 4.5 1
2 ID1 NP2 2018-08-21 55 1
3 ID2 NP1 2016-04-23 4.7 0
4 ID3 NP1 2015-11-24 9.5 1
5 ID3 NP2 2015-11-24 7390 1
6 ID3 NP2 2016-11-24 308 1
7 ID4 NP1 2015-01-01 6.0 0
8 ID4 NP2 2018-01-01 100 0
9 ID5 NP1 2013-04-04 4.0 0
10 ID5 NP2 2013-04-04 13 0
I have tried dplyr with mutate and case_when, however my skills with dplyr fall short when working with dates.
IDrow <- c("ID1", "ID1", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4", "ID5", "ID5")
type <- c("NP1", "NP2", "NP1", "NP1", "NP2", "NP2", "NP1", "NP2", "NP1", "NP2")
Date <- c("2018-06-11","2018-08-21",
"2016-04-23",
"2015-11-24", "2015-11-24", "2016-11-24",
"2015-01-01", "2018-01-01",
"2013-04-04", "2013-04-04")
value <- c("4.5", "55",
"4.7",
"9.5", "7390", "308",
"6.0", "100",
"4.0", "13")
df <- as.data.frame(cbind(IDrow, type, Date, value))
df$Date <- ymd(df$Date)
df <- df %>% group_by(IDrow, Date) %>%
mutate(positive = case_when
("NP2" %in% type & Date %within% Date -duration(365, units="days") & "NP1" %in% type ~ 1))
Thanks in advance - any help is appreciated.
CodePudding user response:
# Load the packages
library(lubridate)
library(dplyr)
# Load the data
IDrow <- c("ID1", "ID1", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4", "ID5", "ID5")
type <- c("NP1", "NP2", "NP1", "NP1", "NP2", "NP2", "NP1", "NP2", "NP1", "NP2")
Date <- c(
"2018-06-11", "2018-08-21",
"2016-04-23",
"2015-11-24", "2015-11-24", "2016-11-24",
"2015-01-01", "2018-01-01",
"2013-04-04", "2013-04-04"
)
value <- c(
4.5, 55,
4.7,
9.5, 7390, 308,
6.0, 100,
4.0, 13
)
# Transform the date variable
df <- data.frame(IDrow, type, Date = ymd(Date), value)
# Add a new variable based on your conditions
df %>%
group_by(IDrow) %>%
mutate(positive = ifelse(
length(type) > 1,
(Date[type == "NP2"] %within% interval(Date[type == "NP1"] - years(1), Date[type == "NP1"] years(1))) & value[type == "NP2"] > 25,
0
))