Home > Back-end >  Create a score based on value and date within interval
Create a score based on value and date within interval

Time:01-31

I need help with coding a binary column indicating that one measurement is above a threshold and found in a one-year interval of a second measurement. The dataframe has 4 colums (ID, result type, date and value of result) and could look like this:

   IDrow type       Date value
1    ID1  NP1 2018-06-11   4.5
2    ID1  NP2 2018-08-21    55
3    ID2  NP1 2016-04-23   4.7
4    ID3  NP1 2015-11-24   9.5
5    ID3  NP2 2015-11-24  7390
6    ID3  NP2 2016-11-24   308
7    ID4  NP1 2015-01-01   6.0
8    ID4  NP2 2018-01-01   100
9    ID5  NP1 2013-04-04   4.0
10   ID5  NP2 2013-04-04    13

As listed, there are two types of results. Some ID's don't have either two types, which is fine. I need help to code a new column indicating that NP2 occurs within - one year of measurement of NP1. If this is true, and the value for the NP2 is above > 25, then the new column should be 1. If the NP2 result is not avaible for the ID, the new column shoud state 0. If the date for NP2 is not found in the NP1 date interval, the coloumn should state 0 (as for ID4), and if the date is found in the interval, but the value for NP2 is < 25, the coloumn should also state 0 (as for ID5). My desired outcome looks like this

   IDrow type       Date value positive
1    ID1  NP1 2018-06-11   4.5        1
2    ID1  NP2 2018-08-21    55        1
3    ID2  NP1 2016-04-23   4.7        0
4    ID3  NP1 2015-11-24   9.5        1
5    ID3  NP2 2015-11-24  7390        1
6    ID3  NP2 2016-11-24   308        1
7    ID4  NP1 2015-01-01   6.0        0
8    ID4  NP2 2018-01-01   100        0
9    ID5  NP1 2013-04-04   4.0        0
10   ID5  NP2 2013-04-04    13        0

I have tried dplyr with mutate and case_when, however my skills with dplyr fall short when working with dates.

IDrow <- c("ID1", "ID1", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4", "ID5", "ID5")
type <- c("NP1", "NP2", "NP1", "NP1", "NP2", "NP2", "NP1", "NP2", "NP1", "NP2")
Date <- c("2018-06-11","2018-08-21",
          "2016-04-23", 
          "2015-11-24", "2015-11-24", "2016-11-24",
          "2015-01-01", "2018-01-01",
          "2013-04-04", "2013-04-04")
value <- c("4.5", "55", 
           "4.7",
           "9.5", "7390", "308",
           "6.0", "100",
           "4.0", "13")
df <- as.data.frame(cbind(IDrow, type, Date, value))
df$Date <- ymd(df$Date)
df <- df %>% group_by(IDrow, Date) %>% 
  mutate(positive = case_when
         ("NP2" %in% type & Date %within% Date -duration(365, units="days") & "NP1" %in% type ~ 1))

Thanks in advance - any help is appreciated.

CodePudding user response:

# Load the packages
library(lubridate)
library(dplyr)

# Load the data
IDrow <- c("ID1", "ID1", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4", "ID5", "ID5")
type <- c("NP1", "NP2", "NP1", "NP1", "NP2", "NP2", "NP1", "NP2", "NP1", "NP2")
Date <- c(
  "2018-06-11", "2018-08-21",
  "2016-04-23",
  "2015-11-24", "2015-11-24", "2016-11-24",
  "2015-01-01", "2018-01-01",
  "2013-04-04", "2013-04-04"
)
value <- c(
  4.5, 55,
  4.7,
  9.5, 7390, 308,
  6.0, 100,
  4.0, 13
)

# Transform the date variable
df <- data.frame(IDrow, type, Date = ymd(Date), value)

# Add a new variable based on your conditions
df %>%
  group_by(IDrow) %>%
  mutate(positive = ifelse(
    length(type) > 1,
    (Date[type == "NP2"] %within% interval(Date[type == "NP1"] - years(1), Date[type == "NP1"]   years(1))) & value[type == "NP2"] > 25,
    0
  ))

  • Related