I have a data.frame that contains a variable for an obvservation date (Portfolio.Date
), IDs (Loan.Number
) and a status indicator (1-8) (Current.Delinquency.Code
). I want to calculate a new variable that is true only the first time that the status indicator goes above 3 for any ID. In Excel I would write =[Portfolio.Date]=minif([Portfolio.Date], [Loan.Number], [@Loan.Number], [Current.Delinquency.Code], ">3")
, but I cannot figure out how to do this in R. Can anybody help me with this?
Thank you very much!
What I am looking for is the formula for the [Delinquent
] column in the below example Data, which jumps to "TRUE" the first time that the an observation of "Current.Delinquency.Code" for tha "Loan.Number" is above 3.
Portfolio.Date Loan.Number Current.Delinquency.Code Delinquent
2022/01/01 1 1 FALSE
2022/02/01 1 4 TRUE
2022/03/01 1 4 FALSE
2022/04/01 1 4 FALSE
2022/01/01 2 1 FALSE
2022/02/01 2 1 FALSE
2022/03/01 2 1 FALSE
2022/04/01 2 1 FALSE
2022/01/01 3 1 FALSE
2022/02/01 3 3 FALSE
2022/03/01 3 4 TRUE
2022/04/01 3 4 FALSE
CodePudding user response:
if I understood you correctly this is one possible solution:
library(dplyr)
# to make it reproducable:
set.seed(1)
# sample data
df <- data.frame(Portfolio.Date = seq.Date(from = as.Date("2022-01-01"), to = as.Date("2022-01-30"), by = "days"),
Loan.Number = rep(c(1,2), 15),
Current.Delinquency.Code = sample(1:8, size = 30, replace = TRUE)) %>%
# group by Loan.Number
dplyr::group_by(df, Loan.Number) %>%
# order by Portfolio.Date
dplyr::arrange(Portfolio.Date) %>%
# check the condition and make a cumsum of it, returning only those where cumsum is 1 (frist occurence)
dplyr::mutate(nc = ifelse(Current.Delinquency.Code > 3 & cumsum(Current.Delinquency.Code > 3) == 1, 1, 0)) %>%
# ungroup to prevent unwanted behaviour down stream
dplyr::ungroup()
# A tibble: 30 x 4
Portfolio.Date Loan.Number Current.Delinquency.Code nc
<date> <dbl> <int> <dbl>
1 2022-01-01 1 1 0
2 2022-01-02 2 4 1
3 2022-01-03 1 7 1
4 2022-01-04 2 1 0
5 2022-01-05 1 2 0
6 2022-01-06 2 5 0
7 2022-01-07 1 7 0
8 2022-01-08 2 3 0
9 2022-01-09 1 6 0
10 2022-01-10 2 2 0
# ... with 20 more rows
# i Use `print(n = ...)` to see more rows