Home > Software engineering >  R: Generate variable in data.frame using minif on other columns
R: Generate variable in data.frame using minif on other columns

Time:01-28

I have a data.frame that contains a variable for an obvservation date (Portfolio.Date), IDs (Loan.Number) and a status indicator (1-8) (Current.Delinquency.Code). I want to calculate a new variable that is true only the first time that the status indicator goes above 3 for any ID. In Excel I would write =[Portfolio.Date]=minif([Portfolio.Date], [Loan.Number], [@Loan.Number], [Current.Delinquency.Code], ">3"), but I cannot figure out how to do this in R. Can anybody help me with this?

Thank you very much!

What I am looking for is the formula for the [Delinquent] column in the below example Data, which jumps to "TRUE" the first time that the an observation of "Current.Delinquency.Code" for tha "Loan.Number" is above 3.

Portfolio.Date  Loan.Number  Current.Delinquency.Code  Delinquent
    2022/01/01      1            1                     FALSE
    2022/02/01      1            4                     TRUE
    2022/03/01      1            4                     FALSE
    2022/04/01      1            4                     FALSE
    2022/01/01      2            1                     FALSE
    2022/02/01      2            1                     FALSE
    2022/03/01      2            1                     FALSE
    2022/04/01      2            1                     FALSE
    2022/01/01      3            1                     FALSE
    2022/02/01      3            3                     FALSE
    2022/03/01      3            4                     TRUE
    2022/04/01      3            4                     FALSE

CodePudding user response:

if I understood you correctly this is one possible solution:

library(dplyr)

# to make it reproducable:
set.seed(1)
# sample data
df <- data.frame(Portfolio.Date = seq.Date(from = as.Date("2022-01-01"), to = as.Date("2022-01-30"), by = "days"),
                 Loan.Number = rep(c(1,2), 15),
                 Current.Delinquency.Code = sample(1:8, size = 30, replace = TRUE)) %>%

# group by Loan.Number
dplyr::group_by(df, Loan.Number) %>%
    # order by Portfolio.Date
    dplyr::arrange(Portfolio.Date) %>%
    # check the condition and make a cumsum of it, returning only those where cumsum is 1 (frist occurence)
    dplyr::mutate(nc = ifelse(Current.Delinquency.Code > 3 & cumsum(Current.Delinquency.Code > 3) == 1, 1, 0)) %>%
    # ungroup to prevent unwanted behaviour down stream
    dplyr::ungroup()

# A tibble: 30 x 4
   Portfolio.Date Loan.Number Current.Delinquency.Code    nc
   <date>               <dbl>                    <int> <dbl>
 1 2022-01-01               1                        1     0
 2 2022-01-02               2                        4     1
 3 2022-01-03               1                        7     1
 4 2022-01-04               2                        1     0
 5 2022-01-05               1                        2     0
 6 2022-01-06               2                        5     0
 7 2022-01-07               1                        7     0
 8 2022-01-08               2                        3     0
 9 2022-01-09               1                        6     0
10 2022-01-10               2                        2     0
# ... with 20 more rows
# i Use `print(n = ...)` to see more rows
  •  Tags:  
  • rmin
  • Related