Home > Enterprise >  How to create a new variable name with a pre-defined prefix in a function
How to create a new variable name with a pre-defined prefix in a function

Time:12-29

I am trying to calculate the population at risk of a particular type of cancer by year. I have a data.table that has information on whether patients had cancer (1/0), and the date that their cancer was detected cancerDate. My data spans 2015 to 2021.

I have written a function for this:

add_par_column <- function(dt, year) {
  dt[, `:=`(PAR = cancer == 0 | (cancer == 1 & cancerDate >= paste0(year, "-01-01")))]
}

then implemented the function like this:

DT <- add_par_column(DT, 2015)
DT <- add_par_column(DT, 2016)
DT <- add_par_column(DT, 2017)
#etc.

The problem is that the variable PAR that my function creates gets over-written with each new line of year that I run instead of keeping the PAR for each year separately in the data.table.

I have tried to edit the function by adding a prefix to the PAR variable like this:

add_par_column <- function(dt, year) {
  dt[, `:=`(
    paste0("PAR", year) = cancer == 0 | (cancer == 1 & cancerDate >= paste0(year, "-01-01"))
    )]
}

but I keep getting error messages.

If I were to do this without the function, I should have these new PAR variables created in the data.table:

DT <- DT[, 
                     `:=`(
                       PAR2015 = cancer == 0 |(cancer == 1 &  cancerDate >= "2015-01-01"),
                       PAR2016 = cancer == 0 |(cancer == 1 &  cancerDate >= "2016-01-01"),
                       PAR2017 = cancer == 0 |(cancer == 1 &  cancerDate >= "2017-01-01"),
                       PAR2018 = cancer == 0 |(cancer == 1 &  cancerDate >= "2018-01-01"),
                       PAR2019 = cancer == 0 |(cancer == 1 &  cancerDate >= "2019-01-01"),
                       PAR2020 = cancer == 0 |(cancer == 1 &  cancerDate >= "2020-01-01")
                       PAR2021 = cancer == 0 |(cancer == 1 &  cancerDate >= "2021-01-01")
                       )]

but I am trying to avoid the repetitions.

CodePudding user response:

If we want to keep the PAR to keep the original as well as update, then create an OR (|) condition with PAR column already created

add_par_column <- function(dt, year) {
  if(!exists('PAR', dt))
    {
     dt[, PAR := FALSE]
    }
    dt[year(cancerDate) == year,  PAR := (cancer == 0 | 
                   (cancer == 1 & 
                    cancerDate >= paste0(year, "-01-01")))|PAR]
    dt
   
  
}

-testing

> add_par_column(DT, 2015)
> DT
    cancer cancerDate   PAR
 1:      0 2015-01-01  TRUE
 2:      0 2015-04-01  TRUE
 3:      1 2015-07-01  TRUE
 4:      0 2015-10-01  TRUE
 5:      1 2016-01-01 FALSE
 6:      0 2016-04-01 FALSE
 7:      0 2016-07-01 FALSE
 8:      1 2016-10-01 FALSE
 9:      1 2017-01-01 FALSE
10:      1 2017-04-01 FALSE
11:      1 2017-07-01 FALSE
12:      0 2017-10-01 FALSE
13:      1 2018-01-01 FALSE
14:      0 2018-04-01 FALSE
15:      1 2018-07-01 FALSE
16:      0 2018-10-01 FALSE
17:      1 2019-01-01 FALSE
18:      0 2019-04-01 FALSE
19:      0 2019-07-01 FALSE
20:      1 2019-10-01 FALSE
> add_par_column(DT, 2016)
> DT
    cancer cancerDate   PAR
 1:      0 2015-01-01  TRUE
 2:      0 2015-04-01  TRUE
 3:      1 2015-07-01  TRUE
 4:      0 2015-10-01  TRUE
 5:      1 2016-01-01  TRUE
 6:      0 2016-04-01  TRUE
 7:      0 2016-07-01  TRUE
 8:      1 2016-10-01  TRUE
 9:      1 2017-01-01 FALSE
10:      1 2017-04-01 FALSE
11:      1 2017-07-01 FALSE
12:      0 2017-10-01 FALSE
13:      1 2018-01-01 FALSE
14:      0 2018-04-01 FALSE
15:      1 2018-07-01 FALSE
16:      0 2018-10-01 FALSE
17:      1 2019-01-01 FALSE
18:      0 2019-04-01 FALSE
19:      0 2019-07-01 FALSE
20:      1 2019-10-01 FALSE

data

set.seed(24)
DT <- data.table(cancer = sample(0:1, size = 20, replace = TRUE), 
   cancerDate = seq(as.Date('2015-01-01'), length.out = 20, by = '3 months'))

CodePudding user response:

You could use the LHS:=RHS reference semantics instead of the functional form ':='(LHS=RHS).
I can't remember seeing the functional form with a calculated LHS, the error messages you get suggest this isn't allowed.

add_par_column <- function(dt, year) {
  dt[, paste0("PAR", year) := cancer == 0 | (cancer == 1 & cancerDate >= paste0(year, "-01-01"))]
}

DT <- add_par_column(DT, 2015)
DT <- add_par_column(DT, 2016)
DT <- add_par_column(DT, 2017)
DT[]

#    cancer cancerDate PAR2015 PAR2016 PAR2017
#     <int>     <Date>  <lgcl>  <lgcl>  <lgcl>
# 1:      0 2015-01-01    TRUE    TRUE    TRUE
# 2:      0 2015-04-01    TRUE    TRUE    TRUE
# 3:      1 2015-07-01    TRUE   FALSE   FALSE
# 4:      0 2015-10-01    TRUE    TRUE    TRUE
# 5:      1 2016-01-01    TRUE    TRUE   FALSE
...
  • Related