Home > Software engineering >  Automating the creation of new columns for a list of dataframes - R
Automating the creation of new columns for a list of dataframes - R

Time:11-25

I have been tasked with the creation of lagging indicators for a series of data frames. Each dataframe has two columns: date and value. For each "value" column I need to create 12 new columns, where the lag increases by one unit.

A given data frame looks like this:

    date   | value |
01-01-2021     1
02-01-2021     2
03-01-2021     3
04-01-2021     4
05-01-2021     5
06-01-2021     6
07-01-2021     7
08-01-2021     8

And I need to add new columns so they end up looking like this:

    date   | value | lag1 | ... | lag12
01-01-2021     1       2            13
02-01-2021     2       3            14
03-01-2021     3       4            15
04-01-2021     4       5            16
05-01-2021     5       6            17
06-01-2021     6       7            18
07-01-2021     7       8            19
08-01-2021     8       9            20

The values shown are for illustrative purposes, the real data is of economic nature and therefore follows no particular pattern.

So far, I have only managed to figure out how to create one column at a time for a single data frame, but this isn't very efficient. I need a way to loop through all of the data frames and create the additional columns.

Thanks in advance for any help!

CodePudding user response:

Loop over however many lags you need, creating a list of the new column values, then add them back all at once.

lagfun <- function(x,n) c(tail(x,-n), rep(NA,n))
dat[paste0("lag", 1:7)] <- lapply(1:7, lagfun, x=dat$value)
dat

#        date value lag1 lag2 lag3 lag4 lag5 lag6 lag7
#1 01-01-2021     1    2    3    4    5    6    7    8
#2 02-01-2021     2    3    4    5    6    7    8   NA
#3 03-01-2021     3    4    5    6    7    8   NA   NA
#4 04-01-2021     4    5    6    7    8   NA   NA   NA
#5 05-01-2021     5    6    7    8   NA   NA   NA   NA
#6 06-01-2021     6    7    8   NA   NA   NA   NA   NA
#7 07-01-2021     7    8   NA   NA   NA   NA   NA   NA
#8 08-01-2021     8   NA   NA   NA   NA   NA   NA   NA

You could of course use the lag function from dplyr or a similar function instead of my lagfun, but this will be self-contained.

  • Related