Using a for loop to apply a function to a list in R-CodePudding

I have four data frames, each with two columns, one for the date and another for values. I want to generate 24 new columns in each data frame, 12 for lagging indicators and 12 for leading indicators. I have managed to make this work one data frame at a time using the following code:

df[paste0("lag", 1:12)] = lapply(1:12, lag, x=df[,2])
df[paste0("lead", 1:12)] = lapply(1:12, lead, x=df[,2])

However, I would like to automate this using a for loop that goes through a list of the data frames. So far, I have tried the following:

dataframes = list(df1,df2,df3,df4)

for (df in dataframes){
        df[paste0("lag", 1:12)] = lapply(1:12, lag, x=df[,2])
        df[paste0("lead", 1:12)] = lapply(1:12, lead, x=df[,2])
}

Sadly, this doesn't work, since the data frames remain the same after the for loop. Any suggestions as for how to make this work?

CodePudding user response：

The df doesn't update the original object 'df1', 'df2' in the global env. If we want, use assign (or better is to keep it in a list)

# // create a named `list`
dataframes = list(df1,df2,df3,df4)
names(dataframes) <- c("df1", "df2", "df3", "df4")
# // loop over the names of the list
for(nm in names(dataframes)) {
    # // get the value of the object from the names
    df <- get(nm)
    # // create the new columns
    df[paste0("lag", 1:12)] <- lapply(1:12, lag, x=df[,2])
    df[paste0("lead", 1:12)] <- lapply(1:12, lead, x=df[,2])
    # // assign to update the original object 
    assign(nm, df)
}

It may be better to keep it in a list

dataframes2 <- lapply(dataframes, function(df) {
      df[paste0("lag", 1:12)] <- lapply(1:12, lag, x=df[,2])
     df[paste0("lead", 1:12)] <- lapply(1:12, lead, x=df[,2])
     df 
    })

The list output can be used to update the original objects with list2env though not recommended

list2env(dataframe2, .GlobalEnv)

CodePudding user response：

Let nms be a vector of the data frame names and from that create a list L of the data frames themselves. We use 2 lags and 2 leads to keep the example small.

Please read the info at the top of the r tag page and, in particular, examples should be self-contained, complete including all inputs and library statements, reproducible so anyone else can easily run them and minimal.

The library statements are missing. There is no lead function in R so we assume that dplyr is being used.
The data frames themselves are missing so we construct sample data frames based on the BOD data frame which is included with R.
To make this minimal we use 2 rather than 12.

We create a named list L of data frames using mget and then iterate through the names creating new data frames in L overwriting the old ones in L. Although not recommended unless there is a good reason to do so we could write the data frames in L back out to the global environment using listenv(L, .GlobalEnv).

library(dplyr)

# test data
for(i in 1:4) assign(paste0("df", i), i * BOD)

nms <- paste0("df", 1:4)  
L <- mget(nms)
for (nm in names(L)) {
  L[[nm]][paste0("lag", 1:2)] = lapply(1:2, lag, x=L[[nm]][,2])
  L[[nm]][paste0("lead", 1:2)] = lapply(1:2, lead, x=L[[nm]][,2])
}

giving:

> str(L)
List of 4
 $ df1:'data.frame':    6 obs. of  6 variables:
  ..$ Time  : num [1:6] 1 2 3 4 5 7
  ..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
  ..$ lag1  : num [1:6] NA 8.3 10.3 19 16 15.6
  ..$ lag2  : num [1:6] NA NA 8.3 10.3 19 16
  ..$ lead1 : num [1:6] 10.3 19 16 15.6 19.8 NA
  ..$ lead2 : num [1:6] 19 16 15.6 19.8 NA NA
 $ df2:'data.frame':    6 obs. of  6 variables:
  ..$ Time  : num [1:6] 2 4 6 8 10 14
  ..$ demand: num [1:6] 16.6 20.6 38 32 31.2 39.6
  ..$ lag1  : num [1:6] NA 16.6 20.6 38 32 31.2
  ..$ lag2  : num [1:6] NA NA 16.6 20.6 38 32
  ..$ lead1 : num [1:6] 20.6 38 32 31.2 39.6 NA
  ..$ lead2 : num [1:6] 38 32 31.2 39.6 NA NA
 $ df3:'data.frame':    6 obs. of  6 variables:
  ..$ Time  : num [1:6] 3 6 9 12 15 21
  ..$ demand: num [1:6] 24.9 30.9 57 48 46.8 59.4
  ..$ lag1  : num [1:6] NA 24.9 30.9 57 48 46.8
  ..$ lag2  : num [1:6] NA NA 24.9 30.9 57 48
  ..$ lead1 : num [1:6] 30.9 57 48 46.8 59.4 NA
  ..$ lead2 : num [1:6] 57 48 46.8 59.4 NA NA
 $ df4:'data.frame':    6 obs. of  6 variables:
  ..$ Time  : num [1:6] 4 8 12 16 20 28
  ..$ demand: num [1:6] 33.2 41.2 76 64 62.4 79.2
  ..$ lag1  : num [1:6] NA 33.2 41.2 76 64 62.4
  ..$ lag2  : num [1:6] NA NA 33.2 41.2 76 64
  ..$ lead1 : num [1:6] 41.2 76 64 62.4 79.2 NA
  ..$ lead2 : num [1:6] 76 64 62.4 79.2 NA NA

CodePudding user response：

Writing a more explicit function (if required), gives you a lot more flexibility. Using your example, but simplifying the tables

a <- tibble(x = 1:50)
b <- tibble(x = 51:75)
dflist <- list(a, b)

# quick function using single lag on single column, but easily extendible
cv <- function(a)
{
  nca <- ncol(a)
  for(i in seq(from = 1, to = 23, by = 2))
  {  
   a[nca i] = lag(a$x)
   a[nca i 1] = lead(a$x)
  }
  return(a)
}

# simple to apply to create your new columns (or put in loop)
na <- cv(a)

# or simple to do all df at once and concatenate the results
f <- dflist %>% map_dfr(cv)