I have four data frames, each with two columns, one for the date and another for values. I want to generate 24 new columns in each data frame, 12 for lagging indicators and 12 for leading indicators. I have managed to make this work one data frame at a time using the following code:
df[paste0("lag", 1:12)] = lapply(1:12, lag, x=df[,2])
df[paste0("lead", 1:12)] = lapply(1:12, lead, x=df[,2])
However, I would like to automate this using a for loop that goes through a list of the data frames. So far, I have tried the following:
dataframes = list(df1,df2,df3,df4)
for (df in dataframes){
df[paste0("lag", 1:12)] = lapply(1:12, lag, x=df[,2])
df[paste0("lead", 1:12)] = lapply(1:12, lead, x=df[,2])
}
Sadly, this doesn't work, since the data frames remain the same after the for loop. Any suggestions as for how to make this work?
CodePudding user response:
The df
doesn't update the original object 'df1', 'df2' in the global env. If we want, use assign
(or better is to keep it in a list
)
# // create a named `list`
dataframes = list(df1,df2,df3,df4)
names(dataframes) <- c("df1", "df2", "df3", "df4")
# // loop over the names of the list
for(nm in names(dataframes)) {
# // get the value of the object from the names
df <- get(nm)
# // create the new columns
df[paste0("lag", 1:12)] <- lapply(1:12, lag, x=df[,2])
df[paste0("lead", 1:12)] <- lapply(1:12, lead, x=df[,2])
# // assign to update the original object
assign(nm, df)
}
It may be better to keep it in a list
dataframes2 <- lapply(dataframes, function(df) {
df[paste0("lag", 1:12)] <- lapply(1:12, lag, x=df[,2])
df[paste0("lead", 1:12)] <- lapply(1:12, lead, x=df[,2])
df
})
The list
output can be used to update the original objects with list2env
though not recommended
list2env(dataframe2, .GlobalEnv)
CodePudding user response:
Let nms be a vector of the data frame names and from that create a list L of the data frames themselves. We use 2 lags and 2 leads to keep the example small.
Please read the info at the top of the r tag page and, in particular, examples should be self-contained, complete including all inputs and library statements, reproducible so anyone else can easily run them and minimal.
- The library statements are missing. There is no lead function in R so we assume that dplyr is being used.
- The data frames themselves are missing so we construct sample data frames based on the BOD data frame which is included with R.
- To make this minimal we use 2 rather than 12.
We create a named list L of data frames using mget and then iterate through the names creating new data frames in L overwriting the old ones in L. Although not recommended unless there is a good reason to do so we could write the data frames in L back out to the global environment using listenv(L, .GlobalEnv)
.
library(dplyr)
# test data
for(i in 1:4) assign(paste0("df", i), i * BOD)
nms <- paste0("df", 1:4)
L <- mget(nms)
for (nm in names(L)) {
L[[nm]][paste0("lag", 1:2)] = lapply(1:2, lag, x=L[[nm]][,2])
L[[nm]][paste0("lead", 1:2)] = lapply(1:2, lead, x=L[[nm]][,2])
}
giving:
> str(L)
List of 4
$ df1:'data.frame': 6 obs. of 6 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..$ lag1 : num [1:6] NA 8.3 10.3 19 16 15.6
..$ lag2 : num [1:6] NA NA 8.3 10.3 19 16
..$ lead1 : num [1:6] 10.3 19 16 15.6 19.8 NA
..$ lead2 : num [1:6] 19 16 15.6 19.8 NA NA
$ df2:'data.frame': 6 obs. of 6 variables:
..$ Time : num [1:6] 2 4 6 8 10 14
..$ demand: num [1:6] 16.6 20.6 38 32 31.2 39.6
..$ lag1 : num [1:6] NA 16.6 20.6 38 32 31.2
..$ lag2 : num [1:6] NA NA 16.6 20.6 38 32
..$ lead1 : num [1:6] 20.6 38 32 31.2 39.6 NA
..$ lead2 : num [1:6] 38 32 31.2 39.6 NA NA
$ df3:'data.frame': 6 obs. of 6 variables:
..$ Time : num [1:6] 3 6 9 12 15 21
..$ demand: num [1:6] 24.9 30.9 57 48 46.8 59.4
..$ lag1 : num [1:6] NA 24.9 30.9 57 48 46.8
..$ lag2 : num [1:6] NA NA 24.9 30.9 57 48
..$ lead1 : num [1:6] 30.9 57 48 46.8 59.4 NA
..$ lead2 : num [1:6] 57 48 46.8 59.4 NA NA
$ df4:'data.frame': 6 obs. of 6 variables:
..$ Time : num [1:6] 4 8 12 16 20 28
..$ demand: num [1:6] 33.2 41.2 76 64 62.4 79.2
..$ lag1 : num [1:6] NA 33.2 41.2 76 64 62.4
..$ lag2 : num [1:6] NA NA 33.2 41.2 76 64
..$ lead1 : num [1:6] 41.2 76 64 62.4 79.2 NA
..$ lead2 : num [1:6] 76 64 62.4 79.2 NA NA
CodePudding user response:
Writing a more explicit function (if required), gives you a lot more flexibility. Using your example, but simplifying the tables
a <- tibble(x = 1:50)
b <- tibble(x = 51:75)
dflist <- list(a, b)
# quick function using single lag on single column, but easily extendible
cv <- function(a)
{
nca <- ncol(a)
for(i in seq(from = 1, to = 23, by = 2))
{
a[nca i] = lag(a$x)
a[nca i 1] = lead(a$x)
}
return(a)
}
# simple to apply to create your new columns (or put in loop)
na <- cv(a)
# or simple to do all df at once and concatenate the results
f <- dflist %>% map_dfr(cv)