Via this post I discovered the fascinating list2env()
function in conjunction with dplyr
lists, allowing the user to move the contents of a named list out into the global environment. However I have run into some problems, best illustrated by the example below.
Say we have a list of datasets
df1 <- data.frame(x = letters[1:6], y = rnorm(6))
df2 <- data.frame(x = letters[1:6], y = rnorm(6))
df3 <- data.frame(x = letters[1:6], y = rnorm(6))
Now say we want to apply a function to each, in this case transforming the character vector x
to a factor. To do this we create a function.
factorFunct <- function(d) d %>% mutate(x = factor(x))
Now, using dplyr::lst()
we can create a list containing all the datasets, and apply our function to each
library(dplyr)
dfList <- dplyr::lst(df1, df2, df3)
dfList <- lapply(dfList, function(i) factorFunct(i))
Now when we examine each dataset in the list our character vectors have been changed to factors.
glimpse(dfList[[2]])
# Rows: 6
# Columns: 2
# $ x <fct> a, b, c, d, e, f
# $ y <dbl> -0.5809778, 0.8465600, 0.1022410, -0.9117389, 1.0635876, 0.3148138
Now what is really cool is that, using the list2env()
function, we can move the transformed datasets back out into the global environment
list2env(dfList, .GlobalEnv)
So that the original datasets themselves have also been transformed.
glimpse(df2)
# Rows: 6
# Columns: 2
# $ x <fct> a, b, c, d, e, f
# $ y <dbl> -0.5809778, 0.8465600, 0.1022410, -0.9117389, 1.0635876, 0.3148138
A small but important thing to note is that our list of datasets is named
names(dfList)
# [1] "df1" "df2" "df3"
So far so good. My issue is that I have tried to apply a different transformation function to the list, one that adds a prefix to a subset of the variables in each of the datasets. This function requires a list of prefixes be created
preList <- dplyr::lst(one = "one", two = "two", three = "three")
Now to create the function that adds the prefixes, using the rename_with()
function from dplyr
renameCols <- function(d, prefix) d %>% rename_with(.fn = ~ paste0(prefix, .x), .cols = matches("x"))
Now, let's apply the function to the list of datasets we created above
dfList <- lapply(1:length(dfList), function(i) renameCols(dfList[[i]], preList[[i]]))
Within the list it has worked, the x
variable has had a prefix added to it.
glimpse(dfList[[3]])
# Rows: 6
# Columns: 2
# $ threex <fct> a, b, c, d, e, f
# $ y <dbl> 2.47582743, -0.04879153, -0.13771671, -0.41982932, -0.36765793, 0.21731860
However this time when we run the list2env function to move the transformed datasets back out into the global environment we get an error message
list2env(dfList, .GlobalEnv)
# Error in list2env(dfList, .GlobalEnv) : names(x) must be a character vector of the same length as x
I think I know what is going on is that my function has stripped dfList
of the names it needs to move each element of the list back out into the global environment
names(dfList)
# NULL
I suspect this is because I used a numerical index in the lapply()
function rather than the list itself, but I do not know if this is truly the source of the problem, nor, even if it is, how to get the result I need.
CodePudding user response:
You could use mget
to avoid the temporary list, and the renaming problem virtually asks for using Map
.
Map(\(x, y)
setNames(factorFunct(x), paste0(c(y, ''), names(df1))),
mget(ls(pattern='^df\\d$')),
preList) |>
list2env(.GlobalEnv)
df1
# onex y
# 1 a 1.3709584
# 2 b -0.5646982
# 3 c 0.3631284
# 4 d 0.6328626
# 5 e 0.4042683
# 6 f -0.1061245
str(df1)
# 'data.frame': 6 obs. of 2 variables:
# $ onex: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6
# $ y : num 1.371 -0.565 0.363 0.633 0.404 ...
Note: R >= 4.1 used.
In the function, instead of dplyr::mutate
we alternatively could use transform
.
factorFunct <- function(d) transform(d, x=factor(x))
CodePudding user response:
I think the issue is that your list of dataframes 'lost' their names after applying your renaming function; if you reassign the names your function works, e.g.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(123)
df1 <- data.frame(x = letters[1:6], y = rnorm(6))
df2 <- data.frame(x = letters[1:6], y = rnorm(6))
df3 <- data.frame(x = letters[1:6], y = rnorm(6))
preList <- lst(one = "one", two = "two", three = "three")
dfList <- dplyr::lst(df1, df2, df3)
renameCols <- function(d, prefix) {
d %>%
rename_with(.fn = ~ paste0(prefix, .x),
.cols = matches("x"))
}
dfList2 <- lapply(seq_along(dfList), function(i) {renameCols(d = dfList[[i]],
prefix = preList[[i]])})
names(dfList2)
#> NULL
names(dfList2) <- names(dfList)
list2env(dfList2, .GlobalEnv)
#> <environment: R_GlobalEnv>
df1
#> onex y
#> 1 a -0.56047565
#> 2 b -0.23017749
#> 3 c 1.55870831
#> 4 d 0.07050839
#> 5 e 0.12928774
#> 6 f 1.71506499
Created on 2022-03-01 by the reprex package (v2.0.1)