Home > database >  Indexed list creates problems for the list2env() function
Indexed list creates problems for the list2env() function

Time:03-02

Via this post I discovered the fascinating list2env() function in conjunction with dplyr lists, allowing the user to move the contents of a named list out into the global environment. However I have run into some problems, best illustrated by the example below.

Say we have a list of datasets

df1 <- data.frame(x = letters[1:6], y = rnorm(6))
df2 <- data.frame(x = letters[1:6], y = rnorm(6))
df3 <- data.frame(x = letters[1:6], y = rnorm(6))

Now say we want to apply a function to each, in this case transforming the character vector x to a factor. To do this we create a function.

factorFunct <- function(d) d %>% mutate(x = factor(x))

Now, using dplyr::lst() we can create a list containing all the datasets, and apply our function to each

library(dplyr)
dfList <- dplyr::lst(df1, df2, df3)
dfList <- lapply(dfList, function(i) factorFunct(i))

Now when we examine each dataset in the list our character vectors have been changed to factors.

glimpse(dfList[[2]])

# Rows: 6
# Columns: 2
# $ x <fct> a, b, c, d, e, f
# $ y <dbl> -0.5809778, 0.8465600, 0.1022410, -0.9117389, 1.0635876, 0.3148138

Now what is really cool is that, using the list2env() function, we can move the transformed datasets back out into the global environment

list2env(dfList, .GlobalEnv)

So that the original datasets themselves have also been transformed.

glimpse(df2)

# Rows: 6
# Columns: 2
# $ x <fct> a, b, c, d, e, f
# $ y <dbl> -0.5809778, 0.8465600, 0.1022410, -0.9117389, 1.0635876, 0.3148138

A small but important thing to note is that our list of datasets is named

names(dfList)

# [1] "df1" "df2" "df3"

So far so good. My issue is that I have tried to apply a different transformation function to the list, one that adds a prefix to a subset of the variables in each of the datasets. This function requires a list of prefixes be created

preList <- dplyr::lst(one = "one", two = "two", three = "three")

Now to create the function that adds the prefixes, using the rename_with() function from dplyr

renameCols <- function(d, prefix) d %>% rename_with(.fn = ~ paste0(prefix, .x), .cols = matches("x"))

Now, let's apply the function to the list of datasets we created above

dfList <- lapply(1:length(dfList), function(i) renameCols(dfList[[i]], preList[[i]]))

Within the list it has worked, the x variable has had a prefix added to it.

glimpse(dfList[[3]]) 

# Rows: 6
# Columns: 2
# $ threex <fct> a, b, c, d, e, f
# $ y      <dbl> 2.47582743, -0.04879153, -0.13771671, -0.41982932, -0.36765793, 0.21731860

However this time when we run the list2env function to move the transformed datasets back out into the global environment we get an error message

list2env(dfList, .GlobalEnv)

# Error in list2env(dfList, .GlobalEnv) : names(x) must be a character vector of the same length as x

I think I know what is going on is that my function has stripped dfList of the names it needs to move each element of the list back out into the global environment

names(dfList)

# NULL

I suspect this is because I used a numerical index in the lapply() function rather than the list itself, but I do not know if this is truly the source of the problem, nor, even if it is, how to get the result I need.

CodePudding user response:

You could use mget to avoid the temporary list, and the renaming problem virtually asks for using Map.

Map(\(x, y) 
    setNames(factorFunct(x), paste0(c(y, ''), names(df1))),
    mget(ls(pattern='^df\\d$')),
    preList) |> 
  list2env(.GlobalEnv)

df1
#   onex          y
# 1    a  1.3709584
# 2    b -0.5646982
# 3    c  0.3631284
# 4    d  0.6328626
# 5    e  0.4042683
# 6    f -0.1061245

str(df1)
# 'data.frame': 6 obs. of  2 variables:
#  $ onex: Factor w/ 6 levels "a","b","c","d",..: 1 2 3 4 5 6
#  $ y   : num  1.371 -0.565 0.363 0.633 0.404 ...

Note: R >= 4.1 used.

In the function, instead of dplyr::mutate we alternatively could use transform.

factorFunct <- function(d) transform(d, x=factor(x))

CodePudding user response:

I think the issue is that your list of dataframes 'lost' their names after applying your renaming function; if you reassign the names your function works, e.g.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(123)
df1 <- data.frame(x = letters[1:6], y = rnorm(6))
df2 <- data.frame(x = letters[1:6], y = rnorm(6))
df3 <- data.frame(x = letters[1:6], y = rnorm(6))

preList <- lst(one = "one", two = "two", three = "three")
dfList <- dplyr::lst(df1, df2, df3)

renameCols <- function(d, prefix) {
  d %>%
    rename_with(.fn = ~ paste0(prefix, .x),
                .cols = matches("x"))
}

dfList2 <- lapply(seq_along(dfList), function(i) {renameCols(d = dfList[[i]],
                                                             prefix = preList[[i]])})
names(dfList2)
#> NULL

names(dfList2) <- names(dfList)

list2env(dfList2, .GlobalEnv)
#> <environment: R_GlobalEnv>

df1
#>   onex           y
#> 1    a -0.56047565
#> 2    b -0.23017749
#> 3    c  1.55870831
#> 4    d  0.07050839
#> 5    e  0.12928774
#> 6    f  1.71506499

Created on 2022-03-01 by the reprex package (v2.0.1)

  • Related