Home > Mobile >  Return the names of data frames from a loop into a new data frame as IDs
Return the names of data frames from a loop into a new data frame as IDs

Time:05-06

Suppose I have multiple data frames with the same prefixes and same structure.

mydf_1 <- data.frame('fruit' = 'apples', 'n' = 2)
mydf_2 <- data.frame('fruit' = 'pears', 'n' = 0)
mydf_3 <- data.frame('fruit' = 'oranges', 'n' = 3)

I have a for-loop that grabs all the tables with this prefix, and appends those that match a certain condition.

res <- data.frame()

for(i in mget(apropos("^mydf_"), envir = .GlobalEnv)){
  
  if(sum(i$n) > 0){
    res <- rbind.data.frame(res, data.frame('name' = paste0(i[1]),
                                            'n' = sum(i$n)))
  }
}

res

This works fine, but I want my 'res' table to identify the name of the original data frame itself in the 'name' column, instead of the column name. My desired result is:

enter image description here

The closest I have gotten to solving this issue is:

'name' = paste0(substitute(i))

instead of

'name' = paste0(i[1])

but it just returns 'i'.

Any simple solution? Base preferred but not essential.

CodePudding user response:

As mentioned in the comments, it is better to put dataframes into a list as it much easier to handle and manipulate them. However, we could still grab the dataframes from the global environment, get the sum for each dataframe, then bind them together and add the dataframe name as a row.

library(tidyverse)

df_list <-
  do.call("list", mget(grep("^mydf_", names(.GlobalEnv), value = TRUE))) %>%
  map(., ~ .x %>% summarise(n = sum(n))) %>%
  discard(~ .x == 0) %>% 
  bind_rows(., .id = "name")

Or we could use map_dfr to bind together and summarise, then filter out the 0 values:

map_dfr(mget(ls(pattern = "^mydf_")), ~ c(n = sum(.x$n)), .id = "name") %>%
  filter(n != 0)

Output

    name n
1 mydf_1 2
2 mydf_3 3

CodePudding user response:

To bind a list of data.frames and store the list names as a new column, a convenient way is to set the arg .id in dplyr::bind_rows().

library(dplyr)

mget(apropos("^mydf_")) %>%
  bind_rows(.id = "name") %>%
  count(name, wt = n) %>%
  filter(n > 0)

#     name n
# 1 mydf_1 2
# 2 mydf_3 3
  • Related