Home > Mobile >  Use dataframe name while mutating across many dataframes in R
Use dataframe name while mutating across many dataframes in R

Time:08-18

I need to make a new column in each of 96 different dataframes that is the name of the dataframe (the name is informative). It's easiest to just show you what I mean.

> wolf <- data.frame(test1 = c(3,2,4,3),
                     test2 = c(4,5,2,4))
> bear <- data.frame(test1 = c(3,5,6,1),
                     test2 = c(4,6,2,4))
> wolf
  test1 test2
1     3     4
2     2     5
3     4     2
4     3     4
> bear
  test1 test2
1     3     4
2     5     6
3     6     2
4     1     4

I would like the output to be:

> wolf
  test1 test2 animal
1     3     4   wolf
2     2     5   wolf
3     4     2   wolf
4     3     4   wolf
> bear
  test1 test2 animal
1     3     4   bear
2     5     6   bear
3     6     2   bear
4     1     4   bear

Obviously, doing a dplyr::mutate command for each dataframe would take ages. I'm sure there's a way to do this with for loops and/or lapply but I don't have a good handle on how to use those functions. I also know that it's bad practice to have so many dataframes in my global environment; I'm all ears if you have suggestions for a more organized way of inputting this data to begin with (the data is coming from excel spreadsheets).

The reason I'm doing this is I want to combine all these DFs into one DF. But if I just rbind immediately, I'll lose the important information that is in each DF's name. Thanks so much for your help.

CodePudding user response:

A possible solution, based on tibble::lst (to create a named list of the dataframes) and purrr::imap (to iterate over the list of dataframes):

library(tidyverse)

imap(lst(bear, wolf), ~ mutate(.x, animal = .y))

#> $bear
#>   test1 test2 animal
#> 1     3     4   bear
#> 2     5     6   bear
#> 3     6     2   bear
#> 4     1     4   bear
#> 
#> $wolf
#>   test1 test2 animal
#> 1     3     4   wolf
#> 2     2     5   wolf
#> 3     4     2   wolf
#> 4     3     4   wolf

In the case of many dataframes to process, we can do the following, after loading all dataframes (be sure no other dataframe is loaded but only the ones needed):

# this gets all dataframes from the global environment to a named list
l <- do.call(list, eapply(.GlobalEnv, \(x) if (is.data.frame(x)) x else NULL))
l <- Filter(Negate(is.null), l)

imap(l, ~ mutate(.x, animal = .y))
  • Related