After being convinced that it's best practice to use list to manage multiple data frames in R, I decided to put all my data frames that have the same column names into list.
sample list:
#create sample df
df_1 <- data.frame(item = c("a", "b", "c"),
measure = c(1,2,3)
)
df_2 <- data.frame(item = c("x", "y", "z"),
measure = c(4,5,6)
)
#use names as my df has names
data_list <- list(df_1 = df_1, df_2 = df_2)
I wanted to do the same operation across these data frames but I did not want them to be combined as one data frame as later on I'll need to save each into separate output.
Then it become a nightmare because I did not know how to manipulate column across all data frame inside a list.
While we can select a specific element and also the specific df in a list, how do we select by column name?
Taking an example, I need to change the value in the item
column as Upper Case. In a data frame I'll do
df_1 <- df_1 %>% mutate(item = toupper(item))
I am still learning to write function and using the apply family in R. For this simple task I believe I can just use the existing function inside lapply like this
data_list = lapply(x, toupper)
The question is what is x in here? Is there a way to subset by column? Like data_list$df_1
or data_list[1]
can give me the whole df_1.
I hope I can use lapply and function to do column by column approach across data frame in a list.
CodePudding user response:
Another option is to use map
from purrr
. So, if you have already written what you want to do for one dataframe in your list, then you can put it in as a function into map
. You use .x
rather than specifying the specific dataframe.
library(tidyverse)
map(data_list, ~ .x %>%
mutate(item = toupper(item)))
Output
$df_1
item measure
1 A 1
2 B 2
3 C 3
$df_2
item measure
1 X 4
2 Y 5
3 Z 6