I'm currently working with 8 databases with the same structure, what I would like to know is how to apply the same steps and modifications to all the bases at the same time.
I know that with the lapply function and passing the databases to a list it is possible to do but I can not specify it.
The steps I need to perform are as follows:
df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)
d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII")
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")
df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)
df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)
EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)
They are simple steps that do not require much time to perform one by one. But I would like to learn how to be more efficient and save space and time in the script.
Thanks in advance
CodePudding user response:
A general solution as you have already identified is to put the dataframes in a list and use lapply
/map
on each dataframe.
Here's a solution using map_df
from purrr
. If the dataframe are called as df1
, df2
... df8
then you can use mget
to create a list of dataframes. I have also created an id
variable which will give the dataframe name for each row.
library(dplyr)
library(purrr)
EMAILS <- map_df(mget(paste0('df', 1:8)), function(x) {
x %>%
transmute(EMAIL = str_to_lower(EMAIL) %>% stri_trans_general("Latin-ASCII"),
CATEGORY = str_to_title(CATEGORY),
COMPANY)
}, .id = 'id') %>% distinct(EMAIL, .keep_all = TRUE)