Home > OS >  Apply same function to several Dataframes - R
Apply same function to several Dataframes - R

Time:03-18

I'm currently working with 8 databases with the same structure, what I would like to know is how to apply the same steps and modifications to all the bases at the same time.

I know that with the lapply function and passing the databases to a list it is possible to do but I can not specify it.

The steps I need to perform are as follows:

df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)

d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII") 
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")

df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)

df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)

EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)

They are simple steps that do not require much time to perform one by one. But I would like to learn how to be more efficient and save space and time in the script.

Thanks in advance

CodePudding user response:

A general solution as you have already identified is to put the dataframes in a list and use lapply/map on each dataframe.

Here's a solution using map_df from purrr. If the dataframe are called as df1, df2... df8 then you can use mget to create a list of dataframes. I have also created an id variable which will give the dataframe name for each row.

library(dplyr)
library(purrr)

EMAILS <- map_df(mget(paste0('df', 1:8)), function(x) {
  x %>%
    transmute(EMAIL = str_to_lower(EMAIL) %>% stri_trans_general("Latin-ASCII"), 
              CATEGORY = str_to_title(CATEGORY), 
              COMPANY)
}, .id = 'id') %>% distinct(EMAIL, .keep_all = TRUE)
  • Related