Home > OS >  Merge multiple data frames with similar names and add a new column indicating origin
Merge multiple data frames with similar names and add a new column indicating origin

Time:09-17

I have several data frames that I want to merge into one large dataframe. In addition, I want to add a column to this new dataframe that indicates which data frame the data originated from. I did make a rather crude way of doing this:

exports_all<- rbind((mutate(Dai15_Dai3_1, sentiment = factor("week1"))),
                    (mutate(Dai15_Dai3_2, sentiment = factor("week2"))),
                    (mutate(Dai15_Dai3_3, sentiment = factor("week3"))),
                    (mutate(Dai15_Dai3_4, sentiment = factor("week4"))),
                    (mutate(Dai15_Dai3_5, sentiment = factor("week5"))),
                    (mutate(Dai15_Dai3_6, sentiment = factor("week6"))),
                    (mutate(Dai15_Dai3_7, sentiment = factor("week7"))))

Produces:

     Region_ID  Sv_mean  Week
1        10    -64.01115 week1
2        10    -64.96363 week1
3        10    -67.98841 week1
4        13    -66.88734 week2
5        13    -69.79789 week2
6        13    -69.94071 week2
7        15    -66.04855 week3
8        15    -68.31167 week3
9        15    -68.67383 week3

I'm trying to find a more elegant way of doing this. The data I used here was broken into 7 weeks, but next years data could be different (also the year before had 12 weeks). I'm trying to find a solution that will not choke up when encountering differing week numbers.

I did make some progress but the only solution I found inserts a column with the dataframe of origin, not exactly what I need.

exports_all<- mget(ls(pattern = 'Dai15_Dai3_*')) %>%
  map_df(I, .id = 'Week')

CodePudding user response:

We may have to modify the names of the list as mget returns a named list with the names being the object names.

library(dplyr)
mget(ls(pattern = 'Dai15_Dai3_*')) %>% 
  setNames(paste0("Week", seq_along(.))) %>%
  bind_rows(.id = "Week") %>%
  relocate(Week, .after = last_col())

If it should replace the substring in the original names

library(stringr)
mget(ls(pattern = 'Dai15_Dai3_*')) %>%
  setNames(str_replace(names(.), ".*_(\\d )$", "Week\\1")) %>% 
  bind_rows(.id = "Week") %>%
  relocate(Week, .after = last_col())

NOTE: bind_rows is enough as we are not applying any functions in each of the list elements (in the OP's code, I (identity) was used as a placeholder for .f)

  • Related