Home > Net >  How to join multiple data.frames with different numbers of rows at once in R? [duplicate]
How to join multiple data.frames with different numbers of rows at once in R? [duplicate]

Time:09-28

I have many data.frames who have different row numbers. I would like to merge them all into one data.frame. I know the inner_join command does the job for two data.frames, but in my case I have a lot of data.frames.

reproducible example for my data.frames:

df1<-data.frame(s=c(1,2,3,4,5,6,7),
                x=c('a','b','c','d','e', 'f', 'g'))

df2<-data.frame(k=c(6,7,8,9),
                x=c('a','b','c','d'))


df3<-data.frame(y=c(10,11,12,13,14),
                x=c('f','g','h','i','j'),
                u=c(8,7,4,5,3))

df4<-data.frame(z=c(8,7,9,1,3,4,6,2),
                x=c('a','b','c','d','f','e','h', 'j'))

df_aaa<-data.frame(w=c(1000,500),
                x=c('a','b'))

ghj_df<-data.frame(q=c(36,32,31,35,34,34,31,35,36),
                x=c('a','b','c','d','e','f','g','h','j'))

CodePudding user response:

  1. Clean your environment.
  2. load all desired dataframes into your environment
  3. make a list of all your dataframes in your environment with code below dfs ....
  4. use bind_rows to combine all of them to one dataframe
library(dplyr)

dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
bind_rows(dfs, .id="column_label")
column_label  s x  k  y  u  z    w  q
1           df1  1 a NA NA NA NA   NA NA
2           df1  2 b NA NA NA NA   NA NA
3           df1  3 c NA NA NA NA   NA NA
4           df1  4 d NA NA NA NA   NA NA
5           df1  5 e NA NA NA NA   NA NA
6           df1  6 f NA NA NA NA   NA NA
7           df1  7 g NA NA NA NA   NA NA
8           df2 NA a  6 NA NA NA   NA NA
9           df2 NA b  7 NA NA NA   NA NA
10          df2 NA c  8 NA NA NA   NA NA
11          df2 NA d  9 NA NA NA   NA NA
12          df3 NA f NA 10  8 NA   NA NA
13          df3 NA g NA 11  7 NA   NA NA
14          df3 NA h NA 12  4 NA   NA NA
15          df3 NA i NA 13  5 NA   NA NA
16          df3 NA j NA 14  3 NA   NA NA
17          df4 NA a NA NA NA  8   NA NA
18          df4 NA b NA NA NA  7   NA NA
19          df4 NA c NA NA NA  9   NA NA
20          df4 NA d NA NA NA  1   NA NA
21          df4 NA f NA NA NA  3   NA NA
22          df4 NA e NA NA NA  4   NA NA
23          df4 NA h NA NA NA  6   NA NA
24          df4 NA j NA NA NA  2   NA NA
25          df5 NA a NA NA NA NA 1000 NA
26          df5 NA b NA NA NA NA  500 NA
27          df6 NA a NA NA NA NA   NA 36
28          df6 NA b NA NA NA NA   NA 32
29          df6 NA c NA NA NA NA   NA 31
30          df6 NA d NA NA NA NA   NA 35
31          df6 NA e NA NA NA NA   NA 34
32          df6 NA f NA NA NA NA   NA 34
33          df6 NA g NA NA NA NA   NA 31
34          df6 NA h NA NA NA NA   NA 35
35          df6 NA j NA NA NA NA   NA 36

CodePudding user response:

If inner_join does the trick for a pair of dataframes, we can use Reduce do apply inner_join to all data.frames. I would first put all the data.frames in a list (here with mget(ls())), then call Reduce with inner_join as the function to be reduced

library(dplyr)

Reduce(inner_join, mget(ls(pattern='df\\d ')))

I prefer using tidyverse for everything, so I usually use purrr::reduce here:

library(dplyr)
library(purrr)

reduce(mget(ls(pattern='df\\d ')), inner_join)

As an alternative to name matchin with ls(pattern=)), we can use a function to select all dataframes from your global environment. We can use Filter() or purrr::keep:

library(purrr)
library(dplyr)
mget(ls()) %>% keep(is.data.frame) %>% reduce(inner_join)
  •  Tags:  
  • r
  • Related