I have many data.frames who have different row numbers.
I would like to merge them all into one data.frame.
I know the inner_join
command does the job for two data.frames, but in my case I have a lot of data.frames.
reproducible example for my data.frames:
df1<-data.frame(s=c(1,2,3,4,5,6,7),
x=c('a','b','c','d','e', 'f', 'g'))
df2<-data.frame(k=c(6,7,8,9),
x=c('a','b','c','d'))
df3<-data.frame(y=c(10,11,12,13,14),
x=c('f','g','h','i','j'),
u=c(8,7,4,5,3))
df4<-data.frame(z=c(8,7,9,1,3,4,6,2),
x=c('a','b','c','d','f','e','h', 'j'))
df_aaa<-data.frame(w=c(1000,500),
x=c('a','b'))
ghj_df<-data.frame(q=c(36,32,31,35,34,34,31,35,36),
x=c('a','b','c','d','e','f','g','h','j'))
CodePudding user response:
- Clean your environment.
- load all desired dataframes into your environment
- make a list of all your dataframes in your environment with code below
dfs ....
- use
bind_rows
to combine all of them to one dataframe
library(dplyr)
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
bind_rows(dfs, .id="column_label")
column_label s x k y u z w q
1 df1 1 a NA NA NA NA NA NA
2 df1 2 b NA NA NA NA NA NA
3 df1 3 c NA NA NA NA NA NA
4 df1 4 d NA NA NA NA NA NA
5 df1 5 e NA NA NA NA NA NA
6 df1 6 f NA NA NA NA NA NA
7 df1 7 g NA NA NA NA NA NA
8 df2 NA a 6 NA NA NA NA NA
9 df2 NA b 7 NA NA NA NA NA
10 df2 NA c 8 NA NA NA NA NA
11 df2 NA d 9 NA NA NA NA NA
12 df3 NA f NA 10 8 NA NA NA
13 df3 NA g NA 11 7 NA NA NA
14 df3 NA h NA 12 4 NA NA NA
15 df3 NA i NA 13 5 NA NA NA
16 df3 NA j NA 14 3 NA NA NA
17 df4 NA a NA NA NA 8 NA NA
18 df4 NA b NA NA NA 7 NA NA
19 df4 NA c NA NA NA 9 NA NA
20 df4 NA d NA NA NA 1 NA NA
21 df4 NA f NA NA NA 3 NA NA
22 df4 NA e NA NA NA 4 NA NA
23 df4 NA h NA NA NA 6 NA NA
24 df4 NA j NA NA NA 2 NA NA
25 df5 NA a NA NA NA NA 1000 NA
26 df5 NA b NA NA NA NA 500 NA
27 df6 NA a NA NA NA NA NA 36
28 df6 NA b NA NA NA NA NA 32
29 df6 NA c NA NA NA NA NA 31
30 df6 NA d NA NA NA NA NA 35
31 df6 NA e NA NA NA NA NA 34
32 df6 NA f NA NA NA NA NA 34
33 df6 NA g NA NA NA NA NA 31
34 df6 NA h NA NA NA NA NA 35
35 df6 NA j NA NA NA NA NA 36
CodePudding user response:
If inner_join does the trick for a pair of dataframes, we can use Reduce do apply inner_join to all data.frames.
I would first put all the data.frames in a list (here with mget(ls())
), then call Reduce
with inner_join
as the function to be reduced
library(dplyr)
Reduce(inner_join, mget(ls(pattern='df\\d ')))
I prefer using tidyverse for everything, so I usually use purrr::reduce here:
library(dplyr)
library(purrr)
reduce(mget(ls(pattern='df\\d ')), inner_join)
As an alternative to name matchin with ls(pattern=)), we can use a function to select all dataframes from your global environment. We can use Filter() or purrr::keep:
library(purrr)
library(dplyr)
mget(ls()) %>% keep(is.data.frame) %>% reduce(inner_join)