Home > OS >  Merge R data frames with differing lengths
Merge R data frames with differing lengths

Time:02-14

There are three data sets

Parent Category:

ID 
Ants
Cow
Dog
Hen
Tiger

df1:

ID     Number1
Ants    6
Dog     2
Hen     7

df2:

ID     Number2
Ants    5
Cow     7
Tiger   3

The final data (df3) should look like this:

ID      Number1  Number2
Ants       6        5
Cow        0        7
Dog        2        0
Hen        7        0
Tiger      0        3

Is there any direct way to achieve this in R. I can do it in a manual way where I try to find out the missing values and then input it in each row and assign it to 0 and then arrange it in ascending order. Lastly I will merge the two data frames into one. But can we make a function that can do all this for us, with even more than two datasets.

I found a solution, here

but this is taking to much of time to run and showing the error Error: cannot allocate a vector of size 46.0 MB

CodePudding user response:

We can put all dataframes into a list and use Reduce and merge from base R:

df <- Reduce(function(...) merge(..., by='ID', all.x=TRUE), list(parent, df1, df2))
df[is.na(df)] <- 0 

     ID Number1 Number2
1  Ants       6       5
2   Cow       0       7
3   Dog       2       0
4   Hen       7       0
5 Tiger       0       3

Or we can use join_all from plyr:

library(plyr)
join_all(list(parent, df1, df2), by='ID', type='left') %>% 
  replace(is.na(.), 0)

Or with purrr::reduce:

library(tidyverse)

reduce(list(parent, df1, df2), left_join, by = 'ID') %>% 
  mutate(across(where(is.numeric), ~ replace_na(.x, 0)))

Data

parent <- structure(list(ID = c("Ants", "Cow", "Dog", "Hen", "Tiger")), 
                    class = "data.frame", row.names = c(NA, -5L))

df1 <- structure(list(ID = c("Ants", "Dog", "Hen"), 
                      Number1 = c(6L, 2L, 7L)), 
                 class = "data.frame", row.names = c(NA, -3L))

df2 <- structure(list(ID = c("Ants", "Cow", "Tiger"), 
                      Number2 = c(5L, 7L, 3L)), 
                 class = "data.frame", row.names = c(NA, -3L))

CodePudding user response:

You can use full_join from dplyr :

full_join(df1, df2, "ID") %>% replace_NA(0)

For your data :

library(dplyr)
library(collapse)

df1 <- data.frame(ID = c("Ants", "Dog", "Hen"), Number1 = c(6,2,7))
df2 <- data.frame(ID = c("Ants", "Cow", "Tiger"), Number2 = c(5,7,3))

full_join(df1, df2, "ID") %>% replace_NA(0) %>% arrange(ID)
  • Related