There are three data sets
Parent Category:
ID
Ants
Cow
Dog
Hen
Tiger
df1:
ID Number1
Ants 6
Dog 2
Hen 7
df2:
ID Number2
Ants 5
Cow 7
Tiger 3
The final data (df3) should look like this:
ID Number1 Number2
Ants 6 5
Cow 0 7
Dog 2 0
Hen 7 0
Tiger 0 3
Is there any direct way to achieve this in R. I can do it in a manual way where I try to find out the missing values and then input it in each row and assign it to 0 and then arrange it in ascending order. Lastly I will merge the two data frames into one. But can we make a function that can do all this for us, with even more than two datasets.
I found a solution, here
but this is taking to much of time to run and showing the error Error: cannot allocate a vector of size 46.0 MB
CodePudding user response:
We can put all dataframes into a list and use Reduce
and merge
from base R:
df <- Reduce(function(...) merge(..., by='ID', all.x=TRUE), list(parent, df1, df2))
df[is.na(df)] <- 0
ID Number1 Number2
1 Ants 6 5
2 Cow 0 7
3 Dog 2 0
4 Hen 7 0
5 Tiger 0 3
Or we can use join_all
from plyr
:
library(plyr)
join_all(list(parent, df1, df2), by='ID', type='left') %>%
replace(is.na(.), 0)
Or with purrr::reduce
:
library(tidyverse)
reduce(list(parent, df1, df2), left_join, by = 'ID') %>%
mutate(across(where(is.numeric), ~ replace_na(.x, 0)))
Data
parent <- structure(list(ID = c("Ants", "Cow", "Dog", "Hen", "Tiger")),
class = "data.frame", row.names = c(NA, -5L))
df1 <- structure(list(ID = c("Ants", "Dog", "Hen"),
Number1 = c(6L, 2L, 7L)),
class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(ID = c("Ants", "Cow", "Tiger"),
Number2 = c(5L, 7L, 3L)),
class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
You can use full_join
from dplyr
:
full_join(df1, df2, "ID") %>% replace_NA(0)
For your data :
library(dplyr)
library(collapse)
df1 <- data.frame(ID = c("Ants", "Dog", "Hen"), Number1 = c(6,2,7))
df2 <- data.frame(ID = c("Ants", "Cow", "Tiger"), Number2 = c(5,7,3))
full_join(df1, df2, "ID") %>% replace_NA(0) %>% arrange(ID)