Hello I have two daframes in R, and I would like to concatenate them. The structure of the dfs its like that:
x <- data.frame(
ID = c(1, 2, 3),
S1 = c(10, NA, NA),
S2 = c(21, 22, 23)
)
y <- data.frame(
ID = c(1, 2, 3, 4),
S1 = c(11, 12, 13, 14),
S2 = c(24, 25, 26, 27)
)
And I would like to have something like:
final <- data.frame(
ID = c(1, 2, 3, 4, 5, 6, 7),
S1 = c(10, 11, 12, 13, 14, NA, NA),
S2 = c(21, 22, 23, 24, 25, 26, 27)
)
I've tried to use natural_join but it's give me a erro:
>library("rquery")
> final <- natural_join(ipeadata_d, ipeadata_d.cont, by = "ID",jointype = "FULL")
Error in natural_join.relop(dnodea, dnodeb, jointype = jointype, by = by, :
rquery::natural_join.relop all tables must have all join keys, the following keys are not in some tables: ID
I Also tried the rbind, but the dataframe keeps the NA.
I would like to concatenate the dataframe like in the "final" example and would like to transform final in a csv file. Thanks for your help.
CodePudding user response:
There are a few issues here:
- First - your example isn't reproducible because those
data.frames
do not have the same number of elements in each vector. I assume your ID vector should be of equal length to S1 and S2 - Second: it sounds like you can accomplish what you want in base R, without any special functions. You are just attempting to concatenate or "union" the 2 data.frames. R uses the command
rbind
to do this.
I am making an assumption here on what your desired output is.
Here is a working example using rbind
:
x <- data.frame(
ID = c(1, 2, 3),
S1 = c(10, NA, NA),
S2 = c(21, 22, 23)
)
y <- data.frame(
ID = c(4, 5, 6, 7),
S1 = c(11, 12, 13, 14),
S2 = c(24, 25, 26, 27)
)
final <- rbind(x,y)
> rbind(x,y)
ID S1 S2
1 1 10 21
2 2 NA 22
3 3 NA 23
4 4 11 24
5 5 12 25
6 6 13 26
7 7 14 27
For your reference, "merging" usually refers to combining 2 data.frames based on a shared column or key.
CodePudding user response:
You may combine the two datasets using bind_rows
and sort
the columns putting NA
's at the last.
library(dplyr)
bind_rows(x, y) %>%
mutate(ID = row_number(),
across(c(S1, S2), sort, na.last = TRUE))
# ID S1 S2
#1 1 10 21
#2 2 11 22
#3 3 12 23
#4 4 13 24
#5 5 14 25
#6 6 NA 26
#7 7 NA 27