I am quite new to R and never worked with any bigger data. For the examples i reduced the two dataframes:
df1
id | val1 | val2 |
---|---|---|
11 | 1 | 2 |
11 | 2 | 5 |
22 | 2 | 2 |
22 | 4 | 6 |
... | ... | ... |
df2
id | val1 | val2 |
---|---|---|
11 | 5 | 3 |
22 | 6 | 5 |
... | ... | ... |
I am looking for a way to add the values of df2 to each value in df1 with the same id.
So the result should be something like this:
id | val1 | val2 |
---|---|---|
11 | 6 | 5 |
11 | 7 | 8 |
22 | 8 | 7 |
22 | 10 | 11 |
... | ... | ... |
Because the original data is over 3000 observations of 47 variables with 8 different id I am looking for a solution where the values are not added one by one.
#reproducible data
df1 <- read.table(text = "id val1 val2
11 1 2
11 2 5
22 2 2
22 4 6", header = TRUE)
df2 <- read.table(text = "id val1 val2
11 5 3
22 6 5", header = TRUE)
CodePudding user response:
You could use powerjoin
to handle conflicted columns when joining.
library(powerjoin)
power_left_join(df1, df2, by = "id", conflict = ` `)
# id val1 val2
# 1 11 6 5
# 2 11 7 8
# 3 22 8 7
# 4 22 10 11
CodePudding user response:
Merge the datasets then add columns:
# merge
res <- merge(df1, df2, by = "id")
# then add
cbind(res[ 1 ], res[, 2:3] res[, 4:5])
# id val1.x val2.x
# 1 11 6 5
# 2 11 7 8
# 3 22 8 7
# 4 22 10 11
CodePudding user response:
One approach is to merge both datasets by the id
variable, then additioning corresponding columns to create the new val1
and val2
variables, as suggested in the comments by @zx8754. Using dplyr
you can obtain the output with :
library(dplyr)
merge(df1,df2,by="id") %>%
mutate(val1=val1.x val1.y,val2=val2.x val2.y,)%>%
select(id,val1,val2)