I am looking to merge two data frames. Both tables have a "Name" column, and I wish to use this column as the one to connect the two. However, when I run something like:
final <- merge(df1, df2, by = "Name")
All of the columns are merged, but I only get the data from df1. And if I run something like:
final <- merge(df1, df2, by = "Name", all.y = TRUE)
The merged table just gives all of the data from df2 but excludes data from df1.
Is there a way around this?
#reproducible example
Names <- c("John", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)
df1 <- data.frame(Names, Age)
Names <- c("John", "Bill", "Maria", "Ben", "Tina")
Cars<- c("Ford", "Kia", "VW", "Toyota", "SAAB")
Color<- c("Red", "Green", "Blue", "Orange", "White")
df2<-data.frame(Names, Cars, Color)
final<- merge( df1, df2, by="Names")
#Expected Result
view(final)
Names Age Cars Color
John 23 Ford Red
Bill 41 Kia Green
Maria 32 VW Blue
Ben 58 Toyota Orange
Tina 26 SAAB White
#Actual Result
Names Age Cars Color
John NA Ford Red
Bill NA Kia Green
Maria NA VW Blue
Ben NA Toyota Orange
Tina NA SAAB White
In this example, I would get the data from df2, but df1 values would show NA.
Is there any way around this?
CodePudding user response:
your example works for me with little changes
library(tidyverse)
df1 <- tibble(
name=c("John", "Bill", "Maria", "Ben", "Tina"),
age=c(23, 41, 32, 58, 26))
df2<-tibble(
name=c("John", "Bill", "Maria", "Ben", "Tina"),
cars=c("Ford", "Kia", "VW", "Toyota", "SAAB"),
color=c("Red", "Green", "Blue", "Orange", "White"))
merge( df1, df2, by="name")
#
name age cars color
1 Ben 58 Toyota Orange
2 Bill 41 Kia Green
3 John 23 Ford Red
4 Maria 32 VW Blue
5 Tina 26 SAAB White
Here I use tidyverse
cause it's kind of a standard alternative to data.frames. I'm stick with using lowercased column names -- one more code standard.
I couldn't reproduce the example because of the Color varible with wrong quotes. Could it be the cause?
CodePudding user response:
You could try using dplyr package, which is great for managing dataframes:
final <- full_join(df1, df2, by= "Name")
If the names of the columns are different just use: by= c("Name1" = "Name2")
If you only want to drop the rows that are not common you can have a look on right_join
/left_join
/inner_join
CodePudding user response:
besides all.x
and all.y
there is also an all
argument for merge
.
Using your code:
final <- merge(df1, df2, by = "Name", all = TRUE)
should work.
The RDocumentation also explains how the arguments work for merge