Home > Mobile >  Why aren't both of my dataframes merging?
Why aren't both of my dataframes merging?

Time:06-30

I am looking to merge two data frames. Both tables have a "Name" column, and I wish to use this column as the one to connect the two. However, when I run something like:

final <- merge(df1, df2, by = "Name")

All of the columns are merged, but I only get the data from df1. And if I run something like:

final <- merge(df1, df2, by = "Name", all.y = TRUE)

The merged table just gives all of the data from df2 but excludes data from df1.

Is there a way around this?

#reproducible example

Names <- c("John", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)

df1 <- data.frame(Names, Age)

Names <- c("John", "Bill", "Maria", "Ben", "Tina")
Cars<- c("Ford", "Kia", "VW", "Toyota", "SAAB")
Color<- c("Red", "Green", "Blue", "Orange", "White")

df2<-data.frame(Names, Cars, Color)

final<- merge( df1, df2, by="Names")

#Expected Result

view(final)
Names Age Cars Color 
John  23  Ford Red 
Bill  41  Kia  Green
Maria 32  VW   Blue 
Ben   58  Toyota Orange 
Tina  26  SAAB   White 

#Actual Result

Names Age Cars Color 
John  NA  Ford Red 
Bill  NA  Kia  Green
Maria NA  VW   Blue 
Ben   NA  Toyota Orange 
Tina  NA  SAAB   White

In this example, I would get the data from df2, but df1 values would show NA.

Is there any way around this?

CodePudding user response:

your example works for me with little changes

library(tidyverse)

df1 <- tibble(
  name=c("John", "Bill", "Maria", "Ben", "Tina"), 
  age=c(23, 41, 32, 58, 26))

df2<-tibble(
  name=c("John", "Bill", "Maria", "Ben", "Tina"), 
  cars=c("Ford", "Kia", "VW", "Toyota", "SAAB"), 
  color=c("Red", "Green", "Blue", "Orange", "White"))

merge( df1, df2, by="name")

# 
   name age   cars  color
1   Ben  58 Toyota Orange
2  Bill  41    Kia  Green
3  John  23   Ford    Red
4 Maria  32     VW   Blue
5  Tina  26   SAAB  White

Here I use tidyverse cause it's kind of a standard alternative to data.frames. I'm stick with using lowercased column names -- one more code standard.

I couldn't reproduce the example because of the Color varible with wrong quotes. Could it be the cause?

CodePudding user response:

You could try using dplyr package, which is great for managing dataframes:

final <- full_join(df1, df2, by= "Name")

If the names of the columns are different just use: by= c("Name1" = "Name2")

If you only want to drop the rows that are not common you can have a look on right_join/left_join/inner_join

CodePudding user response:

besides all.x and all.y there is also an all argument for merge.

Using your code: final <- merge(df1, df2, by = "Name", all = TRUE) should work.

The RDocumentation also explains how the arguments work for merge

  • Related