Home > Software engineering >  Merging dataframes that have different columns
Merging dataframes that have different columns

Time:10-19

I have 2 dataframes:

df1 = data.frame(Bird_ID = c(1:6), Sex = c("Male","Female","Male","Male","Male","UNK"), Age.years =c("2","4","8","2","12","1"))

df2 = data.frame(Bird_ID = c(7), Sex = c("Female"), date.fledged= c("19/10/2021"))

df1
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
       
df2
# Bird_ID Sex Date.fledged
# 7 Female 19/10/2021
  • My first dataframe (df1) is my database where I have all my birds records with useful informations
  • My second dataframe (df2) is the "updater". I want to merge these informations to the main database (df1) and the output would be like this:
dfmerged = data.frame(Bird_ID = c(1:7), Sex = c("Male","Female","Male","Male","Male","UNK","Female"), Age.years =c("2","4","8","2","12","1",NA))`

dfmerged
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
# 7 Female NA

How can I update the bird database df1 using information from df2and keeping only (and all) columns that are in the main database df1? For example here dfmerged keeps only the columns from df1,drops the "Date.fledged" column from df2 and bird 7 has NA as "Age.years" because data is missing (and that is the wanted output).

CodePudding user response:

You could use

library(dplyr)

df1 %>% 
  bind_rows(df2) %>% 
  select(names(df1))

This returns

  Bird_ID    Sex Age.years
1       1   Male         2
2       2 Female         4
3       3   Male         8
4       4   Male         2
5       5   Male        12
6       6    UNK         1
7       7 Female      <NA>

CodePudding user response:

You may do a full join.

merge(df1, df2, by = c('Bird_ID', 'Sex'), all = TRUE)[-4]

#  Bird_ID    Sex Age.years
#1       1   Male         2
#2       2 Female         4
#3       3   Male         8
#4       4   Male         2
#5       5   Male        12
#6       6    UNK         1
#7       7 Female      <NA>

In dplyr -

library(dplyr)

full_join(df1, df2, by = c('Bird_ID', 'Sex')) %>%
  select(-date.fledged)
  • Related