Home > front end >  Combine two data frames in R without repeated entries
Combine two data frames in R without repeated entries

Time:09-28

I have two data frames containing row entries with respective dates. Data frame 1 contains observations collected from 2010 to 2017.

   dates       A
 2010-01-01   21
 2010-01-02   27
 2010-01-03   34
...
 2017-12-29   22
 2017-12-30   32
 2017-12-31   25

Data frame 2 contains observations collected from 2015 to 2020.

  dates        A
 2015-01-01   20
 2015-01-02   29
 2015-01-03   34
...
 2020-12-29   22
 2020-12-30   27
 2020-12-31   32

Both the data frames have missing observations for some days. I wish to combine both data frames to fill out missing data and obtain complete time series upto 2020 without any repeated entries. Like the following data frame:

   dates       A
 2010-01-01   21
 2010-01-02   27
 2010-01-03   34
 ...
 2020-12-29   22
 2020-12-30   27
 2020-12-31   32

Using merge(df1, df2, by = 'dates') or full_join(df1, df2, by = 'dates') creates duplicate entries or two columns A.x and A.y which is not expected.

CodePudding user response:

If your df is really just two columns, you should be able to bind_rows, group_by, and distinct to remove duplicates.

library(dplyr)
df <- bind_rows(df1, df2) %>%
   group_by(dates, A) %>%
   distinct(dates)

Edit: This will not work if you have data that doesn't agree between the dataframes on a single date. If you have two records for 1/1/15 and they have different A values, those will both be retained.

CodePudding user response:

Try the code below

dfout <- unique(rbind(df1,df2))
dfout <- dfout[order(dfout$dates),]

CodePudding user response:

Combine df1 and df2, if there are duplicate dates which are available in both the dataframes sum the A value and use complete to fill the missing dates.

library(dplyr)
library(tidyr)

df1 %>%
  bind_rows(df2) %>%
  mutate(dates = as.Date(dates)) %>%
  group_by(dates) %>%
  summarise(A = sum(A)) %>%
  complete(dates = seq(min(date), max(date), by = 'day'))
  • Related