Home > Enterprise >  Merge two dataframes via join and consider timestamp
Merge two dataframes via join and consider timestamp

Time:05-23

Let's say I have this data frame

  time1               weight
  <chr>                <dbl>
1 2012-04-25 17:40:00     82
2 2012-04-25 18:40:00     81

I now have another data frame of the form:

  time2                  d1    d2    d3
  <chr>               <dbl> <dbl> <dbl>
1 2012-04-25 17:45:00     1     0     1

I now want to join these dataframes. However, I want to only use the original values in d1 d2 d3 if the time2 is greater than time1. Otherwise the values should be zero. That is I want:

  time1               weight    d1    d2    d3
  <chr>                <dbl> <dbl> <dbl> <dbl>
1 2012-04-25 17:40:00     82     0     0     0
2 2012-04-25 18:40:00     81     1     0     1

What's a sneaky way to do so?

CodePudding user response:

You can use fuzzy-join:

library(fuzzyjoin)

fuzzy_left_join(df1, df2, by = c("time1" = "time2"), match_fun = `>`)

#                 time1 weight               time2 d1 d2 d3
# 1 2012-04-25 17:40:00     82                <NA> NA NA NA
# 2 2012-04-25 18:40:00     81 2012-04-25 17:45:00  1  0  1

The mismatched rows are filled with NA. Many functions can replace NA with other values so I don't deal with it here.


Data
df1 <- structure(list(time1 = c("2012-04-25 17:40:00", "2012-04-25 18:40:00"),
weight = 82:81), class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(time2 = "2012-04-25 17:45:00", d1 = 1L, d2 = 0L, d3 = 1L),
class = "data.frame", row.names = c(NA, -1L))

CodePudding user response:

A possible approach:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

df1 <- tribble(
  ~time1, ~weight,
  "2012-04-25 17:40:00", 82,
  "2012-04-25 18:40:00", 81
) |> rename(time = time1)

df2 <- tribble(
  ~time2, ~d1, ~d2, ~d3,
  "2012-04-25 17:45:00", 1, 0, 1
)|> rename(time = time2)

bind_rows(df1, df2) |> 
  mutate(time = ymd_hms(time)) |> 
  arrange(time) |> 
  fill(-weight) |> 
  filter(!is.na(weight)) |> 
  mutate(across(-time, ~if_else(is.na(.), 0, .)))
#> # A tibble: 2 × 5
#>   time                weight    d1    d2    d3
#>   <dttm>               <dbl> <dbl> <dbl> <dbl>
#> 1 2012-04-25 17:40:00     82     0     0     0
#> 2 2012-04-25 18:40:00     81     1     0     1

Created on 2022-05-23 by the reprex package (v2.0.1)

CodePudding user response:

Since you are not actually joining anything, you are merely concatenating two data frames, you can do that first and then fix what is wrong. For ex. given your data

df1=read.table(text="
time1               weight
'2012-04-25 17:40:00'     82
'2012-04-25 18:40:00'     81",h=T)
df1$time1=as.POSIXct(df1$time1)

df2=read.table(text="
time2                  d1    d2    d3
'2012-04-25 17:45:00'     1     0     1",h=T)
df2$time2=as.POSIXct(df2$time2)

you can do

df3=cbind(df1,df2)
df3[df3$time2>df3$time1,grepl("d",colnames(df3))]=0

                time1 weight               time2 d1 d2 d3
1 2012-04-25 17:40:00     82 2012-04-25 17:45:00  0  0  0
2 2012-04-25 18:40:00     81 2012-04-25 17:45:00  1  0  1
  • Related