Let's say I have this data frame
time1 weight
<chr> <dbl>
1 2012-04-25 17:40:00 82
2 2012-04-25 18:40:00 81
I now have another data frame of the form:
time2 d1 d2 d3
<chr> <dbl> <dbl> <dbl>
1 2012-04-25 17:45:00 1 0 1
I now want to join these dataframes. However, I want to only use the original values in d1 d2 d3
if the time2
is greater than time1
. Otherwise the values should be zero. That is I want:
time1 weight d1 d2 d3
<chr> <dbl> <dbl> <dbl> <dbl>
1 2012-04-25 17:40:00 82 0 0 0
2 2012-04-25 18:40:00 81 1 0 1
What's a sneaky way to do so?
CodePudding user response:
You can use fuzzy-join
:
library(fuzzyjoin)
fuzzy_left_join(df1, df2, by = c("time1" = "time2"), match_fun = `>`)
# time1 weight time2 d1 d2 d3
# 1 2012-04-25 17:40:00 82 <NA> NA NA NA
# 2 2012-04-25 18:40:00 81 2012-04-25 17:45:00 1 0 1
The mismatched rows are filled with NA
. Many functions can replace NA
with other values so I don't deal with it here.
Data
df1 <- structure(list(time1 = c("2012-04-25 17:40:00", "2012-04-25 18:40:00"),
weight = 82:81), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(time2 = "2012-04-25 17:45:00", d1 = 1L, d2 = 0L, d3 = 1L),
class = "data.frame", row.names = c(NA, -1L))
CodePudding user response:
A possible approach:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
df1 <- tribble(
~time1, ~weight,
"2012-04-25 17:40:00", 82,
"2012-04-25 18:40:00", 81
) |> rename(time = time1)
df2 <- tribble(
~time2, ~d1, ~d2, ~d3,
"2012-04-25 17:45:00", 1, 0, 1
)|> rename(time = time2)
bind_rows(df1, df2) |>
mutate(time = ymd_hms(time)) |>
arrange(time) |>
fill(-weight) |>
filter(!is.na(weight)) |>
mutate(across(-time, ~if_else(is.na(.), 0, .)))
#> # A tibble: 2 × 5
#> time weight d1 d2 d3
#> <dttm> <dbl> <dbl> <dbl> <dbl>
#> 1 2012-04-25 17:40:00 82 0 0 0
#> 2 2012-04-25 18:40:00 81 1 0 1
Created on 2022-05-23 by the reprex package (v2.0.1)
CodePudding user response:
Since you are not actually joining anything, you are merely concatenating two data frames, you can do that first and then fix what is wrong. For ex. given your data
df1=read.table(text="
time1 weight
'2012-04-25 17:40:00' 82
'2012-04-25 18:40:00' 81",h=T)
df1$time1=as.POSIXct(df1$time1)
df2=read.table(text="
time2 d1 d2 d3
'2012-04-25 17:45:00' 1 0 1",h=T)
df2$time2=as.POSIXct(df2$time2)
you can do
df3=cbind(df1,df2)
df3[df3$time2>df3$time1,grepl("d",colnames(df3))]=0
time1 weight time2 d1 d2 d3
1 2012-04-25 17:40:00 82 2012-04-25 17:45:00 0 0 0
2 2012-04-25 18:40:00 81 2012-04-25 17:45:00 1 0 1