I have two df.
date <- c("2021-09-27","2021-09-28","2021-09-29","2021-09-30",
"2021-10-01","2021-10-02","2021-10-03","2021-10-04")
temp <- c(10.5,12.1,11.3,10.2,
11.3,11.2,10.9,14.1)
df1 <- data.frame(date, temp)
id <- c("id1","id2","id3", "id4")
date_event <- c("2021-09-30","2021-10-01","2021-10-03","2021-10-04")
df2 <- data.frame(id, date_event)
I want a new column in df2:
df2$new_column <- #sum from the first value of temp column of df1 until date value of df1 match date_event value of df2
I appreciate any help! Thanks
CodePudding user response:
In base R you vould do
merge(df2, transform(df1,temp=cumsum(temp)), by.x="date_event", by.y="date")
date_event id temp
1 2021-09-30 id1 44.1
2 2021-10-01 id2 55.4
3 2021-10-03 id3 77.5
4 2021-10-04 id4 91.6
CodePudding user response:
If I understand you correctly, this should work, at least for smallish data. First I do a cross join between the two tables to connect every row in df2 with every row in df1, then filter out the ones in df1 that happened later, and then sum the temps for each id/date_event combo. This approach could be costly if you had a large database of id-dater_events, though.
df2 %>%
crossing(df1) %>%
filter(date <= date_event) %>%
count(id, date_event, wt = temp)
Alternatively, you could precompute the cumulative totals in df1
and join those to df2
; that will be much more efficient in most cases.
df2 %>%
left_join(
df1 %>%
arrange(date) %>%
mutate(n = cumsum(temp)),
by = c("date_event" = "date")) %>%
select(-temp) # take out temp column to match other method's output
Result
# A tibble: 4 × 3
id date_event n
<chr> <chr> <dbl>
1 id1 2021-09-30 44.1
2 id2 2021-10-01 55.4
3 id3 2021-10-03 77.5
4 id4 2021-10-04 91.6