Sum of a column based on value of column from another dataframe-CodePudding

I have two df.

date <- c("2021-09-27","2021-09-28","2021-09-29","2021-09-30",
          "2021-10-01","2021-10-02","2021-10-03","2021-10-04")
temp <- c(10.5,12.1,11.3,10.2,
          11.3,11.2,10.9,14.1)
df1 <- data.frame(date, temp)
id <- c("id1","id2","id3", "id4")
date_event <- c("2021-09-30","2021-10-01","2021-10-03","2021-10-04")
df2 <- data.frame(id, date_event)

I want a new column in df2:

df2$new_column <- #sum from the first value of temp column of df1 until date value of df1 match date_event value of df2

I appreciate any help! Thanks

CodePudding user response：

In base R you vould do

merge(df2, transform(df1,temp=cumsum(temp)), by.x="date_event", by.y="date")

date_event  id temp
1 2021-09-30 id1 44.1
2 2021-10-01 id2 55.4
3 2021-10-03 id3 77.5
4 2021-10-04 id4 91.6

CodePudding user response：

If I understand you correctly, this should work, at least for smallish data. First I do a cross join between the two tables to connect every row in df2 with every row in df1, then filter out the ones in df1 that happened later, and then sum the temps for each id/date_event combo. This approach could be costly if you had a large database of id-dater_events, though.

df2 %>%
  crossing(df1) %>%
  filter(date <= date_event) %>%
  count(id, date_event, wt = temp)

Alternatively, you could precompute the cumulative totals in df1 and join those to df2; that will be much more efficient in most cases.

df2 %>%
  left_join(
    df1 %>%
      arrange(date) %>%
      mutate(n = cumsum(temp)),
    by = c("date_event" = "date")) %>%
  select(-temp) # take out temp column to match other method's output

Result

# A tibble: 4 × 3
  id    date_event     n
  <chr> <chr>      <dbl>
1 id1   2021-09-30  44.1
2 id2   2021-10-01  55.4
3 id3   2021-10-03  77.5
4 id4   2021-10-04  91.6