I'm struggling on how can I make a merge using two columns. I have one dataframe containing measure about how much palette was used in some dates. I have another dataframe containing the distance travelled by the car. Then I need to merge both, and the condition to join is that: the car and the sum of the distance of one car until the date that the measure of the palette occur. Here is a toy example:
#palette measure dataframe
measure = data.frame(car = c("A", "A", "A", "B"), data1 = c("20-09-2020", "15-10-2020", "13-05-2021", "20-10-2021"), palette = c(5,4,3,5))
#> measure
# car data1 palette
#1 A 20-09-2020 5
#2 A 15-10-2020 4
#3 A 13-05-2021 3
#4 B 20-10-2021 5
#the distance dataframe
dist_ = data.frame(car = c("A", "C", "B", "A", "A", "A"), data2 = c("20-09-2020", "14-05-2020", "20-10-2021", "10-01-2021", "11-01-2021", "13-01-2021"), distance = c(10, 20, 10, 5, 3,8))
#> dist_
# car data2 distance
#1 A 20-09-2020 10
#2 C 14-05-2020 20
#3 B 20-10-2021 10
#4 A 10-01-2021 5
#5 A 11-01-2021 3
#6 A 13-01-2021 8
#for result I'd like something like
# car data1 palette distance
#1 A 20-09-2020 5 10
#2 A 15-10-2020 4 0
#3 A 13-05-2020 3 16
#4 B 20-10-2021 5 10
Note that the distance are summed until I have a date that the palette are measured. So I can say that a car has covered a distance of 16 km and its palette is 3 cm.
I thought that I could use something like merge(x = measure, y = dist_, by.x=c("car", "date1"), by.y=c("car", "data2"),all.x = T)
, but I don't know how to sum the distance values until the date of the pallete measure for a specif car.
Any hint on how could I do that?
CodePudding user response:
Something like this would work:
library(tidyverse)
library(lubridate)
result <- left_join(measure, dist_, by = c("car")) %>%
mutate(across(c("data1", "data2"), dmy)) %>%
filter(data1 >= data2) %>%
group_by(car, data2) %>%
mutate(threshold = min(data1)) %>%
ungroup() %>%
filter(data1 == threshold) %>%
group_by(car, data1, palette)%>%
summarise(distance = sum(distance))
result
# A tibble: 3 x 4
# Groups: car, data1 [3]
car data1 palette distance
<chr> <date> <dbl> <dbl>
1 A 2020-09-20 5 10
2 A 2021-05-13 3 16
3 B 2021-10-20 5 10
If you want to keep the non-matches you could then rejoin with measure
like so:
result.final <- measure %>%
mutate(data1 = dmy(data1))%>%
left_join(result, by = c("car", "data1", "palette"))
result.final
car data1 palette distance
1 A 2020-09-20 5 10
2 A 2020-10-15 4 NA
3 A 2021-05-13 3 16
4 B 2021-10-20 5 10