I'm having trouble generating a new column in my dataframe which is based on matching dates in different columns:
df
looks something like this:
ID date booked.date weather
1 2016-12-01 NA clouds
1 2016-12-02 2014-10-24 sunny
1 2016-12-03 NA overcast
2 2016-12-01 2015-12-24 clouds
2 2016-12-02 2016-12-01 sunny
2 2016-12-03 2016-12-01 overcast
2 2016-12-04 2016-01-13 sunny
date
indicates the date of stay at the apartment, booking_date
tells us when the apartment was booked.
Now I would like to add a column booked_weather
indicating the weather during the time of booking if this information is included in the df
.
The output would then look like this:
ID date booked.date weather booked_weather
1 2016-12-01 NA clouds NA
1 2016-12-02 2014-10-24 sunny NA
1 2016-12-03 NA overcast NA
2 2016-12-01 2015-12-24 clouds NA
2 2016-12-02 2016-12-01 sunny clouds
2 2016-12-03 2016-12-01 overcast clouds
2 2016-12-04 2016-01-13 sunny NA
Note that there are readings for multiple apartment IDs, hence repeated dates with the same weather.
Here is what I have tried, does not quite get me what I need:
df %>%
mutate(weather_booked = case_when(
booked.date %in% date ~ weather[booked.date]
))
I understand why this won't give me the correct result, but I am not sure how to fix it.
CodePudding user response:
library(tidyverse)
df <- read_table("ID date booked.date weather
1 2016-12-01 NA clouds
1 2016-12-02 2014-10-24 sunny
1 2016-12-03 NA overcast
2 2016-12-01 2015-12-24 clouds
2 2016-12-02 2016-12-01 sunny
2 2016-12-03 2016-12-01 overcast
2 2016-12-04 2016-01-13 sunny")
df %>%
mutate(weather_booked = weather[match(booked.date, date)])
#> # A tibble: 7 x 5
#> ID date booked.date weather weather_booked
#> <dbl> <date> <date> <chr> <chr>
#> 1 1 2016-12-01 NA clouds <NA>
#> 2 1 2016-12-02 2014-10-24 sunny <NA>
#> 3 1 2016-12-03 NA overcast <NA>
#> 4 2 2016-12-01 2015-12-24 clouds <NA>
#> 5 2 2016-12-02 2016-12-01 sunny clouds
#> 6 2 2016-12-03 2016-12-01 overcast clouds
#> 7 2 2016-12-04 2016-01-13 sunny <NA>
Created on 2022-06-29 by the reprex package (v2.0.1)
CodePudding user response:
You can do this with a "self join", joining a modified subset of the data back to itself.
df %>%
select(booked.date, weather) %>%
rename(date = booked.date, booked_weather = weather) %>%
right_join(df, by = "date")