Home > Mobile >  New column value based on matching previous values in different columns
New column value based on matching previous values in different columns

Time:06-30

I'm having trouble generating a new column in my dataframe which is based on matching dates in different columns:

df looks something like this:

ID        date  booked.date   weather
 1  2016-12-01           NA    clouds
 1  2016-12-02   2014-10-24     sunny           
 1  2016-12-03           NA  overcast         
 2  2016-12-01   2015-12-24    clouds           
 2  2016-12-02   2016-12-01     sunny
 2  2016-12-03   2016-12-01  overcast
 2  2016-12-04   2016-01-13     sunny

date indicates the date of stay at the apartment, booking_date tells us when the apartment was booked. Now I would like to add a column booked_weather indicating the weather during the time of booking if this information is included in the df. The output would then look like this:

ID        date  booked.date   weather booked_weather
 1  2016-12-01           NA    clouds             NA
 1  2016-12-02   2014-10-24     sunny             NA
 1  2016-12-03           NA  overcast             NA
 2  2016-12-01   2015-12-24    clouds             NA
 2  2016-12-02   2016-12-01     sunny         clouds
 2  2016-12-03   2016-12-01  overcast         clouds
 2  2016-12-04   2016-01-13     sunny             NA

Note that there are readings for multiple apartment IDs, hence repeated dates with the same weather.

Here is what I have tried, does not quite get me what I need:

df %>%
  mutate(weather_booked = case_when(
    booked.date %in% date ~ weather[booked.date]
  ))

I understand why this won't give me the correct result, but I am not sure how to fix it.

CodePudding user response:

library(tidyverse)

df <- read_table("ID  date  booked.date   weather
 1  2016-12-01           NA    clouds
 1  2016-12-02   2014-10-24     sunny           
 1  2016-12-03           NA  overcast         
 2  2016-12-01   2015-12-24    clouds           
 2  2016-12-02   2016-12-01     sunny
 2  2016-12-03   2016-12-01  overcast
 2  2016-12-04   2016-01-13     sunny") 


df %>%  
  mutate(weather_booked = weather[match(booked.date, date)])


#> # A tibble: 7 x 5
#>      ID date       booked.date weather  weather_booked
#>   <dbl> <date>     <date>      <chr>    <chr>         
#> 1     1 2016-12-01 NA          clouds   <NA>          
#> 2     1 2016-12-02 2014-10-24  sunny    <NA>          
#> 3     1 2016-12-03 NA          overcast <NA>          
#> 4     2 2016-12-01 2015-12-24  clouds   <NA>          
#> 5     2 2016-12-02 2016-12-01  sunny    clouds        
#> 6     2 2016-12-03 2016-12-01  overcast clouds        
#> 7     2 2016-12-04 2016-01-13  sunny    <NA>

Created on 2022-06-29 by the reprex package (v2.0.1)

CodePudding user response:

You can do this with a "self join", joining a modified subset of the data back to itself.

df %>%
  select(booked.date, weather) %>%
  rename(date = booked.date, booked_weather = weather) %>%
  right_join(df, by = "date")
  •  Tags:  
  • r lag
  • Related