Home > database >  dplyr group_by and filter across dates
dplyr group_by and filter across dates

Time:08-19

I have a list of customers who viewed a house and who bought a house. I'd like to group_by customer and filter for customers who bought a house within a month of viewing.

example

customer <- c(1, 2, 3, 3, 4, 4, 4, 5)
action <- c("view", "view", "view", "buy", "view", "view", "buy", "view")
date <- c("2022/01/01", "2022/03/01", "2022/01/01", "2022/12/01", "2022/01/01", "2022/03/01", "2022/03/31", "2022/01/01")

df <- tibble(customer, action, date)

In this case I'd like to get back customer 4 from the filter, they viewed twice and bought within a month of the second viewing.

thanks!

CodePudding user response:

library(lubridate)
library(tidyverse)

df %>%  
  mutate(date = date %>%  
           as.Date("%Y/%m/%d")) %>% 
  pivot_wider(names_from = action, 
              values_from = date) %>%  
  unnest(everything()) %>%  
  mutate(diff = interval(view, buy) %>% 
           as.numeric("months")) %>% 
  filter(diff < 1)

# A tibble: 1 x 4
  customer view       buy         diff
     <dbl> <date>     <date>     <dbl>
1        4 2022-03-01 2022-03-31 0.986

CodePudding user response:

We may convert the 'date' to Date class, grouped by 'customer', create a 'date_buy' with 'date' where the 'action' value is 'buy', get the interval between the 'date', 'date_buy', return the number of months, check if there are any value that are 0 other than the 'buy' row to filter those 'customer' ids

library(dplyr)
library(lubridate)
df %>% 
  mutate(date = ymd(date)) %>%
  group_by(customer) %>%
  mutate(date_buy = date[action == 'buy'][1]) %>% 
  filter(any((interval(date, date_buy) %/% 
     months(1))[action != 'buy'] == 0, na.rm = TRUE)) %>%
  ungroup %>%
  select(-date_buy)

-output

# A tibble: 3 × 3
  customer action date      
     <dbl> <chr>  <date>    
1        4 view   2022-01-01
2        4 view   2022-03-01
3        4 buy    2022-03-31
  • Related