I have a list of customers who viewed a house and who bought a house. I'd like to group_by customer and filter for customers who bought a house within a month of viewing.
example
customer <- c(1, 2, 3, 3, 4, 4, 4, 5)
action <- c("view", "view", "view", "buy", "view", "view", "buy", "view")
date <- c("2022/01/01", "2022/03/01", "2022/01/01", "2022/12/01", "2022/01/01", "2022/03/01", "2022/03/31", "2022/01/01")
df <- tibble(customer, action, date)
In this case I'd like to get back customer 4 from the filter, they viewed twice and bought within a month of the second viewing.
thanks!
CodePudding user response:
library(lubridate)
library(tidyverse)
df %>%
mutate(date = date %>%
as.Date("%Y/%m/%d")) %>%
pivot_wider(names_from = action,
values_from = date) %>%
unnest(everything()) %>%
mutate(diff = interval(view, buy) %>%
as.numeric("months")) %>%
filter(diff < 1)
# A tibble: 1 x 4
customer view buy diff
<dbl> <date> <date> <dbl>
1 4 2022-03-01 2022-03-31 0.986
CodePudding user response:
We may convert the 'date' to Date
class, grouped by 'customer', create a 'date_buy' with 'date' where the 'action' value is 'buy', get the interval
between the 'date', 'date_buy', return the number of months, check if there are any
value that are 0 other than the 'buy' row to filter
those 'customer' ids
library(dplyr)
library(lubridate)
df %>%
mutate(date = ymd(date)) %>%
group_by(customer) %>%
mutate(date_buy = date[action == 'buy'][1]) %>%
filter(any((interval(date, date_buy) %/%
months(1))[action != 'buy'] == 0, na.rm = TRUE)) %>%
ungroup %>%
select(-date_buy)
-output
# A tibble: 3 × 3
customer action date
<dbl> <chr> <date>
1 4 view 2022-01-01
2 4 view 2022-03-01
3 4 buy 2022-03-31