Home > Back-end >  Add column to state whether two purchases were made by same customer within 30 days
Add column to state whether two purchases were made by same customer within 30 days

Time:07-28

I have a dataset that I can reduce to two columns - firstly, CustomerID, secondly, SaleDate. I have already taken steps to ensure that this dataset includes only customers with at least two sales. I would like to add a 3rd column which states whether or not the second sale (or 3rd sale compared to the second etc.) was within 30 days of the previous sale.

CustomerID SaleDate
1 14/09/2021
1 30/09/2021
1 03/10/2021
2 23/07/2021
2 24/07/2021

With the new column entitled 'SaleDate' adding a '1' if the resale is within 30 days of the previous sale and a '0' if not (e.g. it's the first sale or not within 30 of the previous sale). So the end result in this case would be:

CustomerID SaleDate ResaleWithin30
1 14/09/2021 0
1 30/09/2021 1
1 03/10/2021 1
2 23/07/2021 0
2 24/07/2021 1

Any advice on this would be very much appreciated, thank you.

CodePudding user response:

You can use lag to check whether the next record is within 30 days of the present record.

library(dplyr)

df %>% 
  group_by(CustomerID) %>% 
  mutate(SaleDate = as.Date(SaleDate, tryFormats = "%d/%m/%Y"),
         ResaleWithin30 = as.integer(SaleDate - lag(SaleDate) <= 30),
         ResaleWithin30 = replace_na(ResaleWithin30, 0))

# A tibble: 5 × 3
# Groups:   CustomerID [2]
  CustomerID SaleDate   ResaleWithin30
       <int> <date>              <int>
1          1 2021-09-14              0
2          1 2021-09-30              1
3          1 2021-10-03              1
4          2 2021-07-23              0
5          2 2021-07-24              1
  •  Tags:  
  • r
  • Related