I'm trying to identify customers by their recurring purchases. For that I have a table with the distinct customer IDs and several columns with the year and month (YYYY-MM
) filled in by a boolean that identifies whether the customer made any purchases in the month (0
does not have a purchase and 1
does), as shown below:
customer_id | 2020_03 | 2020_04 | 2020_05 | 2020_06 | 2020_07 | 2020_08 |
---|---|---|---|---|---|---|
12345 | 1 | 1 | 1 | 0 | 0 | 0 |
12346 | 0 | 0 | 1 | 1 | 1 | 0 |
12347 | 0 | 0 | 0 | 1 | 0 | 1 |
12348 | 0 | 1 | 0 | 0 | 0 | 0 |
12349 | 0 | 0 | 0 | 1 | 0 | 1 |
In the case above, if we define that a recurring customer has purchases in three consecutive months, then only customers 12345
and 12346
fit this premise.
The question I'm stuck with is: is there any way to be able to identify these customers if they have three booleans in a row marked with 1, or if I need to process my data in some other way to achieve the goal?
CodePudding user response:
You can concatenate all columns into a single string and then look for the sequence '111'
in it. For example:
select *
from t
where '' || "2020_03" || "2020_04" || "2020_05"
|| "2020_06" || "2020_07" || "2020_08" like '1%'