I have a query in a Postgres DB that combines clients subscriptions.
I want to add a variable that's called "consecutive months" but I'm not sure how to do it in Postgres.
My raw table is something like this:
client | product | Date |
---|---|---|
1 | Sub | 2020-10-01 |
1 | Sub | 2020-11-01 |
2 | Sub | 2020-11-01 |
2 | Sub | 2020-12-01 |
1 | Sub | 2021-01-01 |
1 | Sub | 2021-02-01 |
2 | Sub | 2021-02-01 |
and I inteed to have something that counts the consecutive months originating something like this:
client | product | Date | Consecutive_months |
---|---|---|---|
1 | Sub | 2020-10-01 | 1 |
1 | Sub | 2020-11-01 | 2 |
2 | Sub | 2020-11-01 | 1 |
2 | Sub | 2020-12-01 | 2 |
1 | Sub | 2021-01-01 | 1 |
1 | Sub | 2021-02-01 | 2 |
2 | Sub | 2021-02-01 | 1 |
Thank you for the help in advanced!
CodePudding user response:
Looks like you got yourself a Gaps-And-Islands type of problem.
The trick is to calculate some ranking based on connected dates per client.
Then a sequencial number can be calculated based on client and the rank.
select client, product, "Date" , row_number() over (partition by client, daterank order by "Date") as Consecutive_months from ( select "Date", client, product , dense_rank() over (partition by client order by "Date") (DATE_PART('year', AGE(current_date, "Date"))*12 DATE_PART('month', AGE(current_date, "Date"))) daterank from raw t ) q order by "Date", client
client | product | Date | consecutive_months -----: | :------ | :--------- | -----------------: 1 | Sub | 2020-10-01 | 1 1 | Sub | 2020-11-01 | 2 2 | Sub | 2020-11-01 | 1 2 | Sub | 2020-12-01 | 2 1 | Sub | 2021-01-01 | 1 1 | Sub | 2021-02-01 | 2 2 | Sub | 2021-02-01 | 1
db<>fiddle here
CodePudding user response:
Based on the tag OP apparently realizes this is a gaps and island problem. This query extracts month and year information to generate a sequence that increments on a monthly basis. After that it's only necessary to use the standard difference logic to find the rows that move out of step and create mark off the islands.
with A as (
select *,
date_part('year', dt) * 12 date_part('month', dt)
- row_number() over (partition by client, product order by dt) as grp
from T
)
select *,
row_number()
over (partition by client, product, grp order by dt) as consecutive_months
from A;
If it's acceptable to have multiple rows falling in the same month for a given client-product then switch row_number()
to dense_rank()
in both places.
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=397a2f3282cab3b70bd7a47d1dc5ea0a