Thanks in advance for any help.
I have a table with unique tickets, customer IDs and ticket price. For each ticket, I want to see the number of tickets and total revenue from a customer 3 months after the date of the ticket.
I tried to use the partition by function with the date condition set in the on clause, but it just evaluates all tickets of the customer rather than the 3 month period I want.
select distinct on (at2.ticket_number)
at2.customer_id
,at2.ticket_id
,at2.ticket_number
,at2.initial_sale_date
,ata.tix "a_tix"
,ata.aov "a_aov"
,ata.rev "a_rev"
from reports.agg_tickets at2
left join (select at2.customer_id, at2.final_fare_value, at2.initial_sale_date, count(at2.customer_id) OVER (PARTITION BY at2.customer_id) AS tix,
avg(at2.final_fare_value) over (partition by at2.customer_id) as aov,
sum(at2.final_fare_value) over (partition by at2.customer_id) as rev
from reports.agg_tickets at2
) ata
on (ata.customer_id = at2.customer_id
and ata.initial_sale_date > at2.initial_sale_date
and ata.initial_sale_date < at2.initial_sale_date interval '3 months')
I could use a left join lateral, but it takes far too long. Slightly confused with how to achieve what I want, so any help would be greatly appreciated.
Many thanks
Edit: Here is the sample of data. Picture of data table.
The table is unique on ticket number, but not on customer.
CodePudding user response:
No need to use a join at all, this will yield (as you observe) a problemetic performnce. What is your solution is a plain window function with a frame_clause that will consider the next 3 months for each ticket
Example (self explained)
count(*) over (partition by customer_id order by initial_sale_date
range between current row and '3 months' following) ticket_cnt
Here a full query with simplified sample data and the result
with dt as (
select * from (values
(1, 1, date'2020-01-01', 10),
(1, 2, date'2020-02-01', 15),
(1, 3, date'2020-03-01', 20),
(1, 4, date'2020-04-01', 25),
(1, 5, date'2020-05-01', 30),
(2, 6, date'2020-01-01', 15),
(2, 7, date'2020-02-01', 20),
(2, 7, date'2021-01-01', 25)
) tab (customer_id, ticket_id, initial_sale_date,final_fare_value)
)
select
customer_id, ticket_id, initial_sale_date, final_fare_value,
count(*) over (partition by customer_id order by initial_sale_date range between current row and '3 months' following) ticket_cnt,
sum(final_fare_value) over (partition by customer_id order by initial_sale_date range between current row and '3 months' following) ticket_sum
from dt;
customer_id|ticket_id|initial_sale_date|final_fare_value|ticket_cnt|ticket_sum|
----------- --------- ----------------- ---------------- ---------- ----------
1| 1| 2020-01-01| 10| 4| 70|
1| 2| 2020-02-01| 15| 4| 90|
1| 3| 2020-03-01| 20| 3| 75|
1| 4| 2020-04-01| 25| 2| 55|
1| 5| 2020-05-01| 30| 1| 30|
2| 6| 2020-01-01| 15| 2| 35|
2| 7| 2020-02-01| 20| 1| 20|
2| 7| 2021-01-01| 25| 1| 25|