Below is the data set I am working with:
customer_id, event_date, status, credit_limit
1, 2019-1-1, C, 1000
1, 2019-1-5, F, 1000
1, 2019-3-10, [NULL], 1000
1, 2019-3-10, [NULL], 1000
1, 2019-8-27, L, 1000
2, 2019-1-1, L, 2000
2, 2019-1-5, [NULL], 2500
2, 2019-3-10, [NULL], 2500
3, 2019-1-1, S, 5000
3, 2019-1-5, [NULL], 6000
3, 2019-3-10, B, 5000
4, 2019-3-10, B, 10000
I am trying to solve for the following:
For each customer_id, show account status at month end for the year 2019
I have tried using windows function last_value(), but it does not give me the latest date in a month. Here is my query:
with cte1 as
(select customer_id, status,
event_date,
last_value(date_format(event_date, '%Y-%m-%d')) over ( partition by customer_id, event_date
order by event_date) as l_v
from cust_acct ca
where event_date between "2019-01-01 00:00:00" and "2019-12-31 11:59:59")
select * from cte1
It returns:
Customer_id, Status, Event_date, L_v
1, C, 2019-01-01 00:00:00, 2019-01-01
1, F, 2019-01-05 00:00:00, 2019-01-05
1, [NULL], 2019-03-10 00:00:00, 2019-03-10
1, [NULL], 2019-03-10 00:00:00, 2019-03-10
1, L, 2019-08-27 00:00:00, 2019-08-27
2, L, 2019-01-01 00:00:00, 2019-01-01
2, [NULL], 2019-01-05 00:00:00, 2019-01-05
2, [NULL], 2019-03-10 00:00:00, 2019-03-10
3, S, 2019-01-01 00:00:00, 2019-01-01
3, [NULL], 2019-01-05 00:00:00, 2019-01-05
3, B, 2019-03-10 00:00:00, 2019-03-10
4, B, 2019-03-10 00:00:00, 2019-03-10
Customer_id 1, for month 2019-01, should have a last_value of '2019-01-05' in the column l_v. Why is the query showing both dates in january in column l_v?
CodePudding user response:
LAST_VALUE()
is not the proper window function in this case.
It can be used only if you extend the window:
WITH cte1 AS (
SELECT customer_id, status, event_date,
LAST_VALUE(DATE(event_date)) OVER (
PARTITION BY customer_id, DATE_FORMAT(event_date, '%Y-%m')
ORDER BY event_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS l_v
FROM cust_acct ca
WHERE event_date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 11:59:59'
)
SELECT * FROM cte1;
You should use FIRST_VALUE()
:
WITH cte1 AS (
SELECT customer_id, status, event_date,
FIRST_VALUE(DATE(event_date)) OVER (
PARTITION BY customer_id, DATE_FORMAT(event_date, '%Y-%m')
ORDER BY event_date DESC
) AS l_v
FROM cust_acct ca
WHERE event_date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 11:59:59'
)
SELECT * FROM cte1;
or better MAX()
:
WITH cte1 AS (
SELECT customer_id, status, event_date,
MAX(DATE(event_date)) OVER (
PARTITION BY customer_id, DATE_FORMAT(event_date, '%Y-%m')
) AS l_v
FROM cust_acct ca
WHERE event_date BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 11:59:59'
)
SELECT * FROM cte1;
See the demo.