The image shows each account and its associated account status (i.e. open or closed) on two dates (31 Jan and 31 July). The code at the bottom generates the image shown. How can I edit/add to the code to extract how many accounts moved from OPEN to CLOSED between 31 January to 31 July and how many moved in the opposite direction? I am not well versed in window functions so would appreciate the help.
SELECT
date
,account_id
,account_status
,row_number() over(partition by account_id,account_status order by date asc) as row_rank
from X
INNER JOIN Y on x.account_id=y.account_id
qualify row_rank in (1,7)
CodePudding user response:
With STRING_AGG aggregate funtion instead of using a window function,
SELECT account_id, STRING_AGG(account_status, '->' ORDER BY date) AS status_change
FROM sample GROUP BY 1;
you can see the state change of each account.
Then counting status_change,
SELECT status_change, COUNT(1) cnt FROM (
SELECT account_id, STRING_AGG(account_status, '->' ORDER BY date) AS status_change
FROM sample GROUP BY 1
) GROUP BY 1;
output will be:
sample table:
CREATE TEMP TABLE sample AS
SELECT '2020-01-31' date, 1 account_id, 'OPEN' account_status, 1 row_rank UNION ALL
SELECT '2020-07-31', 1, 'CLOSED', 7 UNION ALL
SELECT '2020-01-31', 2, 'OPEN', 1 UNION ALL
SELECT '2020-07-31', 2, 'OPEN', 7 UNION ALL
SELECT '2020-01-31', 3, 'CLOSED', 1 UNION ALL
SELECT '2020-07-31', 3, 'OPEN', 7;
CodePudding user response:
You can try query below:
with sample_data as (
select date('2020-01-31') as date, 111 as account_id, 'OPEN' as account_status,
union all select date('2020-07-31') as date, 111 as account_id, 'OPEN' as account_status,
union all select date('2020-01-31') as date, 222 as account_id, 'OPEN' as account_status,
union all select date('2020-07-31') as date, 222 as account_id, 'CLOSED' as account_status,
union all select date('2020-01-31') as date, 333 as account_id, 'OPEN' as account_status,
union all select date('2020-07-31') as date, 333 as account_id, 'CLOSED' as account_status,
union all select date('2020-01-31') as date, 444 as account_id, 'OPEN' as account_status,
union all select date('2020-07-31') as date, 444 as account_id, 'OPEN' as account_status,
union all select date('2020-01-31') as date, 555 as account_id, 'CLOSED' as account_status,
union all select date('2020-07-31') as date, 555 as account_id, 'OPEN' as account_status,
union all select date('2020-08-01') as date, 555 as account_id, 'CLOSED' as account_status,
),
sample_data_mod as (
SELECT
date
,account_id
,account_status
,row_number() over(partition by account_id,account_status order by date asc) as row_rank,
lag(account_status) over (partition by account_id order by date asc) as prev_status
from sample_data
)
select
date,
account_id,
account_status,
prev_status,
countif(prev_status = 'OPEN' and account_status = 'CLOSED') over (partition by account_id order by date asc) as open_to_close,
countif(prev_status = 'CLOSED' and account_status = 'OPEN') over (partition by account_id order by date asc) as close_to_open,
from sample_data_mod
This shows the running update per account_status
change.
CodePudding user response:
Consider below option
select account_status, next_account_status, count(*) accounts_moved
from (
select *, lead(account_status) over statuses next_account_status
from your_table
window statuses as (partition by account_id order by date)
)
where not next_account_status is null
group by account_status, next_account_status