Home > OS >  Using partition by to find account status at the start and end of a period
Using partition by to find account status at the start and end of a period

Time:07-05

The image shows each account and its associated account status (i.e. open or closed) on two dates (31 Jan and 31 July). The code at the bottom generates the image shown. How can I edit/add to the code to extract how many accounts moved from OPEN to CLOSED between 31 January to 31 July and how many moved in the opposite direction? I am not well versed in window functions so would appreciate the help.

enter image description here

SELECT 
date
,account_id
,account_status 
,row_number() over(partition by account_id,account_status order by date asc) as row_rank

from X
INNER JOIN Y on x.account_id=y.account_id
qualify row_rank in (1,7)

CodePudding user response:

With STRING_AGG aggregate funtion instead of using a window function,

SELECT account_id, STRING_AGG(account_status, '->' ORDER BY date) AS status_change
  FROM sample GROUP BY 1;

you can see the state change of each account.

enter image description here

Then counting status_change,

SELECT status_change, COUNT(1) cnt FROM (
  SELECT account_id, STRING_AGG(account_status, '->' ORDER BY date) AS status_change
    FROM sample GROUP BY 1
) GROUP BY 1;

output will be:

enter image description here

sample table:

CREATE TEMP TABLE sample AS
SELECT '2020-01-31' date, 1 account_id, 'OPEN' account_status, 1 row_rank UNION ALL
SELECT '2020-07-31', 1, 'CLOSED', 7 UNION ALL
SELECT '2020-01-31', 2, 'OPEN', 1 UNION ALL
SELECT '2020-07-31', 2, 'OPEN', 7 UNION ALL
SELECT '2020-01-31', 3, 'CLOSED', 1 UNION ALL
SELECT '2020-07-31', 3, 'OPEN', 7;

CodePudding user response:

You can try query below:

with sample_data as (
  select date('2020-01-31') as date, 111 as account_id, 'OPEN' as account_status,
  union all select date('2020-07-31') as date, 111 as account_id, 'OPEN' as account_status,
  union all select date('2020-01-31') as date, 222 as account_id, 'OPEN' as account_status,
  union all select date('2020-07-31') as date, 222 as account_id, 'CLOSED' as account_status,
  union all select date('2020-01-31') as date, 333 as account_id, 'OPEN' as account_status,
  union all select date('2020-07-31') as date, 333 as account_id, 'CLOSED' as account_status,
  union all select date('2020-01-31') as date, 444 as account_id, 'OPEN' as account_status,
  union all select date('2020-07-31') as date, 444 as account_id, 'OPEN' as account_status,
  union all select date('2020-01-31') as date, 555 as account_id, 'CLOSED' as account_status,
  union all select date('2020-07-31') as date, 555 as account_id, 'OPEN' as account_status,
  union all select date('2020-08-01') as date, 555 as account_id, 'CLOSED' as account_status,
),
sample_data_mod as (
  SELECT 
date
,account_id
,account_status 
,row_number() over(partition by account_id,account_status order by date asc) as row_rank,
lag(account_status) over (partition by account_id order by date asc) as prev_status
from sample_data
)


select 
date,
account_id,
account_status,
prev_status,
countif(prev_status = 'OPEN' and account_status = 'CLOSED') over (partition by account_id order by date asc) as open_to_close,
countif(prev_status = 'CLOSED' and account_status = 'OPEN') over (partition by account_id order by date asc) as close_to_open,
from sample_data_mod

Output: enter image description here

This shows the running update per account_status change.

CodePudding user response:

Consider below option

select account_status, next_account_status, count(*) accounts_moved
from (
  select *, lead(account_status) over statuses next_account_status
  from your_table
  window statuses as (partition by account_id order by date)
)
where not next_account_status is null
group by account_status, next_account_status 
  • Related