I have data with 4 columns: company_id, obs_period (observation period), hist_period (historical observation period) and is_payroll (payroll status of company). I want to create dummy variable (dummy) that identify if company left payroll status (e.g. is_payroll column change from P to NP). how can I solve this problem in Oracle SQL.
compay_NO | obs_period | hist_period | is_payroll | dummy |
---|---|---|---|---|
11 | 2020/09 | 2018/10 | NP | 0 |
11 | 2020/09 | 2018/11 | NP | 0 |
11 | 2020/09 | 2018/12 | NP | 0 |
11 | 2020/09 | 2019/01 | P | 0 |
11 | 2020/09 | 2019/02 | P | 0 |
11 | 2020/09 | 2019/03 | P | 0 |
11 | 2020/09 | 2019/04 | P | 0 |
11 | 2020/09 | 2019/05 | P | 0 |
11 | 2020/09 | 2019/06 | P | 0 |
11 | 2020/09 | 2019/07 | P | 0 |
12 | 2020/09 | 2019/08 | P | 1 |
12 | 2020/09 | 2019/09 | P | 1 |
12 | 2020/09 | 2019/10 | P | 1 |
12 | 2020/09 | 2019/11 | P | 1 |
12 | 2020/09 | 2019/12 | P | 1 |
12 | 2020/09 | 2020/01 | NP | 1 |
12 | 2020/09 | 2020/02 | NP | 1 |
12 | 2020/09 | 2020/03 | NP | 1 |
12 | 2020/09 | 2020/04 | NP | 1 |
12 | 2020/09 | 2020/05 | NP | 1 |
12 | 2020/09 | 2020/06 | NP | 1 |
CodePudding user response:
Here's one option: compare is_payroll
with the next is_payroll
per each company_no
(sorted by hist_period
); if there was switch from P
to NP
, then dummy = 1
:
SQL> with temp as
2 (select company_no,
3 hist_period,
4 is_payroll,
5 --
6 case when is_payroll = 'P' and
7 lead(is_payroll) over (partition by company_no
8 order by hist_period) = 'NP' then 1
9 else 0
10 end dummy
11 from test t
12 )
13 select company_no,
14 hist_period,
15 is_payroll,
16 max(dummy) over (partition by company_no) dummy
17 from temp;
COMPANY_NO HIST_PERIOD IS_PAYROLL DUMMY
---------- ------------ ---------- ----------
11 2018/10 NP 0
11 2018/11 NP 0
11 2018/12 NP 0
11 2019/01 P 0
11 2019/02 P 0
11 2019/03 P 0
12 2019/11 P 1
12 2019/12 P 1
12 2020/01 NP 1
12 2020/02 NP 1
12 2020/03 NP 1
11 rows selected.
SQL>