I got a contract DataFrame looking like this :
Customer CONTRACT_ID PHASING_DATE SCHEDULED ARTICLE_CODE
A C0218 2021-01-01 21 001
A COZ19 2021-01-01 23 001
A IUD80 2021-01-01 43 001
A PAZO1 2021-02-01 12 002
B DZAP2 2021-01-01 3 003
B DZAH8 2021-01-01 4 003
B FGIG0 2021-03-01 5 003
C SDFH4 2021-01-01 4 004
C AZFE3 2021-04-01 54 005
C DAZJ9 2021-04-01 32 005
..
I would like to create a DataFrame based on the number of active contracts like this :
e.g there's 3 contracts with the same PHASING_DATE, I want 3 columns filled with the scheduled associated to each active contract
CUSTOMER DATE ARTICLE_CODE SCHEDULED_CT_1 SCHEDULED_CT_2 SCHEDULED_CT_3 SCHEDULED_CT_4
A 2021-01-01 001 21 23 43 0
B 2021-01-01 003 3 4 0 0
C 2021-01-01 004 4 0 0 0
A 2021-02-01 002 12 0 0 0
B 2021-03-01 003 5 0 0 0
C 2021-04-01 004 54 32 0 0
CodePudding user response:
Use GroupBy.cumcount
for counter, add column to DataFrame and pivoting by pivot
with replace missing values, rename columns by DataFrame.add_prefix
, some data cleaning and last sorting by both columns:
s = df.groupby(['Customer','PHASING_DATE','ARTICLE_CODE']).cumcount()
df = (df.assign(a=s)
.pivot(['Customer','PHASING_DATE','ARTICLE_CODE'], 'a', 'SCHEDULED')
.fillna(0)
.add_prefix('SCHEDULED_CT_')
.reset_index()
.rename_axis(None, axis=1)
.rename(columns={'PHASING_DATE':'DATE'})
.sort_values(['DATE','Customer']))
print (df)
Customer DATE ARTICLE_CODE SCHEDULED_CT_0 SCHEDULED_CT_1 \
0 A 2021-01-01 001 21 23
2 B 2021-01-01 003 3 4
4 C 2021-01-01 004 4 0
1 A 2021-02-01 002 12 0
3 B 2021-03-01 003 5 0
5 C 2021-04-01 005 54 32
SCHEDULED_CT_2
0 43
2 0
4 0
1 0
3 0
5 0