We have raw banking data for making credit scoring risk audit later. In simplified form it's looks like this:
import pandas as pd
d = {'contract_id':['1175082589', '1175082589', '1175082589','1175082589','1575082194','1575082194','1575082194','1575082194'],
'date_of_contract_signature':['2019-05-01','2019-05-01','2019-05-01','2019-05-01','2019-06-01','2019-06-01','2019-06-01','2019-06-01'],
'date_of_report': ['2019-05-01', '2019-06-01', '2019-07-01','2019-08-01','2019-06-01', '2019-07-01', '2019-08-01','2019-09-01'],
'debt_90_plus':[0, 0, 34800,34800,0,0,56500,56500]}
df=pd.DataFrame(data=d)
print(df)
contract_id date_of_contract_signature date_of_report debt_90_plus
0 1175082589 2019-05-01 2019-05-01 0
1 1175082589 2019-05-01 2019-06-01 0
2 1175082589 2019-05-01 2019-07-01 34800
3 1175082589 2019-05-01 2019-08-01 34800
4 1575082194 2019-06-01 2019-06-01 0
5 1575082194 2019-06-01 2019-07-01 0
6 1575082194 2019-06-01 2019-08-01 56500
7 1575082194 2019-06-01 2019-09-01 56500
The result should be month-by-month change in debt_90_plus:
d1={'date_of_contract_signature': ['2019-05-01', '2019-06-01'], 1:[0, 0],2:[0,0],3:[34800,56500],4:[34800,56500]}
df1=pd.DataFrame(data=d1)
print(df1)
date_of_contract_signature 1 2 3 4
0 2019-05-01 0 0 34800 34800
1 2019-06-01 0 0 56500 56500
I am trying:
print(pd.pivot_table(df,index=['date_of_contract_signature','date_of_report']))
Which returns very wrong result:
debt_90_plus
date_of_contract_signature date_of_report
2019-05-01 2019-05-01 0
2019-06-01 0
2019-07-01 34800
2019-08-01 34800
2019-06-01 2019-06-01 0
2019-07-01 0
2019-08-01 56500
2019-09-01 56500
What can help solving the issue?
CodePudding user response:
Example
I don't think long column names are needed in Q&A. It's just inconvenient in code or when creating output. so i make column name simple
d = {'id':['1175082589', '1175082589', '1175082589','1175082589','1575082194','1575082194','1575082194','1575082194'],
'date_A':['2019-05-01','2019-05-01','2019-05-01','2019-05-01','2019-06-01','2019-06-01','2019-06-01','2019-06-01'],
'date_B': ['2019-05-01', '2019-06-01', '2019-07-01','2019-08-01','2019-06-01', '2019-07-01', '2019-08-01','2019-09-01'],
'debt':[0, 0, 34800,34800,0,0,56500,56500]}
df=pd.DataFrame(data=d)
df
id date_A date_B debt
0 1175082589 2019-05-01 2019-05-01 0
1 1175082589 2019-05-01 2019-06-01 0
2 1175082589 2019-05-01 2019-07-01 34800
3 1175082589 2019-05-01 2019-08-01 34800
4 1575082194 2019-06-01 2019-06-01 0
5 1575082194 2019-06-01 2019-07-01 0
6 1575082194 2019-06-01 2019-08-01 56500
7 1575082194 2019-06-01 2019-09-01 56500
Code
make order series in same date
s1 = df.groupby('date_A').cumcount().add(1)
s1
0 1
1 2
2 3
3 4
4 1
5 2
6 3
7 4
dtype: int64
make pivot_table with s1
out = df.pivot_table('debt', index='date_A', columns=s1).reset_index()
out
date_A 1 2 3 4
0 2019-05-01 0 0 34800 34800
1 2019-06-01 0 0 56500 56500
Other way
If you are well with pandas, you can also use following 1-line code
out = (df.groupby('date_A')['debt'].apply(lambda x: pd.Series(list(x)))
.unstack().rename(columns=lambda x: x 1).reset_index())
same result