Here is the initial table as shown below.
Cust ID | Jan Transaction Fee | Jan Transaction Fee | Jan Product Fee | Jan Product Fee | Feb Transaction Fee | Feb Transaction Fee | Feb Product Fee | Feb Product Fee |
---|---|---|---|---|---|---|---|---|
HKD | USD | HKD | USD | HKD | USD | HKD | USD | |
100103 | 100 | 20 | 21 | 24 | 215 | 55 | 253 | 25 |
100104 | 200 | 30 | 31 | 34 | 315 | 65 | 353 | 35 |
I would like to convert the table from above to the below expected result.
Cust ID | Period | Type | FX | Price |
---|---|---|---|---|
100103 | 202201 | Transaction Fee | HKD | 100 |
100103 | 202201 | Transaction Fee | USD | 20 |
100103 | 202201 | Product Fee | HKD | 21 |
100103 | 202201 | Product Fee | USD | 24 |
100103 | 202202 | Transaction Fee | HKD | 215 |
100103 | 202202 | Transaction Fee | USD | 55 |
100103 | 202202 | Product Fee | HKD | 253 |
100103 | 202202 | Product Fee | USD | 25 |
100104 | 202201 | Transaction Fee | HKD | 200 |
100104 | 202201 | Transaction Fee | USD | 30 |
100104 | 202201 | Product Fee | HKD | 31 |
100104 | 202201 | Product Fee | USD | 34 |
100104 | 202202 | Transaction Fee | HKD | 315 |
100104 | 202202 | Transaction Fee | USD | 65 |
100104 | 202202 | Product Fee | HKD | 353 |
100104 | 202202 | Product Fee | USD | 35 |
My coding on import data is below
import pandas as pd
test=pd.DataFrame({'Cust ID':['','100103','100104'],'Jan Transaction Fee':['HKD',100,200],'Jan Transaction Fee.1':['USD',20,30],\
'Jan Product Fee':['HKD',21,31],'Jan Product Fee.1':['USD',24,34],
'Feb Transaction Fee':['HKD',215,315],'Feb Transaction Fee.1':['USD',55,65],
'Feb Product Fee':['HKD',253,353],'Feb Product Fee.1':['USD',25,35]})
test
is there a way to do the expected result with using Python?
CodePudding user response:
This is a complex reshape.
NB. I ignored the '.1', removed using test.columns = test.columns.map(lambda s: s.strip('.1'))
.
df = (test
.T.set_index(0, append=True).T
.set_index([('Cust ID', '')])
.stack()
.rename_axis(index=['Cust ID', 'FX'], columns='Type')
.stack()
.reset_index(name='Price')
)
df[['Period', 'Type']] = df['Type'].str.split(n=1, expand=True)
df['Period'] = pd.to_datetime('2022 ' df['Period']).dt.strftime('%Y%m')
output:
Cust ID FX Type Price Period
0 100103 HKD Transaction Fee 100 202201
1 100103 USD Product Fee 25 202202
2 100103 USD Transaction Fee 55 202202
3 100103 USD Product Fee 24 202201
4 100103 USD Transaction Fee 20 202201
5 100104 HKD Transaction Fee 200 202201
6 100104 USD Product Fee 35 202202
7 100104 USD Transaction Fee 65 202202
8 100104 USD Product Fee 34 202201
9 100104 USD Transaction Fee 30 202201