Home > Back-end >  Python Data Transformation--EDA
Python Data Transformation--EDA

Time:01-25

Trying to transform my data from

enter image description here

  • lm-stands for last month

- [enter image description here](https://img.codepudding.com/202301/bb6f687dd49f4f8e96b53a4adeab1c5f.png)

hopefully this makes sense ,how i have it

import pandas as pd
df = pd.read_excel('data.xlsx') #reading data
output = []
grouped = df.groupby('txn_id')
for txn_id, group in grouped:
avg_amt = group['avg_amount'].iloc[-1]
min_amt = group['min_amount'].iloc[-1]
lm_avg = group['avg_amount'].iloc[-6:-1]
min_amt_list = group['min_amount'].iloc[-6:-1]
output.append([txn_id, *lm_avg, min_amt, *min_amt_list])

result_df = pd.DataFrame(output, columns=['txn_id', 'lm_avg', 'lm_avg-1', 'lm_avg-2', 'lm_avg-3', 'lm_avg-4', 'lm_avg-5', 'min_am', 'min_amt-1', 'min_amt-2', 'min_amt-3', 'min_amt-4', 'min_amt-5'])#getting multiple crows for 1 txn_id which is not expected

CodePudding user response:

Use pivot_table:

# Rename columns before reshaping your dataframe with pivot_table
cols = df[::-1].groupby('TXN_ID').cumcount().astype(str)
out = (df.rename(columns={'AVG_Amount': 'lm_avg', 'MIN_AMOUNT': 'min_amnt'})
         .pivot_table(index='TXN_ID', values=['lm_avg', 'min_amnt'], columns=cols))

# Flat columns name
out.columns = ['-'.join(i) if i[1] != '0' else i[0] for i in out.columns.to_flat_index()]

# Reset index
out = out.reset_index()

Output:

>>> out
   TXN_ID  lm_avg  lm_avg-1  lm_avg-2  lm_avg-3  lm_avg-4  lm_avg-5  min_amnt  min_amnt-1  min_amnt-2  min_amnt-3  min_amnt-4  min_amnt-5
0       1     578       688       589       877       556        78       400          31          20         500         300          30
1       2     578       688       589       877       556        78       400          31          20           0           0          90
  • Related