Home > database >  How to create summarized data in pandas and python
How to create summarized data in pandas and python

Time:01-11

I am trying to format some data from an external source and make some sorting and groupings. Below is the snippet that I am using currently. I am trying to make the sorting and grouping work.

pmtAccount = []
pmtAmount = []

for payment in payments_sent:
    print ("{} {:>15}".format(payment['to'], payment['amount']))
    pmtAmount.append(payment['amount'])
    pmtAccount.append(payment['to'])


#-- Attempt to summarize output
df = pd.DataFrame(list(zip(pmtAccount, pmtAmount)), columns =['To', 'Amount'])    

result = df.groupby('To')['Amount'].agg(['count','sum']).reset_index()
result['count'] = result['count'].astype(str)   ' Transactions'
print(result)

Current Output:

#-- Output of the line print ("{} {:>15}".format(payment['to'], payment['amount']))

199SJFW1K     100.0000000
ABC123ABC     100.0000000
ABC123ABC      50.0000000
ABC123ABC     300.0000000
123ABC123     200.0000000
123ABC123      50.0000000
123ABC123     100.0000000
ABC123ABC     300.0000000
ABC123ABC     250.0000000
XYXY12XYX     250.0000000
XYXY12XYX     500.0000000
1OPR12DRT      41.0000000
KSJDHW812       0.1900000

#-- 
                                                  To           count                  sum
0  ABC123ABC       ...  5 Transactions  100.0000000 50.0000000 300.0000000 300.0000000250...
1  XYXY12XYX       ...  2 Transactions                             250.0000000500.0000000
2  KSJDHW812       ...  1 Transactions                                          0.1900000
3  1OPR12DRT       ...  1 Transactions                                         41.0000000
4  123ABC123       ...  3 Transactions                   200.000000050.0000000100.0000000
5  199SJFW1K       ...  1 Transactions                                        100.0000000

Needed Output: #-- Sorted and grouped

0  ABC123ABC    ...  5 Transactions       1,100.0000000 #-- Sum of (100.0000000, 50.0000000, 300.0000000, 300.0000000, 250.0000000)
1  123ABC123    ...  3 Transactions         350.0000000    
2  XYXY12XYX    ...  2 Transactions         750.0000000
3  1OPR12DRT    ...  1 Transactions          41.0000000
4  199SJFW1K    ...  1 Transactions         100.0000000    
5  KSJDHW812    ...  1 Transactions           0.1900000
Total Transactions: 13     Total Amount:  2,341.19 

CodePudding user response:

You have the right approach, but after you thought the data, you have not converted Amount into Float type.

df['Amount'] = df['Amount'].astype(float)

Now you can separately calculate Total Transactions and Total Amount:

result = df.groupby('To')['Amount'].agg(['count','sum']).reset_index()
total_transactions = result['count'].sum()
total_amount = result['sum'].sum()

Now the full block of code with these changes will look like this:

df = pd.DataFrame(list(zip(pmtAccount, pmtAmount)), columns =['To', 'Amount'])
df['Amount'] = df['Amount'].astype(float)

result = df.groupby('To')['Amount'].agg(['count','sum']).reset_index()
total_transactions = result['count'].sum()
total_amount = result['sum'].sum()
result['count'] = result['count'].astype(str)   ' Transactions'

result = result.sort_values(by=['sum', 'count'], ascending=[False, False])
print(result)
print("Total Transactions: {} Total Amount: {}".format(total_transactions, total_amount))

Output for this:

           To           count      sum
3  ABC123ABC  5 Transactions  1000.00
5  XYXY12XYX  2 Transactions   750.00
0  123ABC123  3 Transactions   350.00
1  199SJFW1K  1 Transactions   100.00
2  1OPR12DRT  1 Transactions    41.00
4  KSJDHW812  1 Transactions     0.19
Total Transactions: 13 Total Amount: 2241.19

CodePudding user response:

IIUC: you may need this:

import pandas as pd
df=pd.DataFrame([
    ['199SJFW1K',100.0000000],
    ['ABC123ABC',100.0000000],
    ['ABC123ABC',50.0000000],
    ['ABC123ABC',300.0000000],
    ['123ABC123',200.0000000],
    ['123ABC123',50.0000000],
    ['123ABC123',100.0000000],
    ['ABC123ABC',300.0000000],
    ['ABC123ABC',250.0000000],
    ['XYXY12XYX',250.0000000],
    ['XYXY12XYX',500.0000000],
    ['1OPR12DRT',41.0000000],
    ['KSJDHW812',0.1900000],
], columns=['to','amount'])

And you want to group by the to column

df2=df.groupby('to').size()\
    .reset_index(name='transaction')\
    .sort_values('transaction', ascending=False)
df2

groupby('to') will group by the to column size() will count each group reset_index(name='transaction') will set the count column to transaction sort_values('transaction', ascending=False) will sort values by the transaction column and ascending=False mean order from max to min This is the output

to transaction
3 ABC123ABC 5
0 123ABC123 3
5 XYXY12XYX 2
1 199SJFW1K 1
2 1OPR12DRT 1
4 KSJDHW812 1

and the total should use the code:

print(f'Total transaction: {df2.transaction.sum()}')

Then the output will be

Total transaction: 13

Additional, if you want to sum the amount, try this

df3=df.groupby('to')\
    .sum()\
    .reset_index()\
    .rename(columns={'amount':'sum_of_amount'})
df3

This is the output

to sum_of_amount
0 123ABC123 350.00
1 199SJFW1K 100.00
2 1OPR12DRT 41.00
3 ABC123ABC 1000.00
4 KSJDHW812 0.19
5 XYXY12XYX 750.00

And if you want to merge 2 dataframe, df2 and df3 to show to transaction and a sum_of_amount

pd.merge(df2,df3,on='to')

This is the final output

to transaction sum_of_amount
0 ABC123ABC 5 1000.00
1 123ABC123 3 350.00
2 XYXY12XYX 2 750.00
3 199SJFW1K 1 100.00
4 1OPR12DRT 1 41.00
5 KSJDHW812 1 0.19
  • Related