Home > Blockchain >  writing csv from Groupby with whole data and new columns
writing csv from Groupby with whole data and new columns

Time:10-26

The Situation:

I grouped my dataframe data and did some aggregation. When I'm printing, I get correct data, but while writing it into CSV file, I am getting only 3 columns coming in CSV which is in aggregation part.

Also I want to add one more column named AVG which will come from deviation:

AMOUNT/CUST_NUMB

My code:

final=pd.DataFrame(gg.groupby(['Phone','BID', 'P_NAME',
                'STATE','VILLAGE', 'TALUK', 'DISTRICT'])
    .agg({'AMOUNT':'sum', 'CUST_NUMB': 'count','DATE': 'max'}))
     
final.to_csv("spend.csv",index=False)

OUTPUT I AM GETTING FROM MY CODE

The Question:

So i want to write whole DF in one CSV with that extra column named :

avg_spend

Please could you help me with this.

CodePudding user response:

Can you try this one:

final.to_csv("spend.csv")

If this is not what you expect please provide more details

CodePudding user response:

Here is a minimal demo reproducing your problem:

import pandas as pd

data = {
    'Phone': ['x', 'x', 'x'],
    'BID': ['x', 'x', 'x'],
    'P_NAME': ['x', 'x', 'x'],
    'STATE': ['x', 'x', 'x'],
    'VILLAGE': ['x', 'x', 'x'],
    'TALUK': ['1', '2', '1'],
    'DISTRICT': ['x', 'x', 'x'],
    'AMOUNT': [3, 4, 5],
    'CUST_NUMB': ['12', '13', '14'],
    'DATE': ['20211025', '20211025', '20211025']
}
gg = pd.DataFrame(data)
final = pd.DataFrame(gg.groupby(['Phone', 'BID', 'P_NAME',
                                 'STATE', 'VILLAGE', 'TALUK', 'DISTRICT'])
                     .agg({'AMOUNT': 'sum', 'CUST_NUMB': 'count', 'DATE': 'max'}))
print(final)
#                                                AMOUNT  CUST_NUMB      DATE
# Phone BID P_NAME STATE VILLAGE TALUK DISTRICT                             
# x     x   x      x     x       1     x              8          2  20211025
#                                2     x              4          1  20211025

Use reset_index() to convert the groups to columns:

final = final.reset_index()
print(final)
#   Phone BID P_NAME STATE VILLAGE TALUK DISTRICT  AMOUNT  CUST_NUMB      DATE
# 0     x   x      x     x       x     1        x       8          2  20211025
# 1     x   x      x     x       x     2        x       4          1  20211025

Add new column avg_spend and write in CSV:

final['avg_spend'] = final['AMOUNT'] / final['CUST_NUMB']
print(final)
#   Phone BID P_NAME STATE  ... AMOUNT CUST_NUMB      DATE  avg_spend
# 0     x   x      x     x  ...      8         2  20211025        4.0
# 1     x   x      x     x  ...      4         1  20211025        4.0

final.to_csv("spend.csv", index=False)
  • Related