Home > Enterprise >  Groupby everything but one column
Groupby everything but one column

Time:07-22

I have a pandas dataframe that initially has these columns below and grows as the program continues.

'Branch_ID', 'Region', 'ActualRegion', 'Lease_Id', 'Year', 'Make', 'Model', 'VIN', 'Plate_Number', 'Full_Name', 'Employee Number', 'Job Title', 'Device Serial Number', 'Blackout since', 'Covered'

Currently I have multiple groupby Statements that look similar to this. This example sums everything in the last column called Miles Driven

report = report.groupby(['Branch_ID', 'Region', 'ActualRegion', 'Lease_Id', 
                         'Year', 'Make',  'Model',  'VIN',  
                        'Plate_Number',  'Full_Name', 'Employee Number', 'Job Title',
                      'Device Serial Number', 'Blackout since',  'Covered']).sum().reset_index()

I have to do a similar process multiple times and each time I do it a new column is added.. I am trying to create a reusable function to simplify and get rid of redundant code.

What I have tried; in my head the code above should be able to look like this.. I am summing a column called Miles Driven if there are duplicates in the other columns

columns_to_group = report.columns.difference(['Miles Driven'])

report = report.groupby(columns_to_group).sum().reset_index()

This is much more elegant and will help me create a function that could shorten my code signficantly but I have tried many ways to get something similar to work and can't.

The error above that I get is

raise ValueError("Grouper and axis must be same length")

If I print out columns_to_group it is identical to what I am inserting in the groupby statement above.

CodePudding user response:

If I print out columns_to_group it is identical to what I am inserting in the groupby statement above

Whilst it might looks like list, it is not, convert it to list and it should work. Simple example

import pandas as pd
df = pd.DataFrame({'X':[0,0,1,1],'Y':[1,1,0,0],'Z':[1,10,100,1000]})
group_cols = df.columns.difference(['Z'])
df_sum = df.groupby(list(group_cols)).sum().reset_index()
print(df_sum)

output

   X  Y     Z
0  0  1    11
1  1  0  1100

Note: I used own data sample for brevity sake.

  • Related