I have a pandas dataframe that initially has these columns below and grows as the program continues.
'Branch_ID', 'Region', 'ActualRegion', 'Lease_Id', 'Year', 'Make', 'Model', 'VIN', 'Plate_Number', 'Full_Name', 'Employee Number', 'Job Title', 'Device Serial Number', 'Blackout since', 'Covered'
Currently I have multiple groupby Statements that look similar to this. This example sums everything in the last column called Miles Driven
report = report.groupby(['Branch_ID', 'Region', 'ActualRegion', 'Lease_Id',
'Year', 'Make', 'Model', 'VIN',
'Plate_Number', 'Full_Name', 'Employee Number', 'Job Title',
'Device Serial Number', 'Blackout since', 'Covered']).sum().reset_index()
I have to do a similar process multiple times and each time I do it a new column is added.. I am trying to create a reusable function to simplify and get rid of redundant code.
What I have tried; in my head the code above should be able to look like this.. I am summing a column called Miles Driven if there are duplicates in the other columns
columns_to_group = report.columns.difference(['Miles Driven'])
report = report.groupby(columns_to_group).sum().reset_index()
This is much more elegant and will help me create a function that could shorten my code signficantly but I have tried many ways to get something similar to work and can't.
The error above that I get is
raise ValueError("Grouper and axis must be same length")
If I print out columns_to_group it is identical to what I am inserting in the groupby statement above.
CodePudding user response:
If I print out columns_to_group it is identical to what I am inserting in the groupby statement above
Whilst it might looks like list
, it is not, convert it to list
and it should work. Simple example
import pandas as pd
df = pd.DataFrame({'X':[0,0,1,1],'Y':[1,1,0,0],'Z':[1,10,100,1000]})
group_cols = df.columns.difference(['Z'])
df_sum = df.groupby(list(group_cols)).sum().reset_index()
print(df_sum)
output
X Y Z
0 0 1 11
1 1 0 1100
Note: I used own data sample for brevity sake.