Home > OS >  Using pandas to sum columns based on a criteria
Using pandas to sum columns based on a criteria

Time:04-09

I am trying to use pandas to group sales information based on category and a criteria.
For example in "Table 1" below, I want sales totals for each category excluding those with a "Not Stated" in the Reg/Org column. My ideal output would be in "Table 2" below. My actual data set has 184 columns, and I am trying to capture the sales volume by category across any values excluding those that are "Not Stated".

Thank you for any help or direction that you can provide.

TABLE 1

Category Reg/Org Sales
Apple Regular 10
Apple Organic 5
Apple Not Stated 5
Banana Regular 15
Banana Organic 5

TABLE 2

Category Reg/Org
Apple 15
Banana 20

The first part was to summarize the values by column for the entire data set. I utilized the code below to gather that info for each of the 184 columns. Now I want to create a further summary where I create those column totals again, but split by the 89 categories I have. Ideally, I am trying to create a cross tab, where the categories are listed down the rows, and each of the 184 columns contains the sales. (e.g. the column "Reg/Org" would no longer show "Organic" or "Regular", it would just show the sales volume for all values that are not "Not Stated".)

att_list = att.columns.tolist() 
ex_list = ['NOT STATED','NOT COLLECTED'] 
sales_list = [] 
for att_col in att_list:
    sales_list.append(att[~att[att_col].isin(ex_list)]['$'].sum())

CodePudding user response:

Try

df[df["Reg/Org"]!="Not Stated"].groupby("Category").sum()

Or

df.groupby("Category").sum().drop(index= ["Not Stated"])

CodePudding user response:

try using "YourDataframe.loc[]" with a filter inside

import pandas as pd

data = pd.read_excel('Test_excel.xlsx')
sales_volum = data.loc[data["Reg/Org"] != "Not Stated"]
print(str(sales_volum))
  • Related