Using pandas to sum columns based on a criteria-CodePudding

I am trying to use pandas to group sales information based on category and a criteria.
For example in "Table 1" below, I want sales totals for each category excluding those with a "Not Stated" in the Reg/Org column. My ideal output would be in "Table 2" below. My actual data set has 184 columns, and I am trying to capture the sales volume by category across any values excluding those that are "Not Stated".

Thank you for any help or direction that you can provide.

TABLE 1

Category	Reg/Org	Sales
Apple	Regular	10
Apple	Organic	5
Apple	Not Stated	5
Banana	Regular	15
Banana	Organic	5

TABLE 2

Category	Reg/Org
Apple	15
Banana	20

The first part was to summarize the values by column for the entire data set. I utilized the code below to gather that info for each of the 184 columns. Now I want to create a further summary where I create those column totals again, but split by the 89 categories I have. Ideally, I am trying to create a cross tab, where the categories are listed down the rows, and each of the 184 columns contains the sales. (e.g. the column "Reg/Org" would no longer show "Organic" or "Regular", it would just show the sales volume for all values that are not "Not Stated".)

att_list = att.columns.tolist() 
ex_list = ['NOT STATED','NOT COLLECTED'] 
sales_list = [] 
for att_col in att_list:
    sales_list.append(att[~att[att_col].isin(ex_list)]['$'].sum())

CodePudding user response：

Try

df[df["Reg/Org"]!="Not Stated"].groupby("Category").sum()

df.groupby("Category").sum().drop(index= ["Not Stated"])

CodePudding user response：

try using "YourDataframe.loc[]" with a filter inside

import pandas as pd

data = pd.read_excel('Test_excel.xlsx')
sales_volum = data.loc[data["Reg/Org"] != "Not Stated"]
print(str(sales_volum))