I am trying to use pandas to group sales information based on category and a criteria.
For example in "Table 1" below, I want sales totals for each category excluding those with a "Not Stated" in the Reg/Org column. My ideal output would be in "Table 2" below. My actual data set has 184 columns, and I am trying to capture the sales volume by category across any values excluding those that are "Not Stated".
Thank you for any help or direction that you can provide.
TABLE 1
Category | Reg/Org | Sales |
---|---|---|
Apple | Regular | 10 |
Apple | Organic | 5 |
Apple | Not Stated | 5 |
Banana | Regular | 15 |
Banana | Organic | 5 |
TABLE 2
Category | Reg/Org |
---|---|
Apple | 15 |
Banana | 20 |
The first part was to summarize the values by column for the entire data set. I utilized the code below to gather that info for each of the 184 columns. Now I want to create a further summary where I create those column totals again, but split by the 89 categories I have. Ideally, I am trying to create a cross tab, where the categories are listed down the rows, and each of the 184 columns contains the sales. (e.g. the column "Reg/Org" would no longer show "Organic" or "Regular", it would just show the sales volume for all values that are not "Not Stated".)
att_list = att.columns.tolist()
ex_list = ['NOT STATED','NOT COLLECTED']
sales_list = []
for att_col in att_list:
sales_list.append(att[~att[att_col].isin(ex_list)]['$'].sum())
CodePudding user response:
Try
df[df["Reg/Org"]!="Not Stated"].groupby("Category").sum()
Or
df.groupby("Category").sum().drop(index= ["Not Stated"])
CodePudding user response:
try using "YourDataframe.loc[]" with a filter inside
import pandas as pd
data = pd.read_excel('Test_excel.xlsx')
sales_volum = data.loc[data["Reg/Org"] != "Not Stated"]
print(str(sales_volum))