Home > Mobile >  Creating totals by column in pandas df
Creating totals by column in pandas df

Time:04-08

I have a large dataset of 481K rows x 184 columns. Below is an example table. I want to be able to have a sales total for each column based on a set of criteria.

Name (Criteria = All excl. Jim) = $275 sales
Position (Criteria = All exlc. Associates) = $250 sales
City (Criteria = All excl. Detroit) = $75 sales

Ideally the current 184 columns would return as a row with an associated sales value in the second column. I'm really just trying to figure out where to start with this one. Thank you for any helping or ideas you may have.

         Name  Position   City          Sales
0        Jon   Owner      Detroit       200.0
1        Joe   Manager    Philadelphia  50.0
2        Jim   Associate  Orlando       25.0
3        Jerry Associate  Detroit       25.0

CodePudding user response:

Example:

df[df['Name']!='Jon']['Sales'].sum()

Generalized:

df[df[col_name]!=exclude_variable]['Sales'].sum()

CodePudding user response:

Try this:

df = pd.DataFrame({'Name': ['Jon', 'Joe', 'Jim', 'Jerry'],
                   'Position':['Owner', 'Manager', 'Associate', 'Associate'],
                   'City':['Detroit', 'Philadelphia', 'Orlando', 'Detroit'],
                   'Sales':[200.0, 50.0, 25.0, 25.0]})


def grab_total_sales(df, base_column='Name', exclude_list=['Jim']):
    if base_column not in df.columns:
        print(f'unknown column: {base_column} not in {list(df_columns)}')
        return None
    else:
        df_ = df[~df[base_column].isin(exclude_list)]
        return df_.Sales.sum()
    return None

# print(grab_total_sales(df, base_column='Name', exclude_list=['Jim']))
# 275.0

# print(grab_total_sales(df, base_column='Position', exclude_list=['Associate']))
# 250.0 

# print(grab_total_sales(df, base_column='City', exclude_list=['Detroit']))
# 75.0
  • Related