The end goal is to sum the values in the "Price" column based on multiple conditions in other columns. I am unsure how to do it though.
import pandas as pd
#creating random dataset
rnd = pd.DataFrame({'Status':['Accepted', 'Denied', 'Accepted', 'Accepted', 'Denied'],
'Letter':['A 02-02-19', 'AB 10-31-21', 'A 03-07-18', 'CD 11-13-21', 'A 04-05-21'],
'Fruit':['Apple', 'Orange', 'Blueberry', 'Orange', 'Apple'],
'Price':[10,20,14,15,29]})
#output of dataframe
rnd
Needs to have "Accepted" as a value in the "Status" Column. I know this can be done by doing
''' net = rnd.loc(rnd["Status"] == "Accepted", "Price"].sum() '''
However, I also needed it to be summed based on what occurs in the "Letter" Column. I don't care about the random dates in the value, only the characters in the beginning of the string. AB would be grouped in a different Group than A, which would also be grouped differently than CD. I am trying to figure out how to sum "Price" if I only wanted those who had "A" in the Letter column and "Accepted" in the Status column.
CodePudding user response:
Try:
rnd.query('Status == "Accepted"')\
.groupby(rnd['Letter'].str.split(' ').str[0])['Price'].sum()
Output:
Letter
A 24
CD 15
Name: Price, dtype: int64
CodePudding user response:
In general:
foo = self.db[(self.db['COL_NAME'] == val1) & (self.db['OTHER_COL'] != 0)]['COL_TO_SUM'].sum()