Home > Mobile >  Pandas groupby with constant criteria
Pandas groupby with constant criteria

Time:05-17

I have dataframe like this one below.

df = pd.DataFrame({
    'date': ["22.10.2021", "22.10.2021", "22.10.2021", "22.10.2021", "23.10.2021"],
    'Product': ["apple", "apple", "orange", "orange", "cherry"],
    'sold_kg': [2, 3, 1, 6, 2]})

df['Qty of Orange Sold that day'] = df.groupby(['date', 'Product']).sold_kg.transform("sum")

My actual groupby will consist of 7-8 arguments. I need one of them be constant based. For example I want it calculate qty only of oranges sold that day but only for oranges. I mean even something like - but of course that didn't work. Could you advice something?

df['Qty of Orange Sold that day'] =  = df.groupby(['date', 'Product'=='orange']).sold_kg.transform("sum")

CodePudding user response:

IIUC, iterating through the groups and assigning the new value to a new column should work. This could be written into a custom lambda function using apply, but I've had similar speed results this way.

df = pd.DataFrame({
    'date': ["22.10.2021", "22.10.2021", "22.10.2021", "22.10.2021", "23.10.2021", "23.10.2021", "23.10.2021", "23.10.2021", "23.10.2021", "25.10.2021"],
    'Product': ["apple", "apple", "orange", "orange", "cherry","apple", "apple", "orange", "orange", "cherry"],
    'sold_kg': [2, 3, 1, 6, 2,1, 5, 2, 3, 3]})

# the if is needed in case there are no 'oranges' in a particular date. otherwise, it's throws an error.

my_constant = 'orange'
dfg = df.groupby(['date'])
df_list = []
for grp in dfg.groups:
    g = dfg.get_group(grp)
    if g.groupby(['Product']).sum().filter(like=my_constant, axis = 0)['sold_kg'].values.size > 0:
        val = g.groupby(['Product']).sum().filter(like=my_constant, axis = 0)['sold_kg'].values[0]
    else:
        val = 0

    df_list.append(g.assign(totals=val))
out_pd = pd.concat(df_list, axis=0)
out_pd

         date Product  sold_kg  totals
0  22.10.2021   apple        2       7
1  22.10.2021   apple        3       7
2  22.10.2021  orange        1       7
3  22.10.2021  orange        6       7
4  23.10.2021  cherry        2       5
5  23.10.2021   apple        1       5
6  23.10.2021   apple        5       5
7  23.10.2021  orange        2       5
8  23.10.2021  orange        3       5
9  25.10.2021  cherry        3       0

CodePudding user response:

You could do the whole process in two parts.

First, filter oranges:

oranges = df[df['Product'] == 'orange']

Then, groupby date

oranges.groupby(['date'])

If you want to do this in one line:

df[df['Product'] == 'orange'].groupby(['date'])

There could be some mistakes, I am on mobile.

  • Related