I have dataframe like this one below.
df = pd.DataFrame({
'date': ["22.10.2021", "22.10.2021", "22.10.2021", "22.10.2021", "23.10.2021"],
'Product': ["apple", "apple", "orange", "orange", "cherry"],
'sold_kg': [2, 3, 1, 6, 2]})
df['Qty of Orange Sold that day'] = df.groupby(['date', 'Product']).sold_kg.transform("sum")
My actual groupby will consist of 7-8 arguments. I need one of them be constant based. For example I want it calculate qty only of oranges sold that day but only for oranges. I mean even something like - but of course that didn't work. Could you advice something?
df['Qty of Orange Sold that day'] = = df.groupby(['date', 'Product'=='orange']).sold_kg.transform("sum")
CodePudding user response:
IIUC, iterating through the groups and assigning the new value to a new column should work. This could be written into a custom lambda function using apply, but I've had similar speed results this way.
df = pd.DataFrame({
'date': ["22.10.2021", "22.10.2021", "22.10.2021", "22.10.2021", "23.10.2021", "23.10.2021", "23.10.2021", "23.10.2021", "23.10.2021", "25.10.2021"],
'Product': ["apple", "apple", "orange", "orange", "cherry","apple", "apple", "orange", "orange", "cherry"],
'sold_kg': [2, 3, 1, 6, 2,1, 5, 2, 3, 3]})
# the if is needed in case there are no 'oranges' in a particular date. otherwise, it's throws an error.
my_constant = 'orange'
dfg = df.groupby(['date'])
df_list = []
for grp in dfg.groups:
g = dfg.get_group(grp)
if g.groupby(['Product']).sum().filter(like=my_constant, axis = 0)['sold_kg'].values.size > 0:
val = g.groupby(['Product']).sum().filter(like=my_constant, axis = 0)['sold_kg'].values[0]
else:
val = 0
df_list.append(g.assign(totals=val))
out_pd = pd.concat(df_list, axis=0)
out_pd
date Product sold_kg totals
0 22.10.2021 apple 2 7
1 22.10.2021 apple 3 7
2 22.10.2021 orange 1 7
3 22.10.2021 orange 6 7
4 23.10.2021 cherry 2 5
5 23.10.2021 apple 1 5
6 23.10.2021 apple 5 5
7 23.10.2021 orange 2 5
8 23.10.2021 orange 3 5
9 25.10.2021 cherry 3 0
CodePudding user response:
You could do the whole process in two parts.
First, filter oranges:
oranges = df[df['Product'] == 'orange']
Then, groupby date
oranges.groupby(['date'])
If you want to do this in one line:
df[df['Product'] == 'orange'].groupby(['date'])
There could be some mistakes, I am on mobile.