I need to groupby
my DataFrame in pandas but when I am doing that the null values are converting into zero, but I want to retain nulls. I am not sure how to do that in pandas.
Input:
Id Country Product sales qty price
1 Germany shoes 32 1 NaN
1 Germany shoes 32 1 2
2 England Shoes 22 1 NaN
2 England Shoes 22 1 NaN
3 Austria Shoes 0 3 NaN
3 Austria Shoes NaN NaN NaN
Desired output:
Id Country Product sales qty price
1 Germany shoes 64 2 2
2 England Shoes 44 2 NaN
3 Austria Shoes 0 3 NaN
CodePudding user response:
Use parameter min_count=1
in sum
:
df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
Id Country Product sales qty price
0 1 Germany shoes 64.0 2.0 2.0
1 2 England Shoes 44.0 2.0 NaN
2 3 Austria Shoes 0.0 3.0 NaN
CodePudding user response:
You could mask
it using isna
group
all
out = (df.groupby(['Id','Country','Product']).sum()
.mask(df[['sales','qty','price']].isna()
.groupby([df['Id'], df['Country'], df['Product']]).all())
.reset_index())
The same idea written differently:
cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
.mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
.reset_index())
Output:
Id Country Product sales qty price
0 1 Germany shoes 64.0 2.0 2.0
1 2 England Shoes 44.0 2.0 NaN
2 3 Austria Shoes 0.0 3.0 NaN