Pandas : Query then groupby and sum, and finally fillna-CodePudding

I'm trying to do something that seems completely non readable, and I was wondering if I could make it simplier ( And I bet I could, but can't find how ).

This is my code :

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df['D'] = [1,2,2,1,2]

df1 = df.query('B<1')[['A', 'D']].groupby('D').sum().reset_index()

df = df.set_index(['D'])
df1 = df1.set_index(['D'])

df2 = df1.join(df[['B', 'C']], how='inner', on=['D']).reset_index()

df2.loc[df2['B'] > 1, 'A'] = 0

df

          A         B         C
D
1  0.702204  2.288548 -0.251334
2 -0.346842 -1.486899 -0.576246
2 -1.183607 -2.210152  0.409037
1 -0.884401  0.124899  1.719387
2  0.305400  0.988187  0.160168

df2
   D         A         B         C
0  1  0.000000  2.288548 -0.251334
1  1 -0.884401  0.124899  1.719387
2  2 -1.225049 -1.486899 -0.576246
3  2 -1.225049 -2.210152  0.409037
4  2 -1.225049  0.988187  0.160168

This is what I came up, and it is correct but not really readable.

I need:

To query every line where B < 1
Group by another column ( D )
Sum a third column ( A )
Put 0 in every line not matching query in 1

Is it possible to do that without that ugly code?

Thanks,

CodePudding user response：

First sort values by D if need same order like df2, then convert A to 0 if greater like 1, so possible use GroupBy.transform with sum for new Series with same size like original DataFrame, last set 0 if B is greater like 1 and assign to column A:

#query 4 use query 1, so possible reuse mask
m = df.B.lt(1)

df['A'] = df['A'].where(m, 0).groupby(df['D']).transform('sum').where(m,0)
print (df)
   D         A         B         C
0  1  0.000000  2.288548 -0.251334
1  2 -1.225049 -1.486899 -0.576246
2  2 -1.225049 -2.210152  0.409037
3  1 -0.884401  0.124899  1.719387
4  2 -1.225049  0.988187  0.160168

How it working:

print (df['A'].where(m, 0))
0    0.000000
1   -1.225049
2   -1.225049
3   -0.884401
4   -1.225049
Name: A, dtype: float64

print (df['A'].where(m, 0).groupby(df['D']).transform('sum'))
0   -0.884401
1   -3.675147
2   -3.675147
3   -0.884401
4   -3.675147
Name: A, dtype: float64

print (df['A'].where(m, 0).groupby(df['D'])
              .transform('sum').where(m,0))
0    0.000000
1   -3.675147
2   -3.675147
3   -0.884401
4   -3.675147
Name: A, dtype: float64