Home > database >  Pandas operation
Pandas operation

Time:12-08

I have the following python pandas data frame:

df = pd.DataFrame( {
   'Date':[202101,202102,202103,202104,202105,202106,202107,202101,202102,202103,202104,202105,
          202106],
   'ID': [1,1,1,1,1,1,1,2,2,2,2,2,2],
   'amnt': [300,200,100,50,0,250,100,1000,500,200,100,0,0],'trx': [100,0,0,0,0,0,100,1000,500,0,0,0,0]} );

>   Date    ID  amnt  trx
0   202101  1   300   100   
1   202102  1   200   0 
2   202103  1   100   0 
3   202104  1   50    0 
4   202105  1   0     0 
5   202106  1   250   0 
6   202107  1   100   100   
7   202101  2   1000  1000  
8   202102  2   500   500   
9   202103  2   200   0 
10  202104  2   100   0 
11  202105  2   0     0 
12  202106  2   0     0 

Would like to obtain this dataframe without :

The rule is : if amnt = 0 and trx = 0 for the last 3 months then status = No active (by ID) The size of my dataframe is about 10.000.000 rows.

    Date    ID  amnt    trx    status
0   202101  1   300     100    active
1   202102  1   200     0      active
2   202103  1   100     0      active
3   202104  1   50      0      active
4   202105  1   0       0      No active
5   202106  1   250     0      active
6   202107  1   100     100    active
7   202101  2   1000    1000   active
8   202102  2   500     500    active
9   202103  2   200     0      active
10  202104  2   100     0      active
11  202105  2   0       0      active
12  202106  2   0       0      No active

I would be very happy with any advice on this or idea. Thank you.

CodePudding user response:

IIUC, use boolean mask:

m1 = df['amnt'].eq(0)
m2 = df.groupby('ID')['trx'].rolling(4).sum().eq(0).droplevel(0)
df['status'] = (m1 & m2).replace({True: 'No active', False: 'active'})
print(df)

# Output:

      Date  ID  amnt   trx     status
0   202101   1   300   100     active
1   202102   1   200     0     active
2   202103   1   100     0     active
3   202104   1    50     0     active
4   202105   1     0     0  No active
5   202106   1   250     0     active
6   202107   1   100   100     active
7   202101   2  1000  1000     active
8   202102   2   500   500     active
9   202103   2   200     0     active
10  202104   2   100     0     active
11  202105   2     0     0     active
12  202106   2     0     0  No active
  • Related