How to use conditional statement to map duplicates first and last with multiple columns in a datafra-CodePudding

I am working with the following dataframe:

  issue_status   market_phase   trading_status   is_and_mp      market_state         reason
0     10              0             B0              100       UNSCHEDULED_AUCTION
1     20              0                             200       CONTINUOUS_TRADING
2     40              0             D1              400           POST_TRADE
3     10              0                             100           
4     10              0                             100  
5     40              0                             400
6     40              0                             400
7     40              0                             400

I am trying to write a Python condition where if is_and_mp is 100 and trading_status is None, then for the first instance of is_and_mp of 100, mark an F in the reason column.

And do the same if is_and_mp is 400 and trading_status is None. For the second last instance where is_and_mp is 400 and trading_status is None, mark an SL in the reason column. Finally for the last instance where is_and_mp is 400 and trading_status is None, mark L in the reason column.

So the above dataframe should look like this:

  issue_status   market_phase   trading_status   is_and_mp      market_state         reason
0     10              0             B0              100       UNSCHEDULED_AUCTION
1     20              0                             200       CONTINUOUS_TRADING
2     40              0             D1              400           POST_TRADE
3     10              0                             100                                F
4     10              0                             100  
5     40              0                             400                                F
6     40              0                             400                                SL
7     40              0                             400                                L

The logic for 100 and 400 doesn't have to be together, it can be separated if it is easier!

CodePudding user response：

Create a "group" column to represent similar group of records which match the given condition:

df["group"] = (~((df["is_and_mp"].eq(df["is_and_mp"].shift())) & (df["trading_status"] == ""))).cumsum()

Then create the output "reason" column by matching the required logic:

df["reason"] = df.groupby("group")["is_and_mp"].transform(lambda grp: ["F" if v in (100,400) and len(grp)>1 and i==0 else "SL" if v in (100,400) and len(grp)>2 and i==len(grp)-2 else "L" if v in (100,400) and len(grp)>2 and i==len(grp)-1 else "" for i,v in enumerate(grp)])

Full example:

df = pd.DataFrame(data=[[10,0,"B0",100,"UNSCHEDULED_AUCTION"],[20,0,"",200,"CONTINUOUS_TRADING"],[40,0,"D1",400,"POST_TRADE"],[10,0,"",100,""],[10,0,"",100,""],[40,0,"",400,""],[40,0,"",400,""],[40,0,"",400,""]], columns=["issue_status","market_phase","trading_status","is_and_mp","market_state"])

df["group"] = (~((df["is_and_mp"].eq(df["is_and_mp"].shift())) & (df["trading_status"] == ""))).cumsum()

df["reason"] = df.groupby("group")["is_and_mp"].transform(lambda grp: ["F" if v in (100,400) and len(grp)>1 and i==0 else "SL" if v in (100,400) and len(grp)>2 and i==len(grp)-2 else "L" if v in (100,400) and len(grp)>2 and i==len(grp)-1 else "" for i,v in enumerate(grp)])

[Out]:
   issue_status  market_phase trading_status  is_and_mp         market_state  group reason
0            10             0             B0        100  UNSCHEDULED_AUCTION      1       
1            20             0                       200   CONTINUOUS_TRADING      2       
2            40             0             D1        400           POST_TRADE      3       
3            10             0                       100                           4      F
4            10             0                       100                           4       
5            40             0                       400                           5      F
6            40             0                       400                           5     SL
7            40             0                       400                           5      L

CodePudding user response：

@Azhar, I tried your solution for the below input, but it didn't quite produce the desired output for groups, could you kindly advise? Thank you!

          trading_status is_and_mp         market_state        groups
0                           000               CLOSED              1  
1                           200       CONTINUOUS_TRADING          2  
2                           103                 None              3  
3                           204     UNSCHEDULED_AUCTION           4  
4                           203     UNSCHEDULED_AUCTION           5  
5                B0         100     UNSCHEDULED_AUCTION           6  
6                B1         200      CONTINUOUS_TRADING           7  
7                           400                 None              8  
8                A0         400               HALTED              9  
9                           100                 None             10  
10                          100                 None             10  
11                          400                 None             11