I am working with the following dataframe:
issue_status market_phase trading_status is_and_mp market_state reason
0 10 0 B0 100 UNSCHEDULED_AUCTION
1 20 0 200 CONTINUOUS_TRADING
2 40 0 D1 400 POST_TRADE
3 10 0 100
4 10 0 100
5 40 0 400
6 40 0 400
7 40 0 400
I am trying to write a Python condition where if is_and_mp
is 100
and trading_status
is None
, then for the first instance of is_and_mp
of 100, mark an F
in the reason
column.
And do the same if is_and_mp
is 400 and trading_status
is None
.
For the second last instance where is_and_mp
is 400
and trading_status
is None, mark an SL
in the reason
column.
Finally for the last instance where is_and_mp
is 400
and trading_status
is None
, mark L
in the reason
column.
So the above dataframe should look like this:
issue_status market_phase trading_status is_and_mp market_state reason
0 10 0 B0 100 UNSCHEDULED_AUCTION
1 20 0 200 CONTINUOUS_TRADING
2 40 0 D1 400 POST_TRADE
3 10 0 100 F
4 10 0 100
5 40 0 400 F
6 40 0 400 SL
7 40 0 400 L
The logic for 100
and 400
doesn't have to be together, it can be separated if it is easier!
CodePudding user response:
Create a "group" column to represent similar group of records which match the given condition:
df["group"] = (~((df["is_and_mp"].eq(df["is_and_mp"].shift())) & (df["trading_status"] == ""))).cumsum()
Then create the output "reason" column by matching the required logic:
df["reason"] = df.groupby("group")["is_and_mp"].transform(lambda grp: ["F" if v in (100,400) and len(grp)>1 and i==0 else "SL" if v in (100,400) and len(grp)>2 and i==len(grp)-2 else "L" if v in (100,400) and len(grp)>2 and i==len(grp)-1 else "" for i,v in enumerate(grp)])
Full example:
df = pd.DataFrame(data=[[10,0,"B0",100,"UNSCHEDULED_AUCTION"],[20,0,"",200,"CONTINUOUS_TRADING"],[40,0,"D1",400,"POST_TRADE"],[10,0,"",100,""],[10,0,"",100,""],[40,0,"",400,""],[40,0,"",400,""],[40,0,"",400,""]], columns=["issue_status","market_phase","trading_status","is_and_mp","market_state"])
df["group"] = (~((df["is_and_mp"].eq(df["is_and_mp"].shift())) & (df["trading_status"] == ""))).cumsum()
df["reason"] = df.groupby("group")["is_and_mp"].transform(lambda grp: ["F" if v in (100,400) and len(grp)>1 and i==0 else "SL" if v in (100,400) and len(grp)>2 and i==len(grp)-2 else "L" if v in (100,400) and len(grp)>2 and i==len(grp)-1 else "" for i,v in enumerate(grp)])
[Out]:
issue_status market_phase trading_status is_and_mp market_state group reason
0 10 0 B0 100 UNSCHEDULED_AUCTION 1
1 20 0 200 CONTINUOUS_TRADING 2
2 40 0 D1 400 POST_TRADE 3
3 10 0 100 4 F
4 10 0 100 4
5 40 0 400 5 F
6 40 0 400 5 SL
7 40 0 400 5 L
CodePudding user response:
@Azhar, I tried your solution for the below input, but it didn't quite produce the desired output for groups
, could you kindly advise? Thank you!
trading_status is_and_mp market_state groups
0 000 CLOSED 1
1 200 CONTINUOUS_TRADING 2
2 103 None 3
3 204 UNSCHEDULED_AUCTION 4
4 203 UNSCHEDULED_AUCTION 5
5 B0 100 UNSCHEDULED_AUCTION 6
6 B1 200 CONTINUOUS_TRADING 7
7 400 None 8
8 A0 400 HALTED 9
9 100 None 10
10 100 None 10
11 400 None 11