I am trying to create a mapping where there are duplicates in certain columns in a dataframe.
Here are two examples of dataframes I am working with:
issue_status trading_state reason
100 'A0'
100 None 'F'
400 None
100 None
400 None 'SL'
100 'B2'
400 None 'L'
100 None
400 'A6'
Here is what I need; 3 conditional python logic that does:
- when we see the first
issue_status
of100
andtrading_state
of None, mapF
in thereason
column. - when we see the second last
issue_status
of400
andtrading_state
of None, mapSL
in thereason
column. - when we see the last
issue_status
of400
andtrading_state
of None, mapL
in thereason
column.
Here is another example:
issue_status trading_state reason
400 None 'SL'
100 'A0'
400 None 'L'
400 'A0'
100 None 'F'
100 None
CodePudding user response:
You can filter by 400
and None
values for df1
, create helper Series
with range
and mapping last and second last values, for first 100
and None
values use Series.duplicated
, last join both Series
by Series.combine_first
:
#if None is string
#m1 = df['trading_state'].eq('None')
m1 = df['trading_state'].isna()
m2 = df['issue_status'].eq(400)
m3 = df['issue_status'].eq(100)
df1 = df[m1 & m2]
s1 = pd.Series(range(len(df1), 0, -1), index=df1.index).map({1:'L', 2:'SL'})
s2 = df.loc[m1 & m3, 'issue_status'].duplicated().map({False:'F'})
df['reason'] = s1.combine_first(s2)
print (df)
issue_status trading_state reason
0 100 'A0' NaN
1 100 None F
2 400 None NaN
3 100 None NaN
4 400 None SL
5 100 'B2' NaN
6 400 None L
7 100 None NaN
8 400 'A6' NaN
For second:
df['reason'] = s1.combine_first(s2)
print (df)
issue_status trading_state reason
0 400 None SL
1 100 'A0' NaN
2 400 None L
3 400 'A0' NaN
4 100 None F
5 100 None NaN
If necessary empty strings in reason
column use:
df['reason'] = df['reason'].fillna('')