I am trying to write the following condition to map MARKET_STATES.OPENING_AUCTION
, however I am getting KeyError: (False, False, False)
when I run the following:
market_info_df.loc[market_info_df['issue_status' == '10', 'market_phase' == '0',
'trading_state' == ' '].iloc[0]] = MARKET_STATES.OPENING_AUCTION
Two questions:
- What is it that the query does? My understanding is that using
iloc[0]
it finds the first instance where the three columns at the values stated. But then it maps something toMARKET_STATES.OPENING_AUCTION
or? - How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say
trade_type
) toMARKET_STATES.OPENING_AUCTION
?
Here is an example:
e.g:
row issue_status market_phase trading_state trade_type
0 20 0 ' ' ->
1 10 0 ' ' -> OPENING_AUCTION
2 20 0 ' ' ->
3 10 0 ' ' ->
4 10 0 ' ' ->
5 10 0 ' ' ->
CodePudding user response:
I'll answer your second question first:
How can I write a condition like the above where if the 3 columns match the values, then for the first occurrence of this, map another columns (say
trade_type
) toMARKET_STATES.OPENING_AUCTION
?
market_info_df = pd.DataFrame({'issue_status': {0: 20, 1: 10, 2: 20, 3: 10, 4: 10, 5: 10},
'market_phase': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
'trading_state': {0: ' ', 1: ' ', 2: ' ', 3: ' ', 4: ' ', 5: ' '}})
Any time you want to combine multiple conditions, remember that you need a logical and or logical or operation on those conditions.
When you have three conditions and you want to select rows where all of them are True
, you need to and all the conditions. So in your case you'd need to do
mask = (market_info_df['issue_status'] == 10) & (market_info_df['market_phase'] == 0) & (market_info_df['trading_state'] == ' ')
To get a boolean mask that tells you which rows fulfill all those conditions. The &
operator is overloaded by pandas to do the element-wise logical and for multiple Series
objects. The parentheses are needed to override python's operator precedence which puts &
before ==
Note: My dataframe contains integers in the issue_status
and market_phase
columns, which is why I compare against integers e.g. market_info_df['issue_status'] == 10
. If your columns contain strings (as your code would indicate), compare against strings (market_info_df['issue_status'] == "10"
)
Next, you want to set those rows for the "trade_type"
column, so you'd do:
market_info_df.loc[mask, "trade_type"] = "OPENING_AUCTION"
Which sets those only rows:
issue_status market_phase trading_state trade_type
0 20 0 NaN
1 10 0 OPENING_AUCTION
2 20 0 NaN
3 10 0 OPENING_AUCTION
4 10 0 OPENING_AUCTION
5 10 0 OPENING_AUCTION
Now, we can use idxmax()
to get the index of the max value of the mask
. Since this mask only contains True
and False
values, the first occurrence of the True
value is the one that is returned. If we do this instead of setting all values of mask
, we get:
market_info_df.loc[mask.idxmax(), "trade_type"] = "OPENING_AUCTION"
which gives:
issue_status market_phase trading_state trade_type
0 20 0 NaN
1 10 0 OPENING_AUCTION
2 20 0 NaN
3 10 0 NaN
4 10 0 NaN
5 10 0 NaN
As for your first question:
What is it that the query does? My understanding is that using
iloc[0]
it finds the first instance where the three columns at the values stated. But then it maps something toMARKET_STATES.OPENING_AUCTION
or?
I have no idea what that code is supposed to do. The code inside the brackets i.e.:
'issue_status' == '10', 'market_phase' == '0', 'trading_state' == ' '
compares the string "issue_status"
to the string "10"
(which unsurprisingly is False
) and so on, and returns a tuple that contains three False
s.
Then, market_info_df[False, False, False]
tries to find the element in the dataframe at the False, False, False
key, and throws the error when it fails to do so.