I am checking to see if a pandas column matches a pre-defined regex, using .any()
to get the first match if found. However, I need to return the index/row where this match occurred so that I can get the value of another column in that row.
I have the below to check where the reg_ex pattern exists in df['id_org']
if df['id_org'].str.contains(pat=reg_ex, regex=True).any()
Once the above evaluates to true, how do I get the index/row that caused the expression to evaluate to true? I would like to use this index so that I can access another column for that same row using pandas df.at[index, 'desired_col']
or .iloc
functions.
In the past I have done: df.at[df['id_org'][df['id_org'] == key].index[0], 'desired_col']
however, I can't use this line of code any more because I am no longer checking for an exact string "key" match bur rather when a regex now matches in that column.
CodePudding user response:
You can use idxmax
combined with any
:
reg_ex = 'xxx'
s = df['id_org'].str.contains(pat=reg_ex, regex=True)
out = s.idxmax() if s.any() else None
s = df['id_org'].str.contains(pat=reg_ex, regex=True)
out = s[s].first_valid_index()
Example of outputs:
# reg_ex = 'e'
1
# reg_ex = 'z'
None
Used input:
id_org
0 abc
1 def
2 ghi
3 cde
all matches
s = df['id_org'].str.contains(pat=reg_ex, regex=True)
out = s.index[s]
Example for the regex 'e'
: Int64Index([1, 3], dtype='int64')