I want to apply a regex
function to selected rows in a dataframe. My solution works but the code is terribly long and I wonder if there is not a better, faster and more elegant way to solve this problem.
In words I want my regex function to be applied to elements of the source_value
column, but only to rows where the column source_type
== rhombus
AND (rhombus_refer_to_odk_type
== integer
OR a decimal
).
The code:
df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'] = df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'].apply(lambda x: re.sub(r'^[^<=>] ','', str(x)))
CodePudding user response:
Use Series.isin
with condition in variable m
and for replace use Series.str.replace
:
m = (df_arrows['source_type']=='rhombus') &
df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal'])
df_arrows.loc[m,'source_value'] = df_arrows.loc[m,'source_value'].astype(str).str.replace(r'^[^<=>] ','')
EDIT: If mask is 2 dimensional possible problem should be duplicated columns names, you can test it:
print ((df_arrows['source_type']=='rhombus'))
print (df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal']))