How to apply a function to selected rows of a dataframe-CodePudding

I want to apply a regex function to selected rows in a dataframe. My solution works but the code is terribly long and I wonder if there is not a better, faster and more elegant way to solve this problem.

In words I want my regex function to be applied to elements of the source_value column, but only to rows where the column source_type == rhombus AND (rhombus_refer_to_odk_type == integer OR a decimal).

The code:

df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'] = df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'].apply(lambda x: re.sub(r'^[^<=>] ','', str(x)))

CodePudding user response：

Use Series.isin with condition in variable m and for replace use Series.str.replace:

m = (df_arrows['source_type']=='rhombus') & 
     df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal'])
df_arrows.loc[m,'source_value'] = df_arrows.loc[m,'source_value'].astype(str).str.replace(r'^[^<=>] ','')

EDIT: If mask is 2 dimensional possible problem should be duplicated columns names, you can test it:

 print ((df_arrows['source_type']=='rhombus'))
 print (df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal']))