Replace multiple "less than values" in different columns in pandas dataframe-CodePudding

I am working with python and pandas. I have a dataset of lab analysis where I am dealing with multiple parameters and detection limits(dl). Many of the samples are reported as below the dl (e.g.<dl,<4)

For example: import pandas as pd

df=pd.DataFrame([['<4','88.72','<0.09'],['<1','5','<0.09'],['2','17.6','<0.09']], columns=['var_1','var_2','var_3']) df

My goal is to replace all <dl with dl/2 as a float value.

I can do this for one column pretty easily.

df['var_3'] = df.var_3.str.replace('<' ,'').astype(float) df['var_3'] = df['var_3'].apply(lambda x: x/2 if x == 0.09 else x) df

but this requires me looking at the dl and inputting it.

I would like to streamline it to apply it across all variables with one or more detection limits per variable as I have many variables and the detection limit will not always be constant from data frame to data frame this is applied to.

I found something similar in R but not sure how to apply it in python. Any solutions would be appreciated.

CodePudding user response：

Use replace instead str.replace than eval all expressions:

>>> df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)
   var_1  var_2  var_3
0    2.0  88.72  0.045
1    0.5   5.00  0.045
2    2.0  17.60  0.045

\1 will be replace by the first capture group .*

Update

Alternative:

out = df.melt(ignore_index=False)
m = out['value'].str.startswith('<')
out.loc[m, 'value'] = out.loc[m, 'value'].str.strip('<').astype(float) / 2
out = out.reset_index().pivot('index', 'variable', 'value') \
         .rename_axis(index=None, columns=None)

Output:

>>> out
  var_1  var_2  var_3
0   2.0  88.72  0.045
1   0.5      5  0.045
2     2   17.6  0.045