I am working with python and pandas. I have a dataset of lab analysis where I am dealing with multiple parameters and detection limits(dl). Many of the samples are reported as below the dl (e.g.<dl,<4)
For example: import pandas as pd
df=pd.DataFrame([['<4','88.72','<0.09'],['<1','5','<0.09'],['2','17.6','<0.09']], columns=['var_1','var_2','var_3']) df
My goal is to replace all <dl with dl/2 as a float value.
I can do this for one column pretty easily.
df['var_3'] = df.var_3.str.replace('<' ,'').astype(float) df['var_3'] = df['var_3'].apply(lambda x: x/2 if x == 0.09 else x) df
but this requires me looking at the dl and inputting it.
I would like to streamline it to apply it across all variables with one or more detection limits per variable as I have many variables and the detection limit will not always be constant from data frame to data frame this is applied to.
I found something similar in R but not sure how to apply it in python. Any solutions would be appreciated.
CodePudding user response:
Use replace
instead str.replace
than eval
all expressions:
>>> df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)
var_1 var_2 var_3
0 2.0 88.72 0.045
1 0.5 5.00 0.045
2 2.0 17.60 0.045
\1
will be replace by the first capture group .*
Update
Alternative:
out = df.melt(ignore_index=False)
m = out['value'].str.startswith('<')
out.loc[m, 'value'] = out.loc[m, 'value'].str.strip('<').astype(float) / 2
out = out.reset_index().pivot('index', 'variable', 'value') \
.rename_axis(index=None, columns=None)
Output:
>>> out
var_1 var_2 var_3
0 2.0 88.72 0.045
1 0.5 5 0.045
2 2 17.6 0.045