I have a dataframe with multiple columns where some cells contain the characters "DL" and a float. The other cells contain floats only.
For example:
Column1 | Column2 | |
---|---|---|
row1 | DL10.4 | 5.6 |
row2 | 4.7 | DL8.8 |
I want use python to remove the characters "DL" and divide the remaining floats by 2. The cells without characters should be unchanged and not divided by 2.
Expected result:
Column1 | Column2 | |
---|---|---|
row1 | 5.2 | 5.6 |
row2 | 4.7 | 4.4 |
CodePudding user response:
Use Series.str.extract
Series.str.extractall
for values after DL
, divide by 2
and replace non DL
valeus by original DataFrame:
df1 = df.apply(lambda x: x.str.extract('DL(\d \.\d )', expand=False))
df = df1.astype(float).div(2).fillna(df).astype(float)
print (df)
Column1 Column2
row1 5.2 5.6
row2 4.7 4.4
CodePudding user response:
I will assume you have a way of looping through each row in the dataset. With that in mind, something like this should work for you:
import re
for row in df.iterrows():
for col_value in row:
reg_match = re.match("^DL([0-9] \.[0-9])", col_value)
if reg_match:
num = float(reg_match.group(1))
col_value = num
I'm not sure whether or not Python will complain about modifying the col_value
in this case. If it does, you probably want to save the new values into a new data frame instead.