I have a pandas dataframe with different formats for one column like this
Name | Values |
---|---|
First | 5-9 |
Second | 7 |
Third | - |
Fourth | 12-16 |
I need to iterate over all Values column, and if the format is like the first row 5-9
or like fourth row 12-16
replace it with the mean between the 2 numbers in string.
For first row replace 5-9
to 7
, or for fourth row replace 12-16
to 14
.
And if the format is like third row -
replace it to 0
I have tried
if df["Value"].str.len() > 1:
df["Value"] = df["Value"].str.split('-')
df["Value"] = (df["Value"][0] df["Value"][1]) / 2
elif df["Value"].str.len() == 1:
df["Value"] = df["Value"].str.replace('-', 0)
Expected output
Name | Values |
---|---|
First | 7 |
Second | 7 |
Third | 0 |
Fourth | 14 |
CodePudding user response:
Let us split
and expand
the column then cast values to float
and calculate mean
along column axis:
s = df['Values'].str.split('-', expand=True)
df['Values'] = s[s != ''].astype(float).mean(1).fillna(0)
Name Values
0 First 7.0
1 Second 7.0
2 Third 0.0
3 Fourth 14.0
CodePudding user response:
You can use str.replace
with customized replacement function
mint = lambda s: int(s or 0)
repl = lambda m: str(sum(map(mint, map(m.group, [1,2])))/2)
df['Values'] = df['Values'].str.replace('(\d*)-(\d*)', repl, regex=True)
print(df)
Name Values
0 First 7.0
1 Second 7
2 Third 0.0
3 Fourth 14.0