Home > other >  Manipulate string values in pandas
Manipulate string values in pandas

Time:07-11

I have a pandas dataframe with different formats for one column like this

Name Values
First 5-9
Second 7
Third -
Fourth 12-16

I need to iterate over all Values column, and if the format is like the first row 5-9 or like fourth row 12-16 replace it with the mean between the 2 numbers in string. For first row replace 5-9 to 7, or for fourth row replace 12-16 to 14. And if the format is like third row - replace it to 0

I have tried

if df["Value"].str.len() > 1:
    df["Value"] = df["Value"].str.split('-')
    df["Value"] = (df["Value"][0]   df["Value"][1]) / 2
elif df["Value"].str.len() == 1:
    df["Value"] = df["Value"].str.replace('-', 0)

Expected output

Name Values
First 7
Second 7
Third 0
Fourth 14

CodePudding user response:

Let us split and expand the column then cast values to float and calculate mean along column axis:

s = df['Values'].str.split('-', expand=True)
df['Values'] = s[s != ''].astype(float).mean(1).fillna(0)

     Name  Values
0   First     7.0
1  Second     7.0
2   Third     0.0
3  Fourth    14.0

CodePudding user response:

You can use str.replace with customized replacement function

mint = lambda s: int(s or 0)
repl = lambda m: str(sum(map(mint, map(m.group, [1,2])))/2)
df['Values'] = df['Values'].str.replace('(\d*)-(\d*)', repl, regex=True)
print(df)

     Name Values
0   First    7.0
1  Second      7
2   Third    0.0
3  Fourth   14.0
  • Related