Home > Blockchain >  How to clean a dataset having intervals number values?
How to clean a dataset having intervals number values?

Time:02-16

I have a dataset containing a series of temperatures measurements like:

my question is: how can I convert this series into a new series containing only the maximum value if it's an interval and the same value if not. thanks enter image description here

CodePudding user response:

Use Series.apply with map and string.split:

In [1186]: df['maxval'] = df.temperatue.apply(lambda x: max(map(float, x.split('-'))))

In [1200]: df
Out[1200]: 
  temperatue  maxval
0     3.5-10    10.0
1          7     7.0
2      10-15    15.0
3        NAN     NaN
4   20.5-111   111.0

CodePudding user response:

You can split values by separator, convert to floats and get maximal values:

df['max'] = df['temperatue'].str.split('-', expand=True).astype(float).max(axis=1)

Or:

f = lambda x: pd.to_numeric(x.str.strip(), errors='coerce')
df['max'] = df['temperatue'].str.split('-', expand=True).apply(f).max(axis=1)
  • Related