I'm reading a df from csv that has 2 columns showing the prices of various items. In some cases the price is a single int/float, but other cases it could be a range of spaces seperated int/floats or mixture of int/floats with strings.
example df:
item prices
------ ---------------------------
a 2
b 3.5
c 5
d 0.04
e 1 8 3 4 2
f 0.04 0.04 0.01
g Normal: 4.56Premium: 4.75
What I'm looking is a nice pythonic way to get the prices column to display the lowest possible int/float value for every item. e.g.
item prices
------ --------
a 2
b 3.5
c 5
d 0.04
e 1
f 0.01
g 4.56
The only way I could think of solving this problem for items e and f would be to split the value by using str.split(" ")
and mapping the output to int or float, but this seems like it would be messy as not all values are the same types. And I don't even know how I would get the lowest value for item g.
any help would be appreciated
CodePudding user response:
Use Series.str.extractall
for get integer
s or float
s, convert to floats and get minimal values:
df['prices'] = (df['prices'].str.extractall('(\d \.\d |\d )')[0]
.astype(float)
.groupby(level=0)
.min())
print (df)
item prices
0 a 2.00
1 b 3.50
2 c 5.00
3 d 0.04
4 e 1.00
5 f 0.01
6 g 4.56