Pandas take number out string-CodePudding

In my data, I have this column "price_range".

Dummy dataset:

df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146', 'No pricing available']})

I am using pandas. What is the most efficient way to get the upper and lower bound of the price range in seperate columns?

CodePudding user response：

Alternatively, you can parse the string accordingly (if you want to limits for each row, rather than the total range:

df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146']})



def get_lower_limit(some_string):
    a = some_string.split(' - ')
    return int(a[0].split('€')[-1])
    
def get_upper_limit(some_string):
    a = some_string.split(' - ')
    return int(a[1].split('€')[-1])
    
df['lower_limit'] = df.price_range.apply(get_lower_limit)
df['upper_limit'] = df.price_range.apply(get_upper_limit)

Output:

Out[153]: 
   price_range  lower_limit  upper_limit
0     €4 - €25            4           25
1     €3 - €14            3           14
2   €25 - €114           25          114
3  €112 - €146          112          146

CodePudding user response：

You can do the following. First create two extra columns lower and upper which contain the lower bound and the upper bound from each row. Then find the minimum from the lower column and maximum from the upper column.

df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146', 'No pricing available']})

df.loc[df.price_range != 'No pricing available', 'lower'] = df['price_range'].str.split('-').str[0]
df.loc[df.price_range != 'No pricing available', 'upper'] = df['price_range'].str.split('-').str[1]

df['lower'] = df.lower.str.replace('€', '').astype(float)
df['upper'] = df.upper.str.replace('€', '').astype(float)

price_range = [df.lower.min(), df.upper.max()]

Output:

>>> price_range
[3.0, 146.0]