In my data, I have this column "price_range".
Dummy dataset:
df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146', 'No pricing available']})
I am using pandas. What is the most efficient way to get the upper and lower bound of the price range in seperate columns?
CodePudding user response:
Alternatively, you can parse the string accordingly (if you want to limits for each row, rather than the total range:
df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146']})
def get_lower_limit(some_string):
a = some_string.split(' - ')
return int(a[0].split('€')[-1])
def get_upper_limit(some_string):
a = some_string.split(' - ')
return int(a[1].split('€')[-1])
df['lower_limit'] = df.price_range.apply(get_lower_limit)
df['upper_limit'] = df.price_range.apply(get_upper_limit)
Output:
Out[153]:
price_range lower_limit upper_limit
0 €4 - €25 4 25
1 €3 - €14 3 14
2 €25 - €114 25 114
3 €112 - €146 112 146
CodePudding user response:
You can do the following. First create two extra columns lower
and upper
which contain the lower bound and the upper bound from each row. Then find the minimum from the lower
column and maximum from the upper
column.
df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146', 'No pricing available']})
df.loc[df.price_range != 'No pricing available', 'lower'] = df['price_range'].str.split('-').str[0]
df.loc[df.price_range != 'No pricing available', 'upper'] = df['price_range'].str.split('-').str[1]
df['lower'] = df.lower.str.replace('€', '').astype(float)
df['upper'] = df.upper.str.replace('€', '').astype(float)
price_range = [df.lower.min(), df.upper.max()]
Output:
>>> price_range
[3.0, 146.0]