I am trying to convert elements in a list in a pandas dataframe row based on a range. An example of that would be if I had a value for a row: ["1","15","35"] I would want to convert that to: ["1 - 9 lbs","10 - 19 lbs","Greater Than 30 lbs"]
I have the script to convert single values like: [10] but its the multiple values in a single row that are throwing me off. If anyone can help I would greatly appreciate it. I know its not the best to have a list as a value but its what my work requires.
What I have:
metric='lbs'
range_string = """
1 - 9 lbs
10 - 19 lbs
20 - 29 lbs
Greater Than 30 lbs
"""
# range function builder
string = range_string.replace(metric, '')
lst = string.split('\n')
builder_base = f'''
def range_app(num):
'''
for val in lst:
if val.find(' - ') >-1:
original_val = val
lower, upper = val.split(' - ')
inner_f = f'''
if num >= {lower} and num <= {upper}:
return "{original_val}{metric}"
'''
builder_base = builder_base inner_f
if val.find('Greater than ') >-1:
original_val = val
upper = val.replace('Greater than ','')
inner_f = f'''
if num >= {upper}:
return "{original_val}{metric}"
'''
builder_base = builder_base inner_f
final_else = '''
else:
return r"n/a"
'''
exec(builder_base final_else)
print( builder_base final_else)
df = pd.DataFrame({"A": [[16, 14.97, 22.75]]})
df['A']=df['A'].astype(float)
df['A'] = df['A'].apply(range_app)
What I need:
df = pd.DataFrame({"A": [["16","24.42"], ["14.97","16.06"], ["22.75","23"]]})
df['A']=df['A'].astype(float)
df['A'] = df['A'].apply(range_app)
Final output:
["10-19 lbs","20-29 lbs"]
["10-19 lbs","10-19 lbs"]
["20-29 lbs","20-29 lbs"]
CodePudding user response:
Try this.
import re
range_string = """
1 - 9 lbs
10 - 19 lbs
20 - 29 lbs
Greater Than 30 lbs
"""
range_params = {}
for range_entry in range_string.split('\n'):
range_nums = re.findall('\d ', range_entry)
if len(range_nums) > 0:
range_params[tuple(map(int, range_nums))] = range_entry.strip()
def range_app(num_lst):
updated_labels = []
for num in num_lst:
num = float(num)
for range_param, range_label in range_params.items():
if len(range_param) == 1:
if num >= range_param[0]:
updated_labels.append(range_label)
else:
if num >= range_param[0] and num <= range_param[1]:
updated_labels.append(range_label)
return updated_labels
Also, I am not sure if your type casting is correct because your values for row of the "A" column is a list.
df = pd.DataFrame({"A": [["16","24.42"], ["14.97","16.06"], ["22.75","23"]]})
df['A'] = df['A'].apply(range_app)