There are entries like ",orange"
or "(if needed)"
in the dataframe, so the code stores ","
or "("
as amount
which disrupts the rest of the code and gives a value error. I am not sure which stage of the code should I improve.
def unit_unify(list_of_texts, unit_dict):
"""Takes a list of strings that contains liquid units, and converts them
into fluid ounces.
"""
str_pattern = make_pattern(units_list)
pattern = fr"(^[\d -/]*)({str_pattern})"
new_list = []
for text in list_of_texts:
re_result = re.search(pattern, text)
if re_result:
amount = re_result.group(1).strip()
unit = re_result.group(2).strip()
if not amount:
amount = "1.0"
if "-" in amount:
ranged = True
else:
ranged = False
amount = re.sub(r"(\d) (/\d)",r"\1\2",amount)
amount = amount.replace("-"," ").replace(" "," ").strip()
amount = re.sub(r"[ ] "," ",amount)
amount_in_dec = frac_to_dec_converter(amount.split(" "))
amount = np.sum(amount_in_dec)
if ranged:
to_oz = (amount*unit_dict[unit])/2
else:
to_oz = amount*unit_dict[unit]
new_list.append(str(round(to_oz,2)))
else:
new_list.append(text)
return new_list
Here is the fraction converter:
def frac_to_dec_converter(num_strings):
"""Takes a list of strings that contains fractions and convert them into
floats.
"""
result_list = []
for frac_str in num_strings:
try:
converted = float(frac_str)
except ValueError:
num, denom = frac_str.split('/')
try:
leading, num = num.split(' ')
total = float(leading)
except ValueError:
total = 0
frac = float(num) / float(denom)
converted = total frac
result_list.append(converted)
return result_list
CodePudding user response:
My first instinct would be to add some processing on the entities in the dataframe or simply processing as you read from the dataframe.
ex.
dataframeVariable = // convert this entity taken from the dataframe to string
usableVariable = dataframeVariable.replace(",","")
In the example ",orange" this would replace the ',' in the string with '' nothing, effectively removing it. Same could be applicable to any character you want removed. Translate is another option.