I want to change the pattern
so that it does not only match strings with both unit and amount but also unit alone. For instance, I want it to match "cubes" as well, even though it does not have an amount listed. Similarly, if the string just has the amount and not the unit, I want it to match the amount alone too. Currently, the output returned is
['1.0', '0.07', '32.0', '0.12', '1.01', 'cubes', '2']
I want the output to be as follows:
['1.0', '0.07', '32.0', '0.12', '1.01', '1.0', '2.0']
Here is the code:
list_of_texts = ["1oz", "2ml", "4cup", "1 wedge","2 slices", "cubes", "2"]
pattern = r"(^[\d -/] )(oz|ml|cl|tsp|teaspoon|teaspoons|tea spoon|tbsp|tablespoon|tablespoons|table spoon|cup|cups|qt|quart|quarts|drop|drop|shot|shots|cube|cubes|dash|dashes|l|L|liters|Liters|wedge|wedges|pint|pints|slice|slices|twist of|top up|small bottle)"
new_list = []
for text in list_of_texts:
re_result = re.search(pattern, text)
if re_result:
amount = re_result.group(1).strip()
unit = re_result.group(2).strip()
print(amount)
print(unit)
if "-" in amount:
ranged = True
else:
ranged = False
amount = re.sub(r"(\d) (/\d)",r"\1\2",amount)
amount = amount.replace("-"," ").replace(" "," ").strip()
amount = re.sub(r"[ ] "," ",amount)
amount_in_dec = frac_to_dec_converter(amount.split(" "))
amount = np.sum(amount_in_dec)
if ranged:
to_oz = (amount*liquid_units[unit])/2
else:
to_oz = amount*liquid_units[unit]
new_list.append(str(round(to_oz,2)))
else:
new_list.append(text)
Note: I have a dictionary that has conversion units
CodePudding user response:
Make the number optional by using *
instead of
. Then if the first capture group is empty, treat it as 1.0
.
pattern = r"(^[\d -/]*)(oz|ml|cl|tsp|teaspoon|teaspoons|tea spoon|tbsp|tablespoon|tablespoons|table spoon|cup|cups|qt|quart|quarts|drop|drop|shot|shots|cube|cubes|dash|dashes|l|L|liters|Liters|wedge|wedges|pint|pints|slice|slices|twist of|top up|small bottle)"
for text in list_of_texts:
re_result = re.search(pattern, text)
if re_result:
amount = re_result.group(1).strip()
if amount == '':
amount = '1.0'
unit = re_result.group(2).strip()
print(amount)
print(unit)
# rest of your code