I have an example shopping list:
test_list = "2x 400g beans", "3 x 500 ml choco milk", " 2 chicken breasts"
I want to extract the quantities, not including the units, to get the following result:
quant_list = ['2x 400', '3 x 500', '2']
So far, I have attempted the following:
def strip_quantities(string):
x = re.search("someregex", string)
sep = x.start() 1
return string.rsplit(string[sep])
quant_list = [strip_quantities(x)[0] for x in test_list]
print(quant_list)
However, I cannot figure out a regular expression, "someregex", that will allow me to split the string (/) at 400/g, 500/ ml, 2/ chicken. The regular expression also needs to ignore (not match) the "2x" in the first list item and "3 x" in the second list item.
I think the expression needs to say "match any letter following a digit or any letter following a digit then whitespace, except when that letter is "x". My best guesses for the expression so far are:
"\d[^x]|\d\s[^x]"
But the output for this is:
['2x 4', '2 ', '2 ']
Not the intended result as stated above. Any help in finding a solution would be greatly appreciated, using regex or alternative method. I've only been learning Python for a few days so any accompanying explanation would be awesome. Thanks!
CodePudding user response:
You want "match any set of digits that is not followed by optional whitespace and x`. For that, you need a "negative lookahead":
import re
test_list = "2x 400g beans", "3 x 500 ml choco milk", " 2 chicken breasts"
m = r"(\d )(?! *x)"
for t in test_list:
print( re.findall( m, t) )
Output:
['400']
['500']
['2']
CodePudding user response:
You can use this regex: ((\d([x\s]?) ) )
Input: 2x 400g bexfgans", "3 x 500 ml choco milk", " 2 chicken breasts
Output:
2x 400
3 x 500
2