Home > Blockchain >  Strip quantities from list using regex
Strip quantities from list using regex

Time:02-17

I have an example shopping list:

test_list = "2x 400g beans", "3 x 500 ml choco milk", " 2 chicken breasts"

I want to extract the quantities, not including the units, to get the following result:

quant_list = ['2x 400', '3 x 500', '2']

So far, I have attempted the following:

def strip_quantities(string):
    x = re.search("someregex", string)
    sep = x.start()   1
    return string.rsplit(string[sep])

quant_list = [strip_quantities(x)[0] for x in test_list]
print(quant_list)

However, I cannot figure out a regular expression, "someregex", that will allow me to split the string (/) at 400/g, 500/ ml, 2/ chicken. The regular expression also needs to ignore (not match) the "2x" in the first list item and "3 x" in the second list item.

I think the expression needs to say "match any letter following a digit or any letter following a digit then whitespace, except when that letter is "x". My best guesses for the expression so far are:

"\d[^x]|\d\s[^x]"

But the output for this is:

['2x 4', '2 ', '2 ']

Not the intended result as stated above. Any help in finding a solution would be greatly appreciated, using regex or alternative method. I've only been learning Python for a few days so any accompanying explanation would be awesome. Thanks!

CodePudding user response:

You want "match any set of digits that is not followed by optional whitespace and x`. For that, you need a "negative lookahead":

import re
test_list = "2x 400g beans", "3 x 500 ml choco milk", " 2 chicken breasts"

m = r"(\d )(?! *x)"
for t in test_list:
    print( re.findall( m, t) )

Output:

['400']
['500']
['2']

CodePudding user response:

You can use this regex: ((\d([x\s]?) ) )

Input: 2x 400g bexfgans", "3 x 500 ml choco milk", " 2 chicken breasts

Output:

2x 400
3 x 500 
2 
  • Related