Problem:
I have a list of strings and I need to get rid of whitespaces before and after substring that looks like 'digit / digit'
. Been stuck on this for quite a while and still don't understand how to fix itI will appreciate any help.
Sample input:
steps = [
'mix butter , flour , 1 / 3 c',
'sugar and 1-1 / 4 t',
'vanilla'
]
Expected output:
[
'mixbutter,flour,1 / 3c',
'sugarand1-1 / 4t',
'vanilla'
]
My approach:
steps_new = []
for step in steps:
step = re.sub(r'\s [^\d \s/\s\d ]','',step)
steps_new.append(step)
steps_new
My output:
[
'mixutterlour 1 / 3',
'sugarnd 1-1 / 4',
'vanilla'
]
CodePudding user response:
You can use
import re
steps = ['mix butter , flour , 1 / 3 c', 'sugar and 1-1 / 4 t', 'vanilla']
steps_new = [re.sub(r'(\d \s*/\s*\d )|\s ', lambda x: x.group(1) or "", x) for x in steps]
print(steps_new) # => ['mixbutter,flour,1 / 3c', 'sugarand1-1 / 4t', 'vanilla']
See the Python demo online.
The (\d \s*/\s*\d )|\s
regex matches and captures into Group 1 sequences of digits zero or more whitespaces / zero or more whitespaces digits (with (\d \s*/\s*\d )
), or (|
) just matches one or more whitespaces (\s
).
If Group 1 participated in the match, the replacement is an empty string. Else, the replacement is the Group 1 value, i.e. no replacement occurs.
CodePudding user response:
You can remove all spaces and then insert spaces to correct places (\d)/(\d)
:
import re
steps = ["mix butter , flour , 1 / 3 c", "sugar and 1-1 / 4 t", "vanilla"]
for step in steps:
x = re.sub(r"(\d)/(\d)", r"\1 / \2", step.replace(" ", ""))
print(x)
Prints:
mixbutter,flour,1 / 3c
sugarand1-1 / 4t
vanilla
CodePudding user response:
You may use this lookaround based solution to get this in just a single regex:
(?<!/)[ \t](?![ \t]*/)
RegEx Details:
(?<!/)
: Assert that previous character is not/
[ \t]
: Match a space or tab(?![ \t]*/)
: Assert that next position doesn't have a/
after 0 or more spaces
Code:
import re
arr = ["mix butter , flour , 1 / 3 c", "sugar and 1-1 / 4 t", "vanilla"]
rx = re.compile(r'(?<!/)[ \t](?![ \t]*/)')
for i in arr:
print (rx.sub('', i))