Home > Software design >  How to remove spaces before pattern and after it in python with regex?
How to remove spaces before pattern and after it in python with regex?

Time:10-20

Problem: I have a list of strings and I need to get rid of whitespaces before and after substring that looks like 'digit / digit'. Been stuck on this for quite a while and still don't understand how to fix itI will appreciate any help.

Sample input:

steps = [
'mix butter , flour , 1 / 3 c',
'sugar and 1-1 / 4 t',
'vanilla'
]

Expected output:

[
'mixbutter,flour,1 / 3c',
'sugarand1-1 / 4t',
'vanilla'
]

My approach:

steps_new = []
for step in steps:
    step = re.sub(r'\s [^\d \s/\s\d ]','',step)
    steps_new.append(step)
steps_new

My output:

[
'mixutterlour 1 / 3',
'sugarnd 1-1 / 4',
'vanilla'
]

CodePudding user response:

You can use

import re
steps = ['mix butter , flour , 1 / 3 c', 'sugar and 1-1 / 4 t', 'vanilla']
steps_new = [re.sub(r'(\d \s*/\s*\d )|\s ', lambda x: x.group(1) or "", x) for x in steps]
print(steps_new) # => ['mixbutter,flour,1 / 3c', 'sugarand1-1 / 4t', 'vanilla']

See the Python demo online.

The (\d \s*/\s*\d )|\s regex matches and captures into Group 1 sequences of digits zero or more whitespaces / zero or more whitespaces digits (with (\d \s*/\s*\d )), or (|) just matches one or more whitespaces (\s ).

If Group 1 participated in the match, the replacement is an empty string. Else, the replacement is the Group 1 value, i.e. no replacement occurs.

CodePudding user response:

You can remove all spaces and then insert spaces to correct places (\d)/(\d):

import re

steps = ["mix butter , flour , 1 / 3 c", "sugar and 1-1 / 4 t", "vanilla"]

for step in steps:
    x = re.sub(r"(\d)/(\d)", r"\1 / \2", step.replace(" ", ""))
    print(x)

Prints:

mixbutter,flour,1 / 3c
sugarand1-1 / 4t
vanilla

CodePudding user response:

You may use this lookaround based solution to get this in just a single regex:

(?<!/)[ \t](?![ \t]*/)

RegEx Demo

RegEx Details:

  • (?<!/): Assert that previous character is not /
  • [ \t]: Match a space or tab
  • (?![ \t]*/): Assert that next position doesn't have a / after 0 or more spaces

Code:

import re
 
arr = ["mix butter , flour , 1 / 3 c", "sugar and 1-1 / 4 t", "vanilla"]
 
rx = re.compile(r'(?<!/)[ \t](?![ \t]*/)')
 
for i in arr:
    print (rx.sub('', i))

Code Demo

  • Related