Home > Software design >  Pattern to extract, expand and form a sentence based on a certain delimiter
Pattern to extract, expand and form a sentence based on a certain delimiter

Time:04-06

I was trying out to solve a problem on regex:

There is an input sentence which is of one of these forms: Number1,2,3 or Number1/2/3 or Number1-2-3 these are the 3 delimiters: , / -

The expected output is: Number1,Number2,Number3

Pattern I've tried so far:

(?\<=,)\[^,\] (?=,)

but this misses out on the edge cases i.e. 1st element and last element. I am also not able to generate for '/'.

CodePudding user response:

You could separate out the key from values, then use a list comprehension to build the output you want.

inp = "Number1,2,3"
matches = re.search(r'(\D )(.*)', inp)
output = [matches[1]   x for x in re.split(r'[,/]', matches[2])]
print(output)  # ['Number1', 'Number2', 'Number3']

CodePudding user response:

You can do it in several steps: 1) validate the string to match your pattern, and once validated 2) add the first non-digit chunk to the numbers while replacing - and / separator chars with commas:

import re
texts = ['Number1,2,3', 'Number1/2/3', 'Number1-2-3']
for text in texts:
    m = re.search(r'^(\D )(\d (?=([,/-]))(?:\3\d )*)$', text)
    if m:
        print( re.sub(r'(?<=,)(?=\d)', m.group(1).replace('\\', '\\\\'), text.replace('/',',').replace('-',',')) )
    else:
        print(f"NO MATCH in '{text}'")

See this Python demo.

Output:

Number1,Number2,Number3
Number1,Number2,Number3
Number1,Number2,Number3

The ^(\D )(\d (?=([,/-]))(?:\3\d )*)$ regex validates your three types of input:

  • ^ - start of string
  • (\D ) - Group 1: one or more non-digits
  • (\d (?=([,/-]))(?:\3\d )*) - Group 2: one or more digits, and then zero or more repetitions of ,, / or - and one or more digits (and the separator chars should be consistent due to the capture used in the positive lookahead and the \3 backreference to that value used in the non-capturing group)
  • $ - end of string.

The re.sub pattern, (?<=,)(?=\d), matches a location between a comma and a digit, the Group 1 value is placed there (note the .replace('\\', '\\\\') is necessary since the replacement is dynamic).

CodePudding user response:

import re

for text in ("Number1,2,3", "Number1-2-3", "Number1/2/3"):
  print(re.sub(r"(\D )(\d )[/,-](\d )[/,-](\d )", r"\1\2,\1\3,\1\4", text))
  • \D matches "Number" or any other non-number text
  • \d matches a number (or more than one)
  • [/,-] matches any of /, ,, -

The rest is copy paste 3 times.

The substitution consists of backreferences to the matched "Number" string (\1) and then each group of the (\d )s.

This works if you're sure that it's always three numbers divided by that separator. This does not ensure that it's the same separator between each number. But it's short.

Output:

Number1,Number2,Number3
Number1,Number2,Number3
Number1,Number2,Number3

CodePudding user response:

If you can make use of the pypi regex module you can use the captures collection with a named capture group.

([^\d\s,/] )(?<num>\d )([,/-])(?<num>\d )(?:\3(?<num>\d ))*(?!\S)
  • ([^\d\s,/] ) Capture group 1, match 1 chars other than the listed
  • (?<num>\d ) Named capture group num matching 1 digits
  • ([,/-]) Capture either , / - in group 3
  • (?<num>\d ) Named capture group num matching 1 digits
  • (?:\3(?<num>\d ))* Optionally repeat a backreference to group 3 to keep the separators the same and match 1 digits in group num
  • (?!\S) Assert a whitspace boundary to the right to prevent a partial match

Regex demo | Python demo

import regex as re

pattern = r"([^\d\s,/] )(?<num>\d )([,/-])(?<num>\d )(?:\3(?<num>\d ))*(?!\S)"
s = "Number1,2,3 or Number4/5/6 but not Number7/8,9"

matches = re.finditer(pattern, s)
for _, m in enumerate(matches, start=1):
    print(','.join([m.group(1)   c for c in m.captures("num")]))

Output

Number1,Number2,Number3
Number4,Number5,Number6
  • Related