Home > database >  Regex Expression to Find All Digits in a list and dashes
Regex Expression to Find All Digits in a list and dashes

Time:05-13

I am trying to convert this string '4-6,10-12,16' into a list that looks like this [4,"-",6,10,"-",12,16]. There would be a combination of integers and the special character "-" in the list.

I was trying to use a regex code in python but I could only do it to extract the numbers, however, I need the dashes as well in the list. How can I include dashes with numbers in the list?

Here is my code:

interval='4-6,10-12,16'
import re
l=[int(s) for s in re.findall(r'\b\d \b', interval)]

CodePudding user response:

Try this:

interval='4-6,10-12,16'
import re
l=[int(s) if s.isnumeric() else s for s in re.findall(r'\d |-', interval)]
l

Output:

[4, '-', 6, 10, '-', 12, 16]

CodePudding user response:

You can use

import re
interval='4-6,10-12,16'
l=[int(s) if all(c.isdigit() for c in s) else '-' for s in re.findall(r'\d |-', interval)]
print(l) # => [4, '-', 6, 10, '-', 12, 16]

See the Python demo.

Details:

  • re.findall(r'\d |-', interval) extracts digit sequences or - chars
  • int(s) if all(c.isdigit() for c in s) else '-' either casts a digit sequence to an int if the whole match consists of digits, or just returns - as a string.

CodePudding user response:

Useful functions:

  • str.isdigit (or str.isnumeric or str.isdecimal);
  • itertools.groupby to group adjacent characters that share a characteristic.
from itertools import groupby

def tokenize_digits_and_dashes(s):
    for k, g in groupby(s, key=lambda c: (c.isdigit(), c == '-')):
        if k == (True, False):
            yield int(''.join(g))
        elif k == (False, True):
            yield '-'

print(list(tokenize_digits_and_dashes('4-6,10-12,16')))
# [4, '-', 6, 10, '-', 12, 16]

Alternative approach

Your string already contains separators in the form of commas ,. These are useful! Don't ignore them. You can split the list on the separators using str.split.

def tokenize_intervals(s):
    for interval in s.split(','):
        i = interval.split('-')
        if len(i) == 2:
            yield tuple(int(''.join(w)) for w in i)
        elif len(i) == 1:
            x = int(''.join(i[0]))
            yield (x, x)

print(list(tokenize_intervals('4-6,10-12,16')))
# [(4, 6), (10, 12), (16, 16)]

CodePudding user response:

# By Using Regex #
# -------------- #

import re
interval = '4-6,10-12,16'
s_list = re.findall(r'[\d ] |-', interval)
x = [int(_) if _.isnumeric() else _ for _ in s_list]
print(x)

# By Using the split method #
# ------------------------- #
final_list = []
for _ in interval.split(','):
    sub_list = _.split('-')
    for i in sub_list:
        if i.isnumeric():
            final_list.append(int(i))
        if sub_list[-1] != I:
            final_list.append('-')
print(final_list)

# By Checking Character By Character #
# ---------------------------------- #
z = ""
s = []
count = 0
for _ in interval:
    count  = 1
    if _.isnumeric():
        z  = _
        if count == len(interval):
            s.append(int(z))
    elif _ == '-':
        s.append(int(z))
        z = ""
        s.append('-')
    else:
        s.append(int(z))
        z = ""
print(s)
  • Related