Home > Back-end >  Python regex split on plus or minus and keep character
Python regex split on plus or minus and keep character

Time:05-11

I have a set of data like this:

data_list = ['0 .25 4.06 5.12', '0 0-.033 933.00 9 48.002']

The only delimiters are the plus and minus signs. I want to keep the plus or minus signs but still split on them. The first 0 in front of the element also is not needed.

Here's what I have so far:

import re

data_list = ['0 .25 4.06 5.12', '0 0-.033 933.00 9 48.002']
data_string = ""
for item in data_list:
    data_string  = item[1:]

data_string = re.split(', |\ |-', data_string)

new_data_list = []

for item in data_string:
    if item:
        new_data_list.append(item)

print(new_data_list)

This gives me close to the right output:

['.25', '4.06', '5.12', '0', '.033', '933.00', '9', '48.002']

but now I cannot determine which one is positive or negative.

I would like output to be like this:

['.25', '4.06', '5.12', '0', '-.033', '933.00', '9', '48.002']

where I can see that .033 is a negative number.

CodePudding user response:

You can use

import re
 
data_list = ['0 .25 4.06 5.12', '0 0-.033 933.00 9 48.002']
new_data_list = []
for item in data_list:
    new_data_list.extend(re.split(r'\ |(?=-)', item[2:]))
 
print(new_data_list)
# => ['.25', '4.06', '5.12', '0', '-.033', '933.00', '9', '48.002']

See the Python demo.

Note:

  • item[2:] - truncates the first two chars (if you need more precision, replace item[2:] with re.sub(r'^0\ ', '', item))
  • \ |(?=-) matches a or a location that is immediately followed with a - char.

CodePudding user response:

You could try with this list comprehension:

[el for el in re.findall('[ -]\d*\.?\d ', ''.join(data_list))]

Regex explanation:

  • [ -]: beginning symbol
  • \d*: optional numbers
  • \.?: optional dot
  • \d : decimal numbers

CodePudding user response:

It can be done in a single findall without any loop:

import re
data_list = ['0 .25 4.06 5.12', '0 0-.033 933.00 9 48.002']

print (re.findall(r'-?(?!0 [  ])\d*\.?\d ', ' '.join(data_list)))

Output:

['.25', '4.06', '5.12', '0', '-.033', '933.00', '9', '48.002']

RegEx Demo

RegEx Details:

  • -?: Match optional -
  • (?!0 [ ]): Negative lookahead to fail the match if we have just 0s in input
  • \d*\.?\d : Match an integer ot floating point number
  • Related