Home > Software design >  How to split a string without removing the delimiter?
How to split a string without removing the delimiter?

Time:10-30

I know this question has already been answered in the past. But I am however still encountering difficulties, although I have tried applying multiple suggestions I found online. So, what I want is quite easy. Split this string

"__label__2:somedata" 

or

"__label__43:somedata" 

and get

['__label__2:', 'somedata'] 

or

['__label__43:', 'somedata'].

Here is the code I have:

import re
line = "__label__2:somedata"
p = re.split("(__label__{1,2}:)", line)
print (p)

But this unfortunately prints

['__label__2:somedata']

What am I doing wrong here?

CodePudding user response:

You need to add \d inside your regular expression, like so:

(__label__\d :)

This also allows you to capture all numericals rather than having to list all possible values...

CodePudding user response:

"(__label__{1,2}:)" is not doing what you think. {1,2} is requesting 1 or 2 repeats of the __label__ string, not the characters 1 or 2.

The correct syntax is using [12]:

import re

re.split('(__label__[12]:)', "__label__2:somedata")

output: ['', '__label__2:', 'somedata']

If you want to split after any __label__ digit : use \d , also slice to remove the first empty string:

>>> re.split('(__label__\d :)', "__label__789:somedata")[1:]
['__label__789:', 'somedata']

CodePudding user response:

You seem to hesitate between spliting the string and matching a part of it. Both are possible but they are different and have different use cases.

  1. split:

    You have just to split on : and add the delimiter to all parts but the last:

     lst = line.split(':')
     mx = len(lst) - 1
     result = [ s if i == mx else s   ':' for i, s in enumerate(lst)]
    
  2. match:

    You have to match the first part and, separately, the rest of the line:

     m = re.match('(__label__\\d{1,2}:)(.*)', line)
     resul = m.groups()
    

You will split if you can expect to have more than 2 fields, and match if you want to control the pattern of the first one.

CodePudding user response:

You can use .partition:

>>> s="__label__2:somedata"
>>> t=s.partition(':')
>>> [t[0] t[1], t[2]]
['__label__2:', 'somedata']

If you have a bunch, you can use a comprehension:

cases=("__label__2:somedata", "__label__43:somedata" )

>>> [[t[0] t[1], t[2]] for t in map(lambda s: s.partition(':'), cases)]
[['__label__2:', 'somedata'], ['__label__43:', 'somedata']]
  • Related