I know this question has already been answered in the past. But I am however still encountering difficulties, although I have tried applying multiple suggestions I found online. So, what I want is quite easy. Split this string
"__label__2:somedata"
or
"__label__43:somedata"
and get
['__label__2:', 'somedata']
or
['__label__43:', 'somedata'].
Here is the code I have:
import re
line = "__label__2:somedata"
p = re.split("(__label__{1,2}:)", line)
print (p)
But this unfortunately prints
['__label__2:somedata']
What am I doing wrong here?
CodePudding user response:
You need to add \d inside your regular expression, like so:
(__label__\d :)
This also allows you to capture all numericals rather than having to list all possible values...
CodePudding user response:
"(__label__{1,2}:)"
is not doing what you think. {1,2}
is requesting 1 or 2 repeats of the __label__
string, not the characters 1
or 2
.
The correct syntax is using [12]
:
import re
re.split('(__label__[12]:)', "__label__2:somedata")
output: ['', '__label__2:', 'somedata']
If you want to split after any __label__
digit :
use \d
, also slice to remove the first empty string:
>>> re.split('(__label__\d :)', "__label__789:somedata")[1:]
['__label__789:', 'somedata']
CodePudding user response:
You seem to hesitate between spliting the string and matching a part of it. Both are possible but they are different and have different use cases.
split:
You have just to split on
:
and add the delimiter to all parts but the last:lst = line.split(':') mx = len(lst) - 1 result = [ s if i == mx else s ':' for i, s in enumerate(lst)]
match:
You have to match the first part and, separately, the rest of the line:
m = re.match('(__label__\\d{1,2}:)(.*)', line) resul = m.groups()
You will split if you can expect to have more than 2 fields, and match if you want to control the pattern of the first one.
CodePudding user response:
You can use .partition:
>>> s="__label__2:somedata"
>>> t=s.partition(':')
>>> [t[0] t[1], t[2]]
['__label__2:', 'somedata']
If you have a bunch, you can use a comprehension:
cases=("__label__2:somedata", "__label__43:somedata" )
>>> [[t[0] t[1], t[2]] for t in map(lambda s: s.partition(':'), cases)]
[['__label__2:', 'somedata'], ['__label__43:', 'somedata']]