Regular expression in Python to split a string based on characters that begin with @ and end with :?-CodePudding

I have strings that look like this:

sentences = "@en:The dog went for a walk@es:El perro fue de paseo"

Desired output:

splitted = ['The dog went for a walk', 'El perro fue de paseo']

Current code:

splitted = re.split("^@:$", sentences)

So, id like to split the sentences based on characters beginning with an add symbol @ and ending with a colon : , as these are the way all languages are encoded, e.g. (@en:, @es:, @fr:, @nl: etc.)

CodePudding user response：

You can split on from @ to : without matching any of those chars in between using a negated character class.

There might be empty entries in the result, which you can filter out.

@[^@:]*:

Regex demo

import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = [s for s in re.split("@[^@:]*:", sentences) if s]

print(splitted)

Output

['The dog went for a walk', 'El perro fue de paseo']

CodePudding user response：

hello try this code it will help you

import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo" 
splitted = re.split(r"@[a-zA-z] :",sentences)  
print(splitted)

CodePudding user response：

You need this regex : @[^@:] :

first, @ match a @

next, [^@:] match any number of characters (minimum one) that are not @ or :

finally, : match a :

import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = re.split("@[^@:] :", sentences)
print(splitted[1:])

output:

['The dog went for a walk', 'El perro fue de paseo']