I have strings that look like this:
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
Desired output:
splitted = ['The dog went for a walk', 'El perro fue de paseo']
Current code:
splitted = re.split("^@:$", sentences)
So, id like to split the sentences based on characters beginning with an add symbol @ and ending with a colon : , as these are the way all languages are encoded, e.g. (@en:, @es:, @fr:, @nl: etc.)
CodePudding user response:
You can split on from @ to : without matching any of those chars in between using a negated character class.
There might be empty entries in the result, which you can filter out.
@[^@:]*:
import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = [s for s in re.split("@[^@:]*:", sentences) if s]
print(splitted)
Output
['The dog went for a walk', 'El perro fue de paseo']
CodePudding user response:
hello try this code it will help you
import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = re.split(r"@[a-zA-z] :",sentences)
print(splitted)
CodePudding user response:
You need this regex : @[^@:] :
first, @
match a @
next, [^@:]
match any number of characters (minimum one) that are not @
or :
finally, :
match a :
import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = re.split("@[^@:] :", sentences)
print(splitted[1:])
output:
['The dog went for a walk', 'El perro fue de paseo']