Home > Back-end >  Extract substrings with regular expression
Extract substrings with regular expression

Time:10-15

Let's say I have a string:

L1045    $    u0    $    m0    $    BIANCA    $    They do not!

And I need to extract the name - BIANCA and the text that is at the end into two variables. I tried to do somthen like this:

dialogue = "L1045    $    u0    $    m0    $    BIANCA    $    They do not!"
name : str = ""
line : str = ""
name = re.findall('^L.*\s(. ?)\s.*', dialogue)

but I'm a little confused about using regular expression. How can I solve this using regular expression?

Thanks!

CodePudding user response:

You can do that without re

data = "L1045    $    u0    $    m0    $    BIANCA    $    They do not!"
parts = data.split('   $   ')
print(parts[-2].strip())
print(parts[-1].strip())

output

BIANCA
They do not!

CodePudding user response:

You can use this regex:

[ \t]([^ ] )[ \t]\ {3}\$\ {3}[ \t] ([^ ] )$

Demo

Python:

import re

dialogue = "L1045    $    u0    $    m0    $    BIANCA    $    They do not!"

>>> re.findall(r'[ \t]([^ ] )[ \t]\ {3}\$\ {3}[ \t] ([^ ] )$', dialogue)
[('BIANCA', 'They do not!')]

You can also split and slice:

>>> re.split(r'[ \t]\ {3}\$\ {3}[ \t]', dialogue)[-2:]
['BIANCA', ' They do not!']

But split and slice does not gracefully fail if $ is not found; the search pattern above does.

  • Related