I'm trying to extract any strings with an underscore _
in the middle. For example, from the string s
below:
s = "name1_name2 _ nothing test1 _ test 2_ _3"
I would like to extract name1_name2
.
Thank you for reading!
CodePudding user response:
>>> import re
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4'
>>> re.findall('[a-zA-Z0-9] _[a-zA-Z0-9] ', s)
['name1_name2', 'name3_name4']
import re
: this imports the regular expression module, which is part of the standard library. I suggest you get familiar with it, it's useful in many use cases.
re.findall
: the findall
method from the module re
returns all non-overlapping matches of pattern in string, as a list of strings or tuples.
[a-zA-Z0-9] _[a-zA-Z0-9]
: this regex means any a
to z
, lowercase or uppercase characters, followed by an underscore, followed by any a
to z
, lowercase or uppercase characters.
The regex \w _\w
might have unintended consequences. Look at the differences below:
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4 ˆd_d'
>>> re.findall('[a-zA-Z0-9] _[a-zA-Z0-9] ', s)
['name1_name2', 'name3_name4', 'd_d']
>>> re.findall('\w _\w ', s)
['name1_name2', 'name3_name4', 'ˆd_d']
One might say that you can use \w
passing the ASCII flag, which equals to [a-zA-Z0-9_]
but as you can note, there's an underscore which also can have unintended consequences:
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4 ˆd_d_'
>>> re.findall('\w _\w ', s, re.ASCII)
['name1_name2', 'name3_name4', 'd_d_']
CodePudding user response:
How's this?
[\w] _[\w]
Does that work for you?
CodePudding user response:
You can use regex given below and it will select any string having single _
in its name.
^\S [_]\S