Home > front end >  Extract/Match strings with underscore "_" in between two words
Extract/Match strings with underscore "_" in between two words

Time:03-31

I'm trying to extract any strings with an underscore _ in the middle. For example, from the string s below:

s = "name1_name2 _ nothing test1 _ test 2_ _3"

I would like to extract name1_name2.

Thank you for reading!

CodePudding user response:

>>> import re
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4'
>>> re.findall('[a-zA-Z0-9] _[a-zA-Z0-9] ', s)
['name1_name2', 'name3_name4']

import re: this imports the regular expression module, which is part of the standard library. I suggest you get familiar with it, it's useful in many use cases.

re.findall: the findall method from the module re returns all non-overlapping matches of pattern in string, as a list of strings or tuples.

[a-zA-Z0-9] _[a-zA-Z0-9] : this regex means any a to z, lowercase or uppercase characters, followed by an underscore, followed by any a to z, lowercase or uppercase characters.

The regex \w _\w might have unintended consequences. Look at the differences below:

>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4 ˆd_d'
>>> re.findall('[a-zA-Z0-9] _[a-zA-Z0-9] ', s)
['name1_name2', 'name3_name4', 'd_d']
>>> re.findall('\w _\w ', s)
['name1_name2', 'name3_name4', 'ˆd_d']

One might say that you can use \w passing the ASCII flag, which equals to [a-zA-Z0-9_] but as you can note, there's an underscore which also can have unintended consequences:

>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4 ˆd_d_'
>>> re.findall('\w _\w ', s, re.ASCII)
['name1_name2', 'name3_name4', 'd_d_']

CodePudding user response:

How's this?

[\w] _[\w] 

Does that work for you?

CodePudding user response:

You can use regex given below and it will select any string having single _ in its name.

^\S [_]\S 
  • Related