Home > Enterprise >  Search for a partial string in a list of strings
Search for a partial string in a list of strings

Time:06-27

I have a list of strings looking like this:

['MEASUREMENT   K02313  New York',\
 'MEASUREMENT   K02338  London [BC:2.7.7.7]',\
 'MEASUREMENT   K14761  Kairo [BC:1.2.-.-]',\
 'MEASUREMENT   K03629  Berlin',\
 'MEASUREMENT   K02470  Paris [BC:5.6.2.-]',\
 'MEASUREMENT   K02469  Madrid [BC:5.43.2.2]',\
....]

As you can see some elements in the list have a string with the format BC:x.x.x.x, with x either being a number from 0-999 or a hyphen ("-").
Now I want to get a seperate list that has all of these BC:x.x.x.x elements saved.
I tried using a regular expression:

re.findall(r"BC:([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-).([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-).([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-).([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-)", list_name)

but it doesn't work, I get the following error message:

TypeError: expected string or bytes-like object

CodePudding user response:

Use a list comprehension along with re.search:

inp = ['MEASUREMENT   K02313  New York', 'MEASUREMENT   K02338  London [BC:2.7.7.7]', 'MEASUREMENT   K14761  Kairo [BC:1.2.-.-]', 'MEASUREMENT   K03629  Berlin', 'MEASUREMENT   K02470  Paris [BC:5.6.2.-]', 'MEASUREMENT   K02469  Madrid [BC:5.43.2.2]']
output = [re.search(r'\[BC:.*?\]', x).group() for x in inp if '[BC:' in x]
print(output)

This prints:

['[BC:2.7.7.7]', '[BC:1.2.-.-]', '[BC:5.6.2.-]', '[BC:5.43.2.2]']

CodePudding user response:

Your code doesn't work because you're currently passing a tuple of strings to the re.findall method. If you want to use a single command, then transform your tuple of strings into a single string:

re.findall(r"BC:[\d\.] ", ' '.join(list_name))

CodePudding user response:

You can use your pattern without capture groups as you only want the match.

Then per item in the list get the match using a list comprehension (assuming there is 1 BC: item between square brackets)

The pattern can be shortened with a quantifier {3} repeating the dot and a number 0-999 without a leading zeroes and there can be one alternative less by adding the hyphen to the first character class [a-z-]

\bBC:(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])(?:\.(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])){3}

Regex demo | Python demo

import re

lst = ['MEASUREMENT   K02313  New York',
       'MEASUREMENT   K02338  London [BC:2.7.7.7]',
       'MEASUREMENT   K14761  Kairo [BC:1.2.-.-]',
       'MEASUREMENT   K03629  Berlin',
       'MEASUREMENT   K02470  Paris [BC:5.6.2.-]',
       'MEASUREMENT   K02469  Madrid [BC:5.43.2.2]'
       ]
pattern = r"\bBC:(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])(?:\.(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])){3}"

print ([m.group() for s in lst for m in [re.search(pattern, s)] if m])

Output

['BC:2.7.7.7', 'BC:1.2.-.-', 'BC:5.6.2.-', 'BC:5.43.2.2']

CodePudding user response:

It seems that you are passing an array to findall function, try iterating through every element of that array and pass it to findall function.

  • Related