I have a list of strings looking like this:
['MEASUREMENT K02313 New York',\
'MEASUREMENT K02338 London [BC:2.7.7.7]',\
'MEASUREMENT K14761 Kairo [BC:1.2.-.-]',\
'MEASUREMENT K03629 Berlin',\
'MEASUREMENT K02470 Paris [BC:5.6.2.-]',\
'MEASUREMENT K02469 Madrid [BC:5.43.2.2]',\
....]
As you can see some elements in the list have a string with the format BC:x.x.x.x, with x either being a number from 0-999 or a hyphen ("-").
Now I want to get a seperate list that has all of these BC:x.x.x.x elements saved.
I tried using a regular expression:
re.findall(r"BC:([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-).([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-).([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-).([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|-)", list_name)
but it doesn't work, I get the following error message:
TypeError: expected string or bytes-like object
CodePudding user response:
Use a list comprehension along with re.search
:
inp = ['MEASUREMENT K02313 New York', 'MEASUREMENT K02338 London [BC:2.7.7.7]', 'MEASUREMENT K14761 Kairo [BC:1.2.-.-]', 'MEASUREMENT K03629 Berlin', 'MEASUREMENT K02470 Paris [BC:5.6.2.-]', 'MEASUREMENT K02469 Madrid [BC:5.43.2.2]']
output = [re.search(r'\[BC:.*?\]', x).group() for x in inp if '[BC:' in x]
print(output)
This prints:
['[BC:2.7.7.7]', '[BC:1.2.-.-]', '[BC:5.6.2.-]', '[BC:5.43.2.2]']
CodePudding user response:
Your code doesn't work because you're currently passing a tuple of strings to the re.findall
method. If you want to use a single command, then transform your tuple of strings into a single string:
re.findall(r"BC:[\d\.] ", ' '.join(list_name))
CodePudding user response:
You can use your pattern without capture groups as you only want the match.
Then per item in the list get the match using a list comprehension (assuming there is 1 BC: item between square brackets)
The pattern can be shortened with a quantifier {3}
repeating the dot and a number 0-999 without a leading zeroes and there can be one alternative less by adding the hyphen to the first character class [a-z-]
\bBC:(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])(?:\.(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])){3}
import re
lst = ['MEASUREMENT K02313 New York',
'MEASUREMENT K02338 London [BC:2.7.7.7]',
'MEASUREMENT K14761 Kairo [BC:1.2.-.-]',
'MEASUREMENT K03629 Berlin',
'MEASUREMENT K02470 Paris [BC:5.6.2.-]',
'MEASUREMENT K02469 Madrid [BC:5.43.2.2]'
]
pattern = r"\bBC:(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])(?:\.(?:[0-9-]|[1-9][0-9]|[1-9][0-9][0-9])){3}"
print ([m.group() for s in lst for m in [re.search(pattern, s)] if m])
Output
['BC:2.7.7.7', 'BC:1.2.-.-', 'BC:5.6.2.-', 'BC:5.43.2.2']
CodePudding user response:
It seems that you are passing an array to findall function, try iterating through every element of that array and pass it to findall function.