I have list_a
and string_tmp
like this
list_a = ['AA', 'BB', 'CC']
string_tmp = 'Hi AA How Are You'
I want to find out is there any of string_tmp
items in the list_a
, if it is, type = L1
else type = L2
?
# for example
type = ''
for k in string_tmp.split():
if k in list_a:
type = 'L1'
if len(type) == 0:
type = 'L2'
this is the real problem but in my project, len(list_a) = 200,000
and len(strgin_tmp) = 10,000
, so I need that to be super fast
# this is the output of the example
type = 'L1'
CodePudding user response:
Converting the reference list and string tokens to sets should enhance performance. Something like this:
list_a = ['AA', 'BB', 'CC']
string_tmp = 'Hi AA How Are You'
def get_type(s, r): # s is the string, r is the reference list
s = set(s.split())
r = set(r)
return 'L1' if any(map(lambda x: x in r, s)) else 'L2'
print(get_type(string_tmp, list_a))
Output:
L1
CodePudding user response:
Using regex along with a list comprehension we can try:
list_a = ['AA', 'BB', 'CC']
string_tmp = 'Hi AA How Are You'
output = ['L1' if re.search(r'\b' x r'\b', string_tmp) else 'L2' for x in list_a]
print(output) # ['L1', 'L2', 'L2']
CodePudding user response:
Efficiency depends on which of the two inputs is the most invariant. For instance, if list_a
remains the same, but you have different strings to test with, then it may be worth to turn that list into a regular expression and then use it for different strings.
Here is a solution where you create an instance of a class for a given list. Then use this instance repeatedly for different strings:
import re
class Matcher:
def __init__(self, lst):
self.regex = re.compile(r"\b(" "|".join(re.escape(key) for key in lst) r")\b")
def typeof(self, s):
return "L1" if self.regex.search(s) else "L2"
# demo
list_a = ['AA', 'BB', 'CC']
matcher = Matcher(list_a)
string_tmp = 'Hi AA How Are You'
print(matcher.typeof(string_tmp)) # L1
string_tmp = 'Hi DD How Are You'
print(matcher.typeof(string_tmp)) # L2
A side effect of this regular expression is that it also matches words when they have punctuation near them. For instance, the above would still return "L1" when the string is 'Hi AA, How Are You' (with the additional comma).