I am using any
to see if a string in a longer string (description) matches with any strings across several lists. I have the code working, but I feel like it's an inefficient way of doing a comparison, and would like feedback on how I can make it more efficient.
def convert_category(description):
categoryFood = ['COUNTDOWN', 'BAKE', 'MCDONALDS', 'ST PIERRE', 'PAK N SAVE', 'NEW WORLD']
categoryDIY = ['BUNNINGS', 'MITRE10']
containsFood = any(keyword in description for keyword in categoryFood)
containsDIY = any(keyword in description for keyword in categoryDIY)
if(containsFood):
return 'Food and Groceries'
elif(containsDIY):
return 'Home and DIY'
return ''
CodePudding user response:
I would use a regular expression. They are optimized for this kind of problem - searching for any of multiple strings - and the hot part of the code is pushed into a fast library. With big enough strings you should notice the difference.
import re
foodPattern = '|'.join(map(re.escape, categoryFood))
diyPattern = '|'.join(map(re.escape, categoryDIY))
containsFood = re.search(foodPattern, description) is not None
containsDiy = re.search(diyPattern, description) is not None
You can easily extend this with word boundary or similar features to make the keyword matching be smarter/only match whole words.
CodePudding user response:
The only way to make this faster is some negligible work to return some statements easier from the sounds of things. Marking as answered and closing.