Home > Software design >  Making multiple "any" more efficient
Making multiple "any" more efficient

Time:05-15

I am using any to see if a string in a longer string (description) matches with any strings across several lists. I have the code working, but I feel like it's an inefficient way of doing a comparison, and would like feedback on how I can make it more efficient.

def convert_category(description):
    categoryFood = ['COUNTDOWN', 'BAKE', 'MCDONALDS', 'ST PIERRE', 'PAK N SAVE', 'NEW WORLD']
    categoryDIY = ['BUNNINGS', 'MITRE10']

    containsFood = any(keyword in description for keyword in categoryFood)
    containsDIY = any(keyword in description for keyword in categoryDIY)

    if(containsFood):
        return 'Food and Groceries'
    elif(containsDIY):
        return 'Home and DIY'
    return ''

CodePudding user response:

I would use a regular expression. They are optimized for this kind of problem - searching for any of multiple strings - and the hot part of the code is pushed into a fast library. With big enough strings you should notice the difference.

import re

foodPattern = '|'.join(map(re.escape, categoryFood))
diyPattern = '|'.join(map(re.escape, categoryDIY))

containsFood = re.search(foodPattern, description) is not None
containsDiy = re.search(diyPattern, description) is not None

You can easily extend this with word boundary or similar features to make the keyword matching be smarter/only match whole words.

CodePudding user response:

The only way to make this faster is some negligible work to return some statements easier from the sounds of things. Marking as answered and closing.

  • Related