Home > Enterprise >  Is there a better way to capture all the regex patterns in matching with nested lists within a dicti
Is there a better way to capture all the regex patterns in matching with nested lists within a dicti

Time:09-21

I am trying out a simple text-matching activity where I scraped titles of blog posts and try to match it with my pre-defined categories once I find specific keywords.

So for example, the title of the blog post is

"Capture Perfect Night Shots with the Oppo Reno8 Series"

Once I ensure that "Oppo" is included in my categories, "Oppo" should match with my "phone" category like so:

categories = {"phone" : ['apple', 'oppo', 'xiaomi', 'samsung', 'huawei', 'nokia'],
"postpaid" : ['signature', 'postpaid'],
"prepaid" : ['power all', 'giga'],
"sku" : ['data', 'smart bro'],
"ewallet" : ['gigapay'],
"event" : ['gigafest'],
"software" : ['ios', 'android', 'macos', 'windows'],
"subculture" : ['anime', 'korean', 'kpop', 'gaming', 'pop', 'culture', 'lgbtq', 'binge', 'netflix', 'games', 'ml', 'apple music'],
"health" : ['workout', 'workouts', 'exercise', 'exercises'],
"crypto" : ['axie', 'bitcoin', 'coin', 'crypto', 'cryptocurrency', 'nft'],
"virtual" : ['metaverse', 'virtual']}

Then my dataframe would look like enter image description here

The advantage of this approach is that a title can have multiple categories assigned to it (see the 2nd title)

  • Related