Home > database >  Intersect split string with partial words on list (possibly with regex)
Intersect split string with partial words on list (possibly with regex)

Time:03-16

I have to lists:

keywords = ['critic', 'argu', 'dog', 'cat']
splitSentences = ['Add', 'critical', 'argument', 'birds']

I need to find how many words in splitSentence begin with words of keywords. In my example, that would be 2 (for critical matching "critic" and argument matching "argu").

The problem is that doing set(keywords).intersection(splitSentences) returns 0. I tried prefixing every word in keywords with ^, but it still returns 0.

Apologies, quite new on Python. I'm working on a Jupyter notebook.

CodePudding user response:

With regex:

import re

for i in keywords:
    count = 0
    pref = '^'  i
    for word in splitSentences:
        if re.match(pref, word):
            count  = 1
    print(count)

The semi one liner:

for i in keywords:
    print(sum([1 for word in splitSentences if word.startswith(i)]))

The one liner:

print({el:sum([1 for word in splitSentences if word.startswith(el)]) for el in keywords})

CodePudding user response:

keywords = ['critic', 'argu', 'dog', 'cat']
splitSentences = ['Add', 'critical', 'argument', 'birds']

for s in splitSentences:
  for k in keywords:
    if s.startswith(k):
      print(s)

Pretty much self-explanatory. Iterate on splitSentences and for each word in splitSentences iterate on keywords and check if it starts with the keyword.

One-liner:

[s for k in keywords for s in splitSentences if s.startswith(k)]

Time complexity: O(sk). Trie data-structure will be more efficient: O(s k)

  • Related