Home > Back-end >  Get words in a text up to any stop word, improving code
Get words in a text up to any stop word, improving code

Time:07-27

I am dealing with getting the list of words of a text backwards from a particular position (in this example just the last position) up to any stopword (I have a list of stopwords).

The code I have is this:

stopwords = ['one','this','or']
mytext    = 'this is a text with a car more than this other blue moon name'

result=[]
for word in mytext.split()[::-1]:
    if word not in stopwords:
        result.append(word)
    else:
        break

print((' ').join(result[::-1]))

This perfectly works. result is "other blue moon name". Now, I have the intuition (I can not prove) that there should be a better way than this super chunky code for such a little thing?

Any idea for a ONELINER???

CodePudding user response:

There's actually a reasonable way to do it in a one-liner using itertools:

from itertools import takewhile

stopwords = ['one','this','or']
mytext    = 'this is a text with a car more than this other blue moon name'

result = " ".join(list(takewhile(lambda x: x not in stopwords, reversed(mytext.split())))[::-1])

Might be easier with regex, though

import re

stopwords = ['one','this','or']
mytext    = 'this is a text with a car more than this other blue moon name'

# construct the regex matching string based on stopwords, instead of
# constructing it manually.
# Manual construction would just be r'.*(?:one|this|or)\W?(.*$)'
rstr = f'.*(?:{"|".join(stopwords)})\\W?(.*$)'
result = re.match(rstr, mytext).group(1)

CodePudding user response:

the only solution I can imagine without any loop is the following, but unfortunately you need to import numpy as np

import numpy as np

#input
stopwords = ['one','this','or']
mytext    = 'this is a text with a car more than this other blue moon name'

#ONELINER
" ".join(mytext.split(" ")[-np.isin(mytext.split(" ")[::-1], stopwords).argmax():])

#output
'other blue moon name'
  • Related