i have a function remove_stopwords like this, how to make it run faster?
temp.reverse()
def drop_stopwords(text):
for x in temp:
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y len(x.split())]) == x:
del text_list[y:y len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
time for solve a text in my data is 14s and if i have some trick like this time for will decrease to 3s:
temp.reverse()
def drop_stopwords(text):
for x in temp:
if len(x.split()) >2:
if x in text:
text = text.replace(x,'')
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y len(x.split())]) == x:
del text_list[y:y len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
but i think it may get wrong some where in my language. How can i rewrite this function in python to make it faster ( in C and C i can solve it ez with func above :(( )
CodePudding user response:
Your function does a lot of the same thing over and over, particularly repeated split
and join
of the same text
. Doing a single split
, operating on the list, and then doing a single join
at the end might be faster, and would definitely lead to simpler code. Unfortunately I don't have any of your sample data to test the performance with, but hopefully this gives you something to experiment with:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
text_list = text.split()
text_len = len(text_list)
for word in temp:
word_list = word.split()
word_len = len(word_list)
for i in range(text_len 1 - word_len):
if text_list[i:i word_len] == word_list:
text_list[i:i word_len] = [None] * word_len
return ' '.join(t for t in text_list if t)
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
You could also just try iteratively doing text.replace
in all cases and seeing how that performs compared to your more complex split
-based solution:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
for word in temp:
text = text.replace(word, '')
return ' '.join(text.split())
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog