Home > Back-end >  What is the most efficient way to grab and store part of a given string between keywords with python
What is the most efficient way to grab and store part of a given string between keywords with python

Time:02-23

I have an array of keywords:

keyword_list = ['word1', 'anotherWord', 'wordup', 'word to your papa']

I have a string of text:

string_of_text = 'So this is a string of text. I want to talk about anotherWord...and then I'm going to say something I've been meaning to say "wordup". But I also wanted to say the following: word to your papa. And lastly I wanted to talk about word1...'

I want to return the following:

{'list_word': 'word1', 'string_of_text_after': '...'}, {'list_word': 'anotherWord', 'string_of_text_after': '...and then I'm going to say something I've been meaning to say "'}, {'list_word': 'wordup', 'string_of_text_after': '". But I also wanted to say the following: '}, {list_word: 'word to your papa', 'string_of_text_after':'. And lastly I wanted to talk about '}

As you can see it is a list of dictionaries with the list word and then the text that comes after the list word item but only until the next list word is detected is when it would discontinue.

What would be the most efficient way to do this in python (python 3 or later, 2 is also ok if there are any issues with deprecated methods).

CodePudding user response:

you could try something like this:

keyword_list = ['word1', 'anotherWord', 'wordup', 'word to your papa']

string_of_text = """So this is a string of text.  I want to talk about anotherWord...\
                  and then I'm going to say something I've been meaning to say "wordup".\
                  But I also wanted to say the following: word to your papa.\
                  And lastly I wanted to talk about word1..."""


def t(k, t):
    ls = len(t)
    tmp = {i:len(i) for i in k}
    return [{"list_word":i,"string_of_text_after":t[t.find(i) tmp[i]:]} for i in tmp if t.find(i)>0]

from pprint import pprint

pprint(t(keyword_list,string_of_text))

Result:

[{'list_word': 'wordup',
  'string_of_text_after': '".                  But I also wanted to say the following: word to your papa.                  And lastly I wanted to talk about word1...'},
 {'list_word': 'word1', 'string_of_text_after': '...'},
 {'list_word': 'anotherWord',
  'string_of_text_after': '...                  and then I\'m going to say something I\'ve been meaning to say "wordup".                  But I also wanted to say the following: word to your papa.                  And lastly I wanted to talk about word1...'},
 {'list_word': 'word to your papa',
  'string_of_text_after': '.                  And lastly I wanted to talk about word1...'}]

ATTENTION

This code has several implications :

  1. the keyword_list has to be of unique elements ...
  2. the call t.find(i) is doubled
  3. the function returns a list, which must be saved in your memory, this could be fixed if you chose to return a generator like this :

return ({"list_word":i,"string_of_text_after":t[t.find(i) tmp[i]:]} for i in tmp if t.find(i)>0) and to call it where und when needed.

Good luck ! :)

  • Related