Home > Software engineering >  Get values after keyword until next keyword from string
Get values after keyword until next keyword from string

Time:09-28

I have a string:

entry: 1.0 - 2.0 stop:1.0 tp: 3.0, 4.0 risk: medium, type: Long

String originally is list of all words that I combine with a simple for loop:

final_string = ""
for element in string_list:
    final_string  = element   " "

string_list would be = ("entry:", "1.0", "-", "2.0", "stop:1.0", "tp:", "3.0,", "4.0", "risk:",   "medium,", "type:", "Long",)

I want to extract one by one each value to a certain variable, result would be:

entry = " 1.0 - 2.0 "
stop = "1.0 "
tp = " 3.0, 4.0 "
risk = "   medium, "
type = " Long"

At first I wanted to append each word to a string until I stumbled upon a member of list of all possible keywords - ["entry:", "stop:", "tp:", "risk:", "type:", "info:", "lev:"] but after trying it, my only idea of implementation was both non-optimal and didn't find values that weren't seperated by ' ' (space) from keyword.

CodePudding user response:

Regex is actually a good tool for this, if we use a lookahead.

import re

st = "entry: 1.0 - 2.0 stop:1.0 tp: 3.0, 4.0 risk:   medium, type: Long"

keywords = dict(re.findall(r'(\w*)\:(.*?)(?=\w*\:|$)', st))
# {'entry': ' 1.0 - 2.0 ', 
#  'stop': '1.0 ', 
#  'tp': ' 3.0, 4.0 ', 
#  'risk': '   medium, ', 
#  'type': ' Long'}

To break down the regex used:

(\w*)\:(.*?)(?=\w*\:|$)

(\w*)                    capture continous word characters (no spaces)
     \:                  followed by a literal ':', which we ignore
       (.*?)             capture any character, non-greedy
            (?=       )  and stop capturing when the following is ahead
               \w*\:     another keyword (continuous word characters followed by ':')
                    |    or
                     $   the end of the string

If you want the : to be included as part of the keyword itself, then just move the \: to inside the first capture of the regular expression: (\w*\:)(.*?)(?=\w*\:|$)

CodePudding user response:

You shouldn't really be creating separate variables to hold each key (see How do I create a variable number of variables). It's much better to create a dictionary with those keys.

keys = []
vals = []

for item in string_list:
    if ":" in item:
        # Split item by :
        kk = item.split(":", 1)
        # First element of kk is the key, so 
        # Add a new key
        keys.append(kk[0])
        # Add a new list to hold values for this key
        # Any subsequent elements of kk are part of the value. 
        vals.append(kk[1:])
    else:
       # Append item to last element of vals
        vals[-1].append(item)

Now, you have:

keys = ['entry', 'stop', 'tp', 'risk', 'type']

vals = [['', '1.0', '-', '2.0'],
 ['1.0'],
 ['', '3.0,', '4.0'],
 ['', 'medium,'],
 ['', 'Long']]

To create your dictionary, you can iterate over keys and vals, joining all the items in each vals entry. Using str.join() to join the elements of each vals entry, you can even filter out the blank values.

result = {
    k: " ".join(v_elem for v_elem in v if v_elem) 
    for k, v in zip(keys, vals)
}

Which gives the following dict:

{'entry': '1.0 - 2.0',
 'stop': '1.0',
 'tp': '3.0, 4.0',
 'risk': 'medium,',
 'type': 'Long'}
  • Related