Home > Software design >  In python, find tokens in line
In python, find tokens in line

Time:03-20

long time ago I wrote a tool for parsing text files, line by line, and do some stuff, depending on commands and conditions in the file. I used regex for this, however, I was never good in regex.

A line holding a condition looks like this:

[type==STRING]

And the regex I use is:

re.compile(r'^[^\[\]]*\[([^\]\[=]*)==([^\]\[=]*)\][^\]\[]*$', re.MULTILINE)

This regex would result me the keyword "type" and the value "STRING".

However, now I need to update my tool to have more conditions in one line, e.g.

[type==STRING][amount==0]

I need to update my regex to get me two pairs of results, one pair type/STRING and one pair amount/0. But I'm lost on this. My regex above gets me zero results with this line.

Any ideas how to do this?

CodePudding user response:

You could either match a second pair of groups:

^[^\[\]]*\[([^\]\[=]*)==([^\]\[=]*)\][^\]\[]*(?:\[([^\]\[=]*)==([^\]\[=]*)\][^\]\[]*)?$

Regex demo

Or you can omit the anchors and the [^\[\]]* part to get the group1 and group 2 values multiple times:

\[([^\]\[=]*)==([^\]\[=]*)\]

Regex demo

CodePudding user response:

Is it a requirement that you use regex? You can alternatively accomplish this pretty easily using the split function twice and stripping the first opening and last closing bracket.

line_to_parse = "[type==STRING]"

# omit the first and last char before splitting
pairs = line_to_parse[1:-1].split("][") 
for pair in pairs:
    x, y = pair.split("==")

CodePudding user response:

Rather depends on the precise "rules" that describe your data. However, for your given data why not:

import re

text = '[type==STRING][amount==0]'
words = re.findall('\w ', text)
lst = []
for i in range(0, len(words), 2):
    lst.append((words[i], words[i 1]))

print(lst)

Output:

[('type', 'STRING'), ('amount', '0')]
  • Related