This is the String i am getting from textfile in python:
String= '''/begin FUNCTION N1ame1 Some texts and special char
/begin Variable_1name1
Adlkfj_dADDF_A32111 BAd_afd111
/end Variable_1name1
/begin Variable_1name2
Adlkfj_dADDF_A32222 BAd_afd222
/end Variable_1name2
/begin Variable_1name3
Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss
/end Variable_1name3
FUNCTION_DFADS
/end FUNCTION
/begin FUNCTION N2ame2 Sometexts and special char "dlfkjaodfja;lkd
/begin Variable_1name1
Adlkfj_dADDF_A32111_1 BAd_afd111_1
/end Variable_1name1
/begin Variable_1name2
Adlkfj_dADDF_A32222_2 BAd_afd222_2
/end Variable_1name2
/begin Variable_1name3
Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3
/end Variable_1name3
FUNCTION_DFADS
/end FUNCTION'''
I need to get the data using single FUNCTION name and multiple VARIABLE names:
example 1: if /begin FUNCTION N1ame1 and /begin Variable_1name1 expected output:
['Adlkfj_dADDF_A32111','BAd_afd111']
example 2: if /begin FUNCTION N1ame2 and /begin Variable_1name1 , /begin Variable_1name3
expected output: (two dimensional array output also no issues)
['Adlkfj_dADDF_A32111_1', 'BAd_afd111_1','Adlkfj_dADDF_A32333_3','BAd333_afd333_3','333DSFadss_3']
I tried basic pattern matching,split,findall functions but two line pattern matching is diffcult:
res = re.findall('begin(.*?)end', string, re.DOTALL)
print ( [s.split() for s in re.findall(r'/Variable_1name1\s ((?:. \n) ?)/end ', String)] )
I have solution with for loop trying in regex and understanding concept it is possible with this pattern?
CodePudding user response:
Just basic text parsing.
String= '''/begin FUNCTION N1ame1 Some texts and special char
/begin Variable_1name1
Adlkfj_dADDF_A32111 BAd_afd111
/end Variable_1name1
/begin Variable_1name2
Adlkfj_dADDF_A32222 BAd_afd222
/end Variable_1name2
/begin Variable_1name3
Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss
/end Variable_1name3
FUNCTION_DFADS
/end FUNCTION
/begin FUNCTION N2ame2 Sometexts and special char "dlfkjaodfja;lkd
/begin Variable_1name1
Adlkfj_dADDF_A32111_1 BAd_afd111_1
/end Variable_1name1
/begin Variable_1name2
Adlkfj_dADDF_A32222_2 BAd_afd222_2
/end Variable_1name2
/begin Variable_1name3
Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3
/end Variable_1name3
FUNCTION_DFADS
/end FUNCTION'''
data = {}
variable = False
for line in String.splitlines():
parts = line.split()
if parts[0] == "/begin":
if parts[1] == "FUNCTION":
funcname = parts[2]
data[funcname] = []
elif parts[1].startswith('Variable'):
variable = True
elif parts[0] == "/end":
variable = False
elif variable:
data[funcname].append(parts)
from pprint import pprint
pprint(data)
Output:
{'N1ame1': [['Adlkfj_dADDF_A32111', 'BAd_afd111'],
['Adlkfj_dADDF_A32222', 'BAd_afd222'],
['Adlkfj_dADDF_A32333', 'BAd333_afd333', '333DSFadss']],
'N2ame2': [['Adlkfj_dADDF_A32111_1', 'BAd_afd111_1'],
['Adlkfj_dADDF_A32222_2', 'BAd_afd222_2'],
['Adlkfj_dADDF_A32333_3', 'BAd333_afd333_3', '333DSFadss_3']]}
CodePudding user response:
text= '''/begin FUNCTION N1ame1 Some texts and special char
/begin Variable_1name1
Adlkfj_dADDF_A32111 BAd_afd111
/end Variable_1name1
/begin Variable_1name2
Adlkfj_dADDF_A32222 BAd_afd222
/end Variable_1name2
/begin Variable_1name3
Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss
/end Variable_1name3
FUNCTION_DFADS
/end FUNCTION
/begin FUNCTION N2ame2 Sometexts and special char "dlfkjaodfja;lkd
/begin Variable_1name1
Adlkfj_dADDF_A32111_1 BAd_afd111_1
/end Variable_1name1
/begin Variable_1name2
Adlkfj_dADDF_A32222_2 BAd_afd222_2
/end Variable_1name2
/begin Variable_1name3
Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3
/end Variable_1name3
FUNCTION_DFADS
/end FUNCTION'''
# one can find the blocks of needed lines between the begin and end markers with re.findall
lst = re.findall(r'\/begin Variable_(.*?)\s*\/end Variable_', text, re.MULTILINE | re.DOTALL)
# removing extra space
print([re.sub(r'\s ', ' ', x) for x in lst])
['1name1 Adlkfj_dADDF_A32111 BAd_afd111', '1name2 Adlkfj_dADDF_A32222 BAd_afd222', '1name3 Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss', '1name1 Adlkfj_dADDF_A32111_1 BAd_afd111_1', '1name2 Adlkfj_dADDF_A32222_2 BAd_afd222_2', '1name3 Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3']