Home > Software design >  capture data between difficult pattern in python
capture data between difficult pattern in python

Time:08-25

This is the String i am getting from textfile in python:

    String= '''/begin FUNCTION N1ame1 Some texts and special char 
            /begin Variable_1name1  
                Adlkfj_dADDF_A32111 BAd_afd111
            /end Variable_1name1
            /begin Variable_1name2  
                Adlkfj_dADDF_A32222 BAd_afd222
            /end Variable_1name2
            /begin Variable_1name3    
                Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss
            /end Variable_1name3
            FUNCTION_DFADS
        /end FUNCTION
        /begin FUNCTION N2ame2 Sometexts and special char "dlfkjaodfja;lkd 
            /begin Variable_1name1  
                Adlkfj_dADDF_A32111_1 BAd_afd111_1
            /end Variable_1name1
            /begin Variable_1name2  
                Adlkfj_dADDF_A32222_2 BAd_afd222_2
            /end Variable_1name2
            /begin Variable_1name3  
                Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3
            /end Variable_1name3
            FUNCTION_DFADS
    /end FUNCTION'''

I need to get the data using single FUNCTION name and multiple VARIABLE names:

example 1: if /begin FUNCTION N1ame1 and /begin Variable_1name1 expected output:

    ['Adlkfj_dADDF_A32111','BAd_afd111']

example 2: if /begin FUNCTION N1ame2 and /begin Variable_1name1 , /begin Variable_1name3
expected output: (two dimensional array output also no issues)

    ['Adlkfj_dADDF_A32111_1', 'BAd_afd111_1','Adlkfj_dADDF_A32333_3','BAd333_afd333_3','333DSFadss_3']

I tried basic pattern matching,split,findall functions but two line pattern matching is diffcult:

    res = re.findall('begin(.*?)end', string, re.DOTALL)

print ( [s.split() for s in re.findall(r'/Variable_1name1\s ((?:. \n) ?)/end ', String)] )

I have solution with for loop trying in regex and understanding concept it is possible with this pattern?

CodePudding user response:

Just basic text parsing.

String= '''/begin FUNCTION N1ame1 Some texts and special char 
        /begin Variable_1name1  
            Adlkfj_dADDF_A32111 BAd_afd111
        /end Variable_1name1
        /begin Variable_1name2  
            Adlkfj_dADDF_A32222 BAd_afd222
        /end Variable_1name2
        /begin Variable_1name3    
            Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss
        /end Variable_1name3
        FUNCTION_DFADS
    /end FUNCTION
    /begin FUNCTION N2ame2 Sometexts and special char "dlfkjaodfja;lkd 
        /begin Variable_1name1  
            Adlkfj_dADDF_A32111_1 BAd_afd111_1
        /end Variable_1name1
        /begin Variable_1name2  
            Adlkfj_dADDF_A32222_2 BAd_afd222_2
        /end Variable_1name2
        /begin Variable_1name3  
            Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3
        /end Variable_1name3
        FUNCTION_DFADS
/end FUNCTION'''

data = {}
variable = False
for line in String.splitlines():
    parts = line.split()
    if parts[0] == "/begin":
        if parts[1] == "FUNCTION":
            funcname = parts[2]
            data[funcname] = []
        elif parts[1].startswith('Variable'):
            variable = True
    elif parts[0] == "/end":
        variable = False
    elif variable:
        data[funcname].append(parts)
from pprint import pprint
pprint(data)

Output:

{'N1ame1': [['Adlkfj_dADDF_A32111', 'BAd_afd111'],
            ['Adlkfj_dADDF_A32222', 'BAd_afd222'],
            ['Adlkfj_dADDF_A32333', 'BAd333_afd333', '333DSFadss']],
 'N2ame2': [['Adlkfj_dADDF_A32111_1', 'BAd_afd111_1'],
            ['Adlkfj_dADDF_A32222_2', 'BAd_afd222_2'],
            ['Adlkfj_dADDF_A32333_3', 'BAd333_afd333_3', '333DSFadss_3']]}

CodePudding user response:

text= '''/begin FUNCTION N1ame1 Some texts and special char 
            /begin Variable_1name1  
                Adlkfj_dADDF_A32111 BAd_afd111
            /end Variable_1name1
            /begin Variable_1name2  
                Adlkfj_dADDF_A32222 BAd_afd222
            /end Variable_1name2
            /begin Variable_1name3    
                Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss
            /end Variable_1name3
            FUNCTION_DFADS
        /end FUNCTION
        /begin FUNCTION N2ame2 Sometexts and special char "dlfkjaodfja;lkd 
            /begin Variable_1name1  
                Adlkfj_dADDF_A32111_1 BAd_afd111_1
            /end Variable_1name1
            /begin Variable_1name2  
                Adlkfj_dADDF_A32222_2 BAd_afd222_2
            /end Variable_1name2
            /begin Variable_1name3  
                Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3
            /end Variable_1name3
            FUNCTION_DFADS
    /end FUNCTION'''

# one can find the blocks of needed lines between the begin and end markers with re.findall

lst = re.findall(r'\/begin Variable_(.*?)\s*\/end Variable_', text, re.MULTILINE | re.DOTALL)

# removing extra space

print([re.sub(r'\s ', ' ', x) for x in lst])

['1name1 Adlkfj_dADDF_A32111 BAd_afd111', '1name2 Adlkfj_dADDF_A32222 BAd_afd222', '1name3 Adlkfj_dADDF_A32333 BAd333_afd333 333DSFadss', '1name1 Adlkfj_dADDF_A32111_1 BAd_afd111_1', '1name2 Adlkfj_dADDF_A32222_2 BAd_afd222_2', '1name3 Adlkfj_dADDF_A32333_3 BAd333_afd333_3 333DSFadss_3']
  • Related