Home > Blockchain >  Code to pull information before and after matching substring
Code to pull information before and after matching substring

Time:08-16

I tried building the regex in regex101. In regex101 the code is working but when I am trying to run that cod in Python it is not working. Can anyone please help me in correcting the mistake? The code is

> for file in glob.glob('*TxtMsg*'):
>     with open(file) as f:
>         names = []
>         contents = f.read()
>         if 'Nbsequence::findtest System.SerialNumber=' in contents:
>             print(re.match(r'(\w*)(?: Active\) Nbsequence::findtest System.SerialNumber=)(\d )', contents))

Code Identifies required files from a location Read it one by one to identify it has required info or not If it has then it check for regex patter to extract required info.

Sample data is

succeeded to get serial number, Module = ABC Active, SerialNumber = 8212____________
[00007841][2022-07-04 16:48:30.581][Info][P00800 SunnyDay][T18304]: DayModule::FetchSerialNumber, fetching via web service..., SlotId = 1, IpAddress = 156.185.1.21, Port = 3990
[00007842][2022-07-04 16:48:30.597][Info][P00800 SunnyDay][T13844]: (ABC Active) Nbsequence::findtest (System.SerialNumber) 
[00007843][2022-07-04 16:48:30.597][Info][P00800 SunnyDay][T13844]: (ABC Active) Nbsequence::findtest System.SerialNumber=8212____________
[00007844][2022-07-04 16:48:30.606][Info][P00800 SunnyDay][T13844]: (PQR Active) Nbsequence::findtest (System.SerialNumber) 
[00007845][2022-07-04 16:48:30.608][Info][P00800 SunnyDay][T13844]: (PDIM Active) Nbsequence::findtest (System.SerialNumber) 
[00007846][2022-07-04 16:48:30.613][Info][P00800 SunnyDay][T13844]: (PQR Active) Nbsequence::findtest System.SerialNumber=8198____________
[00007847][2022-07-04 16:48:30.615][Info][P00800 SunnyDay][T13844]: (WPC Activ

Appreciate your help in this

Result will be (Underlined in Blue) ABC: 8212, PQR: 8198

enter image description here

Regex Screenenter image description here

enter image description here

CodePudding user response:

Try:

import re

s = """\
succeeded to get serial number, Module = ABC Active, SerialNumber = 8212____________
[00007841][2022-07-04 16:48:30.581][Info][P00800 SunnyDay][T18304]: DayModule::FetchSerialNumber, fetching via web service..., SlotId = 1, IpAddress = 156.185.1.21, Port = 3990
[00007842][2022-07-04 16:48:30.597][Info][P00800 SunnyDay][T13844]: (ABC Active) Nbsequence::findtest (System.SerialNumber) 
[00007843][2022-07-04 16:48:30.597][Info][P00800 SunnyDay][T13844]: (ABC Active) Nbsequence::findtest System.SerialNumber=8212____________
[00007844][2022-07-04 16:48:30.606][Info][P00800 SunnyDay][T13844]: (PQR Active) Nbsequence::findtest (System.SerialNumber) 
[00007845][2022-07-04 16:48:30.608][Info][P00800 SunnyDay][T13844]: (PDIM Active) Nbsequence::findtest (System.SerialNumber) 
[00007846][2022-07-04 16:48:30.613][Info][P00800 SunnyDay][T13844]: (PQR Active) Nbsequence::findtest System.SerialNumber=8198____________
[00007847][2022-07-04 16:48:30.615][Info][P00800 SunnyDay][T13844]: (WPC Active)"""

pat = re.compile(r"\((.*?) Active\).*?System\.SerialNumber=(\d )")


for result in pat.findall(s):
    print(result)

Prints:

('ABC', '8212')
('PQR', '8198')

Note: re.match tries to match the line from the beginning and \w doesn't match space (you have space in your datetime part of line).

CodePudding user response:

edited your regex a little bit, seems to work for me

re.match(r'\(\w*(?: Active\) Nbsequence::findtest System.SerialNumber=)(\d )', contents)

enter image description here

here is this same regex but with the name and code added to a dictionary as requested:

dictionary = {}
text = '(ABC Active) Nbsequence::findtest (System.SerialNumber) (ABC Active) Nbsequence::findtest System.SerialNumber=8212______ (PQR Active) Nbsequence::findtest (System.SerialNumber) (PDIM Active) Nbsequence::findtest (System.SerialNumber) (PQR Active) Nbsequence::findtest System.SerialNumber=8196____________'
matches = re.findall(r'\((\w*) Active\) Nbsequence::findtest System.SerialNumber=(\d )', text)
for match in matches:
    dictionary[match[0]] = match[1]
print(dictionary)
  • Related