extraction of string using reg exp.using python script-CodePudding

I tried to extract 'Startpoint' Match from my text file using python regular exp.

and this wasn't displaying me the desired result. am a very beginner and was trying to extract all the matches of the string 'Startpoint' from my text file.someone pls help.

import re 
with open('report.txt') as file: 
    f = file.readlines() 
    # pattern=(r'(Startpoint\:*\s[a-zA-Z]\w*)') 
    pattern=(r'([^S..t$]\:*\s[A-Z]\W*)') 
    print(pattern) 
    mat=re.match(r'([^S..t$]\:*\s[A-Z]\W*)',f,re.M|re.I) 
    print(mat)

Also, i have tried, this and since i have used re.sub its removing my my string 'S'

with open('report.txt') as file:
    text=file.read()
    f=file.readlines()
    pat=re.sub(r'([^S..t$]:*\s[A-Z]\W*)'," ",text)#for replacing
    print(pat)

i have included some info. from my text here,

exintf_max.txt:3870:  Startpoint: EXINTF_DATA0

exintf_max.txt:3920:  Startpoint: EXINTF_DATA9

exintf_max.txt:3972:  Startpoint: EXINTF_DATA11

exintf_max.txt:4022:  Startpoint: top/soc/xintf_cntrl/lv_temp_xintf_rg_chip_1_select_enable_reg

exintf_max.txt:4054:  Startpoint: EXINTF_DATA15

exintf_max.txt:4105:  Startpoint: top/soc/xintf_cntrl/lv_temp_xintf_rg_chip_0_select_enable_reg

exintf_max.txt:4137:  Startpoint: EXINTF_DATA1

exintf_max.txt:4189:  Startpoint: EXINTF_DATA6

exintf_max.txt:4241:  Startpoint: EXINTF_DATA10

exintf_max.txt:4290:  Startpoint: EXINTF_DATA10

exintf_max.txt:4341:  Startpoint: top/soc/xintf_cntrl/lv_temp_xintf_rg_XWR_reg

this above information i could get through grep command, i want to get the same using python script.

pls help me. am a very beginner.

CodePudding user response：

Just to check a line of that text..., then you can use it inside a loop.

import re
# txt = "exintf_max.txt:4341: Startpoint: top/soc/xintf_cntrl/lv_temp_xintf_rg_XWR_reg"
txt = "exintf_max.txt:4290: Startpoint: EXINTF_DATA10"
m = re.match("[\w_] .\w :(\d ):\s?(Startpoint):\s?(.*)",txt)
if m:
  print(m.groups())
  print(m.group(3))

CodePudding user response：

To get the values in Python for the string Startpoint, you could use:

^[^:\n]*:\d :\s*Startpoint:\s*(. )

Explanation

^ Start of string
[^:\n]* Optionally match any character except : or a newline
:\d : Match 1 digits between colons
\s* Match optional whitespace chars
Startpoint: Match literally
\s*(. ) Match optional whitspace chars

If you want to match spaces without newlines, you can change \s to [^\S\n]

Regex demo

Example code

import re

with open('report.txt') as file:
    f = file.read()
    pattern = r"^[^:\n]*:\d :\s*Startpoint:\s*(. )"
    print(re.findall(pattern, f, re.M))

Output

[
'EXINTF_DATA0',
'EXINTF_DATA9',
'EXINTF_DATA11',
'top/soc/xintf_cntrl/lv_temp_xintf_rg_chip_1_select_enable_reg', 
'EXINTF_DATA15', 
'top/soc/xintf_cntrl/lv_temp_xintf_rg_chip_0_select_enable_reg',
'EXINTF_DATA1',
'EXINTF_DATA6',
'EXINTF_DATA10',
'EXINTF_DATA10', 
'top/soc/xintf_cntrl/lv_temp_xintf_rg_XWR_reg'
]

CodePudding user response：

If you want to grab what comes after Startpoint: you can just do:

import re
from pathlib import Path

path = Path("report.txt")

pat = re.compile(r'.*Startpoint:\s*(\S*)')
pat.findall(path.read_text())

This basically goes past any match before Startpoint: then gets whatever comes after it that isn't a space.

This outputs:

['EXINTF_DATA0', 'EXINTF_DATA9', 'EXINTF_DATA11',
 'top/soc/xintf_cntrl/lv_temp_xintf_rg_chip_1_select_enable_reg', 'EXINTF_DATA15',
'top/soc/xintf_cntrl/lv_temp_xintf_rg_chip_0_select_enable_reg', 
'EXINTF_DATA1', 'EXINTF_DATA6', 'EXINTF_DATA10', 'EXINTF_DATA10', 
'top/soc/xintf_cntrl/lv_temp_xintf_rg_XWR_reg']