I want to perform the selection of a group of lines in a text file to get all jobs related to an ipref The test file is like this : job numbers : (1,2,3), ip ref : (10,12,10)
text file : 1 ... (several lines of text) xxx 10 2 ... (several lines of text) xxx 12 3 ... (several lines of text) xxx 10
i want to select job numbers for IPref=10.
Code :
#!/usr/bin/python
import re
import sys
fic=open('test2.xml','r')
texte=fic.read()
fic.close()
#pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'
pattern='\n?\d.*?xxx 10'
result= re.findall(pattern,texte, re.DOTALL)
i=1
for match in result:
print("\nmatch:",i)
i=i 1
print(match)
Result :
match: 1
1
a
b
xxx 10
match: 2
1
a
b
xxx 12
1
a
b
xxx 10
i have tried to replace .* by a a negative lookahead assertion to only select if no expr like "\n?xxx \d{2}\n"
is before "xxx 10" :
pattern='\n?\d(?!(?:\n?xxx \d{2}\n)*)xxx 10'
but it is not working ...
CodePudding user response:
You can write the pattern in this way, repeating the newline and asserting not xxx followed by 1 or more digits:
^\d(?:\n(?!xxx \d $).*)*\nxxx 10$
The pattern matches:
^
Start of string\d
Match a single digit (or\d
for 1 or more)(?:
Non capture group\n
Match a newline(?!xxx \d $)
Negative lookahead to assert that the string is notxxx
followed by 1 digits.*
If the assertion is true, match the whole line
)*
Close the group and optionally repeat it\nxxx 10$
Match a newline,xxx
and 10
CodePudding user response:
Good day to you :) and Thank you very much for your quick response!! i give you below the result Note : i have modified re.DOTALL by re.DOTALL|re.MULTILINE (because the result is none without that... Sorry for the previous presentation ... it wat not very clear)
Text file :
1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10
Code With your pattern :
#!/usr/bin/python
import re
import sys
fic=open('test2.xml','r')
texte=fic.read()
fic.close()
print(texte)
#pattern='<\/?(?!(?:span|br|b)(?: [^>]*)?>)[^>\/]*>'
#pattern='\n?\d(?!(?:\n?xxx \d{2}\n?)*?)xxx 10'
#pattern='\n?\d.*?xxx 10'
pattern='^\d(?:\n(?!xxx \d $).*)*\nxxx 10$'
result= re.findall(pattern,texte, re.DOTALL|re.MULTILINE)
i=1
for match in result:
print("\nmatch:",i)
i=i 1
print(match)
Result :
match: 1
1
a
b
xxx 10
1
a
b
xxx 12
1
a
b
xxx 10
but i try to obtain :
match: 1
1
a
b
xxx 10
match 2 :
1
a
b
xxx 10
CodePudding user response:
Thank you very much, (you saved my day !!) as you say :
pattern='^\d(?:\n(?!xxx \d $).*)*\nxxx 10$'
result= re.findall(pattern,texte, re.MULTILINE)
result : OK, the line group (1..xxx 12) is ignored, NOTE : i can adapt it to a case where line 1 is a line giving job information and "xxx 12" is a line giving printer IP information.
match: 1
1
a
b
xxx 10
match: 2
1
a
b
xxx 10
CodePudding user response:
file :
job_number job_id
1 10202
bla bla
bla bla bla
xxx 100.10.10.100
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100
bash script with embedded python script :
#!/bin/bash
# function , $1 : ip of a printer
get_jobs_ip ()
{
cat <<EOF | python
import re
fic=open('test3.xml','r')
texte=fic.read()
fic.close()
"""
The pattern matches example with ip="100\.10\.10\.100" :
thank you to Fourth bird for the pattern !!!
#pattern='^\d\s \d (?:\n(?!xxx \d \.\d \.\d \.\d $).*)*\nxxx 100\.10\.10\.100$'
^ Start of string
\d Match a single digit (or \d for 1 or more)
(?: Non capture group
\n Match a newline
(?!xxx \d \.\d \.\d \.\d $) Negative lookahead to assert that the string is not xxx followed by 1 digits
.* If the assertion is true, match the whole line
)* Close the group and optionally repeat it
\nxxx 100\.10\.10\.100$ Match a newline, xxx and 10
"""
ip="$1"
pattern_template='^\d\s \d (?:\n(?!xxx \d \.\d \.\d \.\d $).*)*\nxxx @ip@$'
pattern=pattern_template.replace('@ip@',ip)
result= re.findall(pattern,texte, re.MULTILINE)
i=1
for match in result:
print("\nmatch:",i)
i=i 1
print(match)
EOF
}
get_jobs_ip "100\.10\.10\.100"
get_jobs_ip "100\.10\.10\.102"
result :
match: 1
1 10202
bla bla
bla bla bla
xxx 100.10.10.100
match: 2
3 10204
bla bla bla
bla bla bla
xxx 100.10.10.100
match: 1
2 10203
bla bla
bla bla bla
bla bla bla
xxx 100.10.10.102