I have file named input.txt.gz like this,
<hello script="2.5">
<welcome>
<hgsdhjaghjdghjagdjhgjdhgdajhgdajhgdhjjgfkjg
<number new="0x0000-0x3FF" Id="bhi" Range="4" no_id="hello" />
<----jsdjhsdjndkjjdhjdJHksdkjdnknnddnekfgrejgjorgj jregjgkrjglrjgojggjorjg--->
<number new="0x02" Id="bhi" Unit="0" Range="4" info="0x00000012" no_id="hi all" />
<number new="0x04" Id="bhi" Unit="0" Range="4" info="0x0000023f" no_id="dbhwd" />
<---- dfiuhdwiudi iwqdidffenfj odwqjdjqwgru jdqkkjwfkjfwn odHHOIJD JSDNKS nsk---->
<number new="0x06" Id="bhi" Unit="0" Range="4" info="0x00000f22" no_id="sjkdnkl jdsnj" />
<number new="0x08" Id="bhi" Unit="0" Range="4" info="0x00000f1b" no_id="dm o" />
<---bdheuh jwdhjwdkiwh---->
<number new="0x32" Id="bhi" Range="4" info="0x000012f5" no_id="he d kd" />
<number new="0x336" Id="bhi" Range="4" info="0x00000df2" no_id="dnkwn" />
<number new="0x428" Id="bhi" Range="4" info="0x0001cbf2" no_id="h nd" />
<--new model vdhjsb---->
<number new="0x06" Id="bhi" Unit="0" unit_id="hi_all" Range="2" info="0x0f22" no_id="sjkdnkl jdsnj' />
<number new="0x08" Id="bhi" Unit="0" unit_id="this new" Range="4" info="0x00000f1b" no_id="dm o" /
<--adhhj jdwjdkkj jsSDjkasdj jefnflefk kjsjfoekfle kajfofkp ksaokdfpef---->
<---the end of file---->
From this file I need to get new
and info
string values and save it to another file named output.txt.
output.txt
new="0x02" info="0x00000012"
new="0x04" info="0x0000023f"
new="0x06" info="0x00000f22"
new="0x08" info="0x00000f1b"
new="0x32" info="0x000012f5"
new="0x336" info="0x00000df2"
new="0x428" info="0x0001cbf2"
new="0x06" info="0x0f22"
new="0x08" info="0x00000f1b"
how ever with my current code im not able to do this.
This is my current code
import gzip
with gzip.open("input.txt.gz", "rb") as fin:
with open("output.txt", "w") as fout:
for line in fin:
if line.decode('utf-8').strip():
line = line.decode('utf-8').strip("\n' '")
cols = line.split(" ")
if len(cols) >= 5:
print(cols[1], cols[5])
CodePudding user response:
You could use re
to parse each line. Here's the pattern I've used, but you can update as required.
import gzip, re
pattern = r'(new=\"\w \").*(info=\"\w \")'
with gzip.open("input.txt.gz", "rb") as fin:
with open("output.txt", "w") as fout:
for line in fin:
for match_new, match_info in re.findall(pattern, line.decode('utf-8')):
fout.write(f'{match_new} {match_info}\n')