Home > OS >  How to get certain strings and it's respective values in a file and copy them to another file?
How to get certain strings and it's respective values in a file and copy them to another file?

Time:10-25

I have file named input.txt.gz like this,

<hello script="2.5">
<welcome>
     <hgsdhjaghjdghjagdjhgjdhgdajhgdajhgdhjjgfkjg
     <number new="0x0000-0x3FF" Id="bhi" Range="4" no_id="hello" />
               
          <----jsdjhsdjndkjjdhjdJHksdkjdnknnddnekfgrejgjorgj jregjgkrjglrjgojggjorjg--->
          <number new="0x02" Id="bhi" Unit="0" Range="4" info="0x00000012" no_id="hi all" />
          <number new="0x04" Id="bhi" Unit="0" Range="4" info="0x0000023f" no_id="dbhwd" />
          <---- dfiuhdwiudi iwqdidffenfj odwqjdjqwgru jdqkkjwfkjfwn odHHOIJD JSDNKS nsk---->
          <number new="0x06" Id="bhi" Unit="0" Range="4" info="0x00000f22" no_id="sjkdnkl jdsnj" />
          <number new="0x08" Id="bhi" Unit="0" Range="4" info="0x00000f1b" no_id="dm o" />
    <---bdheuh jwdhjwdkiwh---->
          <number new="0x32" Id="bhi"  Range="4" info="0x000012f5" no_id="he d kd" />
          <number new="0x336" Id="bhi" Range="4" info="0x00000df2" no_id="dnkwn" />
          <number new="0x428" Id="bhi" Range="4" info="0x0001cbf2" no_id="h nd" />
<--new model vdhjsb---->
      <number new="0x06" Id="bhi" Unit="0" unit_id="hi_all" Range="2" info="0x0f22" no_id="sjkdnkl jdsnj' />
       <number new="0x08" Id="bhi" Unit="0" unit_id="this new" Range="4" info="0x00000f1b" no_id="dm o" /

<--adhhj jdwjdkkj jsSDjkasdj jefnflefk kjsjfoekfle kajfofkp ksaokdfpef---->
<---the end of file---->

From this file I need to get new and info string values and save it to another file named output.txt.

output.txt

new="0x02" info="0x00000012"
new="0x04" info="0x0000023f"
new="0x06" info="0x00000f22"
new="0x08" info="0x00000f1b"
new="0x32" info="0x000012f5"
new="0x336" info="0x00000df2"
new="0x428" info="0x0001cbf2"
new="0x06" info="0x0f22"
new="0x08" info="0x00000f1b"

how ever with my current code im not able to do this.

This is my current code

import gzip
with gzip.open("input.txt.gz", "rb") as fin:
     with open("output.txt", "w") as fout:
           for line in fin:
                if line.decode('utf-8').strip():
                   line = line.decode('utf-8').strip("\n' '")
                   cols = line.split(" ")
                   if len(cols) >= 5:
                      print(cols[1], cols[5])

CodePudding user response:

You could use re to parse each line. Here's the pattern I've used, but you can update as required.

import gzip, re
pattern = r'(new=\"\w \").*(info=\"\w \")'

with gzip.open("input.txt.gz", "rb") as fin:
    with open("output.txt", "w") as fout:
        for line in fin:
            for match_new, match_info in re.findall(pattern, line.decode('utf-8')):
                fout.write(f'{match_new} {match_info}\n')
  • Related