Home > Net >  Removing lines that have strings with same hexadecimal values from a text file
Removing lines that have strings with same hexadecimal values from a text file

Time:11-24

I have a file in1.txt

info="0x0000b573" data="0x7" id="sp. PCU(Si)"
info="0x0000b573" data="0x00000007" id="HI all. SHa"
info="0x00010AC3" data="0x00000003" id="abc_16. PS"
info="0x00010ac3" data="0x00000045" id="hB2_RC/BS (Spr)"
info="0x205" data="0x00000010" id="cgc_15. PK"
info="0x205" data="0x10" id="cgsd_GH/BS (Scd)"

Expected output: out.txt

info="0x00010AC3" data="0x00000003" id="abc_16. PS"
info="0x00010ac3" data="0x00000045" id="hB2_RC/BS (Spr)"

I need only lines that have same info values and different data values to be written to out.txt.

but the current code removes all the line that have string data in it.

with open("in.txt", "r") as fin,open("out.txt", "w") as fout:
    for line in fin:
        if 'data' not in line:
            fout.write(line.strip() '\n')

what i need is for eg: line 1 and line 2 is having same info="0x0000b573" and data is "0x7" & "0x00000007" which is same then remove that line.

CodePudding user response:

You can use regex

import re

s = '''info="0x0000b573" data="0x7" id="sp. PCU(Si)"
info="0x0000b573" data="0x00000007" id="HI all. SHa"
info="0x00010AC3" data="0x00000003" id="abc_16. PS"
info="0x00010ac3" data="0x00000045" id="hB2_RC/BS (Spr)"
info="0x205" data="0x00000010" id="cgc_15. PK"
info="0x205" data="0x10" id="cgsd_GH/BS (Scd)"'''

parsed_data = re.findall(r'info="([^"] )" data="([^"] )" id="[^"] "', s, re.MULTILINE)
parsed_data = sorted([list(map(lambda x: int(x, 16), i))   [index] for index,i in enumerate(parsed_data)])

row_numbers = [j for i in [[parsed_data[i][-1], parsed_data[i 1][-1]] for i in range(0,len(parsed_data),2) if parsed_data[i][1] != parsed_data[i 1][1]] for j in i]


final_output = []

for index,line in enumerate(s.split('\n')):
    if index in row_numbers:
        final_output.append(line)
        
        
final_out_text = '\n'.join(final_output)
print(final_out_text)

# info="0x00010AC3" data="0x00000003" id="abc_16. PS"
# info="0x00010ac3" data="0x00000045" id="hB2_RC/BS (Spr)"

CodePudding user response:

Something like this could work:

found_info_values = []

with open("in.txt", "r") as fin,open("out.txt", "w") as fout:
    for line in fin:
        info = line.split('"')[1]
        if info not in found_info_values:
            fout.write(line.strip() '\n')
        found_info_values  = info
  • Related