How to convert specific string or numbers into desired list by using python?-CodePudding

I need to convert the input into desired output by using python.

The rule is to match all the indices with the numbers of the each second row. and if the size is larger or equal to 10, then the number needs 1 extra space per 10 numbers. (ex. if the size is 025, 2 spaces for 2 times) FYI, the numbers inside '[]' are hexadecimal format.

The input and desired output are shown as below:

<input.txt>

[00], idx=06, size=001
[06000] 00
[01], idx=07, size=001
[07000] 00
[02], idx=20, size=007
[20000] 00 00 00 00 40 00 00
[03], idx=33, size=002
[33000] 2b 03
[04], idx=45, size=015
[45000] ff 07 00 00 00 00 00 00 00 00 
[4500a] 00 00 00 00 00 00 00 00 00 00
[45014] 00 00 00 00 00

<output.txt>

[index]06 00
[index]07 00
[index]20 00 00 00 00 40 00 00
[index]33 2b 03
[index]45 ff 07 00 00 00 00 00 00 00  00 00 00 00 00

In order to achieve the goal, There are the steps I came up with.

put [index] on beginning of every line.
list all the numbers that matches with 'idx'
extend the numbers of the second rows of index.

and I've managed to do the following so far:

import sys,os,re

def get_index(f):
    for line in f.readlines():
        if "idx" in line:
            yield line[10:12]

def main():
    with open("input.txt", "r") as f:
        with open("output.txt", "w") as oup:
            for line in get_index(f):
                oup.write('[index]'   line   '\n')

if __name__ == '__main__':
    main()

for me, step 3 seems to be hard.. How can I develop it from here? Or any better idea to solve this? Any help will be appreciated. Thank you very much.

CodePudding user response：

I think this is a 2-step process.

Consume the file building a dictionary of the data you're interested in.

Use re to isolate the idx and size values rather than fixed offsets.

Once you have the data in a manageable structure, you can then work through the dictionary to generate the required output.

Here's one way of doing it:

import re

INFILE = 'input.txt'
OUTFILE = 'output.txt'

result = dict()
key = None

with open(INFILE) as txt:
    for line in map(str.strip, txt):
        if (found := re.findall(r'idx=(\d ).*size=(\d ).*', line)):
            key, size = found[0]
            result[key] = [int(size)]
        else:
            result[key].extend(line.split()[1:])

output = []

for k, v in result.items():
    output.append(f'[index]{k}')
    size, hexvals = v[0], v[1:]
    for i in range(0, size, 10):
        output[-1]  = ' ' if i == 0 else '  '
        output[-1]  = ' '.join(hexvals[i:min(i 10, size)])

with open(OUTFILE, 'w') as outfile:
    print(*output, sep='\n', file=outfile)

Output (in file):

[index]06 00
[index]07 00
[index]20 00 00 00 00 40 00 00
[index]33 2b 03
[index]45 ff 07 00 00 00 00 00 00 00 90  50 00 00 00 60

CodePudding user response：

Just provide a different idea, output during consuming the input file.

Because the number of values behind idx is determined by size, so we'd better parse idx and size simultaneously. idx, size = line[10:12], int(line[19:22], 10).

Note: I suggest you using a regular expression for robustness.

After we get integer size, it tells us how to process next lines of the input file. Each time we at most output 10 values in such a line. Continue till we output size values for the current idx. During this process, output spaces properly.

def main():
    with open("input.txt", "r") as inf, open("output.txt", "w") as ouf:
        idx, size = "", 0
        outline = ""
        for line in inf:
            if "idx" in line:
                idx, size = line[10:12], int(line[19:22], 10)
                if outline != "":
                    ouf.write("[index]"   outline   "\n")
                outline = idx
            else:
                if size > 0:  # output values
                    lst = line.split()
                    outline  = " "   " ".join(lst[1:] if size >= 10 else lst[1:size 1])   " "
                    size -= 10
        if outline != "":
            assert size <= 0, size
            ouf.write("[index]"   outline   "\n")

CodePudding user response：

The accepted answer is very good, they were faster than me but I though it would be good to post my solution as well :)

The solution is pretty similar when using a regexp to get what I call the status line which gives information on how much data retrieve from the subsequent lines.

First we find where the status lines are and get their indexes.
We start then at these indexes and we walk down to retrieve the data.
We use a regexp to retrieve the data, defined in a regexp as [\d] separated by a whitespace.
We iterate over this data until the data is completed or the size has been reached.

For the sake of testing I used strings instead of files,so you can test is also in an online python editor.

import re

in_file = """[00], idx=06, size=001
[06000] 00
[01], idx=07, size=001
[07000] 00
[02], idx=20, size=007
[20000] 00 00 00 00 40 00 00
[03], idx=33, size=002
[33000] 2b 03
[04], idx=45, size=015
[45000] ff 07 00 00 00 00 00 00 00 00 
[4500a] 00 00 00 00 00 00 00 00 00 00
[45014] 00 00 00 00 00"""

out_file = ""

# same as open(f).readlines()
lines = in_file.split("\n")

# This regexp matches with the lines that contain status command like [00], idx=06, size=001
rstatus = re.compile(r"\[[0-9a-fA-F] \],\s idx=([0-9a-fA-F] ),\s size=(\d )")
rdata   = re.compile(r"\[[0-9a-fA-F] \]\s (.*)")    

# Find all the indexes of these status lines
status_indexes = [ i for i,line in enumerate(lines) if rstatus.match(line) != None ]

# Now use the index as a starting point and walk down the line
for si in status_indexes:
    
    # Get the status line
    line = lines[si]
    
    # Match idx and size
    status_match = rstatus.match(line)
    idx  = str(status_match.group(1)) #index is hex
    size = int(status_match.group(2))
    
    # Iterate until the size is consumed
    out_file  = "[index] "
    out_file  = f"[{str(idx)}]"
    
    offset = si 1
    while(size > 0):
        
        # Get data from the next lines
        dline = lines[offset] 
        data  = rdata.match(dline).group(1).split(" ")
        
        for d in data:
            out_file  = d   " "
            size -= 1
            
            if size == 0:
                break
            
        offset  = 1
        
    out_file  = '\n'

print(out_file)