Home > database >  why I am getting empty list when I use split()?
why I am getting empty list when I use split()?

Time:02-13

I have a textfile as:

-- Generated ]
FILEUNIT
  METRIC /

Hello
-- timestep: Jan 01,2017 00:00:00
  3*2 344.0392 343.4564 343.7741
  343.9302 343.3884 343.7685 0.0000 341.0843
  342.2441 342.5899 343.0728 343.4850 342.8882
  342.0056 342.0564 341.9619 341.8840 342.0447 /

I have written a code to read the file and remove the words, characters and empty lines, and do some other processes on that and finally return those numbers in the last four lines. I cannot understand how to put all the numbers of the text file properly in a list. Right now the new_line generates a string of those lines with numbers

import string

def expand(chunk):
    l = chunk.split("*")
    chunk = [str(float(l[1]))] * int(l[0])

    return chunk

with open('old_textfile.txt', 'r') as infile1:
    for line in infile1:
        if set(string.ascii_letters.replace("e","")) & set(line):
            continue

        chunks = line.split(" ")
        #Get rid of newlines
        chunks = list(map(lambda chunk: chunk.strip(), chunks))
        if "/" in chunks:
            chunks.remove("/")

        new_chunks = []
        for i in range(len(chunks)):
            if '*' in chunks[i]:
                new_chunks  = expand(chunks[i])
            else:
                new_chunks.append(chunks[i])
        new_chunks[len(new_chunks)-1] = new_chunks[len(new_chunks)-1] "\n"
        new_line = " ".join(new_chunks)

when I use the

A = new_line.split()
B = list(map(float, A))

it returns an empty list. Do you have any idea how I can put all these numbers in one single list? currently, I am writing the new_line as a textfile and reading it again, but it increase my runtime which is not good.

f = open('new_textfile.txt').read()
A = f.split()
B = list(map(float, A))
list_1.extend(B)

There was another solution to use Regex, but it deletes 3*2. I want to process that as 2 2 2

import re

with open('old_textfile.txt', 'r') as infile1:
    lines = infile1.read()

nums = re.findall(r'\d \.\d ', lines)
print(nums)

CodePudding user response:

I'm not quite sure if I entirely understand what you are trying to do, but as I understand it you want to extract all numbers which either are in a decimal form \d \.\d or an integer which is multiplied by another integer using an asterisk, so \d \*\d . You want the results all in a list of floats where the decimals are in the list directly and for the integers the second is repeated by the first.

One way to do this would be:

lines = """
-- Generated ]
FILEUNIT
  METRIC /

Hello
-- timestep: Jan 01,2017 00:00:00
  3*2 344.0392 343.4564 343.7741
  343.9302 343.3884 343.7685 0.0000 341.0843
  342.2441 342.5899 343.0728 343.4850 342.8882
  342.0056 342.0564 341.9619 341.8840 342.0447 /
"""

nums = []
for n in re.findall(r'(\d \.\d |\d \*\d )', lines):
    split_by_ast = n.split("*")
    if len(split_by_ast) == 1:
        nums  = [float(split_by_ast[0])]
    else:
        nums  = [float(split_by_ast[1])] * int(split_by_ast[0])

print(nums)

Which returns:

[2.0, 2.0, 2.0, 344.0392, 343.4564, 343.7741, 343.9302, 343.3884, 343.7685, 0.0, 341.0843, 342.2441, 342.5899, 343.0728, 343.485, 342.8882, 342.0056, 342.0564, 341.9619, 341.884, 342.0447]

The regular expression searches for numbers matching one of the formats (decimal or int*int). Then in case of a decimal it is directly appended to the list, in case of int*int it is parsed to a smaller list repeating the second int by first int times, then the lists are concatenated.

  • Related