I have a textfile as:
-- Generated ]
FILEUNIT
METRIC /
Hello
-- timestep: Jan 01,2017 00:00:00
3*2 344.0392 343.4564 343.7741
343.9302 343.3884 343.7685 0.0000 341.0843
342.2441 342.5899 343.0728 343.4850 342.8882
342.0056 342.0564 341.9619 341.8840 342.0447 /
I have written a code to read the file and remove the words, characters and empty lines, and do some other processes on that and finally return those numbers in the last four lines. I cannot understand how to put all the numbers of the text file properly in a list. Right now the new_line generates a string of those lines with numbers
import string
def expand(chunk):
l = chunk.split("*")
chunk = [str(float(l[1]))] * int(l[0])
return chunk
with open('old_textfile.txt', 'r') as infile1:
for line in infile1:
if set(string.ascii_letters.replace("e","")) & set(line):
continue
chunks = line.split(" ")
#Get rid of newlines
chunks = list(map(lambda chunk: chunk.strip(), chunks))
if "/" in chunks:
chunks.remove("/")
new_chunks = []
for i in range(len(chunks)):
if '*' in chunks[i]:
new_chunks = expand(chunks[i])
else:
new_chunks.append(chunks[i])
new_chunks[len(new_chunks)-1] = new_chunks[len(new_chunks)-1] "\n"
new_line = " ".join(new_chunks)
when I use the
A = new_line.split()
B = list(map(float, A))
it returns an empty list. Do you have any idea how I can put all these numbers in one single list?
currently, I am writing the new_line
as a textfile and reading it again, but it increase my runtime which is not good.
f = open('new_textfile.txt').read()
A = f.split()
B = list(map(float, A))
list_1.extend(B)
There was another solution to use Regex, but it deletes 3*2
. I want to process that as 2 2 2
import re
with open('old_textfile.txt', 'r') as infile1:
lines = infile1.read()
nums = re.findall(r'\d \.\d ', lines)
print(nums)
CodePudding user response:
I'm not quite sure if I entirely understand what you are trying to do, but as I understand it you want to extract all numbers which either are in a decimal form \d \.\d
or an integer which is multiplied by another integer using an asterisk, so \d \*\d
. You want the results all in a list of floats where the decimals are in the list directly and for the integers the second is repeated by the first.
One way to do this would be:
lines = """
-- Generated ]
FILEUNIT
METRIC /
Hello
-- timestep: Jan 01,2017 00:00:00
3*2 344.0392 343.4564 343.7741
343.9302 343.3884 343.7685 0.0000 341.0843
342.2441 342.5899 343.0728 343.4850 342.8882
342.0056 342.0564 341.9619 341.8840 342.0447 /
"""
nums = []
for n in re.findall(r'(\d \.\d |\d \*\d )', lines):
split_by_ast = n.split("*")
if len(split_by_ast) == 1:
nums = [float(split_by_ast[0])]
else:
nums = [float(split_by_ast[1])] * int(split_by_ast[0])
print(nums)
Which returns:
[2.0, 2.0, 2.0, 344.0392, 343.4564, 343.7741, 343.9302, 343.3884, 343.7685, 0.0, 341.0843, 342.2441, 342.5899, 343.0728, 343.485, 342.8882, 342.0056, 342.0564, 341.9619, 341.884, 342.0447]
The regular expression searches for numbers matching one of the formats (decimal or int*int
). Then in case of a decimal it is directly appended to the list, in case of int*int it is parsed to a smaller list repeating the second int by first int times, then the lists are concatenated.