I want to create an array A[ ] [ ] in Python from a txt file.
My txt file looks like this:
A=[ -5 1 7 9 -1 -2 6 -2
3 -3 5 1 1 -1 7 8
4 8 -6 1 -1 2 4 -6
1 2 -1 -1 12 6 1 8
2 -9 15 11 9 -1 -1 -1
3 -9 1 1 -2 1 5 9]
The numbers are Tab Delimited
Can anyone help me store this to a 2-D array? *What if the numbers were not only int and I also had floats?
CodePudding user response:
All the below methods use filedata
to represent your file data. Tabs are explicitly used, because my editor turns tabs to spaces for python. The new line and strip
was purposely added to the end to illustrate that you have to consider it. A float was also added for testing purposes. Otherwise, it's just a copy of what you posted. All timeit
times are based on 10000 iterations.
filedata = ('A=[\t-5.6\t1\t7\t9\t-1\t-2\t6\t-2\n'
'\t3\t-3\t5\t1\t1\t-1\t7\t8\n'
'\t4\t8\t-6\t1\t-1\t2\t4\t-6\n'
'\t1\t2\t-1\t-1\t12\t6\t1\t8\n'
'\t2\t-9\t15\t11\t9\t-1\t-1\t-1\n'
'\t3\t-9\t1\t1\t-2\t1\t5\t9]\n').strip()
First Method :
Use regex to parse the data (timeit: 1.4896928800153546)
import re
d = re.compile(r'-?\d (\.\d*)?') #int/float regex
ch = (int, float) #choice
out = [] #for results
#get rid of the name, which could have numbers in it that would break this technique
filedata = filedata.split('[')[1].strip()
#iterate over lines
for line in filedata.split('\n'):
#get all numbers in this line as str
t = [m.group() for m in d.finditer(line)]
#format str to float or int based on the existence of a dot
out.append([ch['.' in i](i) for i in t])
print(*out, sep='\n')
However, you could actually cut the amount of iterations in half with a cleverly placed walrus(:=
). The above has to loop over each line twice. Once to get the numbers, and again to retype them. The below does all of that in one loop. Although, it is actually slower.
timeit: 1.6025475490023382
#iterate over lines
for line in filedata.split('\n'):
#everything in one ~ half as many iterations as the above version
out.append([ch['.' in (i:=m.group())](i) for m in d.finditer(line)])
Second Method
Reformat the data to JSON and load (timeit: 2.2486417230102234)
import re, json
#get rid of name, and make sure we don't have a trailing new line
filedata = filedata.split('=')[1].strip()
#replace new lines with brackets
filedata = filedata.replace('\n', '],[')
#replace Num Whitespace with Num Comma
filedata = re.compile(r'(\d)\s').sub('\\1,', filedata)
#wrap
filedata = '[' filedata ']'
#load as json
out = json.loads(filedata)
print(*out, sep='\n')
Third Method
One character at a time (timeit: 0.997349611017853)
The conditions are placed in the order that things will happen to hopefully be easier to follow. This is not the best order. The best order would be to move the current if
to the end, and then fix the if
/elif
keywords to be in the proper order. This is because you are mostly going to find numbers, so that should be the first condition. Conversely, initiating the result container will only happen once, so it should be the last condition. Doing this changes timeit
to 0.9182042190223001
out = None
num = []
ch = (int, float)
#iterate over every character individually
for c in filedata:
#initiate result container
if c == '[':
out = [[]]
#store number character
elif c in '-.0123456789':
num.append(c)
elif (c in '\t\n]') and num:
#format number, append to the last child, and reset num container
i = ''.join(num)
out[-1].append(ch['.' in i](i))
num = []
#start a new child
if c == '\n':
out.append([])
print(*out, sep='\n')
Fourth Method
String splitting (timeit: 0.5249546770355664)
This finishes the answer provided by @Pete. If you switch the uncommented line in try
with the one below it timeit
goes to 0.4442364440183155.
import re
out = []
ch = (int, float)
try:
#get only list guts
filedata = re.compile(r'. =\[(.*)\]', re.S).search(filedata).group(1)
#filedata = filedata.split('[')[1].split(']')[0].strip()
except Exception as e:
print(e) #issues
else:
for line in filedata.split('\n'):
out.append([ch['.' in i](i) for i in line.split('\t') if i])
print(*out, sep='\n')
All methods result in the below
#[-5.6, 1, 7, 9, -1, -2, 6, -2]
#[3, -3, 5, 1, 1, -1, 7, 8]
#[4, 8, -6, 1, -1, 2, 4, -6]
#[1, 2, -1, -1, 12, 6, 1, 8]
#[2, -9, 15, 11, 9, -1, -1, -1]
#[3, -9, 1, 1, -2, 1, 5, 9]
CodePudding user response:
Iterate through your file on a line basis, python allows you to do:
for line in file:
Split each line on tabs:
elements = line.split("\t")
Loop through the elements and add them to your array.