I am trying to import data from a text file that I've received.
The text file is somewhat large (400 MB). It is available from this link (https://drive.google.com/file/d/11CwId3feJRZGvP2OUAtixuZEFztrCP3W/view?usp=sharing). It may take a few minutes to download given its size.
The data in the file are in a format I've never encountered before. The delimiter between columns seems to be a semi-colon, and the data rows seem to be separated from each other by a blank row.
I've not been able to read in the data. The following is the Python code I'm using to try to import one column of string data and two columns of float data from the file:
import numpy as np
f = 'summ.txt'
ID = np.loadtxt(f, dtype=np.str, unpack=True, usecols=[4], skiprows=8, delimiter = '; ')
hbeg, hend = np.loadtxt(f3, unpack=True, usecols=[67,73], skiprows=8, delimiter = '; ')
A solution/guidance would be wonderful.
CodePudding user response:
I would simply use csv
to reformat it
import csv
import time
start = time.time()
with open('summ.txt') as fin, open('output.txt', 'w') as fout:
csv_reader = csv.reader(fin, delimiter=';') # read semicolon
csv_writer = csv.writer(fout, delimiter=',') # write comma
for row in csv_reader:
if row: # skip empty row
row = [x.strip() for x in row] # remove spaces
csv_writer.writerow(row)
end = time.time()
print('time:', end-start)
On my computer it took ~31 seconds.
But you can also keep values as 2D list and convet to numpy array
or pandas DataFrame
import csv
import time
start = time.time()
IDs = []
hbeg = []
hend = []
with open('Pulpit/summ.txt') as fin:
csv_reader = csv.reader(fin, delimiter=';')
for row in csv_reader:
if row:
row = [x.strip() for x in row]
if len(row) > 1:
IDs.append(row[4])
hbeg.append(row[64])
hend.append(row[73])
end = time.time()
print('time:', end-start)
print(IDs[:10])
print(hbeg[:10])
print(hend[:10])