Home > Net >  Reading a text document with special formatting in to a numpy array
Reading a text document with special formatting in to a numpy array

Time:12-14

I have a text file with the following format:

(( X_value Y_value Z_value) ID)

I would like to read this into an array and I have been partly able to do with: positions = np.genfromtxt(file, skip_header=N_header_lines, usecols=(1, 2, 3))

However I run into a problem when the X_value is negative this results in the following:

((-X_value Y_value Z_value) ID)

the problem being that Numpy now reads "((-X_value" as one column and does not separate the the string from the float.

I hope I was able to convey my problem clearly. Does someone know how to solve this problem.

CodePudding user response:

a = "(( -1 2.0 3.7) 5)"
a = a.replace('(','').replace(')','').split()
result = [float(i) for i in a]
print(result)

output:

[-1.0, 2.0, 3.7, 5.0]

CodePudding user response:

Willian answer is great, however if you do not wish to load the text file and manipulate it before, you can use the following 1 liner -

arr = np.genfromtxt('blob.txt', usecols=(1,2,3),converters={i:lambda x: float(x.decode().strip('(').strip(')')) for i in [1,2,3]} )

Basically genfromtxt try to read the data into bytes and convert it to the right format, thus you can manipulate it using converters as you desire.

CodePudding user response:

genfromtxt is meant for csv files, which have consistent rows and columns, a simple table. Extra characters like () can mess it up. Also inconsistent delimiters (white-space is the default)

With a sample

In [68]: txt = """(( 1 2 3) xxx)
    ...: (-1 2 3) yyy)
    ...: """
In [69]: np.genfromtxt(txt.splitlines(), usecols=(1,2,3))
Out[69]: 
array([[ 1.,  2., nan],
       [ 2., nan, nan]])

nan for strings that can't be made into floats.

Without usecols, we see that it gets different numbers of columns in each line.

In [70]: np.genfromtxt(txt.splitlines())
Traceback (most recent call last):
  File "<ipython-input-70-3a7e73045f73>", line 1, in <module>
    np.genfromtxt(txt.splitlines())
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 2124, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 4 columns instead of 5)

Here's what it's doing for each line:

In [71]: for row in txt.splitlines():print(row.split())
['((', '1', '2', '3)', 'xxx)']
['(-1', '2', '3)', 'yyy)']

You need to clean up the file before passing it to genfromtxt, or use your own parsing that can deal as you want with the ().

With a clean file:

In [72]: txt = """1 2 3 xxx
    ...: -1 2 3 yyy
    ...: """
In [73]: np.genfromtxt(txt.splitlines(),usecols=(0,1,2))
Out[73]: 
array([[ 1.,  2.,  3.],
       [-1.,  2.,  3.]])
  • Related