Reading a text document with special formatting in to a numpy array-CodePudding

I have a text file with the following format:

(( X_value Y_value Z_value) ID)

I would like to read this into an array and I have been partly able to do with: positions = np.genfromtxt(file, skip_header=N_header_lines, usecols=(1, 2, 3))

However I run into a problem when the X_value is negative this results in the following:

((-X_value Y_value Z_value) ID)

the problem being that Numpy now reads "((-X_value" as one column and does not separate the the string from the float.

I hope I was able to convey my problem clearly. Does someone know how to solve this problem.

CodePudding user response：

a = "(( -1 2.0 3.7) 5)"
a = a.replace('(','').replace(')','').split()
result = [float(i) for i in a]
print(result)

output:

[-1.0, 2.0, 3.7, 5.0]

CodePudding user response：

Willian answer is great, however if you do not wish to load the text file and manipulate it before, you can use the following 1 liner -

arr = np.genfromtxt('blob.txt', usecols=(1,2,3),converters={i:lambda x: float(x.decode().strip('(').strip(')')) for i in [1,2,3]} )

Basically genfromtxt try to read the data into bytes and convert it to the right format, thus you can manipulate it using converters as you desire.

CodePudding user response：

genfromtxt is meant for csv files, which have consistent rows and columns, a simple table. Extra characters like () can mess it up. Also inconsistent delimiters (white-space is the default)

With a sample

In [68]: txt = """(( 1 2 3) xxx)
    ...: (-1 2 3) yyy)
    ...: """
In [69]: np.genfromtxt(txt.splitlines(), usecols=(1,2,3))
Out[69]: 
array([[ 1.,  2., nan],
       [ 2., nan, nan]])

nan for strings that can't be made into floats.

Without usecols, we see that it gets different numbers of columns in each line.

In [70]: np.genfromtxt(txt.splitlines())
Traceback (most recent call last):
  File "<ipython-input-70-3a7e73045f73>", line 1, in <module>
    np.genfromtxt(txt.splitlines())
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 2124, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #2 (got 4 columns instead of 5)

Here's what it's doing for each line:

In [71]: for row in txt.splitlines():print(row.split())
['((', '1', '2', '3)', 'xxx)']
['(-1', '2', '3)', 'yyy)']

You need to clean up the file before passing it to genfromtxt, or use your own parsing that can deal as you want with the ().

With a clean file:

In [72]: txt = """1 2 3 xxx
    ...: -1 2 3 yyy
    ...: """
In [73]: np.genfromtxt(txt.splitlines(),usecols=(0,1,2))
Out[73]: 
array([[ 1.,  2.,  3.],
       [-1.,  2.,  3.]])