I have a text file with the following format:
(( X_value Y_value Z_value) ID)
I would like to read this into an array and I have been partly able to do with:
positions = np.genfromtxt(file, skip_header=N_header_lines, usecols=(1, 2, 3))
However I run into a problem when the X_value is negative this results in the following:
((-X_value Y_value Z_value) ID)
the problem being that Numpy now reads "((-X_value" as one column and does not separate the the string from the float.
I hope I was able to convey my problem clearly. Does someone know how to solve this problem.
CodePudding user response:
a = "(( -1 2.0 3.7) 5)"
a = a.replace('(','').replace(')','').split()
result = [float(i) for i in a]
print(result)
output:
[-1.0, 2.0, 3.7, 5.0]
CodePudding user response:
Willian answer is great, however if you do not wish to load the text file and manipulate it before, you can use the following 1 liner -
arr = np.genfromtxt('blob.txt', usecols=(1,2,3),converters={i:lambda x: float(x.decode().strip('(').strip(')')) for i in [1,2,3]} )
Basically genfromtxt try to read the data into bytes and convert it to the right format, thus you can manipulate it using converters as you desire.
CodePudding user response:
genfromtxt
is meant for csv
files, which have consistent rows and columns, a simple table. Extra characters like () can mess it up. Also inconsistent delimiters (white-space is the default)
With a sample
In [68]: txt = """(( 1 2 3) xxx)
...: (-1 2 3) yyy)
...: """
In [69]: np.genfromtxt(txt.splitlines(), usecols=(1,2,3))
Out[69]:
array([[ 1., 2., nan],
[ 2., nan, nan]])
nan
for strings that can't be made into floats.
Without usecols
, we see that it gets different numbers of columns in each line.
In [70]: np.genfromtxt(txt.splitlines())
Traceback (most recent call last):
File "<ipython-input-70-3a7e73045f73>", line 1, in <module>
np.genfromtxt(txt.splitlines())
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 2124, in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #2 (got 4 columns instead of 5)
Here's what it's doing for each line:
In [71]: for row in txt.splitlines():print(row.split())
['((', '1', '2', '3)', 'xxx)']
['(-1', '2', '3)', 'yyy)']
You need to clean up the file before passing it to genfromtxt
, or use your own parsing that can deal as you want with the ().
With a clean file:
In [72]: txt = """1 2 3 xxx
...: -1 2 3 yyy
...: """
In [73]: np.genfromtxt(txt.splitlines(),usecols=(0,1,2))
Out[73]:
array([[ 1., 2., 3.],
[-1., 2., 3.]])