My code is the following:
file_ = open('file.txt', 'r')
lines = file_.readlines()
data = []
for row in lines:
temp = row.split()
data.append(np.array(temp).astype(np.float64))
I want to cast every item in the array to float EXCEPT the final one, which I want to remain a string.
How can I do this?
CodePudding user response:
No, there is no function to cast elements of the same array to different types. Unlike regular Python lists, numpy arrays are homogeneous and store elements with fixed physical record sizes, so each element of the array must always have the same type.
You could handle the strings separately and parse only the numeric part into a numpy array:
for row in lines:
temp = row.split()
numbers = temp[:-1]
stringbit = temp[-1]
data.append(np.array(numbers).astype(np.float64))
Alternatively, if your data is very consistent and each line always has the same type structure, you might be able to use a more complex numpy dtype and numpy.genfromtext to make each line an element of a larger array.
You might also find a pandas.DataFrame fits better for working with this kind of heterogeneous data.
A related question with useful details: NumPy array/matrix of mixed types
CodePudding user response:
You can use recarrays.
Of your rows are records with similar data, you can create a custom dtype that does what you want. The requirement for a homogenous datatype in this case is that the number of elements is constant and there is an upper bound on the number of characters in the final string.
Here is an example that assumes the string only holds ASCII characters:
max_len = 10
dtype = np.dtype([('c1', np.float_), ('c2', np.float_), ('c3', np.float_), ('str', f'S{max_len}')])
row = [(10.0, 1.2, 4.5, b'abc')]
result = np.array(row, dtype)
If you don't want to name each float column separately, you can make that field a subarray:
dtype = np.dtype([('flt', np.float_, 3), ('str', f'S{max_len}')])
row = [([10.0, 1.2, 4.5], b'abc')]
If the strings are not of a known length, you can use the object dtype in that field and simply store a reference.
Even though it's possible, you may find it simpler to just load the floats into one array and the strings into another. I generally find it simpler to work with arrays of a homogenous built in dtype than recarrays.