How can I encode a numpy array of object elements into ASCII?-CodePudding

Suppose, I have four lists of different data types. I also have a 2d matrix. I want to merge them column-wise.

Say, in the following source code:

train_x_111 == ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
train_y_111 == ['abcd', 'bcde', 'cdef', 'defg', 'efgh', 'fghi', 'ghij', 'hijk', 'ijkl', 'jklm']
train_z_111 == [[0.0, 0.1, 0.2, 0.3],
 [0.1, 0.2, 0.3, 0.4],
 [0.2, 0.3, 0.4, 0.5],
 [0.3, 0.4, 0.5, 0.6],
 [0.4, 0.5, 0.6, 0.7],
 [0.5, 0.6, 0.7, 0.8],
 [0.6, 0.7, 0.8, 0.9],
 [ 0.7, 0.8, 0.9, 1.0],
 [0.8, 0.9, 1.0, 1.1],
 [0.9, 1.0, 1.1, 1.2]]

I want the following output in a text file:

1   a   abcd    0.0     0.1     0.2     0.3
2   b   bcde    0.1     0.2     0.3     0.4
3   c   cdef    0.2     0.3     0.4     0.5
4   d   defg    0.3     0.4     0.5     0.6
5   e   efgh    0.4     0.5     0.6     0.7
6   f   fghi    0.5     0.6     0.7     0.8
7   g   ghij    0.6     0.7     0.8     0.9
8   h   hijk    0.7     0.8     0.9     1.0
9   i   ijkl    0.8     0.9     1.0     1.1
0   j   jklm    0.9     1.0     1.1     1.2

source_code.py

if __name__ == "__main__":
    train_x_111, train_y_111, train_z_111 = load_data() # load_data() returns three TF tensors

    features_data_int_2d = np.array(train_x_111, dtype=int)
    sum_int_1d = np.sum(features_data_int_2d, axis=1)
    sum_int_1d = sum_int_1d.reshape(-1, 1)

    sum_data_1d_obj = sum_int_1d.astype(np.object_)
    features_data_2d_obj = np.array(train_x_111, dtype=np.object_)
    classes_data_1d_obj = np.array(train_y_111, dtype=np.object_)
    classes_data_1d_obj = classes_data_1d_obj.reshape(10,1)
    classes_string_1d_obj = np.array(train_z_111, dtype=np.object_)
    classes_string_1d_obj = classes_string_1d_obj.reshape(10, 1)

    sum_matrix = np.concatenate((sum_data_1d_obj, classes_data_1d_obj), axis=-1)
    sum_matrix = np.concatenate((sum_matrix, classes_string_1d_obj), axis=-1)
    sum_matrix = np.concatenate((sum_matrix, features_data_int_2d), axis=-1)

    sum_matrix = sum_matrix.encode('ascii')
    print(sum_matrix)
    np.savetxt("my_file.txt", sum_matrix, fmt='%s', delimiter='\t')

Error output

C:\ProgramData\Miniconda3\python.exe C:/Users/pc/source/repos/my_project/data_hashing.py
Traceback (most recent call last):
  File "C:\Users\pc\source\repos\my_project\data_hashing.py", line 151, in <module>
    sum_matrix = sum_matrix.encode('ascii')
AttributeError: 'numpy.ndarray' object has no attribute 'encode'

Process finished with exit code 1

How can I encode a numpy array of object elements into ASCII?

CodePudding user response：

.encode('ascii') only works on strings, you should replace the problematic line with :

newArray = []
for i in range(len(sum_matrix)) :
    newLine = []
    for j in range(len(sum_matrix[0])) :
        newLine.append(str(sum_matrix[i][j]).encode('ascii'))
    newArray.append(newLine)
sum_matrix = np.array(newArray)

This basically goes through your array and encode eachelement the nput it all back into an array. There might be a way to vectorize the encode function but I don't know how to use this.

CodePudding user response：

I solved the issue by adding the following line

sum_matrix = sum_matrix.astype('U')

in place of

sum_matrix = sum_matrix.encode('ascii')