Home > Mobile >  saving and reading back numpy array
saving and reading back numpy array

Time:02-18

I have ndarray like this. I am writing it to a dataframe, saving as a pickle, reading that pickle, and then creating new array again. Why does np.array_equal(my_array2,X_train) return false? i tried to debug and have written some code to understand the problem but having a hard time

How should I change the code so that both arrays match?

X_train=array([[" I I want to know how much s it thank you"],
       [" press any key to connect P Thank you Too <unk> I "]],
      dtype='<U97064')
X_train


X_train[0]
#array([[' I I want to know how much s it thank you'],
       [' press any key to connect P Thank you Too <unk> I ']],
      dtype='<U97064')


df = pd.DataFrame(X_train, columns = ['Column_A'])


df.to_pickle('df.pkl')
df2 = pd.read_pickle('df.pkl')

my_array2= df2['Column_A'].to_numpy(dtype='<U97064')

np.array_equal(my_array2[0],X_train[0])
#false

np.array_equal(my_array2,X_train)
#false 

type of arrays match

print (type(my_array2))
print (type(X_train))

#<class 'numpy.ndarray'>
#<class 'numpy.ndarray'>

but individual members dont match

#not sure why datatype of individual elements is different
print (type(my_array2[0]))
print (type(X_train[0]))
#<class 'numpy.str_'>
#<class 'numpy.ndarray'>

X_train.dtype
#dtype('<U97064')


type(X_train.dtype)
#numpy.dtype

CodePudding user response:

In your code, X_train[0] is itself an array while my_array2[0] is a string.

print(X_train[0])
>>array([' I I want to know how much s it thank you'], dtype='<U97064')
print(my_array2[0])
>>' I I want to know how much s it thank you'

If you want my_array2 to be a numpy array of shape (2,1) the shape same as X_train, add .reshape(2,1).

my_array2= df2['Column_A'].to_numpy(dtype='<U97064').reshape(2,1)

print(np.array_equal(my_array2[0],X_train[0]))
>>true

print(np.array_equal(my_array2,X_train))
>>true
  • Related